Correspondence analysis [Gre84] is a geometric-based method for
displaying the rows and columns of a matrix (or a two-way contingency
table) as points in dual low-dimensional vector spaces. In contingency
tables [Gif90] or term-by-document matrices such as the
matrix A
defined in Equation (1), the cell or matrix element
contains the frequency with which row category (keyword) i co-occurs
with column category (document) j.
Define
as the
vector of all 1's. Then,
and
define vectors of row and column sums, respectively. It then
follows that
is the sum of all the
nonnegative
matrix elements
. Let
and
define diagonal matrices
composed of the elements of vectors r and c, respectively. As
originally described by Benzécri in [Ben73], the goal
of Correspondence Analysis is to find another matrix representation
(say matrix X) of the rows of A such that the Euclidean
distances between rows in X approximate certain
profile distances between the rows of A.
The term profile as used in [Gre84]
refers to a set of relative frequencies
found in the representation of a data matrix as a long, flat
table having frequencies in each row expressed as percentages
of their respective row sums. Simultaneously,
another matrix representation (say matrix Y) of the columns
of A is desired whose Euclidean distances among columns approximate
certain profile distances between columns of A.
The squared distance or
distance
between rows i and j
of the matrix A is defined as
where
and
denote the i-th and k-th elements of
the column vectors r and c, respectively. It can easily be shown
that
is the same squared Euclidean distance
between rows i and j for the matrix
whose
elements are defined by

The matrix B can also be written as
so that
for any Euclidean representation X of B (see [Gif90]).
One derivation of the Euclidean representation X is given by the
SVD (see Equation (3)) of
defined by
where
, and
).
By defining
it follows that

Using Equation (7), it follows that

and hence the Euclidean distances among the rows of X are equal to the Euclidean distances among the rows of B.
The matrix X by construction will have one column of constant elements
(or equal to
with appropriate scaling). This can be easily shown
by first noting that the matrix
has a singular triplet
corresponding to the largest singular value
, i.e.,
. This triplet can be derived
from the following equalities:


Consequently, the scaled right singular vector (corresponding to
) given by
has unit length so that
is a column
of the matrix X. This constant column of X does not contribute
to the distance between any two rows of X, and can be removed
by computing the SVD

which deflates or prevents the trivial singular vectors
(
,
) from occurring. Hence, this correction
yields a Euclidean representation of

rather than that of the matrix B.
Whereas the rows of the matrix X (see Equation (8)) have the
same profile distances of the rows of matrix A, the columns of the
matrix
have the same profile distances
of the columns of A. Note that
. As discussed
in [Gif90], this suggests that the row elements of the matrix X are,
in fact, the center of gravity or centroid of the elements
of the column elements of A, weighted by their frequency in the
row profile. Benzécri
[Ben73] referred to this as
le principle barycentrique.
If the matrix representation
Y for the matrix A had initially been sought,
distances
between columns of A (as opposed to the rows of A) would be used to
produce column elements of the matrix Y which are at the centroid of the
row elements of A. Using the same SVD from Equation (7),
the derived matrix Y and corresponding matrix X are given by

An alternative formulation for the matrix X determined by correspondence analysis is discussed in [Gre84]. Here the generalized singular value decomposition [GL89]
is used to minimize
where
and
are the i-th rows of the matrices A
and X, respectively, and
.
The k-largest generalized singular triplets
from Equation (9) provide the optimal matrix X in
Equation (10) of rank k [Mir60].
For the right generalized singular vectors
from
Equation (9), the first k column vectors
are referred to as the k
principal axes of the rows of A. The total variation or
inertia of the matrix A or how well A is represented along
the k principal axes is given by
where
is the i-th largest generalized singular
value (diagonal element of
) from Equation (9).
The unexplained variation
when approximating A via
, where the subscript k denotes the first k columns
of each factor of the generalized singular value decomposition in
Equation (9), is given by

Since the total inertia is decomposed along the principal axes
[Gre84], the i-th principal axis accounts for an amount
of
of the total inertia in Equation (11).
When used as permutation vectors for the rows and columns of an
matrix A, the k principal axes of the rows and
columns of A (i.e.,
and
, respectively) as determined
by Equations (7) or (9) can produce a
reordering of the original matrix A having more banded (or block diagonal)
form [Gre84]. Typically, the elements of the second largest left
and right generalized singular vectors (
) are
sorted in ascending order to produce the required row and column permutations.
Table 7 illustrates the reordering using the second
largest generalized singular triplets from Equation (9)
when A is the
matrix
defined in Table 2.
term-by-document
matrix of the technical memoranda titles using Correspondence
Analysis .