In the LSI model, terms and documents are represented by an
incidence matrix A. Each of the m unique terms in the document
collection are assigned a row in the matrix, while each of the n
documents in the collection are assigned
a column in the matrix. A non-zero element
, where

indicates not only that term i occurs in document j, but also the number of times the term appears in that document. Since the number of terms in a given document is typically far less than the number of terms in the entire document collection, A is usually very sparse [5].