latent semantic indexing:
Latent Semantic Indexing is Principal Component Analysis (PCA) in document analysis, it is simply applying PCA to (the variance-covariance matrix) of X and the principal directions (eigenvectors) now define topics.
It uses a term-document matrix X that describes the occurrences of terms in documents. Rows correspond to terms(vocabulary) and columns correspond to documents. Elements of X are typically weights that are proportional to the number of times a term appears in a document, with rare terms upweighted to reflect the relative importance. The matrix X is usually large and sparse.
LSA finds a low-rank approximation of the original term-document matrix, which merges the dimensions of terms that have similar meanings.
What is it used for:
LSA can be applied to compare documents in the low-dimensional space (document classification), find relations between terms (synonym identification), find matching documents by translating a query of terms to low-dimensional space (information retrieval), and etc.
Limitations include:
The resulting dimensions can be difficult to interpret
LSA cannot capture multiple meanings of a word
The terms of a document are represented unordered
Eigenvectors can have negative components
Reference:
https://en.wikipedia.org/wiki/Latent_semantic_analysis