Latent Semantic Analysis

Latent Semantic Analysis (LSA) is a theory and method for extracting and representing the contextual-usage meaning of words by statistical computations applied to a large corpus of text (Landauer and Dumais, 1997). The underlying idea is that the aggregate of all the word contexts in which a given word does and does not appear provides a set of mutual constraints that largely determines the similarity of meaning of words and sets of words to each other.

How Does Latent Semantic Indexing in Search Engines Work?

Latent semantic indexing allows a search engine to determine what a page is about outside of specifically matching search query text. A page about Apple computers will likely naturally have terms such as iMac or iPod on it. Watch the video below:

What effect does Latent Semantic Indexing have on Google ranking?

Initially, search engines would look solely for the presence and frequency of keywords on a webpage to determine relevancy. But such an approach can result in poor results. For example, search results might distinguish between synonyms, such as “car” and “automobile,” or fail to distinguish between polynyms (words which have multiple meanings) such as “apple” and “computer.” Latent Semantic Indexing (LSI) is an approach to understanding keywords in the context of the words on the entire webpage.

The adequacy of LSA’s reflection of human knowledge has been established in a variety of ways. For example, its scores overlap those of humans on standard vocabulary and subject matter tests; it mimics human word sorting and category judgments; it simulates word–word and passage–word lexical priming data; and, as reported in 3 following articles in this issue, it accurately estimates passage coherence, learnability of passages by individual students, and the quality and quantity of knowledge contained in an essay.

Download exclusive free pdf book provide by lsa.colorado.edu
http://lsa.colorado.edu/papers/dp1.LSAintro.pdf

Link:

Related links

Leave a Reply