Search
Improving algorithmic search
Accurate WSD technology has been shown to improve algorithmic search relevancy by 10% on average, and by over 100% on hard-to-resolve queries, where performance counts most.
WSD technology generates improvements in relevancy without making any changes to a search engine’s proprietary ranking algorithms. In its simplest implementation, WSD simply enriches the pool of results the search engine ranking algorithms consider, by simply removing a bunch of irrelevant results, and replacing them with a bunch of good results. More specifically, the following two mechanisms are at work in a WSD-enabled search engine:
- Word sense matching improves precision
WSD eliminates false matches (also known as “improving precision”) by matching word senses in queries to word senses in documents. Idilia’s WSD technology enables the meanings of words in queries and/or ads to be matched to an index of word senses, so that a query about river banks doesn’t return information about commercial bank branches. Lest one think that ambiguity in query keywords is the exception, consider that almost all words used in queries are ambiguous (have multiple senses depending on context). WSD is also applied to distinguish between different meanings of proper nouns. An ordinary word such as “jaguar” can have 8 different meanings and an even more common word such as “left” can have more than 20 meanings crossing several different parts of speech. Even a simple word such as “in” can have multiple meanings (e.g. inch, indium, Indiana). In short, word sense ambiguity is the central problem to improving search relevancy, not a peripheral issue that sometimes crops up. First, Idilia’s technology is used to build a word-sense index of any document collection (e.g. the web, an enterprise server, a PC hard-drive). In an Idilia "word-sense and semantic relations index” (or simply, a "semantic index"), the word "bank", for example, is indexed discretely for its various meanings (e.g. "river bank", "bank of switches", "bank branch", "bank an airplane"). Then, users’ queries are disambiguated and the resulting word senses in the query are matched to the semantic index. The ability to match the keyword-senses in a query to a semantic index will yield superior precision.
- Paraphrasing improves recall
Idilia’s WSD engine provides new and better matches (also known as “improving recall”) by providing precise paraphrases of the original query (phrases that express the same concept as the original query but use a different set of word senses), which can be used to find additional correct or possibly better matches. Once the meanings of the words in a query have been automatically selected by WSD, Idilia has technology to paraphrase the query into new phrases with equivalent meaning to the original query. This can generate hundreds or even thousands of highly accurate paraphrases. A search engine can use these paraphrases to return additional correct matches (documents or ads) for a query which may not contain the original query terms, but nonetheless answer the query, so that a query for “text-to-speech software” can return results matching the terms “speech synthesis application". It is important to note that paraphrasing is not possible without WSD. In the above example, the word “software" cannot be paraphrased without first selecting its correct meaning. Attempting to paraphrase all possible meanings of a query term leads to massive over-generation of paraphrases, most of which are wrong and a resulting overall degradation of the relevancy of results. (Consider attempting to paraphrase the term “Java” without first understanding the sense, which could conceivably yield paraphrases as diverse as “object-oriented programming language”, “Indonesian island” and “coffee”, only one of which would generate relevant results in any given context.) This is why search engines today do not generally attempt paraphrasing.
Question answering
It is important to note that WSD technology requires no fundamental change in the way users search. Users can continue to search using keywords; they will just get better results. However, WSD technology does also allow a search engine to intelligently process questions, such that a result containing an answer can be provided, rather than just a set of keyword matches. This is because a WSD allows specific concepts in queries to be identified and matched to specific instances, so that, for example, a search for a physical location (e.g. “Where is Montreal?”) can return only results that themselves are physical locations, or a search for a generic type of entity (e.g. “Which airlines fly to St. Louis?”) can return results which are entities of that type (e.g. “airlines”), and only for meanings which have the properties of being that entity (e.g. “American”, “United”, but only when these words refer to an airline).
Integration with a search engine
Idilia’s technology can readily be integrated into an existing search engine in a modular fashion, without necessarily requiring modification of a search engine’s (highly sensitive) core proprietary ranking technology. WSD adds a series of steps – first, when building the search index and second, when matching queries – to yield a far richer and superior collection of results for ranking than an existing keyword based system. More specifically,
- First, WSD is performed prior to indexing a document collection (e.g. the web, an enterprise server, a PC hard-drive), so that word meanings, rather than simply words, are indexed. In a word-sense index, the word “bank”, for example, is indexed separately for its various meanings (e.g. “river bank”, “bank branch”, “bank an airplane”).
- Secondly, queries are disambiguated prior to being matched, and, where appropriate, paraphrased into equivalent sets of word meanings. At this point, the meanings of the query terms can be matched to the word meanings in a word sense index, exactly analogous to keyword matching, but yielding only precise sense matches. Optionally, additional matching results can be obtained by matching paraphrased word meanings, enriching the result set. The search engine’s existing and proprietary ranking technology can then be used to rank a more precise, enriched set of results.
Improved ranking
Although WSD can deliver an improvement in search results relevancy without any changes to a search engine’s proprietary ranking algorithms, optionally, WSD can deliver additional benefit if semantic information is used to influence search results ranking. One important way of doing so is to augment the basic collocation algorithm that ranks search results higher in which the query words can be found side-by-side in the same sequence as they are found in the query, with a more intelligent version that considers whether the words in the document have the same semantic relationship to one another in the result as they do in the query. Idilia’s WSD provides this information. Thus, rather than simply considering whether “buy cheap flights” is collocated, the ranking algorithm can look for “cheap” in a modifier relationship with “flight” and “flight” as the direct object (or “theme”) of the verb “buy”, which makes “buy very cheap airline flights” a perfect match.