Using Query Rewriting
The recall of enterprise search queries can be drastically improved by rewriting the search queries to generate alternate equivalent forms (called “rewrites”) in some enterprise search applications. Idilia’s technology performs this automatically.
- Idilia’s Query Rewrite API improves recall from any keyword index
- Queries are automatically and instantly rewritten into semantically equivalent queries
- Idilia’s software understands the precise meaning of each keyword in the original query and rewrites the query using synonyms, hypernyms, hyponyms, and transformations
- Users don’t need to spend time reformulating queries to find what they’re looking for
Let’s say an employee is searching a corporate intranet for documents that will help with a client pitch. So, the employee searches with the keywords “client” and “pitch”.
But unless all the relevant documents contain both keywords in the query, recall is going to be low.
Idilia’s Query Rewrite API determines the exact meaning of the query keywords and rewrites the query into semantically equivalent queries. So, in this case, “pitch” means to put forward a sales argument, rather than throw a ball.
The resulting rewrites can be expressed as four separate queries, each submitted to the search engine, or the output can be combined using Boolean logic into a single query, “(client or customer) + (pitch or “sales talk”)”.
|0.98||client sales talk|
|0.98||customer sales talk|
How it Works
- The Query Rewrite API routes a query to Idilia’s Sense Analysis software
- A specially trained recipe determines the precise meaning of each keyword in the query
- Then, the sense-annotated query is routed to Idilia’s Paraphrasing software where the query is rewritten into several semantically equivalent queries
- The number of rewrites is configurable using several parameters including a maximum number, weighting for proximity to the original query, confidence of the sense analysis, and variations on the paraphrasing recipe (selecting whether to rewrite adjectives, verbs, nouns, etc.)
- Finally, the rewritten queries are returned with proximity weighting, and the original query is returned with sense-annotation, including confidence scores (read more about the API here).
- Depending on your search engine, you can process the rewrites individually or combine the unique keyword rewrites using Boolean logic into a single query
Four Ways to Deploy Query Rewriting
- Cached Queries – If you maintain a cache of queries and use auto-complete to suggest queries, then the entire cache can be sense-annotated, each query rewritten, and each set of rewrites turned into a Boolean query, all off-line
- Real-Time – Individual user queries can be routed to the Idilia API in real-time, or rules relating to the number of results returned by the original query can determine whether to route the query for semantic processing
- User Control – The interface can allow the user to specify precise senses for keywords, helping improve precision when the original query returns too many incorrect results
- Sense-Annotate the Index – The entire index of documents can be sense annotated by Idilia allowing queries and rewritten queries to be sense-matched to the index, yielding simultaneous improvements in precision and recall
Customize Your Rewrites
Some keywords in a query are more important than others. In particular, verbs are often unhelpful in enterprise-search queries.
The Idilia Query Rewrite API allows you to control how queries are rewritten by selecting a part of speech (e.g., a verb), and specifying how that part of speech will be managed by Idilia’s paraphrasing software.
Consider the query made up of the keywords “conducting”, “performance”, and “reviews”:
Let’s see how this query is rewritten in two different customization scenarios:
Default Scenario – Rewrite all POS
|1.00||conducting performance reviews|
|0.90||conducting performance appraisals|
|0.90||conducting performance evaluations|
|0.90||conducting employee evaluations|
|0.90||conducting employee performance appraisals|
|0.90||conducting employee appraisals|
|0.90||conducting performance ratings|
|0.90||conducting employee reviews|
|0.79||conducting doing reviews|
|0.79||conducting performance reassessments|
In the example above, the customized scenario yields a better set of rewrites that can be combined using Boolean logic into “(performance or employee) + (reviews or appraisals or evaluations or ratings)”.