Technology

Idilia offers artificial intelligence technology for natural language processing. Idilia’s technology provides software applications with access to the precise semantics of naturally-expressed language, ultimately allowing software applications to manipulate and use naturally expressed information in a manner comparable to humans. Idilia’s core WSD technology disambiguates words, including proper nouns at close to human performance. This represents an artificial intelligence breakthrough.

What is ambiguity?

The common words “book” and “cut” illustrate the problem of ambiguity. The word “book” can mean:

  • noun: a novel, a set of financial records, a collection of bound objects (e.g. “a book of tickets”), or
  • verb: making a reservation, registering (e.g. “book into a hotel”), documentation of an offense by a policeman (e.g. “book for speeding”).

Similarly, a word such as “cut” has even more meanings, and can be:

  • noun: a wound, a reduction (“cut in pay”), or
  • verb: wounding someone, chopping something, moving a page in an electronic document (“cut and paste”), or even
  • adjective: having a severed stem (“cut flowers”), chiseled (“cut glass”).

In total, the word “book” has at least 14 distinct meanings and the word “cut” 73 distinct meanings. Without WSD, a random assignment of meaning in a sentence containing both words would be accurate less than one time in a thousand. Even proper nouns such as “JFK” often have multiple meanings, and require WSD. Idilia’s system currently identifies the correct meaning of a word in context at very high levels of accuracy – accuracy significantly higher than any published research results.

Idilia’s WSD technology is also able to specify precise semantic relations between pairs of word senses in unstructured text. This allows software applications to accurately relate the concepts in unstructured text, adding further structure to the information. For example, in the sentence “Book me a flight to New York”, the word “flight” is identified as the “theme” of the verb “book”, while the proper noun “New York” is identified as modifying the noun “flight”.

Idilia’s word sense disambiguation technology

Idilia technology determines the precise meanings of words in naturally expressed language, a process known as word sense disambiguation (WSD)and is revolutionizing critical software applications that deal predominantly with naturally expressed language.

Word sense disambiguation (WSD) is the process of resolving ambiguity in text, and is a well-known and massively researched problem of artificial intelligence. Simply put, language is ambiguous because words have multiple meanings. This feature of language deals a knockout blow to any computer system attempting to extract and work with meaning in ordinary human language (also referred to as “natural language”, or, when in textual form, as “unstructured text”). Humans resolve word meanings using the information provided by the context, thereby allowing them to understand naturally expressed language seemingly effortlessly.

Accurate WSD technology opens up the entire range of human tasks that depend on natural language understanding – from rewriting a search query to extracting sentiment from a Tweet – to computers and software applications. Until now, the process of WSD has proved impossible to perform accurately with software, and no system had been built that performed WSD at levels approaching human accuracy. In fact, despite decades of research, WSD research system hit a ceiling of 60-70% accuracy, a level some 25-35% below human performance. This level of accuracy is not much better than selecting the most frequent word sense in every context, an indication of just how difficult the problem is, and how little progress had been made in solving it. Idilia has decisively broken through the research ceiling and developed the world’s first WSD technology able to automatically select word meanings at close to human levels of accuracy. Idilia’s WSD technology is over 85% accurate, which is within 5-10% of human performance. This means that Idilia has closed over 80% of the gap between humans and machines for understanding language.

Relative word sense disambiguation accuracy

Advanced Linguistic Services

Building on its word sense technology, Idilia has developed the following advanced linguistic services.

Query Rewriting

Idilia has developed technology that, given a set of word senses, can paraphrase that set of word senses into an equivalent set of word senses that collectively mean the same thing, but use different words and/or syntax. This technology is applicable to queries, among other possible applications. Once the meanings of the words in a query have been automatically selected by WSD, Idilia’s paraphrasing technology can identify equivalent collections of meanings using a set of algorithms combined with Idilia’s massive Language Graph. Thus, for example, “text to speech software” can be paraphrased into “speech synthesis application”. It is important to note that paraphrasing is not possible without accurate WSD. In this example, the word “charter” cannot be paraphrased without first selecting its correct meaning. Attempting to paraphrase all possible meanings of a term leads to massive over-generation of paraphrases, the vast majority of which are wrong.

This technology is useful in a variety of applications ranging from search retargeting, advertising keyword expansion, small index search, etc. For more information, see Solutions.

Social Media Filtering

Idilia has a unique solution for filtering social media based on identifying the content with the relevant word senses. For example, Tweets containing the word “subway” can be divided based on the possible senses of the word. Those interested in public transportation do not have to manually sift through those related to the popular fast-food chain and vice-versa. For more information, refer to Social Media Monitoring.

Automated Knowledge Extraction

WSD requires a semantic knowledge base containing an inventory of word meanings and semantic relations between them, sufficient to cover the vast majority of word senses, including proper noun sense, found in general domain texts. Proper nouns are a special case, as there are literally millions of them; for example, names of people, companies, products, and places. Idilia’s semantic analysis technology incorporates a massive general purpose semantic knowledge base, called the Language Graph, consisting of millions of proper and common word meanings, as well as tens of millions of precise semantic relations of different types connecting them. New proper noun senses are created all the time. Idilia has therefore researched and developed new knowledge acquisition technology for automatically mining new terms, and anchoring them in the general-purpose knowledge base. More specifically, Idilia’s knowledge extraction technology includes:

  • Technology for automatically extracting new terminology and semantic relations from unstructured and semi-structured sources;
  • Methods for precisely linking new terminology into the general-purpose knowledge base, building sub-ontologies whenever required. This means that the knowledge base can be constantly updated as new terminology is invented.

Idilia’s Language Graph is regularly updated from continuous mining of several data sources using this technology.