You are here: Home Technology Word Sense Disambiguation

Word Sense Disambiguation

What is ambiguity?

The common words "book" and "cut" illustrate the problem of ambiguity. The word "book" can mean:

  • noun: a novel, a set of financial records, a collection of bound objects (e.g. "a book of tickets"), or
  • verb: making a reservation, registering (e.g. "book into a hotel"), documentation of an offense by a policeman (e.g. "book for speeding").

Similarly, a word such as "cut" has even more meanings, and can be:

  • noun: a wound, a reduction ("cut in pay"), or
  • verb: wounding someone, chopping something, moving a page in an electronic document ("cut and paste"), or even
  • adjective: having a severed stem ("cut flowers"), chiseled ("cut glass").

In total, the word "book" has at least 14 distinct meanings and the word "cut" 73 distinct meanings. Without WSD, a random assignment of meaning in a sentence containing both words would be accurate less than one time in a thousand. Even proper nouns such as "JFK" often have multiple meanings, and require WSD. Idilia’s system currently identifies the correct meaning of a word in context at very high levels of accuracy – accuracy significantly higher than any published research results.

Idilia’s WSD technology is also able to specify precise semantic relations between pairs of word senses in unstructured text. This allows software applications to accurately relate the concepts in unstructured text, adding further structure to the information. For example, in the sentence "Book me a flight to New York", the word "flight" is identified as the "theme" of the verb "book", while the proper noun "New York" is identified as modifying the noun "flight".

Idilia's word sense disambiguation technology

Idilia has researched and developed revolutionary artificial intelligence technology that allows software to understand and use the precise meanings of words in naturally expressed language, a process known as word sense disambiguation (WSD). Idilia’s core WSD technology will revolutionize the critical software applications that deal predominantly with naturally expressed language. Idilia has developed the world’s first system capable of performing WSD at accuracy approaching that of a human being, and the system is ready for commercial deployment. It represents a milestone in the field of artificial intelligence.

Word sense disambiguation (WSD) is the process of resolving ambiguity in text, and is a well-known and massively researched problem of artificial intelligence. Simply put, language is ambiguous because words have multiple meanings. This feature of language deals a knockout blow to any computer system attempting to extract and work with meaning in ordinary human language (also referred to as “natural language”, or, when in textual form, as “unstructured text”). Humans resolve word meanings using the information provided by the context, thereby allowing them to understand naturally expressed language seemingly effortlessly.

Accurate WSD technology opens up the entire range of human tasks that depend on natural language understanding – from paraphrasing a search query to extracting sentiment from a Tweet – to computers and software applications. Until now, the process of WSD has proved impossible to perform accurately with software, and no system had been built that performed WSD at levels approaching human accuracy. In fact, despite decades of research, WSD research system hit a ceiling of 60-70% accuracy, a level some 25-35% below human performance. This level of accuracy is not much better than selecting the most frequent word sense in every context, an indication of just how difficult the problem is, and how little progress had been made in solving it. Idilia has decisively broken through the research ceiling and developed the world’s first WSD technology able to automatically select word meanings at close to human levels of accuracy. Idilia’s WSD technology is over 85% accurate, which is within 5-10% of human performance. This means that Idilia has closed over 80% of the gap between humans and machines for understanding language.