You are here: Home Documentation Language Graph Understanding the Language Graph

Understanding the Language Graph

The Language Graph is a linguistic knowledge base containing millions of concepts and over 100 million facts in the form of relations and annotations. It is used extensively by the linguistic algorithms to predict the senses of words, generate paraphrases, etc.

It contains almost all common words, many common multi-word expressions (e.g., nuclear reactor), and millions of named entities (proper nouns or adjectives). The named entities were mined from popular sources such as Wikipedia, MusicBrainz, etc. Where a sense was obtained from an external reference, that information is available and can be used by an application to establish a mapping between the Idilia sense and the matching Wikipedia page (for example).

Senses are available in two granularities: fine senses and coarse senses. The fine senses correspond to a definition such as would exist in a dictionary. Several words can be used to communicate a same sense and those are called "sensekeys" of the same sense. Coarse senses are groupings of fine senses that are closely related.

For each fine sense in the knowledge base, the following major linkages are available:

Relationship Description
Synonyms Firstly, the other sensekeys (alternate wording) for the same definition. Secondly, the other senses which are closely related (near synonyms or match synonyms).
Coarse sense The coarse sense to which the fine sense belongs and by extension, the other fine senses forming the coarse sense. For example, passenger/N1 and passenger/J1 form the coarse sense passenger/C3.
Generalization The parent sense (hypernym). For example, this connects car/N1 to automotive_vehicle/N1, which is itself connected to self-propelled_vehicle/N1 which is a wheeled_vehicle/N1, and so on until entity/N1.
Specialization This is the opposite of generalization. For any given sense, the list of its more specific senses. For example, car/N1 has thousands of "children".
Classification The Knowledge Base contains several hundred categories into which the senses are classified. Each sense may belong to more than one category. For example, car/N1 belongs to the categories auto_industry/N1 and consumer_road_vehicle/N1.
Property (Is-A) The Knowledge Base contains several hundred properties that are used to describe what a sense is. Again, each sense may have multiple properties. For example, car/N1 has the properties consumer_vehicle/N1 and self-propelled_vehicle/N1. A politician that used to be a military person will have property links from both of these properties.
Constituents For each multi-word sense (such as nuclear_reactor/N1), the Knowledge Base includes the fine senses assembled to create it. For example, nuclear/J1 and reactor/N8.

There are also many other relationships not described here.

In addition to the linkages, the Language Graph contains key information for senses:

  • Frames and lexical properties: grammatical information about the way verbs and nouns can be used.
  • Frequencies of usage
  • Textual description
  • External references
  • Attributes such as "vulgar", "technical", "archaic", etc.