You are here: Home Documentation Sense Analysis Understanding the Confidence Thresholds

Understanding the Confidence Thresholds

Sense Analysis is based on combining the results of several machine learning models. These models predict the output senses but also a confidence. When the confidence is low the applications may want to ignore the semantic output and fall back to the text. The confidence is impacted by the semantic connections in the text, the case correctness (are the words capitalized correctly?), the ambiguity of the text, etc.

Different confidence thresholds can be used depending on the application’s objective (e.g. improving precision vs. improving recall). Such thresholds could even be adjusted dynamically within an application (e.g. more relaxed thresholds if very few results are found, and much tighter thresholds when many possible results are found). The application thresholds are usually determined empirically but could be learned.

There are several confidence values available to the application. The following table describes them.

Threshold Feature Description
ccfmp fine sense Positive confidence for the most probable fine sense taking into account the possible lexical path (i.e., formation of long compounds), sense, and part of speech ambiguities. Most applications are based on the fine sense and this is the key threshold to use. It also accounts for the confidences of the lexical category and coarse sense. This is an attribute at each frag in the semdoc output.
cccmp coarse sense Equivalent to the "ccfmp" described above but applicable to the coarse sense. Useful for the rare applications that make use of the coarse sense.
pc fine sense Positive confidence for the fine sense. It accounts for the confidences of the lexical category and coarse sense. Assumes that in the correct lexical path.
pc coarse sense Positive confidence for the coarse sense. This indicates whether the coarse sense is correct. Would be applicable to few applications that can make use of the coarse sense.
pc lexical category (part of speech) Positive confidence for the part of speech (i.e. verb, noun, etc). Usually ignore as this is integrated in the fine sense confidence.
c dependency Positive confidence for syntactic dependency. These are obtained from the parse tree of each sentence and might be useful to applications that are looking to understand whether a term is the theme or agent of a phrase.

Example

As an example of threshold usage, the Sense Mapping application available on the site,uses a threshold of 0.70 on the fine sense (pc). This high threshold ensures that external links added to the text are almost always relevant. The "cost" of making an error is such that it was better to under-link than increase coverage.