Skip to main content

Relevance

In the Geneea Media API, every Tag and Entity are assigned a relevance score between 0 and 100. This score indicates how central that item is to the meaning of the article.

While frequency (how many times a name is mentioned) is a factor, relevance is a much more sophisticated metric. It allows you to distinguish between the primary subject of an article and a peripheral mention.

How to Use Relevance

When building your editorial workflow or automated tagging system, we recommend using these thresholds as a starting point.

Score RangeSignificanceRecommended Action
70 – 100Primary SubjectThe "About" entities or topics; always include in metadata, SEO tags, etc.
40 – 69Significant ContextImportant supporting actors or secondary topics.
10 – 39Peripheral MentionMentioned in passing (e.g., as a comparison or background); usually safe to ignore.
< 10Background NoiseLow relevance; these should typically be filtered out .

In the mixed mode, we recommend marking all tags with relevance

  • 70 and above as accepted by default (opt-out),
  • below 70 as rejected by default (opt-in).

For more on this, see Journalist Tag Acceptance Models.

How Relevance is Calculated

Internally, Geneea uses a combination of dozens of features to calculate these scores, considering factors such as:

  • Position: Mentions in the headline or lead paragraph carry more weight.
  • Frequency and Density: How often the item appears relative to the document length.
  • Salience: How central the entity is to the specific domain or topic of the text.

We consider both direct and indirect mentions. For example, when calculating the relevance for the tag France, France is a direct mention, Paris is indirect.

There are often intricate scenarios where the relevance of an entity depends on several factors; what is more important?

  • them or their shared name? (John + Paul + George + Ringo vs. Beatles)
  • them or entities they stand for (Is it about Emmanuel Macron and Keir Starmer vs. France and UK)
  • them or the topic they discussed?
  • etc.

The relevance of IPTC Media Topics, IAB Content Taxonomy and other categorizations is calculated differently, but the scores should be roughly comparable.

Relevance vs. Confidence

It is important not to confuse relevance (how important the entity is to the article) with confidence (how sure the AI is that it detected the entity correctly). An AI can be 99% confident that it found the word Facebook, but that entity might have only 1% relevance if it was mentioned in a footer link.

See Also