Relevance
In the Geneea Media API, every Tag and Entity are assigned a relevance score between 0 and 100. This score indicates how central that item is to the meaning of the article.
While frequency (how many times a name is mentioned) is a factor, relevance is a much more sophisticated metric. It allows you to distinguish between the primary subject of an article and a peripheral mention.
How to Use Relevance
When building your editorial workflow or automated tagging system, we recommend using these thresholds as a starting point.
| Score Range | Significance | Recommended Action |
|---|---|---|
| 70 – 100 | Primary Subject | The "About" entities or topics; always include in metadata, SEO tags, etc. |
| 40 – 69 | Significant Context | Important supporting actors or secondary topics. |
| 10 – 39 | Peripheral Mention | Mentioned in passing (e.g., as a comparison or background); usually safe to ignore. |
| < 10 | Background Noise | Low relevance; these should typically be filtered out . |
In the mixed mode, we recommend marking all tags with relevance
- 70 and above as accepted by default (opt-out),
- below 70 as rejected by default (opt-in).
For more on this, see Journalist Tag Acceptance Models.
How Relevance is Calculated
Internally, Geneea uses a combination of dozens of features to calculate these scores, considering factors such as:
- Position: Mentions in the headline or lead paragraph carry more weight.
- Frequency and Density: How often the item appears relative to the document length.
- Salience: How central the entity is to the specific domain or topic of the text.
We consider both direct and indirect mentions. For example, when calculating the relevance for the tag France, France is a direct mention, Paris is indirect.
There are often intricate scenarios where the relevance of an entity depends on several factors; what is more important?
- them or their shared name? (John + Paul + George + Ringo vs. Beatles)
- them or entities they stand for (Is it about Emmanuel Macron and Keir Starmer vs. France and UK)
- them or the topic they discussed?
- etc.
The relevance of IPTC Media Topics, IAB Content Taxonomy and other categorizations is calculated differently, but the scores should be roughly comparable.
Relevance vs. Confidence
It is important not to confuse relevance (how important the entity is to the article) with confidence (how sure the AI is that it detected the entity correctly). An AI can be 99% confident that it found the word Facebook, but that entity might have only 1% relevance if it was mentioned in a footer link.