Relevance
In the Geneea Media API, every Tag and Entity is assigned a relevance score between 0 and 100. This score indicates how central that item is to the meaning of the article.
While frequency (how many times a name is mentioned) is a factor, relevance is a much more sophisticated metric. It allows you to distinguish between the primary subject of an article and a peripheral mention.
How to Use Relevance
When building your editorial workflow or automated tagging system, we recommend using these thresholds as a starting point.
| Score | Significance | Recommended Action |
|---|---|---|
| 70–100 | Primary Subject | The "About" entities or topics; always include in metadata, SEO tags, etc. |
| 40–69 | Significant Context | Important supporting actors or secondary topics. |
| 10–39 | Peripheral Mention | Mentioned in passing (e.g., as a comparison or background); usually safe to ignore. |
| < 10 | Background Noise | Low relevance; these should typically be filtered out. |
In the mixed supervision mode, we recommend handling tags based on their relevance score:
- 70 and above: Mark as accepted by default (opt-out).
- Below 70: Mark as rejected by default (opt-in).
You can lower the accepted threshold to 50 if your strategy favors high-volume tagging. This ensures strong discoverability and SEO even with minimal manual review. For more on this, see Journalist Tag Supervision Models.
How Relevance is Calculated
Internally, Geneea uses a combination of dozens of features to calculate these scores, considering factors such as:
- Frequency and Density: How often the item appears relative to the document length.
- Placement: Mentions in prominent positions, such as the headline, carry more weight.
- Salience: How central the entity is to the specific domain or topic of the text.
We consider both direct and indirect mentions. For example, when calculating the relevance for the tag France, France and French Republic are direct mentions, while Paris is an indirect mention.
We also support custom adjustments of scores for specific entity types, article topics, etc.
The relevance for IPTC Media Topics, IAB Content Taxonomy, and other categorizations is calculated differently, but the scores should be roughly comparable.
Cross-Article Comparison
The relevance score is a standardized metric that allows for direct comparison across different documents.
For example, if the tag Olympic Games has a score of 50 in one article and 85 in another,
the latter article is significantly more focused on that topic.
Using these scores, media organizations can programmatically sort or
select the top N most relevant articles for a specific topic across their archive.
Relevance vs. Confidence
Do not confuse relevance — how important the entity is to the article — with confidence — how sure the AI is that it detected the entity correctly. The AI can be 99% confident that it found the word Facebook, but that entity might have only 1% relevance if it was mentioned in a footer link.