Skip to main content

Semantic Tagging

The Media API can perform semantic tagging of articles. Semantic tags represent entities, keywords, or concepts relevant to the article. We rank and standardize them based on their purpose and context.

For a non-technical overview, see this page and this case study.

Below, we discuss various technical topics related to obtaining semantic tags:

For full API reference, see the reference pages. Note that the exact output depends on your account plan and configuration.

Basic code common to all guide pages

Basic Code

To use the API, you'll need a valid API key with the appropriate permissions. If you don't have one, please contact us here.

In the code below, replace <YOUR_API_KEY> with your actual API key.

Note: We do not currently provide dedicated SDKs for this API, but our G3 SDKs can be used to perform NLP analysis.

# No special setup necessary

Tags – Basic Analysis

To perform a basic semantic analysis and obtain tags (keywords), use the following request:

curl -X POST -H 'X-API-KEY: <YOUR_API_KEY>' -H 'accept: */*' -H 'content-type: application/json' 'https://media-api.geneea.com/v2/nlp/analyze' -d '{
"id": "1234",
"title": "Emmanuel Macron in Germany.",
"text": "Mr. Macron visited a trade show in Munich."
}'

The code above produces results similar to the example below. Your actual results may include additional features (e.g., relations, sentiment), depending on your configuration – see entities and sentiment below.

{
"version": "3.3.0",
"id": "1234",
"language": {"detected": "en"},
"tags": [
{"id": "t1", "gkbId": "G3052772", "stdForm": "Emmanuel Macron", "type": "media", "relevance": 96.0, "feats": {"wikidataId": "Q3052772", "gkbEntityType": "person"}},
{"id": "t2", "gkbId": "G183", "stdForm": "Germany", "type": "media", "relevance": 94.0, "feats": {"wikidataId": "Q183", "gkbEntityType": "location"}},
{"id": "t3", "gkbId": "G1726", "stdForm": "Munich", "type": "media", "relevance": 66.0, "feats": {"wikidataId": "Q1726", "gkbEntityType": "location"}},
{"id": "t4", "gkbId": "IPTC-11000000", "stdForm": "politics", "type": "media-topic", "relevance": 68.51, "feats": {"MediaTopicId": "11000000", "wikidataId": "Q7163", "gkbEntityType": "general"}}
]
"usedChars": 100,
"metadata": {"referenceKey": "241014-164726-9bdaf485"},
}

This example includes two types of tags:

  • Entity-based tags ("type": "media"): most relevant entities, both names (e.g., people, locations, organizations) and keywords. See here.
  • IPTC Media Topics ("type": "media-topic"): an industry taxonomy with over 1,200 categories organized hierarchically. The above result includes politics; other examples are sport, basketball, music, classical music, etc.
    For more detail, see this article.

Each tag includes:

  • A unique ID ("gkbId") linking it to the knowledge base.
  • A standardized name ("stdForm"), optionally localized (see Presentation Language).
  • A relevance score ("relevance") from 0 to 100, representing the importance of the tag in relation to both the article and the customer's needs.
    This is distinct from entity relevance, which only considers the article itself when determining importance.
  • Third-party identifiers (e.g., Wikidata, IPTC Media Topics).
  • The type of the knowledge base item (person, organization, location, event, product, general).
  • An internal reference ID (e.g., "id": "t2") used for linking withing the system.

Tag Mentions

To receive mentions — text snippets in the article that correspond to tags — use the "returnMentions": "true" parameter.

Mentions help link tags to specific expressions in the article. Typically, entity-based tags (people, organizations, etc.) have mentions; abstract topics like IPTC Media Topics categories usually do not.

    curl -X POST -H 'X-API-KEY: <YOUR_API_KEY>' -H 'accept: */*' -H 'content-type: application/json' 'https://media-api.geneea.com/v2/nlp/analyze' -d '{
"id": "1234",
"title": "Emmanuel Macron in Germany.",
"text": "Mr. Macron visited a trade show in Munich.",
"returnMentions": "true"
}'

In comparison with the previous response, this one includes mentions of the individual tags: their text and a reference to the relevant tokens. The full tokenized structure of the text—split into paragraphs, sentences, and tokens—is automatically added to the response.

Entities, Sentiment, etc.

By default, only tags are returned. Depending on your account plan and configuration, additional outputs may include:

  • Entities
  • Relations
  • Document-level sentiment
{
"version": "3.3.0",
"id": "1234",
"language": {"detected": "en"},
"entities": [
{"id": "e0", "gkbId": "G57305", "stdForm": "trade fair", "type": "general", "feats": {"relevance": "11", "ranking": "11"}},
{"id": "e1", "gkbId": "G183", "stdForm": "Germany", "type": "location", "feats": {"derivedBy": "country", "relevance": "94", "ranking": "94"}},
{"id": "e2", "gkbId": "G1726", "stdForm": "Munich", "type": "location", "feats": {"derivedBy": "city", "relevance": "66", "ranking": "66"}},
{"id": "e3", "gkbId": "G3052772", "stdForm": "Emmanuel Macron", "type": "person", "feats": {"relevance": "96", "ranking": "96"}},
{"id": "e4", "gkbId": "G980", "stdForm": "Bavaria", "type": "location", "feats": {"derivedBy": "region", "derivedOnly": "true", "relevance": "42", "ranking": "42"}},
{"id": "e5", "gkbId": "G10562", "stdForm": "Upper Bavaria", "type": "location", "feats": {"derivedBy": "district", "derivedOnly": "true", "relevance": "41", "ranking": "41"}}
]
"tags": [
{"id": "t1", "gkbId": "G3052772", "stdForm": "Emmanuel Macron", "type": "media", "relevance": 96.0, "feats": {"wikidataId": "Q3052772", "gkbEntityType": "person"}},
{"id": "t2", "gkbId": "G183", "stdForm": "Germany", "type": "media", "relevance": 94.0, "feats": {"wikidataId": "Q183", "gkbEntityType": "location"}},
{"id": "t3", "gkbId": "G1726", "stdForm": "Munich", "type": "media", "relevance": 66.0, "feats": {"wikidataId": "Q1726", "gkbEntityType": "location"}},
{"id": "t4", "gkbId": "IPTC-11000000", "stdForm": "politics", "type": "media-topic", "relevance": 68.51, "feats": {"MediaTopicId": "11000000", "wikidataId": "Q7163", "gkbEntityType": "general"}}
]
"usedChars": 100,
"metadata": {"referenceKey": "241014-164726-9bdaf485"},
}

In addition to tags, we now also receive entities. Here are a few key points to keep in mind:

  • media tags are a subset of the identified entities. In general, their relevance scores match those of the corresponding entities, meaning we can interpret them as the most relevant entities. However, this equivalence is not always guaranteed. Entity relevance is determined solely by the article's content, while tag relevance can also be influenced by other factors, such as editorial preferences or contextual weighing. Additionally, although tags may reflect the top N entities, we can adjust the relevance of specific tag types—or individual tags—based on their context. For example, location tags might carry more weight in a travel article than in a sports article.
  • Some entities are classified as derived. These are inferred from context rather than mentioned explicitly. For example, the state of Bavaria and the region of Upper Bavaria may be included because the text refers to Munich, even though those entities are not directly stated. An entity like Germany may combine both direct and indirect references: it may be explicitly mentioned while also implied through mentions of locations like Munich.
  • Certain metadata is attached to entities and tags in the form of features (e.g., relevance, derivation method). These are expressed as key-value pairs, where both the keys and values are always strings. If a feature represents a different semantic type—such as a number—it must be converted accordingly. For example, a "relevance":"94" feature should be interpreted as the number 94, not a string.

Paragraphs

The API and SDKs allow easy specification of an article's title and body. To include other types of paragraphs—such as the lead paragraph—or multiple text block, use the paraSpecs field. The public API currently recognizes three paragraph types: title, abstract (also referred to as lead) and text (the body of the article).

curl -X POST -H 'X-API-KEY: <YOUR_API_KEY>' -H 'accept: */*' -H 'content-type: application/json' 'https://media-api.geneea.com/v2/nlp/analyze' -d '{
"id": "1234",
"paraSpecs": [
{"type": "title", "text": "Macron in Germany."},
{"type": "abstract", "text": "Emmanuel Macron is visiting Germany again."},
{"type": "text", "text": "Mr. Macron visited a trade show in Munich."}
]
}'

Topic Categories (Sections)

Often, the topic or section of an article is known in advance—for example, when the article appears under a particular section of a website, such as sport or hobby. Providing this information is optional, as the system will always attempt to detect the topic automatically during analysis. However, if the category is known, including it can improve the quality and accuracy of the results.

We support two types of topic categories:

  • Standard IPTC Media Topics
  • Custom categories or sections defined by the publisher (these must be configured on our side to have any effect)

These two types can be used together, as shown in the example below:

curl -X POST -H 'X-API-KEY: <YOUR_API_KEY>' -H 'accept: */*' -H 'content-type: application/json' 'https://media-api.geneea.com/v2/nlp/analyze' -d '{
"id": "1234",
"title": "Emmanuel Macron in Germany.",
"text": "Mr. Macron visited a trade show in Munich.",
"presentationLanguage": "fr",
"categories": [{"taxonomy": "MediaTopic", "code": "11000000"}, {"taxonomy": "Custom", "code": "politics"} ]
}'

Presentation Language

By default, entities and tags are presented in the language of the document—typically English. However, you can request that they be returned in a different language by specifying the presentationLanguage parameter using the appropriate ISO code.

Supported languages include Czech, Dutch, English, French, German, Polish, Portuguese, Slovak, and Spanish.

curl -X POST -H 'X-API-KEY: <YOUR_API_KEY>' -H 'accept: */*' -H 'content-type: application/json' 'https://media-api.geneea.com/v2/nlp/analyze' -d '{
"id": "1234",
"title": "Emmanuel Macron in Germany.",
"text": "Mr. Macron visited a trade show in Munich.",
"presentationLanguage": "fr"
}'

The following is an example response. For an explanation of each field, see the Analysis reference page. Note that we've omitted the relations field for brevity.

{
"version": "3.3.0",
"id": "1234",
"language": {"detected": "en"},
"entities": [
{"id": "e0", "gkbId": "G57305", "stdForm": "salon", "type": "general", "feats": {"relevance": "11", "ranking": "11"}},
{"id": "e1", "gkbId": "G183", "stdForm": "Allemagne", "type": "location", "feats": {"derivedBy": "country", "relevance": "94", "ranking": "94"}},
{"id": "e2", "gkbId": "G1726", "stdForm": "Munich", "type": "location", "feats": {"derivedBy": "city", "relevance": "66", "ranking": "66"}},
{"id": "e3", "gkbId": "G3052772", "stdForm": "Emmanuel Macron", "type": "person", "feats": {"relevance": "96", "ranking": "96"}},
{"id": "e4", "gkbId": "G980", "stdForm": "Bavière", "type": "location", "feats": {"derivedBy": "region", "derivedOnly": "true", "relevance": "42", "ranking": "42"}},
{"id": "e5", "gkbId": "G10562", "stdForm": "Haute-Bavière", "type": "location", "feats": {"derivedBy": "district", "derivedOnly": "true", "relevance": "41", "ranking": "41"}}
]
"tags": [
{"id": "t1", "gkbId": "G3052772", "stdForm": "Emmanuel Macron", "type": "media", "relevance": 96.0, "feats": {"wikidataId": "Q3052772", "gkbEntityType": "person"}},
{"id": "t2", "gkbId": "G183", "stdForm": "Allemagne", "type": "media", "relevance": 94.0, "feats": {"wikidataId": "Q183", "gkbEntityType": "location"}},
{"id": "t3", "gkbId": "G1726", "stdForm": "Munich", "type": "media", "relevance": 66.0, "feats": {"wikidataId": "Q1726", "gkbEntityType": "location"}},
{"id": "t4", "gkbId": "IPTC-11000000", "stdForm": "Politique", "type": "media-topic", "relevance": 68.51, "feats": {"MediaTopicId": "11000000", "wikidataId": "Q7163", "gkbEntityType": "general"}}
]
"usedChars": 100,
"metadata": {"referenceKey": "241014-164726-ab2eaf07"},
}

If you need tags and entities translated into more than one language, see Multiple Presentation Languages.

Knowledge Base Properties

Additional information from the Geneea Knowledge Base can be returned along with tags and entities. The specific set of properties is configurable. In the example below, the description property is returned for each tag or entity.

A GKB property has three types of attributes:

  • name: a language-independent identifier. Multiple properties may share the same name (e.g., several occupation values).
  • label: a human-readable label of the property in the presentation language of the analysis.
  • One of the following value fields:
    • boolValue
    • floatValue
    • intValue
    • strValue Exactly one of these fields will be present for each property.

If a property is not available for a specific tag or entity, it will not be included in the output.

curl -X POST -H 'X-API-KEY: <YOUR_API_KEY>' -H 'accept: */*' -H 'content-type: application/json' 'https://media-api.geneea.com/v2/nlp/analyze' -d '{
"id": "1234",
"title": "Emmanuel Macron in Germany.",
"text": "Mr. Macron visited a trade show in Munich."
}'
{
"version": "3.3.0",
"id": "1234",
"language": { "detected": "en" },
"tags": [
{ "id": "t0", "gkbId": "G3052772", "stdForm": "Emmanuel Macron", "type": "media", "relevance": 22.605,
"feats": { "wikidataId": "Q3052772" },
"gkbProperties": [{"name": "description", "label": "description", "strValue": "President of France and Co-Prince of Andorra since 2017"}]
},
{ "id": "t1", "gkbId": "G183", "stdForm": "Germany", "type": "media", "relevance": 18.365,
"feats": { "wikidataId": "Q183" },
"gkbProperties": [{"name": "description", "label": "description", "strValue": "country in Central Europe"}]
},
{ "id": "t2", "gkbId": "G1726", "stdForm": "Munich", "type": "media", "relevance": 7.57,
"feats": { "wikidataId": "Q1726" },
"gkbProperties": [{"name": "description", "label": "description", "strValue": "capital and most populous city of Bavaria, Germany"}]
}
],
"usedChars": 100,
"metadata": {"referenceKey": "311441-120020-a24f0281"}
}