Semantic Tagging¶

The Media API can perform semantic tagging of articles. Semantic tags are entities, keywords or concepts relevant for the article. We rank and standardize them based on their purpose. For a non-technical overview, see this page and this case study.

Below, we discuss various technical topics related to obtaining semantic tags:

First steps: basic common setup for calling the API required by the rest of this article
Basic tagging: a simple call to the API to obtain semantic tags
Other features: depending on your configuration, entities, sentiment and other information can be returned as well
Paragraphs: handle lead and multiple text paragraphs
Topic categories: improve the analysis by specifying topic or sections of the article
Presentation language: return tags and entities in a particular language
Knowledge base properties: information about tags and entities drawn from the knowledge base as part of the call

For a full description of the API, see the reference guide.

First Steps¶

To use the API, you need a valid API key with appropriate authorizations. Please get in touch with us if you do not have it here.

Note that we do not provide SDKs for the API yet, but our G3 SDKs can be used to perform NLP analysis.

Common Basic Code¶

We will first define some common code (replace <YOUR_API_KEY> with your API key):

No special setup necessary

No special setup necessary

// HTTP client; see https://github.com/axios/axios
const axios = require('axios');

const config = {
    baseURL: 'https://media-api.geneea.com/v2/',
    headers: {
        'X-API-KEY': '<YOUR_API_KEY>'
    }
};

// A simple function to report the returned json objects
const report = (output) => console.dir(output, { depth: null });

// In production environment, the API should always be called from the backend,
// otherwise you run into CORS problems

# http client; see https://docs.python-requests.org/en/latest/
import requests

BASE_URL = 'https://media-api.geneea.com/v2/'
HEADERS = {
    'content-type': 'application/json',
    'X-API-Key': '<YOUR_API_KEY>'
}

# Geneea NLP client SDK; see https://help.geneea.com/sdk/index.html
# The SDK can be used for content analysis (i.e. NLP) part of the Media API
from geneeanlpclient  import g3

BASE_URL = 'https://media-api.geneea.com/v2/'
API_KEY = '<YOUR_API_KEY>'

Tags – Basic analysis¶

To perform a basic analysis of a document to obtain tags (keywords) use the following code:

curl -X POST -H 'X-API-KEY: <YOUR_API_KEY>' -H 'accept: */*' -H 'content-type: application/json' 'http://media-api.geneea.com/v2/nlp/analyze' -d '{
    "id": "1234",
    "title": "Emmanuel Macron in Germany.",
    "text": "Mr. Macron visited a trade show in Munich."
}'

curl -X POST -H "X-API-KEY: <YOUR_API_KEY>" -H "accept: */*" -H "content-type: application/json" "http://media-api.geneea.com/v2/nlp/analyze" -d "{
    \"id\": \"1234\",
    \"title\": \"Emmanuel Macron in Germany.\",
    \"text\": \"Mr. Macron visited a trade show in Munich.\"
}"

const analyze = async (config, input) => {
    const response = await axios.post('nlp/analyze', input, config);
    return response.data;
};

const input = {
    id: '1234',
    title: 'Emmanuel Macron in Germany.',
    text: 'Mr. Macron visited a trade show in Munich.'
}

analyze(config, input).then(report);

def analyze(input):
    return requests.post(f'{BASE_URL}nlp/analyze', json=input, headers=HEADERS).json()

input = {
    'id': '1234',
    'title': 'Emmanuel Macron in Germany.',
    'text': 'Mr. Macron visited a trade show in Munich.'
}

analyze(input)

requestBuilder = g3.Request.Builder()

with g3.Client.create(url=f'{BASE_URL}nlp/analyze') as analyzer:
    analyzer.session.headers.update({'X-API-Key': API_KEY})

    request = requestBuilder.build(id=str('1234'), title='Emmanuel Macron in Germany.', text='Mr. Macron visited a trade show in Munich.')
    result = analyzer.analyze(request)

    print("Entities:")  # there will be no entities, unless they are specified in your plan and configuration
    for e in result.entities:
        print(f'   {e.type}: {e.stdForm} ({e.gkbId}) relevance: {e.feats.get("relevance")}, derivedBy: {e.feats.get("derivedBy", "N/A")}, derivedOnly: {e.feats.get("derivedOnly", "false")}'))

    print("Tags:")
    for t in result.tags:
        print(f'\t{t.type}: {t.stdForm} ({t.gkbId}) relevance: {t.relevance}')

the above code produces a result similar to the following. The result might also contain relations, sentiment etc., depending on your configuration – see below.

{
    "version": "3.3.0",
    "id": "1234",
    "language": {"detected": "en"},
    "tags": [
        {"id": "t1", "gkbId": "G3052772", "stdForm": "Emmanuel Macron", "type": "media", "relevance": 96.0, "feats": {"wikidataId": "Q3052772", "gkbEntityType": "person"}},
        {"id": "t2", "gkbId": "G183", "stdForm": "Germany", "type": "media", "relevance": 94.0, "feats": {"wikidataId": "Q183", "gkbEntityType": "location"}},
        {"id": "t3", "gkbId": "G1726", "stdForm": "Munich", "type": "media", "relevance": 66.0, "feats": {"wikidataId": "Q1726", "gkbEntityType": "location"}},
        {"id": "t4", "gkbId": "IPTC-11000000", "stdForm": "politics", "type": "media-topic", "relevance": 68.51, "feats": {"MediaTopicId": "11000000", "wikidataId": "Q7163", "gkbEntityType": "general"}}
    ]
    "usedChars": 100,
    "metadata": {"referenceKey": "241014-164726-9bdaf485"},
}

{
    "version": "3.3.0",
    "id": "1234",
    "language": {"detected": "en"},
    "tags": [
        {"id": "t1", "gkbId": "G3052772", "stdForm": "Emmanuel Macron", "type": "media", "relevance": 96.0, "feats": {"wikidataId": "Q3052772", "gkbEntityType": "person"}},
        {"id": "t2", "gkbId": "G183", "stdForm": "Germany", "type": "media", "relevance": 94.0, "feats": {"wikidataId": "Q183", "gkbEntityType": "location"}},
        {"id": "t3", "gkbId": "G1726", "stdForm": "Munich", "type": "media", "relevance": 66.0, "feats": {"wikidataId": "Q1726", "gkbEntityType": "location"}},
        {"id": "t4", "gkbId": "IPTC-11000000", "stdForm": "politics", "type": "media-topic", "relevance": 68.51, "feats": {"MediaTopicId": "11000000", "wikidataId": "Q7163", "gkbEntityType": "general"}}
    ]
    "usedChars": 100,
    "metadata": {"referenceKey": "241014-164726-9bdaf485"},
}

{
    version: '3.3.0',
    id: '1234',
    language: {detected: 'en'},
    tags: [
        {id: 't1', gkbId: 'G3052772', stdForm: 'Emmanuel Macron', type: 'media', relevance: 96.0, feats: {wikidataId: 'Q3052772', gkbEntityType: 'person'}},
        {id: 't2', gkbId: 'G183', stdForm: 'Germany', type: 'media', relevance: 94.0, feats: {wikidataId: 'Q183', gkbEntityType: 'location'}},
        {id: 't3', gkbId: 'G1726', stdForm: 'Munich', type: 'media', relevance: 66.0, feats: {wikidataId: 'Q1726', gkbEntityType: 'location'}},
        {id: 't4', gkbId: 'IPTC-11000000', stdForm: 'politics', type: 'media-topic', relevance: 68.51, feats: {MediaTopicId: '11000000', wikidataId: 'Q7163', gkbEntityType: 'general'}}
    ]
    usedChars: 100,
    metadata: {referenceKey: '241014-164726-9bdaf485'},
}

{
    'version': '3.3.0',
    'id': '1234',
    'language': {'detected': 'en'},
    'tags': [
        {'id': 't1', 'gkbId': 'G3052772', 'stdForm': 'Emmanuel Macron', 'type': 'media', 'relevance': 96.0, 'feats': {'wikidataId': 'Q3052772', 'gkbEntityType': 'person'}},
        {'id': 't2', 'gkbId': 'G183', 'stdForm': 'Germany', 'type': 'media', 'relevance': 94.0, 'feats': {'wikidataId': 'Q183', 'gkbEntityType': 'location'}},
        {'id': 't3', 'gkbId': 'G1726', 'stdForm': 'Munich', 'type': 'media', 'relevance': 66.0, 'feats': {'wikidataId': 'Q1726', 'gkbEntityType': 'location'}},
        {'id': 't4', 'gkbId': 'IPTC-11000000', 'stdForm': 'politics', 'type': 'media-topic', 'relevance': 68.51, 'feats': {'MediaTopicId': '11000000', 'wikidataId': 'Q7163', 'gkbEntityType': 'general'}}
    ]
    'usedChars': 100,
    'metadata': {'referenceKey': '241014-164726-9bdaf485'},
}

Entities: # there are no entities here, unless they are enabled in your plan and configuration
Tags:
    media: Emmanuel Macron (G3052772) relevance: 96.0
    media: Germany (G183) relevance: 94.0
    media: Munich (G1726) relevance: 66.0
    media-topic: politics (IPTC-11000000) relevance: 68.51

In this case, we see two types of tags

entity-based tags ("type": "media"): this is a selection of the most important entities, names (e.g., organizations, cities) and keywords (see here).
IPTC media topics ("type": "media-topic"): an industry-standard taxonomy used for categorizing articles by content. The current version consists of over 1200 categories organized into a hierarchy of up to 5 levels. The above result contains politics, other examples are sport, basketball, music, classical music, etc. For more detail, see this article.

Each tag has

A unique identifier (e.g., "gkbId": "G183") that links it to our knowledge base.
A standard name in one of the supported languages (e.g., "stdForm": "Germany"); see Presentation Language below
A relevance score between 0 and 100 (e.g., "relevance": 94.0);, which indicates its importance in relation to both the article and the customer’s needs. This is distinct from entity relevance, which only considers the article itself when determining importance.
Third-party identifiers, such as Wikidata or IPTC media topics
The type of the corresponding knowledge base item (person, organization, location, event, product, general)
An internal identifier (e.g., "id": "t2") used for cross-referencing in more complex configurations.

Entities, sentiment, etc.¶

The exact set of returned features depends on your account plan and configuration. Above, the result contains just tags, which is the most common situation. However, other information can be included as well (entities, relations, document sentiment), for example:

{
    "version": "3.3.0",
    "id": "1234",
    "language": {"detected": "en"},
    "entities": [
        {"id": "e0", "gkbId": "G57305", "stdForm": "trade fair", "type": "general", "feats": {"relevance": "11", "ranking": "11"}},
        {"id": "e1", "gkbId": "G183", "stdForm": "Germany", "type": "location", "feats": {"derivedBy": "country", "relevance": "94", "ranking": "94"}},
        {"id": "e2", "gkbId": "G1726", "stdForm": "Munich", "type": "location", "feats": {"derivedBy": "city", "relevance": "66", "ranking": "66"}},
        {"id": "e3", "gkbId": "G3052772", "stdForm": "Emmanuel Macron", "type": "person", "feats": {"relevance": "96", "ranking": "96"}},
        {"id": "e4", "gkbId": "G980", "stdForm": "Bavaria", "type": "location", "feats": {"derivedBy": "region", "derivedOnly": "true", "relevance": "42", "ranking": "42"}},
        {"id": "e5", "gkbId": "G10562", "stdForm": "Upper Bavaria", "type": "location", "feats": {"derivedBy": "district", "derivedOnly": "true", "relevance": "41", "ranking": "41"}}
    ]
    "tags": [
        {"id": "t1", "gkbId": "G3052772", "stdForm": "Emmanuel Macron", "type": "media", "relevance": 96.0, "feats": {"wikidataId": "Q3052772", "gkbEntityType": "person"}},
        {"id": "t2", "gkbId": "G183", "stdForm": "Germany", "type": "media", "relevance": 94.0, "feats": {"wikidataId": "Q183", "gkbEntityType": "location"}},
        {"id": "t3", "gkbId": "G1726", "stdForm": "Munich", "type": "media", "relevance": 66.0, "feats": {"wikidataId": "Q1726", "gkbEntityType": "location"}},
        {"id": "t4", "gkbId": "IPTC-11000000", "stdForm": "politics", "type": "media-topic", "relevance": 68.51, "feats": {"MediaTopicId": "11000000", "wikidataId": "Q7163", "gkbEntityType": "general"}}
    ]
    "usedChars": 100,
    "metadata": {"referenceKey": "241014-164726-9bdaf485"},
}

{
    "version": "3.3.0",
    "id": "1234",
    "language": {"detected": "en"},
    "entities": [
        {"id": "e0", "gkbId": "G57305", "stdForm": "trade fair", "type": "general", "feats": {"relevance": "11", "ranking": "11"}},
        {"id": "e1", "gkbId": "G183", "stdForm": "Germany", "type": "location", "feats": {"derivedBy": "country", "relevance": "94", "ranking": "94"}},
        {"id": "e2", "gkbId": "G1726", "stdForm": "Munich", "type": "location", "feats": {"derivedBy": "city", "relevance": "66", "ranking": "66"}},
        {"id": "e3", "gkbId": "G3052772", "stdForm": "Emmanuel Macron", "type": "person", "feats": {"relevance": "96", "ranking": "96"}},
        {"id": "e4", "gkbId": "G980", "stdForm": "Bavaria", "type": "location", "feats": {"derivedBy": "region", "derivedOnly": "true", "relevance": "42", "ranking": "42"}},
        {"id": "e5", "gkbId": "G10562", "stdForm": "Upper Bavaria", "type": "location", "feats": {"derivedBy": "district", "derivedOnly": "true", "relevance": "41", "ranking": "41"}}
    ]
    "tags": [
        {"id": "t1", "gkbId": "G3052772", "stdForm": "Emmanuel Macron", "type": "media", "relevance": 96.0, "feats": {"wikidataId": "Q3052772", "gkbEntityType": "person"}},
        {"id": "t2", "gkbId": "G183", "stdForm": "Germany", "type": "media", "relevance": 94.0, "feats": {"wikidataId": "Q183", "gkbEntityType": "location"}},
        {"id": "t3", "gkbId": "G1726", "stdForm": "Munich", "type": "media", "relevance": 66.0, "feats": {"wikidataId": "Q1726", "gkbEntityType": "location"}},
        {"id": "t4", "gkbId": "IPTC-11000000", "stdForm": "politics", "type": "media-topic", "relevance": 68.51, "feats": {"MediaTopicId": "11000000", "wikidataId": "Q7163", "gkbEntityType": "general"}}
    ]
    "usedChars": 100,
    "metadata": {"referenceKey": "241014-164726-9bdaf485"},
}

{
    version: '3.3.0',
    id: '1234',
    language: {detected: 'en'},
    entities: [
        {id: 'e0', gkbId: 'G57305', stdForm: 'trade fair', type: 'general', feats: {relevance: '11', ranking: '11'}},
        {id: 'e1', gkbId: 'G183', stdForm: 'Germany', type: 'location', feats: {derivedBy: 'country', relevance: '94', ranking: '94'}},
        {id: 'e2', gkbId: 'G1726', stdForm: 'Munich', type: 'location', feats: {derivedBy: 'city', relevance: '66', ranking: '66'}},
        {id: 'e3', gkbId: 'G3052772', stdForm: 'Emmanuel Macron', type: 'person', feats: {relevance: '96', ranking: '96'}},
        {id: 'e4', gkbId: 'G980', stdForm: 'Bavaria', type: 'location', feats: {derivedBy: 'region', derivedOnly: 'true', relevance: '42', ranking: '42'}},
        {id: 'e5', gkbId: 'G10562', stdForm: 'Upper Bavaria', type: 'location', feats: {derivedBy: 'district', derivedOnly: 'true', relevance: '41', ranking: '41'}}
    ]
    tags: [
        {id: 't1', gkbId: 'G3052772', stdForm: 'Emmanuel Macron', type: 'media', relevance: 96.0, feats: {wikidataId: 'Q3052772', gkbEntityType: 'person'}},
        {id: 't2', gkbId: 'G183', stdForm: 'Germany', type: 'media', relevance: 94.0, feats: {wikidataId: 'Q183', gkbEntityType: 'location'}},
        {id: 't3', gkbId: 'G1726', stdForm: 'Munich', type: 'media', relevance: 66.0, feats: {wikidataId: 'Q1726', gkbEntityType: 'location'}},
        {id: 't4', gkbId: 'IPTC-11000000', stdForm: 'politics', type: 'media-topic', relevance: 68.51, feats: {MediaTopicId: '11000000', wikidataId: 'Q7163', gkbEntityType: 'general'}}
    ]
    usedChars: 100,
    metadata: {referenceKey: '241014-164726-9bdaf485'},
}

{
    'version': '3.3.0',
    'id': '1234',
    'language': {'detected': 'en'},
    'entities': [
        {'id': 'e0', 'gkbId': 'G57305', 'stdForm': 'trade fair', 'type': 'general', 'feats': {'relevance': '11', 'ranking': '11'}},
        {'id': 'e1', 'gkbId': 'G183', 'stdForm': 'Germany', 'type': 'location', 'feats': {'derivedBy': 'country', 'relevance': '94', 'ranking': '94'}},
        {'id': 'e2', 'gkbId': 'G1726', 'stdForm': 'Munich', 'type': 'location', 'feats': {'derivedBy': 'city', 'relevance': '66', 'ranking': '66'}},
        {'id': 'e3', 'gkbId': 'G3052772', 'stdForm': 'Emmanuel Macron', 'type': 'person', 'feats': {'relevance': '96', 'ranking': '96'}},
        {'id': 'e4', 'gkbId': 'G980', 'stdForm': 'Bavaria', 'type': 'location', 'feats': {'derivedBy': 'region', 'derivedOnly': 'true', 'relevance': '42', 'ranking': '42'}},
        {'id': 'e5', 'gkbId': 'G10562', 'stdForm': 'Upper Bavaria', 'type': 'location', 'feats': {'derivedBy': 'district', 'derivedOnly': 'true', 'relevance': '41', 'ranking': '41'}}
    ]
    'tags': [
        {'id': 't1', 'gkbId': 'G3052772', 'stdForm': 'Emmanuel Macron', 'type': 'media', 'relevance': 96.0, 'feats': {'wikidataId': 'Q3052772', 'gkbEntityType': 'person'}},
        {'id': 't2', 'gkbId': 'G183', 'stdForm': 'Germany', 'type': 'media', 'relevance': 94.0, 'feats': {'wikidataId': 'Q183', 'gkbEntityType': 'location'}},
        {'id': 't3', 'gkbId': 'G1726', 'stdForm': 'Munich', 'type': 'media', 'relevance': 66.0, 'feats': {'wikidataId': 'Q1726', 'gkbEntityType': 'location'}},
        {'id': 't4', 'gkbId': 'IPTC-11000000', 'stdForm': 'politics', 'type': 'media-topic', 'relevance': 68.51, 'feats': {'MediaTopicId': '11000000', 'wikidataId': 'Q7163', 'gkbEntityType': 'general'}}
    ]
    'usedChars': 100,
    'metadata': {'referenceKey': '241014-164726-9bdaf485'},
}

Entities: # there are no entities here, unless they are enabled in your plan and configuration
    general: trade fair (G57305) relevance: 11
    location: Germany (G183) relevance: 94
    location: Munich (G1726) relevance: 66
    person: Emmanuel Macron (G3052772) relevance: 96
    location: Bavaria (G980) relevance: 42
    location: Upper Bavaria (G10562) relevance: 41
Tags:
    media: Emmanuel Macron (G3052772) relevance: 96.0
    media: Germany (G183) relevance: 94.0
    media: Munich (G1726) relevance: 66.0
    media-topic: politics (IPTC-11000000) relevance: 68.51

In addition to tags, we now also receive entities. Here are some important points to note:

The media tags are a subset of entities. In this setup, the relevance of tags is equal to the relevance of the corresponding entities, meaning we can view these tags as the most relevant entities. However, this is not always the case. Entity relevance is determined solely by the content of the article, while tag relevance may be influenced by other factors. Additionally, while we can view tags as the top N entities, we can adjust the relevance of specific types of tags or even individual tags based on their context. For instance, locations may be more relevant in a travel section than in a sports section.
Some entities are classified as derived entities. For example, the state of Bavaria and the region of Upper Bavaria are not explicitly mentioned, but they are included because the text references Munich. The entity of Germany combines both explicit and implicit mentions, as it is directly stated but also referenced indirectly through Munich.
Certain information is encoded as features (e.g., relevance, nature of derived entities). These features are represented as key-value pairs, where both the keys and values are always strings. If a feature has a different semantic type (e.g., relevance is a number), it must be converted.

Paragraphs¶

The API and the SDKs support easy specification of the title and body of an article. To specify other types of paragraphs (e.g., the lead paragraph) or multiple text paragraphs, it is necessary to use the paraSpecs field. Currently, the standard public API distinguishes title, abstract (lead) and text (body) paragraph types.

curl -X POST -H 'X-API-KEY: <YOUR_API_KEY>' -H 'accept: */*' -H 'content-type: application/json' 'http://media-api.geneea.com/v2/nlp/analyze' -d '{
    "id": "1234",
    "paraSpecs": [
        {"type": "title", "text": "Macron in Germany."},
        {"type": "abstract", "text": "Emmanuel Macron is visiting Germany again."},
        {"type": "text", "text": "Mr. Macron visited a trade show in Munich."}
    ]
}'

curl -X POST -H "X-API-KEY: <YOUR_API_KEY>" -H "accept: */*" -H "content-type: application/json" "http://media-api.geneea.com/v2/nlp/analyze" -d "{
    \"id\": \"1234\",
    \"paraSpecs\": [
        {\"type\": \"title\", \"text\": \"Macron in Germany.\"},
        {\"type\": \"abstract\", \"text\": \"Emmanuel Macron is visiting Germany again.\"},
        {\"type\": \"text\", \"text\": \"Mr. Macron visited a trade show in Munich.\"}
    ]
}"

const input = {
    id: '1234',
    paraSpecs: [
        {type: 'title', text: 'Macron in Germany.'},
        {type: 'abstract', text: 'Emmanuel Macron is visiting Germany again.'},
        {type: 'text', text: 'Mr. Macron visited a trade show in Munich.'}
    ]
}

// see the definition of analyze above
analyze(config, input).then(report);

input = {
    'id': '1234',
    'paraSpecs': [
        {'type': 'title', 'text': 'Macron in Germany.'},
        {'type': 'abstract', 'text': 'Emmanuel Macron is visiting Germany again.'},
        {'type': 'text', 'text': 'Mr. Macron visited a trade show in Munich.'}
    ]
}

# see the definition of analyze above
analyze(input)

requestBuilder = g3.Request.Builder()

with g3.Client.create(url=f'{BASE_URL}nlp/analyze') as analyzer:
    analyzer.session.headers.update({'X-API-Key': API_KEY})

    request = requestBuilder.build(
        id='1234',
        paraSpecs=[
            g3.ParaSpec.title('Macron in Germany.'),
            g3.ParaSpec.lead('Emmanuel Macron is visiting Germany again.'),     # g3.ParaSpec.abstract is equivalent
            g3.ParaSpec.body('Mr. Macron visited a trade show in Munich.')
        ]
    )
    result = analyzer.analyze(request)

Topic categories (sections)¶

Often, the topic of the article is known before the analysis. For example, the article is published within a certain section of the website (e.g. sport, hobby). Providing this information is optional, because an automatic detection of article topic is always run as part of the analysis. However, when available, it further improves the quality of the results. We support two types of topic categories:

standard IPTC media topics, and
custom categories/sections of the publisher. The custom categories have to be configured on our side to have any effect.

These two types can be even combined, as you can see in the example below:

curl -X POST -H 'X-API-KEY: <YOUR_API_KEY>' -H 'accept: */*' -H 'content-type: application/json' 'http://media-api.geneea.com/v2/nlp/analyze' -d '{
    "id": "1234",
    "title": "Emmanuel Macron in Germany.",
    "text": "Mr. Macron visited a trade show in Munich.",
    "presentationLanguage": "fr",
    "categories": [{"taxonomy": "MediaTopic", "code": "11000000"}, {"taxonomy": "Custom", "code": "politics"} ]
}'

curl -X POST -H "X-API-KEY: <YOUR_API_KEY>" -H "accept: */*" -H "content-type: application/json" "http://media-api.geneea.com/v2/nlp/analyze" -d "{
    \"id\": \"1234\",
    \"title\": \"Emmanuel Macron in Germany.\",
    \"text\": \"Mr. Macron visited a trade show in Munich.\",
    \"presentationLanguage\": \"fr\",
    \"categories\": [{\"taxonomy\": \"MediaTopic\", \"code\": \"11000000\"}, {\"taxonomy\": \"Custom\", \"code\": \"politics\"} ]
}"

const categories = [
    {taxonomy: 'MediaTopic', code: '11000000'}, // IPTC category
    {taxonomy: 'Custom', code: 'politics'} // custom category
]

const input = {
    id: '1234',
    title: 'Emmanuel Macron in Germany.',
    text: 'Mr. Macron visited a trade show in Munich.',
    categories: categories
}

// see the definition of analyze above
analyze(config, input).then(report);

categories = [
   {'taxonomy': 'MediaTopic', 'code': '11000000'}, # IPTC category
   {'taxonomy': 'Custom', 'code': 'politics'}, # custom category
]

input = {
    'id': '1234',
    'title': 'Emmanuel Macron in Germany.',
    'text': 'Mr. Macron visited a trade show in Munich.',
    'categories': categories,
}

# see the definition of analyze above
analyze(input)

requestBuilder = g3.Request.Builder()

with g3.Client.create(url=f'{BASE_URL}nlp/analyze') as analyzer:
    analyzer.session.headers.update({'X-API-Key': API_KEY})

    categories = [
        {'taxonomy': 'MediaTopic', 'code': '11000000'}, # IPTC category
        {'taxonomy': 'Custom', 'code': 'politics'}, # custom category
    ]
    request = requestBuilder.build(
        id='1234',
        title='Emmanuel Macron in Germany.',
        text='Mr. Macron visited a trade show in Munich.'
    )
    request.setCustomConfig(categories=categories)

    result = analyzer.analyze(request)

Presentation Language¶

Above, the entities and tags were reported in the language of the document, i.e. English. However, we can request them in other languages as well (currently, Czech, Dutch, English, French, German, Polish, Portuguese, Slovak, and Spanish are supported) using the parameter presentationLanguage with the ISO code of the desired language:

curl -X POST -H 'X-API-KEY: <YOUR_API_KEY>' -H 'accept: */*' -H 'content-type: application/json' 'http://media-api.geneea.com/v2/nlp/analyze' -d '{
    "id": "1234",
    "title": "Emmanuel Macron in Germany.",
    "text": "Mr. Macron visited a trade show in Munich.",
    "presentationLanguage": "fr"
}'

curl -X POST -H "X-API-KEY: <YOUR_API_KEY>" -H "accept: */*" -H "content-type: application/json" "http://media-api.geneea.com/v2/nlp/analyze" -d "{
    \"id\": \"1234\",
    \"title\": \"Emmanuel Macron in Germany.\",
    \"text\": \"Mr. Macron visited a trade show in Munich.\",
    \"presentationLanguage\": \"fr\"
}"

const analyze = async (config, input) => {
    const response = await axios.post('nlp/analyze', input, config);
    return response.data;
};

const input = {
    id: '1234',
    title: 'Emmanuel Macron in Germany.',
    text: 'Mr. Macron visited a trade show in Munich.'
    presentationLanguage: 'fr'
}

analyze(config, input).then(report);

def analyze(input):
    return requests.post(f'{BASE_URL}nlp/analyze', json=input, headers=HEADERS).json()

input = {
    'id': '1234',
    'title': 'Emmanuel Macron in Germany.',
    'text': 'Mr. Macron visited a trade show in Munich.',
    'presentationLanguage': 'fr'
}

analyze(input)

requestBuilder = g3.Request.Builder(customConfig={'presentationLanguage': 'fr'})

with g3.Client.create(url=f'{BASE_URL}nlp/analyze") as analyzer:
    analyzer.session.headers.update({'X-API-Key': API_KEY})

    request = requestBuilder.build(id=str('1234'), title='Emmanuel Macron in Germany.', text='Mr. Macron visited a trade show in Munich.')
    result = analyzer.analyze(request)

    print("Entities:")
    for e in result.entities:
        print(f'   {e.type}: {e.stdForm} ({e.gkbId}) relevance: {e.feats.get("relevance")}, derivedBy: {e.feats.get("derivedBy", "N/A")}, derivedOnly: {e.feats.get("derivedOnly", "false")}'))

    print("Tags:")
    for t in result.tags:
        print(f'   {t.type}: {t.stdForm} ({t.gkbId}) relevance: {t.relevance}')

produces the following result (see the Analysis reference page for explanation, Note that we have omitted the relations field for simplicity).

{
    "version": "3.3.0",
    "id": "1234",
    "language": {"detected": "en"},
    "entities": [
        {"id": "e0", "gkbId": "G57305", "stdForm": "salon", "type": "general", "feats": {"relevance": "11", "ranking": "11"}},
        {"id": "e1", "gkbId": "G183", "stdForm": "Allemagne", "type": "location", "feats": {"derivedBy": "country", "relevance": "94", "ranking": "94"}},
        {"id": "e2", "gkbId": "G1726", "stdForm": "Munich", "type": "location", "feats": {"derivedBy": "city", "relevance": "66", "ranking": "66"}},
        {"id": "e3", "gkbId": "G3052772", "stdForm": "Emmanuel Macron", "type": "person", "feats": {"relevance": "96", "ranking": "96"}},
        {"id": "e4", "gkbId": "G980", "stdForm": "Bavière", "type": "location", "feats": {"derivedBy": "region", "derivedOnly": "true", "relevance": "42", "ranking": "42"}},
        {"id": "e5", "gkbId": "G10562", "stdForm": "Haute-Bavière", "type": "location", "feats": {"derivedBy": "district", "derivedOnly": "true", "relevance": "41", "ranking": "41"}}
    ]
    "tags": [
        {"id": "t1", "gkbId": "G3052772", "stdForm": "Emmanuel Macron", "type": "media", "relevance": 96.0, "feats": {"wikidataId": "Q3052772", "gkbEntityType": "person"}},
        {"id": "t2", "gkbId": "G183", "stdForm": "Allemagne", "type": "media", "relevance": 94.0, "feats": {"wikidataId": "Q183", "gkbEntityType": "location"}},
        {"id": "t3", "gkbId": "G1726", "stdForm": "Munich", "type": "media", "relevance": 66.0, "feats": {"wikidataId": "Q1726", "gkbEntityType": "location"}},
        {"id": "t4", "gkbId": "IPTC-11000000", "stdForm": "Politique", "type": "media-topic", "relevance": 68.51, "feats": {"MediaTopicId": "11000000", "wikidataId": "Q7163", "gkbEntityType": "general"}}
    ]
    "usedChars": 100,
    "metadata": {"referenceKey": "241014-164726-ab2eaf07"},
}

{
    "version": "3.3.0",
    "id": "1234",
    "language": {"detected": "en"},
    "entities": [
        {"id": "e0", "gkbId": "G57305", "stdForm": "salon", "type": "general", "feats": {"relevance": "11", "ranking": "11"}},
        {"id": "e1", "gkbId": "G183", "stdForm": "Allemagne", "type": "location", "feats": {"derivedBy": "country", "relevance": "94", "ranking": "94"}},
        {"id": "e2", "gkbId": "G1726", "stdForm": "Munich", "type": "location", "feats": {"derivedBy": "city", "relevance": "66", "ranking": "66"}},
        {"id": "e3", "gkbId": "G3052772", "stdForm": "Emmanuel Macron", "type": "person", "feats": {"relevance": "96", "ranking": "96"}},
        {"id": "e4", "gkbId": "G980", "stdForm": "Bavière", "type": "location", "feats": {"derivedBy": "region", "derivedOnly": "true", "relevance": "42", "ranking": "42"}},
        {"id": "e5", "gkbId": "G10562", "stdForm": "Haute-Bavière", "type": "location", "feats": {"derivedBy": "district", "derivedOnly": "true", "relevance": "41", "ranking": "41"}}
    ]
    "tags": [
        {"id": "t1", "gkbId": "G3052772", "stdForm": "Emmanuel Macron", "type": "media", "relevance": 96.0, "feats": {"wikidataId": "Q3052772", "gkbEntityType": "person"}},
        {"id": "t2", "gkbId": "G183", "stdForm": "Allemagne", "type": "media", "relevance": 94.0, "feats": {"wikidataId": "Q183", "gkbEntityType": "location"}},
        {"id": "t3", "gkbId": "G1726", "stdForm": "Munich", "type": "media", "relevance": 66.0, "feats": {"wikidataId": "Q1726", "gkbEntityType": "location"}},
        {"id": "t4", "gkbId": "IPTC-11000000", "stdForm": "Politique", "type": "media-topic", "relevance": 68.51, "feats": {"MediaTopicId": "11000000", "wikidataId": "Q7163", "gkbEntityType": "general"}}
    ]
    "usedChars": 100,
    "metadata": {"referenceKey": "241014-164726-ab2eaf07"},
}

{
    version: '3.3.0',
    id: '1234',
    language: {detected: 'en'},
    entities: [
        {id: 'e0', gkbId: 'G57305', stdForm: 'salon', type: 'general', feats: {relevance: '11', ranking: '11'}},
        {id: 'e1', gkbId: 'G183', stdForm: 'Allemagne', type: 'location', feats: {derivedBy: 'country', relevance: '94', ranking: '94'}},
        {id: 'e2', gkbId: 'G1726', stdForm: 'Munich', type: 'location', feats: {derivedBy: 'city', relevance: '66', ranking: '66'}},
        {id: 'e3', gkbId: 'G3052772', stdForm: 'Emmanuel Macron', type: 'person', feats: {relevance: '96', ranking: '96'}},
        {id: 'e4', gkbId: 'G980', stdForm: 'Bavière', type: 'location', feats: {derivedBy: 'region', derivedOnly: 'true', relevance: '42', ranking: '42'}},
        {id: 'e5', gkbId: 'G10562', stdForm: 'Haute-Bavière', type: 'location', feats: {derivedBy: 'district', derivedOnly: 'true', relevance: '41', ranking: '41'}}
    ]
    tags: [
        {id: 't1', gkbId: 'G3052772', stdForm: 'Emmanuel Macron', type: 'media', relevance: 96.0, feats: {wikidataId: 'Q3052772', gkbEntityType: 'person'}},
        {id: 't2', gkbId: 'G183', stdForm: 'Allemagne', type: 'media', relevance: 94.0, feats: {wikidataId: 'Q183', gkbEntityType: 'location'}},
        {id: 't3', gkbId: 'G1726', stdForm: 'Munich', type: 'media', relevance: 66.0, feats: {wikidataId: 'Q1726', gkbEntityType: 'location'}},
        {id: 't4', gkbId: 'IPTC-11000000', stdForm: 'Politique', type: 'media-topic', relevance: 68.51, feats: {MediaTopicId: '11000000', wikidataId: 'Q7163', gkbEntityType: 'general'}}
    ]
    usedChars: 100,
    metadata: {referenceKey: '241014-164726-ab2eaf07'},
}

{
    'version': '3.3.0',
    'id': '1234',
    'language': {'detected': 'en'},
    'entities': [
        {'id': 'e0', 'gkbId': 'G57305', 'stdForm': 'salon', 'type': 'general', 'feats': {'relevance': '11', 'ranking': '11'}},
        {'id': 'e1', 'gkbId': 'G183', 'stdForm': 'Allemagne', 'type': 'location', 'feats': {'derivedBy': 'country', 'relevance': '94', 'ranking': '94'}},
        {'id': 'e2', 'gkbId': 'G1726', 'stdForm': 'Munich', 'type': 'location', 'feats': {'derivedBy': 'city', 'relevance': '66', 'ranking': '66'}},
        {'id': 'e3', 'gkbId': 'G3052772', 'stdForm': 'Emmanuel Macron', 'type': 'person', 'feats': {'relevance': '96', 'ranking': '96'}},
        {'id': 'e4', 'gkbId': 'G980', 'stdForm': 'Bavière', 'type': 'location', 'feats': {'derivedBy': 'region', 'derivedOnly': 'true', 'relevance': '42', 'ranking': '42'}},
        {'id': 'e5', 'gkbId': 'G10562', 'stdForm': 'Haute-Bavière', 'type': 'location', 'feats': {'derivedBy': 'district', 'derivedOnly': 'true', 'relevance': '41', 'ranking': '41'}}
    ]
    'tags': [
        {'id': 't1', 'gkbId': 'G3052772', 'stdForm': 'Emmanuel Macron', 'type': 'media', 'relevance': 96.0, 'feats': {'wikidataId': 'Q3052772', 'gkbEntityType': 'person'}},
        {'id': 't2', 'gkbId': 'G183', 'stdForm': 'Allemagne', 'type': 'media', 'relevance': 94.0, 'feats': {'wikidataId': 'Q183', 'gkbEntityType': 'location'}},
        {'id': 't3', 'gkbId': 'G1726', 'stdForm': 'Munich', 'type': 'media', 'relevance': 66.0, 'feats': {'wikidataId': 'Q1726', 'gkbEntityType': 'location'}},
        {'id': 't4', 'gkbId': 'IPTC-11000000', 'stdForm': 'Politique', 'type': 'media-topic', 'relevance': 68.51, 'feats': {'MediaTopicId': '11000000', 'wikidataId': 'Q7163', 'gkbEntityType': 'general'}}
    ]
    'usedChars': 100,
    'metadata': {'referenceKey': '241014-164726-ab2eaf07'},
}

Entities: # there are no entities here, unless they are enabled in your plan and configuration
    general: salon (G57305) relevance: 11
    location: Allemagne (G183) relevance: 94
    location: Munich (G1726) relevance: 66
    person: Emmanuel Macron (G3052772) relevance: 96
    location: Bavière (G980) relevance: 42
    location: Haute-Bavière (G10562) relevance: 41
Tags:
    media: Emmanuel Macron (G3052772) relevance: 96.0
    media: Allemagne (G183) relevance: 94.0
    media: Munich (G1726) relevance: 66.0
    media-topic: Politique (IPTC-11000000) relevance: 68.51

If you need the entities and tags translated in multiple languages, see Multiple Presentation Languages.

Knowledge base properties¶

Knowledge base properties can be returned along with tags and entities. The exact set of features is configurable, the example below returns the description for each tag/entity.

A GKB property has three types of attributes:

name: a language-independent identifier. There might be multiple properties with the same name (e.g., multiple occupations).
label: a human-readable label of the property in the presentation language of the analysis
boolValue/floatValue/intValue/strValue: the value of the property. Exactly one of these attributes is non-empty.

If a given property does not exist for a particular tag or entity, it is not returned at all.

curl -X POST -H 'X-API-KEY: <YOUR_API_KEY>' -H 'accept: */*' -H 'content-type: application/json' 'http://media-api.geneea.com/v2/nlp/analyze' -d '{
    "id": "1234",
    "title": "Emmanuel Macron in Germany.",
    "text": "Mr. Macron visited a trade show in Munich."
}'

curl -X POST -H "X-API-KEY: <YOUR_API_KEY>" -H "accept: */*" -H "content-type: application/json" "http://media-api.geneea.com/v2/nlp/analyze" -d "{
    \"id\": \"1234\",
    \"title\": \"Emmanuel Macron in Germany.\",
    \"text\": \"Mr. Macron visited a trade show in Munich.\"
}"

const analyze = async (config, input) => {
    const response = await axios.post('nlp/analyze', input, config);
    return response.data;
};

const input = {
    id: '1234',
    title: 'Emmanuel Macron in Germany.',
    text: 'Mr. Macron visited a trade show in Munich.'
}

analyze(config, input).then(report);

def analyze(input):
    return requests.post(f'{BASE_URL}nlp/analyze', json=input, headers=HEADERS).json()

input = {
    'id': '1234',
    'title': 'Emmanuel Macron in Germany.',
    'text': 'Mr. Macron visited a trade show in Munich.'
}

analyze(input)

requestBuilder = g3.Request.Builder()

def value(prop: g3.GkbProperty) -> Union[bool, float, int, str]:
    for v in [prop.boolValue, prop.floatValue, prop.intValue, prop.strValue]:
        if v is not None:
            return v

with g3.Client.create(url=f'{BASE_URL}nlp/analyze') as analyzer:
    analyzer.session.headers.update({'X-API-Key': API_KEY})

    request = requestBuilder.build(id=str('1234'), title='Emmanuel Macron in Germany.', text='Mr. Macron visited a trade show in Munich.')
    # request.custom['returnGkbProperties'] = False   # optionally disable the feature
    result = analyzer.analyze(request)

    print("Tags:")
    for t in result.tags:
        print(f'\t{t.type}: {t.stdForm} ({t.gkbId}) relevance: {t.relevance}')
        for prop in t.gkbProperties:
            print(f'\t\t{prop.name} - {prop.label}: {value(prop)}')

{
  "version": "3.3.0",
  "id": "1234",
  "language": { "detected": "en" },
  "tags": [
    { "id": "t0", "gkbId": "G3052772", "stdForm": "Emmanuel Macron", "type": "media", "relevance": 22.605,
        "feats": { "wikidataId": "Q3052772" },
        "gkbProperties": [{"name": "description", "label":  "description", "strValue": "President of France and Co-Prince of Andorra since 2017"}]
    },
    { "id": "t1", "gkbId": "G183", "stdForm": "Germany", "type": "media", "relevance": 18.365,
        "feats": { "wikidataId": "Q183" },
        "gkbProperties": [{"name": "description", "label":  "description", "strValue": "country in Central Europe"}]
    },
    { "id": "t2", "gkbId": "G1726", "stdForm": "Munich", "type": "media", "relevance": 7.57,
        "feats": { "wikidataId": "Q1726" },
        "gkbProperties": [{"name": "description", "label":  "description", "strValue": "capital and most populous city of Bavaria, Germany"}]
    }
  ],
  "usedChars": 100,
  "metadata": {"referenceKey": "311441-120020-a24f0281"}
}

{
  "version": "3.3.0",
  "id": "1234",
  "language": { "detected": "en" },
  "tags": [
    { "id": "t0", "gkbId": "G3052772", "stdForm": "Emmanuel Macron", "type": "media", "relevance": 22.605,
        "feats": { "wikidataId": "Q3052772" },
        "gkbProperties": [{"name": "description", "label":  "description", "strValue": "President of France and Co-Prince of Andorra since 2017"}]
    },
    { "id": "t1", "gkbId": "G183", "stdForm": "Germany", "type": "media", "relevance": 18.365,
        "feats": { "wikidataId": "Q183" },
        "gkbProperties": [{"name": "description", "label":  "description", "strValue": "country in Central Europe"}]
    },
    { "id": "t2", "gkbId": "G1726", "stdForm": "Munich", "type": "media", "relevance": 7.57,
        "feats": { "wikidataId": "Q1726" },
        "gkbProperties": [{"name": "description", "label":  "description", "strValue": "capital and most populous city of Bavaria, Germany"}]
    }
  ],
  "usedChars": 100,
  "metadata": {"referenceKey": "311441-120020-a24f0281"}
}

{
  version: '3.3.0',
  id: '1234',
  language: { detected: 'en' },
  tags: [
    { id: 't0', gkbId: 'G3052772', stdForm: 'Emmanuel Macron', type: 'media', relevance: 22.605,
        feats: { wikidataId: 'Q3052772' },
        gkbProperties: [{name: 'description', label:  'description', strValue: 'President of France and Co-Prince of Andorra since 2017'}]
    },
    { id: 't1', gkbId: 'G183', stdForm: 'Germany', type: 'media', relevance: 18.365,
        feats: { wikidataId: 'Q183' },
        gkbProperties: [{name: 'description', label:  'description', strValue: 'country in Central Europe'}]
    },
    { id: 't2', gkbId: 'G1726', stdForm: 'Munich', type: 'media', relevance: 7.57,
        feats: { wikidataId: 'Q1726' },
        gkbProperties: [{name: 'description', label:  'description', strValue: 'capital and most populous city of Bavaria, Germany'}]
    }
  ],
  usedChars: 100,
  metadata: {referenceKey: '311441-120020-a24f0281'}
}

{
  'version': '3.3.0',
  'id': '1234',
  'language': { 'detected': 'en' },
  'tags': [
    { 'id': 't0', 'gkbId': 'G3052772', 'stdForm': 'Emmanuel Macron', 'type': 'media', 'relevance': 22.605,
        'feats': { 'wikidataId': 'Q3052772' },
        'gkbProperties': [{'name': 'description', 'label':  'description', 'strValue': 'President of France and Co-Prince of Andorra since 2017'}]
    },
    { 'id': 't1', 'gkbId': 'G183', 'stdForm': 'Germany', 'type': 'media', 'relevance': 18.365,
        'feats': { 'wikidataId': 'Q183' },
        'gkbProperties': [{'name': 'description', 'label':  'description', 'strValue': 'country in Central Europe'}]
    },
    { 'id': 't2', 'gkbId': 'G1726', 'stdForm': 'Munich', 'type': 'media', 'relevance': 7.57,
        'feats': { 'wikidataId': 'Q1726' },
        'gkbProperties': [{'name': 'description', 'label':  'description', 'strValue': 'capital and most populous city of Bavaria, Germany'}]
    }
  ],
  'usedChars': 100,
  'metadata': {'referenceKey': '311441-120020-a24f0281'}
}

Tags:
    media: Emmanuel Macron (G3052772) relevance: 22.605
        description - description: President of France and Co-Prince of Andorra since 2017
    media: Germany (G183) relevance: 18.365
        description - description: country in Central Europe
    media: Munich (G1726) relevance: 7.57
        description - description: capital and most populous city of Bavaria, Germany