Entities
Entities are meaningful expressions—either named (e.g., organizations, cities) or unnamed (e.g., dates). The exact set of supported entities depends on the selected domain.
Each entity includes:
- Name or standard form – A disambiguated and standardized version of the entity.
For example, we return USAfor both USA and United States. We also handle morphology, e.g., returningNěmeckowhen the text contains the form Německu. Media API V2 can return the standard form in a specified language (Germany,Deutschland,Německo, etc.)
- ID – A unique identifier for the entity in a knowledge base (available in selected domains).
- Link to Geneea Knowledge Base – If supported by the domain.
- Type – A string indicating the entity type (e.g., person, date). See the list below.
- Instances or mentions — The actual mentions of the entity in the document.
See the Entity object reference for more detail.
Entity Types
The standard media domains support the following entity types:
- 
Basic: - person– John Doe
- organization– UNESCO, IBM
- location– London, France
- product– Skoda Octavia, iPhone 13
- event– Brexit, World War II
- general– electric vehicle, trade war
 
- 
Internet: - url– geneea.com
- email– info@geneea.com
- hashtag– #hashtag
- mention– @mention
 
- 
Date and Time: These can be resolved relative to a specific point in time (see referenceDatein the Request). Standard forms follow the TIMEX3 format.- date– September 3 (- XXXX-09-03when unresolved), next Monday, summer of 2015 (- 2015-SU)
- time– 12:03 (- YYYY-MM-DDT12:03), tonight (- YYYY-MM-DDTNI)
- duration– 3 years and 4 days (- P3Y4D), 5 minutes (- PT5M). Format:- P(n)Y(n)M(n)DT(n)H(n)M(n)S
- set– Recurring time expressions – every Monday (- XXXX-WXX-1), semiannual (- P6M)
 
- 
Numbers: - number– 3; five (words supported only in English)
- ordinal– third (only for English)
- money– $40
- percent– 5%
 
The standard VoC domains support selected named entities, general entities, industry-specific entities (e.g., food items in restaurant reviews), as well as Internet, date, and numeric entities.
Custom domains can also support many additional types, such as colors, modes of transport, food items, economic terms, legal references, product numbers, and more.
We use a combination of machine learning models, rules, and lexicons—and all of this is fully customizable.
Sample Call
You can obtain entities using the following call:
- cURL
- Python SDK
- plain Python
curl -X POST https://api.geneea.com/v3/analysis \
-H 'Authorization: user_key <YOUR USER KEY>' \
-H 'Content-Type: application/json' \
-d '{
    "id": "1",
    "text": "The trip to London last summer was great. I also liked Cambridge a lot. ",
    "referenceDate": "2016-02-01",
    "analyses": ["entities"]
}'
## On Windows, use \" instead of " and " instead of '
from geneeanlpclient import g3
requestBuilder = g3.Request.Builder(analyses=[g3.AnalysisType.ENTITIES])
with g3.Client.create(userKey=<YOUR USER KEY>) as analyzer:
    result = analyzer.analyze(requestBuilder.build(
        id=str(1),
        referenceDate='2016-02-01',
        text='The trip to London last summer was great. I also liked Cambridge a lot.'
    ))
    for e in result.entities:
        print(f'{e.stdForm}: {e.type}')
import requests
def callGeneea(input):
    url = 'https://api.geneea.com/v3/analysis'
    headers = {
        'content-type': 'application/json',
        'Authorization': 'user_key <YOUR USER KEY>'
    }
    return requests.post(url, json=input, headers=headers).json()
responseObj = callGeneea({
    'id': '1',
    'text': 'The trip to London last summer was great. I also liked Cambridge a lot. ',
    'referenceDate': '2016-02-01',
    'analyses': ['entities']
})
print(responseObj)
Expected response:
- cURL
- Python SDK
- plain Python
{
    "id": "1",
    "language": {"detected": "en"},
    "entities": [
        {"id": "E0", "stdForm": "2015-SU", "type": "date"},
        {"id": "E1", "stdForm": "London", "type": "location"},
        {"id": "E2", "stdForm": "Cambridge", "type": "location"}
    ],
    "usedChars": 100
}
2015-SU: date
London: location
Cambridge: location
{
    "id": "1",
    "language": {"detected": "en"},
    "entities": [
        {"id": "E0", "stdForm": "2015-SU", "type": "date"},
        {"id": "E1", "stdForm": "London", "type": "location"},
        {"id": "E2", "stdForm": "Cambridge", "type": "location"}
    ],
    "usedChars": 100
}
Mentions and Highlighting
To retrieve entity mentions, include "returnMentions": "true" in your request.
Mentions include the actual text as it appears and reference the relevant tokens (useful for highlighting).
- cURL
- Python SDK
- plain Python
curl -X POST https://api.geneea.com/v3/analysis \
-H 'Authorization: user_key <YOUR USER KEY>' \
-H 'Content-Type: application/json' \
-d '{
    "id": "1",
    "text": "The trip to London last summer was great. I also liked Cambridge a lot. ",
    "referenceDate": "2016-02-01",
    "analyses": ["entities"],
    "returnMentions": "true"
}'
## On Windows, use \" instead of " and " instead of '
from geneeanlpclient import g3
requestBuilder = g3.Request.Builder(analyses=[g3.AnalysisType.ENTITIES], returnMentions=True)
with g3.Client.create() as analyzer:
    result = analyzer.analyze(requestBuilder.build(
            id=str(1),
            referenceDate='2016-02-01',
            text='The trip to London last summer was great. I also liked Cambridge a lot.'
    ))
    for e in result.entities:
        print(f'{e.stdForm}: {e.type}')
        for m in e.mentions:
            ## charSpan can be used for highlighting in the original text
            print(f'\t{m.text}; {m.mwl}; {m.tokens.charSpan}')
def callGeneea(input):
    url = 'https://api.geneea.com/v3/analysis'
    headers = {
        'content-type': 'application/json',
        'Authorization': 'user_key <your user key>'
    }
    return requests.post(url, json=input, headers=headers).json()
responseObj = callGeneea({
    'id': '1',
    'text': 'The trip to London last summer was great. I also liked Cambridge a lot. ',
    'referenceDate': '2016-02-01',
    'analyses': ['entities'],
    'returnMentions': True
})
print(responseObj)
The response now includes mentions of individual entities, along with their text and references to the relevant tokens. The full text is automatically split into paragraphs, sentences, and tokens as part of the response.
- cURL
- Python SDK
- plain Python
{
    "id": "1",
    "language": {"detected": "en"},
    "paragraphs": [{
        "id": "P2",
        "type": "BODY",
        "text": "The trip to London last summer was great. I also liked Cambridge a lot. ",
        "corrText": "The trip to London last summer was great. I also liked Cambridge a lot. ",
        "sentences": [{
            "id": "s0",
            "tokens": [
                {"id": "t0", "off": 0, "text": "The", "corrOff": 0, "corrText": "The"},
                {"id": "t1", "off": 4, "text": "trip", "corrOff": 4, "corrText": "trip"},
                {"id": "t2", "off": 9, "text": "to", "corrOff": 9, "corrText": "to"},
                {"id": "t3", "off": 12, "text": "London", "corrOff": 12, "corrText": "London"},
                {"id": "t4", "off": 19, "text": "last", "corrOff": 19, "corrText": "last"},
                {"id": "t5", "off": 24, "text": "summer", "corrOff": 24, "corrText": "summer"},
                {"id": "t6", "off": 31, "text": "was", "corrOff": 31, "corrText": "was"},
                {"id": "t7", "off": 35, "text": "great", "corrOff": 35, "corrText": "great"},
                {"id": "t8", "off": 40, "text": ".", "corrOff": 40, "corrText": "."}]
        }, {
            "id": "s1",
            "tokens": [
                {"id": "t9", "off": 42, "text": "I", "corrOff": 42, "corrText": "I"},
                {"id": "t10", "off": 44, "text": "also", "corrOff": 44, "corrText": "also"},
                {"id": "t11", "off": 49, "text": "liked", "corrOff": 49, "corrText": "liked"},
                {"id": "t12", "off": 55, "text": "Cambridge", "corrOff": 55, "corrText": "Cambridge"},
                {"id": "t13", "off": 65, "text": "a", "corrOff": 65, "corrText": "a"},
                {"id": "t14", "off": 67, "text": "lot", "corrOff": 67, "corrText": "lot"},
                {"id": "t15", "off": 70, "text": ".", "corrOff": 70, "corrText": "."}
            ]
        }]
    }],
    "entities": [
        {"id": "E0", "stdForm": "2015-SU", "type": "date", "mentions": [{"id": "m0", "mwl": "last summer", "text": "last summer", "tokenIds": ["t4", "t5"]}]},
        {"id": "E1", "stdForm": "London", "type": "location", "mentions": [{"id": "m1", "mwl": "London", "text": "London", "tokenIds": ["t3"]}]},
        {"id": "E2", "stdForm": "Cambridge", "type": "location", "mentions": [{"id": "m2", "mwl": "Cambridge", "text": "Cambridge", "tokenIds": ["t12"]}]}
    ],
    "usedChars": 100
}
2015-SU: date
    last summer; last summer; CharSpan(start=19, end=30)
London: location
    London; London; CharSpan(start=12, end=18)
Cambridge: location
    Cambridge; Cambridge; CharSpan(start=55, end=64)
{
    "id": "1",
    "language": {"detected": "en"},
    "paragraphs": [{
        "id": "P2",
        "type": "BODY",
        "text": "The trip to London last summer was great. I also liked Cambridge a lot. ",
        "corrText": "The trip to London last summer was great. I also liked Cambridge a lot. ",
        "sentences": [{
            "id": "s0",
            "tokens": [
                {"id": "t0", "off": 0, "text": "The", "corrOff": 0, "corrText": "The"},
                {"id": "t1", "off": 4, "text": "trip", "corrOff": 4, "corrText": "trip"},
                {"id": "t2", "off": 9, "text": "to", "corrOff": 9, "corrText": "to"},
                {"id": "t3", "off": 12, "text": "London", "corrOff": 12, "corrText": "London"},
                {"id": "t4", "off": 19, "text": "last", "corrOff": 19, "corrText": "last"},
                {"id": "t5", "off": 24, "text": "summer", "corrOff": 24, "corrText": "summer"},
                {"id": "t6", "off": 31, "text": "was", "corrOff": 31, "corrText": "was"},
                {"id": "t7", "off": 35, "text": "great", "corrOff": 35, "corrText": "great"},
                {"id": "t8", "off": 40, "text": ".", "corrOff": 40, "corrText": "."}]
        }, {
            "id": "s1",
            "tokens": [
                {"id": "t9", "off": 42, "text": "I", "corrOff": 42, "corrText": "I"},
                {"id": "t10", "off": 44, "text": "also", "corrOff": 44, "corrText": "also"},
                {"id": "t11", "off": 49, "text": "liked", "corrOff": 49, "corrText": "liked"},
                {"id": "t12", "off": 55, "text": "Cambridge", "corrOff": 55, "corrText": "Cambridge"},
                {"id": "t13", "off": 65, "text": "a", "corrOff": 65, "corrText": "a"},
                {"id": "t14", "off": 67, "text": "lot", "corrOff": 67, "corrText": "lot"},
                {"id": "t15", "off": 70, "text": ".", "corrOff": 70, "corrText": "."}
            ]
        }]
    }],
    "entities": [
        {"id": "E0", "stdForm": "2015-SU", "type": "date", "mentions": [{"id": "m0", "mwl": "last summer", "text": "last summer", "tokenIds": ["t4", "t5"]}]},
        {"id": "E1", "stdForm": "London", "type": "location", "mentions": [{"id": "m1", "mwl": "London", "text": "London", "tokenIds": ["t3"]}]},
        {"id": "E2", "stdForm": "Cambridge", "type": "location", "mentions": [{"id": "m2", "mwl": "Cambridge", "text": "Cambridge", "tokenIds": ["t12"]}]}
    ],
    "usedChars": 100
}