Entities

Entities are meaningful expressions—either named (e.g., organizations, cities) or unnamed (e.g., dates). The exact set of supported entities depends on the selected domain.

Each entity includes:

Name or standard form – A disambiguated and standardized version of the entity. For example, we return USA for both USA and United States. We also handle morphology, e.g., returning Německo when the text contains the form Německu. Media API V2 can return the standard form in a specified language (Germany, Deutschland, Německo, etc.)
ID – A unique identifier for the entity in a knowledge base (available in selected domains).
Link to Geneea Knowledge Base – If supported by the domain.
Type – A string indicating the entity type (e.g., person, date). See the list below.
Instances or mentions — The actual mentions of the entity in the document.

See the Entity object reference for more detail.

Entity Types

The standard media domains support the following entity types:

Basic:
- person – John Doe
- organization – UNESCO, IBM
- location – London, France
- product – Skoda Octavia, iPhone 13
- event – Brexit, World War II
- general – electric vehicle, trade war
Internet:
- url – geneea.com
- email – info@geneea.com
- hashtag – #hashtag
- mention – @mention
Date and Time:

These can be resolved relative to a specific point in time (see referenceDate in the Request). Standard forms follow the TIMEX3 format.
- date – September 3 (XXXX-09-03 when unresolved), next Monday, summer of 2015 (2015-SU)
- time – 12:03 (YYYY-MM-DDT12:03), tonight (YYYY-MM-DDTNI)
- duration – 3 years and 4 days (P3Y4D), 5 minutes (PT5M). Format: P(n)Y(n)M(n)DT(n)H(n)M(n)S
- set – Recurring time expressions – every Monday (XXXX-WXX-1), semiannual (P6M)
Numbers:
- number – 3; five (words supported only in English)
- ordinal – third (only for English)
- money – $40
- percent – 5%

The standard VoC domains support selected named entities, general entities, industry-specific entities (e.g., food items in restaurant reviews), as well as Internet, date, and numeric entities.

Custom domains can also support many additional types, such as colors, modes of transport, food items, economic terms, legal references, product numbers, and more.

We use a combination of machine learning models, rules, and lexicons—and all of this is fully customizable.

Sample Call

You can obtain entities using the following call:

cURL
Python SDK
plain Python

curl -X POST https://api.geneea.com/v3/analysis \
-H 'Authorization: user_key <YOUR USER KEY>' \
-H 'Content-Type: application/json' \
-d '{
    "id": "1",
    "text": "The trip to London last summer was great. I also liked Cambridge a lot. ",
    "referenceDate": "2016-02-01",
    "analyses": ["entities"]
}'

## On Windows, use \" instead of " and " instead of '

from geneeanlpclient import g3

requestBuilder = g3.Request.Builder(analyses=[g3.AnalysisType.ENTITIES])

with g3.Client.create(userKey=<YOUR USER KEY>) as analyzer:
    result = analyzer.analyze(requestBuilder.build(
        id=str(1),
        referenceDate='2016-02-01',
        text='The trip to London last summer was great. I also liked Cambridge a lot.'
    ))

    for e in result.entities:
        print(f'{e.stdForm}: {e.type}')

import requests

def callGeneea(input):
    url = 'https://api.geneea.com/v3/analysis'
    headers = {
        'content-type': 'application/json',
        'Authorization': 'user_key <YOUR USER KEY>'
    }

    return requests.post(url, json=input, headers=headers).json()

responseObj = callGeneea({
    'id': '1',
    'text': 'The trip to London last summer was great. I also liked Cambridge a lot. ',
    'referenceDate': '2016-02-01',
    'analyses': ['entities']
})

print(responseObj)

Expected response:

cURL
Python SDK
plain Python

{
    "id": "1",
    "language": {"detected": "en"},
    "entities": [
        {"id": "E0", "stdForm": "2015-SU", "type": "date"},
        {"id": "E1", "stdForm": "London", "type": "location"},
        {"id": "E2", "stdForm": "Cambridge", "type": "location"}
    ],
    "usedChars": 100
}

2015-SU: date
London: location
Cambridge: location

{
    "id": "1",
    "language": {"detected": "en"},
    "entities": [
        {"id": "E0", "stdForm": "2015-SU", "type": "date"},
        {"id": "E1", "stdForm": "London", "type": "location"},
        {"id": "E2", "stdForm": "Cambridge", "type": "location"}
    ],
    "usedChars": 100
}

Mentions and Highlighting

To retrieve entity mentions, include "returnMentions": "true" in your request. Mentions include the actual text as it appears and reference the relevant tokens (useful for highlighting).

cURL
Python SDK
plain Python

curl -X POST https://api.geneea.com/v3/analysis \
-H 'Authorization: user_key <YOUR USER KEY>' \
-H 'Content-Type: application/json' \
-d '{
    "id": "1",
    "text": "The trip to London last summer was great. I also liked Cambridge a lot. ",
    "referenceDate": "2016-02-01",
    "analyses": ["entities"],
    "returnMentions": "true"
}'

## On Windows, use \" instead of " and " instead of '

from geneeanlpclient import g3

requestBuilder = g3.Request.Builder(analyses=[g3.AnalysisType.ENTITIES], returnMentions=True)

with g3.Client.create() as analyzer:
    result = analyzer.analyze(requestBuilder.build(
            id=str(1),
            referenceDate='2016-02-01',
            text='The trip to London last summer was great. I also liked Cambridge a lot.'
    ))

    for e in result.entities:
        print(f'{e.stdForm}: {e.type}')
        for m in e.mentions:
            ## charSpan can be used for highlighting in the original text
            print(f'\t{m.text}; {m.mwl}; {m.tokens.charSpan}')

def callGeneea(input):
    url = 'https://api.geneea.com/v3/analysis'
    headers = {
        'content-type': 'application/json',
        'Authorization': 'user_key <your user key>'
    }

    return requests.post(url, json=input, headers=headers).json()

responseObj = callGeneea({
    'id': '1',
    'text': 'The trip to London last summer was great. I also liked Cambridge a lot. ',
    'referenceDate': '2016-02-01',
    'analyses': ['entities'],
    'returnMentions': True
})

print(responseObj)

The response now includes mentions of individual entities, along with their text and references to the relevant tokens. The full text is automatically split into paragraphs, sentences, and tokens as part of the response.

cURL
Python SDK
plain Python

{
    "id": "1",
    "language": {"detected": "en"},
    "paragraphs": [{
        "id": "P2",
        "type": "BODY",
        "text": "The trip to London last summer was great. I also liked Cambridge a lot. ",
        "corrText": "The trip to London last summer was great. I also liked Cambridge a lot. ",
        "sentences": [{
            "id": "s0",
            "tokens": [
                {"id": "t0", "off": 0, "text": "The", "corrOff": 0, "corrText": "The"},
                {"id": "t1", "off": 4, "text": "trip", "corrOff": 4, "corrText": "trip"},
                {"id": "t2", "off": 9, "text": "to", "corrOff": 9, "corrText": "to"},
                {"id": "t3", "off": 12, "text": "London", "corrOff": 12, "corrText": "London"},
                {"id": "t4", "off": 19, "text": "last", "corrOff": 19, "corrText": "last"},
                {"id": "t5", "off": 24, "text": "summer", "corrOff": 24, "corrText": "summer"},
                {"id": "t6", "off": 31, "text": "was", "corrOff": 31, "corrText": "was"},
                {"id": "t7", "off": 35, "text": "great", "corrOff": 35, "corrText": "great"},
                {"id": "t8", "off": 40, "text": ".", "corrOff": 40, "corrText": "."}]
        }, {
            "id": "s1",
            "tokens": [
                {"id": "t9", "off": 42, "text": "I", "corrOff": 42, "corrText": "I"},
                {"id": "t10", "off": 44, "text": "also", "corrOff": 44, "corrText": "also"},
                {"id": "t11", "off": 49, "text": "liked", "corrOff": 49, "corrText": "liked"},
                {"id": "t12", "off": 55, "text": "Cambridge", "corrOff": 55, "corrText": "Cambridge"},
                {"id": "t13", "off": 65, "text": "a", "corrOff": 65, "corrText": "a"},
                {"id": "t14", "off": 67, "text": "lot", "corrOff": 67, "corrText": "lot"},
                {"id": "t15", "off": 70, "text": ".", "corrOff": 70, "corrText": "."}
            ]
        }]
    }],
    "entities": [
        {"id": "E0", "stdForm": "2015-SU", "type": "date", "mentions": [{"id": "m0", "mwl": "last summer", "text": "last summer", "tokenIds": ["t4", "t5"]}]},
        {"id": "E1", "stdForm": "London", "type": "location", "mentions": [{"id": "m1", "mwl": "London", "text": "London", "tokenIds": ["t3"]}]},
        {"id": "E2", "stdForm": "Cambridge", "type": "location", "mentions": [{"id": "m2", "mwl": "Cambridge", "text": "Cambridge", "tokenIds": ["t12"]}]}
    ],
    "usedChars": 100
}

2015-SU: date
    last summer; last summer; CharSpan(start=19, end=30)
London: location
    London; London; CharSpan(start=12, end=18)
Cambridge: location
    Cambridge; Cambridge; CharSpan(start=55, end=64)

{
    "id": "1",
    "language": {"detected": "en"},
    "paragraphs": [{
        "id": "P2",
        "type": "BODY",
        "text": "The trip to London last summer was great. I also liked Cambridge a lot. ",
        "corrText": "The trip to London last summer was great. I also liked Cambridge a lot. ",
        "sentences": [{
            "id": "s0",
            "tokens": [
                {"id": "t0", "off": 0, "text": "The", "corrOff": 0, "corrText": "The"},
                {"id": "t1", "off": 4, "text": "trip", "corrOff": 4, "corrText": "trip"},
                {"id": "t2", "off": 9, "text": "to", "corrOff": 9, "corrText": "to"},
                {"id": "t3", "off": 12, "text": "London", "corrOff": 12, "corrText": "London"},
                {"id": "t4", "off": 19, "text": "last", "corrOff": 19, "corrText": "last"},
                {"id": "t5", "off": 24, "text": "summer", "corrOff": 24, "corrText": "summer"},
                {"id": "t6", "off": 31, "text": "was", "corrOff": 31, "corrText": "was"},
                {"id": "t7", "off": 35, "text": "great", "corrOff": 35, "corrText": "great"},
                {"id": "t8", "off": 40, "text": ".", "corrOff": 40, "corrText": "."}]
        }, {
            "id": "s1",
            "tokens": [
                {"id": "t9", "off": 42, "text": "I", "corrOff": 42, "corrText": "I"},
                {"id": "t10", "off": 44, "text": "also", "corrOff": 44, "corrText": "also"},
                {"id": "t11", "off": 49, "text": "liked", "corrOff": 49, "corrText": "liked"},
                {"id": "t12", "off": 55, "text": "Cambridge", "corrOff": 55, "corrText": "Cambridge"},
                {"id": "t13", "off": 65, "text": "a", "corrOff": 65, "corrText": "a"},
                {"id": "t14", "off": 67, "text": "lot", "corrOff": 67, "corrText": "lot"},
                {"id": "t15", "off": 70, "text": ".", "corrOff": 70, "corrText": "."}
            ]
        }]
    }],
    "entities": [
        {"id": "E0", "stdForm": "2015-SU", "type": "date", "mentions": [{"id": "m0", "mwl": "last summer", "text": "last summer", "tokenIds": ["t4", "t5"]}]},
        {"id": "E1", "stdForm": "London", "type": "location", "mentions": [{"id": "m1", "mwl": "London", "text": "London", "tokenIds": ["t3"]}]},
        {"id": "E2", "stdForm": "Cambridge", "type": "location", "mentions": [{"id": "m2", "mwl": "Cambridge", "text": "Cambridge", "tokenIds": ["t12"]}]}
    ],
    "usedChars": 100
}

Entity Types​

Sample Call​

Mentions and Highlighting​

Entity Types

Sample Call

Mentions and Highlighting