Entities

Entities are important expressions, both named (e.g., organizations, cities) and unnamed (e.g., dates). The exact set of supported entities is dependent on the domain.

Entities have:

name or standard form - disambiguated and standardized form of the entity. For example, we will return USA for both USA and United States. We will also take care of morphology: returning Německo even when the text contains the form Německu. Media API V2 can also display the standard form in a specified language (Germany, Deutschland, Německo, etc.)
id - a unique id of the entity in some knowledge base (we support this only in certain domains)
link to Geneea Knowledge Base if the domain supports it.
type - a string indicating whether the entity is a person or date, see below for a list of types.
instances or mentions - the actual mention of the entity in the document

See the Entity object reference page for more information.

Entity types

The standard media domains support the following entity types:

Basic:
- person - John Doe
- organization - UNESCO, IBM
- location - London, France
- product - Skoda Octavia, iPhone 13
- event - Brexit, World War II
- general - electric vehicle, trade war
Internet:
- url - geneea.com
- email - info@geneea.com
- hashtag - #hashtag
- mention - @mention
Date and Time:

Entities can be resolved relative to some point in time (see referenceDate in Request). Standard forms follow the TIMEX3 format.
- date - September 3 (XXXX-09-03 when unresolved), next Monday, summer of 2015 (2015-SU)
- time - 12:03 (YYYY-MM-DDT12:03), tonight (YYYY-MM-DDTNI)
- duration - 3 years and 4 days (P3Y4D), 5 minutes (PT5M). Standard form P(n)Y(n)M(n)DT(n)H(n)M(n)S
- set - set of times/dates - every Monday (XXXX-WXX-1), semiannual (P6M)
Numbers:
- number - 3; five (words only in English)
- ordinal - third (only for English)
- money - $40
- percent - 5%

The standard VoC domains support selected named entities, general entities, industry specific entities (e.g., food for restaurants) and Internet/data/numeric entities.

In addition, we can support many other entity types (colors, means of transport, food items, economic terms, laws, product numbers, ...) in custom domains.

We use a combination of machine learning models, rules and lexicons. And as always we can customize all of these.

Sample call

You can easily try it yourself:

cURL
Python SDK
plain Python

curl -X POST https://api.geneea.com/v3/analysis \
-H 'Authorization: user_key <YOUR USER KEY>' \
-H 'Content-Type: application/json' \
-d '{
    "id": "1",
    "text": "The trip to London last summer was great. I also liked Cambridge a lot. ",
    "referenceDate": "2016-02-01",
    "analyses": ["entities"]
}'

## On Windows, use \" instead of " and " instead of '

from geneeanlpclient import g3

requestBuilder = g3.Request.Builder(analyses=[g3.AnalysisType.ENTITIES])

with g3.Client.create(userKey=<YOUR USER KEY>) as analyzer:
    result = analyzer.analyze(requestBuilder.build(
        id=str(1),
        referenceDate='2016-02-01',
        text='The trip to London last summer was great. I also liked Cambridge a lot.'
    ))

    for e in result.entities:
        print(f'{e.stdForm}: {e.type}')

import requests

def callGeneea(input):
    url = 'https://api.geneea.com/v3/analysis'
    headers = {
        'content-type': 'application/json',
        'Authorization': 'user_key <YOUR USER KEY>'
    }

    return requests.post(url, json=input, headers=headers).json()

responseObj = callGeneea({
    'id': '1',
    'text': 'The trip to London last summer was great. I also liked Cambridge a lot. ',
    'referenceDate': '2016-02-01',
    'analyses': ['entities']
})

print(responseObj)

You should get the following response:

cURL
Python SDK
plain Python

{
    "id": "1",
    "language": {"detected": "en"},
    "entities": [
        {"id": "E0", "stdForm": "2015-SU", "type": "date"},
        {"id": "E1", "stdForm": "London", "type": "location"},
        {"id": "E2", "stdForm": "Cambridge", "type": "location"}
    ],
    "usedChars": 100
}

2015-SU: date
London: location
Cambridge: location

{
    "id": "1",
    "language": {"detected": "en"},
    "entities": [
        {"id": "E0", "stdForm": "2015-SU", "type": "date"},
        {"id": "E1", "stdForm": "London", "type": "location"},
        {"id": "E2", "stdForm": "Cambridge", "type": "location"}
    ],
    "usedChars": 100
}

Mentions and highlighting

You can use "returnMentions": "true" to return the entity mentions:

cURL
Python SDK
plain Python

curl -X POST https://api.geneea.com/v3/analysis \
-H 'Authorization: user_key <YOUR USER KEY>' \
-H 'Content-Type: application/json' \
-d '{
    "id": "1",
    "text": "The trip to London last summer was great. I also liked Cambridge a lot. ",
    "referenceDate": "2016-02-01",
    "analyses": ["entities"],
    "returnMentions": "true"
}'

## On Windows, use \" instead of " and " instead of '

from geneeanlpclient import g3

requestBuilder = g3.Request.Builder(analyses=[g3.AnalysisType.ENTITIES], returnMentions=True)

with g3.Client.create() as analyzer:
    result = analyzer.analyze(requestBuilder.build(
            id=str(1),
            referenceDate='2016-02-01',
            text='The trip to London last summer was great. I also liked Cambridge a lot.'
    ))

    for e in result.entities:
        print(f'{e.stdForm}: {e.type}')
        for m in e.mentions:
            ## charSpan can be used for highlighting in the original text
            print(f'\t{m.text}; {m.mwl}; {m.tokens.charSpan}')

def callGeneea(input):
    url = 'https://api.geneea.com/v3/analysis'
    headers = {
        'content-type': 'application/json',
        'Authorization': 'user_key <your user key>'
    }

    return requests.post(url, json=input, headers=headers).json()

responseObj = callGeneea({
    'id': '1',
    'text': 'The trip to London last summer was great. I also liked Cambridge a lot. ',
    'referenceDate': '2016-02-01',
    'analyses': ['entities'],
    'returnMentions': True
})

print(responseObj)

In comparison with the previous response, this one contains mentions of the individual entities: their text and reference to the relevant tokens (text, split into paragraphs, sentences and tokens are added automatically to the response).

cURL
Python SDK
plain Python

{
    "id": "1",
    "language": {"detected": "en"},
    "paragraphs": [{
        "id": "P2",
        "type": "BODY",
        "text": "The trip to London last summer was great. I also liked Cambridge a lot. ",
        "corrText": "The trip to London last summer was great. I also liked Cambridge a lot. ",
        "sentences": [{
            "id": "s0",
            "tokens": [
                {"id": "t0", "off": 0, "text": "The", "corrOff": 0, "corrText": "The"},
                {"id": "t1", "off": 4, "text": "trip", "corrOff": 4, "corrText": "trip"},
                {"id": "t2", "off": 9, "text": "to", "corrOff": 9, "corrText": "to"},
                {"id": "t3", "off": 12, "text": "London", "corrOff": 12, "corrText": "London"},
                {"id": "t4", "off": 19, "text": "last", "corrOff": 19, "corrText": "last"},
                {"id": "t5", "off": 24, "text": "summer", "corrOff": 24, "corrText": "summer"},
                {"id": "t6", "off": 31, "text": "was", "corrOff": 31, "corrText": "was"},
                {"id": "t7", "off": 35, "text": "great", "corrOff": 35, "corrText": "great"},
                {"id": "t8", "off": 40, "text": ".", "corrOff": 40, "corrText": "."}]
        }, {
            "id": "s1",
            "tokens": [
                {"id": "t9", "off": 42, "text": "I", "corrOff": 42, "corrText": "I"},
                {"id": "t10", "off": 44, "text": "also", "corrOff": 44, "corrText": "also"},
                {"id": "t11", "off": 49, "text": "liked", "corrOff": 49, "corrText": "liked"},
                {"id": "t12", "off": 55, "text": "Cambridge", "corrOff": 55, "corrText": "Cambridge"},
                {"id": "t13", "off": 65, "text": "a", "corrOff": 65, "corrText": "a"},
                {"id": "t14", "off": 67, "text": "lot", "corrOff": 67, "corrText": "lot"},
                {"id": "t15", "off": 70, "text": ".", "corrOff": 70, "corrText": "."}
            ]
        }]
    }],
    "entities": [
        {"id": "E0", "stdForm": "2015-SU", "type": "date", "mentions": [{"id": "m0", "mwl": "last summer", "text": "last summer", "tokenIds": ["t4", "t5"]}]},
        {"id": "E1", "stdForm": "London", "type": "location", "mentions": [{"id": "m1", "mwl": "London", "text": "London", "tokenIds": ["t3"]}]},
        {"id": "E2", "stdForm": "Cambridge", "type": "location", "mentions": [{"id": "m2", "mwl": "Cambridge", "text": "Cambridge", "tokenIds": ["t12"]}]}
    ],
    "usedChars": 100
}

2015-SU: date
    last summer; last summer; CharSpan(start=19, end=30)
London: location
    London; London; CharSpan(start=12, end=18)
Cambridge: location
    Cambridge; Cambridge; CharSpan(start=55, end=64)

{
    "id": "1",
    "language": {"detected": "en"},
    "paragraphs": [{
        "id": "P2",
        "type": "BODY",
        "text": "The trip to London last summer was great. I also liked Cambridge a lot. ",
        "corrText": "The trip to London last summer was great. I also liked Cambridge a lot. ",
        "sentences": [{
            "id": "s0",
            "tokens": [
                {"id": "t0", "off": 0, "text": "The", "corrOff": 0, "corrText": "The"},
                {"id": "t1", "off": 4, "text": "trip", "corrOff": 4, "corrText": "trip"},
                {"id": "t2", "off": 9, "text": "to", "corrOff": 9, "corrText": "to"},
                {"id": "t3", "off": 12, "text": "London", "corrOff": 12, "corrText": "London"},
                {"id": "t4", "off": 19, "text": "last", "corrOff": 19, "corrText": "last"},
                {"id": "t5", "off": 24, "text": "summer", "corrOff": 24, "corrText": "summer"},
                {"id": "t6", "off": 31, "text": "was", "corrOff": 31, "corrText": "was"},
                {"id": "t7", "off": 35, "text": "great", "corrOff": 35, "corrText": "great"},
                {"id": "t8", "off": 40, "text": ".", "corrOff": 40, "corrText": "."}]
        }, {
            "id": "s1",
            "tokens": [
                {"id": "t9", "off": 42, "text": "I", "corrOff": 42, "corrText": "I"},
                {"id": "t10", "off": 44, "text": "also", "corrOff": 44, "corrText": "also"},
                {"id": "t11", "off": 49, "text": "liked", "corrOff": 49, "corrText": "liked"},
                {"id": "t12", "off": 55, "text": "Cambridge", "corrOff": 55, "corrText": "Cambridge"},
                {"id": "t13", "off": 65, "text": "a", "corrOff": 65, "corrText": "a"},
                {"id": "t14", "off": 67, "text": "lot", "corrOff": 67, "corrText": "lot"},
                {"id": "t15", "off": 70, "text": ".", "corrOff": 70, "corrText": "."}
            ]
        }]
    }],
    "entities": [
        {"id": "E0", "stdForm": "2015-SU", "type": "date", "mentions": [{"id": "m0", "mwl": "last summer", "text": "last summer", "tokenIds": ["t4", "t5"]}]},
        {"id": "E1", "stdForm": "London", "type": "location", "mentions": [{"id": "m1", "mwl": "London", "text": "London", "tokenIds": ["t3"]}]},
        {"id": "E2", "stdForm": "Cambridge", "type": "location", "mentions": [{"id": "m2", "mwl": "Cambridge", "text": "Cambridge", "tokenIds": ["t12"]}]}
    ],
    "usedChars": 100
}

Entity types​

Sample call​

Mentions and highlighting​

Entity types

Sample call

Mentions and highlighting