Skip to main content

Entities

Entities are meaningful expressions—either named (e.g., organizations, cities) or unnamed (e.g., dates). The exact set of supported entities depends on the selected domain.

Each entity includes:

  • Name or standard form – A disambiguated and standardized version of the entity. For example, we return USA for both USA and United States. We also handle morphology, e.g., returning Německo when the text contains the form Německu. Media API V2 can return the standard form in a specified language (Germany, Deutschland, Německo, etc.)
  • ID – A unique identifier for the entity in a knowledge base (available in selected domains).
  • Link to Geneea Knowledge Base – If supported by the domain.
  • Type – A string indicating the entity type (e.g., person, date). See the list below.
  • Instances or mentions — The actual mentions of the entity in the document.

See the Entity object reference for more detail.

Entity Types

The standard media domains support the following entity types:

  • Basic:

    • personJohn Doe
    • organizationUNESCO, IBM
    • locationLondon, France
    • productSkoda Octavia, iPhone 13
    • eventBrexit, World War II
    • generalelectric vehicle, trade war
  • Internet:

    • urlgeneea.com
    • emailinfo@geneea.com
    • hashtag#hashtag
    • mention@mention
  • Date and Time:

    These can be resolved relative to a specific point in time (see referenceDate in the Request). Standard forms follow the TIMEX3 format.

    • dateSeptember 3 (XXXX-09-03 when unresolved), next Monday, summer of 2015 (2015-SU)
    • time12:03 (YYYY-MM-DDT12:03), tonight (YYYY-MM-DDTNI)
    • duration3 years and 4 days (P3Y4D), 5 minutes (PT5M). Format: P(n)Y(n)M(n)DT(n)H(n)M(n)S
    • set – Recurring time expressions – every Monday (XXXX-WXX-1), semiannual (P6M)
  • Numbers:

    • number3; five (words supported only in English)
    • ordinalthird (only for English)
    • money$40
    • percent5%

The standard VoC domains support selected named entities, general entities, industry-specific entities (e.g., food items in restaurant reviews), as well as Internet, date, and numeric entities.

Custom domains can also support many additional types, such as colors, modes of transport, food items, economic terms, legal references, product numbers, and more.

We use a combination of machine learning models, rules, and lexicons—and all of this is fully customizable.

Sample Call

You can obtain entities using the following call:

curl -X POST https://api.geneea.com/v3/analysis \
-H 'Authorization: user_key <YOUR USER KEY>' \
-H 'Content-Type: application/json' \
-d '{
"id": "1",
"text": "The trip to London last summer was great. I also liked Cambridge a lot. ",
"referenceDate": "2016-02-01",
"analyses": ["entities"]
}'

## On Windows, use \" instead of " and " instead of '

Expected response:

{
"id": "1",
"language": {"detected": "en"},
"entities": [
{"id": "E0", "stdForm": "2015-SU", "type": "date"},
{"id": "E1", "stdForm": "London", "type": "location"},
{"id": "E2", "stdForm": "Cambridge", "type": "location"}
],
"usedChars": 100
}

Mentions and Highlighting

To retrieve entity mentions, include "returnMentions": "true" in your request. Mentions include the actual text as it appears and reference the relevant tokens (useful for highlighting).

curl -X POST https://api.geneea.com/v3/analysis \
-H 'Authorization: user_key <YOUR USER KEY>' \
-H 'Content-Type: application/json' \
-d '{
"id": "1",
"text": "The trip to London last summer was great. I also liked Cambridge a lot. ",
"referenceDate": "2016-02-01",
"analyses": ["entities"],
"returnMentions": "true"
}'

## On Windows, use \" instead of " and " instead of '

The response now includes mentions of individual entities, along with their text and references to the relevant tokens. The full text is automatically split into paragraphs, sentences, and tokens as part of the response.

{
"id": "1",
"language": {"detected": "en"},
"paragraphs": [{
"id": "P2",
"type": "BODY",
"text": "The trip to London last summer was great. I also liked Cambridge a lot. ",
"corrText": "The trip to London last summer was great. I also liked Cambridge a lot. ",
"sentences": [{
"id": "s0",
"tokens": [
{"id": "t0", "off": 0, "text": "The", "corrOff": 0, "corrText": "The"},
{"id": "t1", "off": 4, "text": "trip", "corrOff": 4, "corrText": "trip"},
{"id": "t2", "off": 9, "text": "to", "corrOff": 9, "corrText": "to"},
{"id": "t3", "off": 12, "text": "London", "corrOff": 12, "corrText": "London"},
{"id": "t4", "off": 19, "text": "last", "corrOff": 19, "corrText": "last"},
{"id": "t5", "off": 24, "text": "summer", "corrOff": 24, "corrText": "summer"},
{"id": "t6", "off": 31, "text": "was", "corrOff": 31, "corrText": "was"},
{"id": "t7", "off": 35, "text": "great", "corrOff": 35, "corrText": "great"},
{"id": "t8", "off": 40, "text": ".", "corrOff": 40, "corrText": "."}]
}, {
"id": "s1",
"tokens": [
{"id": "t9", "off": 42, "text": "I", "corrOff": 42, "corrText": "I"},
{"id": "t10", "off": 44, "text": "also", "corrOff": 44, "corrText": "also"},
{"id": "t11", "off": 49, "text": "liked", "corrOff": 49, "corrText": "liked"},
{"id": "t12", "off": 55, "text": "Cambridge", "corrOff": 55, "corrText": "Cambridge"},
{"id": "t13", "off": 65, "text": "a", "corrOff": 65, "corrText": "a"},
{"id": "t14", "off": 67, "text": "lot", "corrOff": 67, "corrText": "lot"},
{"id": "t15", "off": 70, "text": ".", "corrOff": 70, "corrText": "."}
]
}]
}],
"entities": [
{"id": "E0", "stdForm": "2015-SU", "type": "date", "mentions": [{"id": "m0", "mwl": "last summer", "text": "last summer", "tokenIds": ["t4", "t5"]}]},
{"id": "E1", "stdForm": "London", "type": "location", "mentions": [{"id": "m1", "mwl": "London", "text": "London", "tokenIds": ["t3"]}]},
{"id": "E2", "stdForm": "Cambridge", "type": "location", "mentions": [{"id": "m2", "mwl": "Cambridge", "text": "Cambridge", "tokenIds": ["t12"]}]}
],
"usedChars": 100
}