Entities are important expressions, both named (e.g., organizations, cities) and unnamed (e.g., dates). The exact set of supported entities is dependent on the domain.
name or standard form – disambiguated and standardized form of the entity. For example, we will return USA for both USA and United States. We will also take care of morphology: returning Německo even when the text contains the form Německu.
id – a unique id of the entity in some knowledge base (we support this only in certain domains)
type – a string indicating whether the entity is a person or date, see below for a list of types.
instances or mentions - the actual mention of the entity in the document
See the Entity object reference page for more information.
The standard public workflows support the following entity types:
location– London, France
organization– UNESCO, IBM
person– John Doe
Relations and phrases:
verb relation (action + objects) –
VERB– buy lunch
attribute relation – (attribute + noun) –
ATTR– denied credit card
Note that for relations the
textfield of each instance contains the structure of the entity, e.g.
CLAUSE:attempt(AMOD:First). The format is
fnc:lemma(fnc:lemma, ...), where
fnccan be any dependency label from Universal Dependencies V1, mainly
CS_REFL_CLITIC(reflexive). For most purposes, you can ignore the first function, which expresses the function of the whole phrase relative to the rest of the sentence.
Date and Time:
Entities can be resolved relative to some point in time (see
referenceDatein Request). Standard forms follow the TIMEX3 format.
date– September 3 (
XXXX-09-03when unresolved), next Monday, summer of 2015 (
time– 12:03 (
YYYY-MM-DDT12:03), tonight (
duration– 3 years and 4 days (
P3Y4D), 5 minutes (
PT5M). Standard form
set– set of times/dates – every Monday (
XXXX-WXX-1), semiannual (
number– 3; five (words only in English)
ordinal– third (only for English)
In addition, we support many other entity types (food items, colors, means of transport, economic terms, laws, product numbers, …) in custom workflows aimed at particular industries.
We find the entities in a document, we use a combination of machine learning models, rules and lexicons. And as always we can customize all of these.
You can easily try it yourself:
You should get the following response:
You can use
"returnMentions": "true" to return the entity mentions:
In comparison with the previous response, this one contains mentions of the individual entities: their text and reference to the relevant tokens (text, split into paragraphs, sentences and tokens are added automatically to the response).