Entities
Entities are elements explicitly mentioned in the article. They include:
- Names of people, locations, organizations, products, and events (e.g., Barack Obama, Paris, Apple Inc., iPhone 13, Olympic Games)
- General concepts, sometimes referred to as keywords, which may not refer to a specific named item but represent meaningful ideas or topics—such as flu season, electric vehicle, or income tax.
These entities are identified to help you understand what the article is about and to support taksks like tagging, linking, or analysis.
Entity Information
The following information is provided for each detected entity (JSON field names are in parentheses):
-
GKB ID (
gkbId): The entity's identifier in our Geneea Knowledge Base (GKB), typically in its generic bucket (e.g.,G145for Britain). This field is missing for synthetic entities (see below). -
Standard Form (
stdForm): The standard/canonical name of the entity in the relevant language. For example, the standard form forG145is:United Kingdomin EnglishSpojené královstvíin CzechRoyaume-Uniin FrenchStorbritannienin Danish While typical translations, standard forms do not need to be direct equivalents. -
Type (
type): One of the supported entity types listed below. -
Mentions (
mentions): If enabled, we return all mentions of the entity within the text. For example,G145might appear as United Kingdom, UK, Britain, Great Britain and Northern Ireland in English. For each mention of an entity, we provide the following information:-
References to the article text (
tokenIds): These are references to specific tokens in the article where the entity is mentioned. This allows you to link back to the exact positions in the text—useful, for example, if you want to highlight or hyperlink entity mentions to tag or detail pages. -
Normalized form of the mention (
mwl): This abstracts away the language-specific morphological variations. For instance, in Czech, the entitySpojené království(United Kingdom in English) may appear in several different grammatical cases—e.g., Spojeného království, Spojenému království, etc.—depending on the context and grammatical function of the entity in the sentence.The
mvlfield always contains the base form:Spojené království. This normalization helps with consistent tagging, grouping, and analysis across inflected forms.
-
-
Relevance Score (
feats.relevance): A numeric value between 0 and 100 that indicates how central the entity is to the meaning of the text. For example, if the article is primarily about Britain, the relevance score of theG145entity (United Kingdom) will be high. If Britain is mentioned only briefly or in passing, the score will be much lower. This helps to distinguish main topics from incidental references. -
Other Features (depending on configuration, under the
featskey):- Wikidata ID – when available, providing a link to structured knowledge in Wikidata
- Social media handles, Wikipedia links – e.g., for public figures or organizations
- Other metadata from our internal GKB
These properties are especially useful for enriching the entity with external context or linking it to structured datasets.
See the Entity object reference page for more details.
Entity Types
The standard configuration includes the following entity types:
person– John Doeorganization– UNESCO, IBMlocation– London, Franceproduct– Skoda Octavia, iPhone 13event– Brexit, World War IIgeneral– electric vehicle, trade war, flu season, income tax
In addition, we support detection of numeric and temporal expressions such as dates, currencies, and amounts.
We can also enable detection of custom entity types tailored to your use case—for example:
colors, food items, economic terms, laws, product numbers, etc.
Derived Entities
Derived entities are not explicitly mentioned in the text but are logically related to those that are.
For example, if an article mentions Prague, we infer that it is also about the Czech Republic, based on geographic relationships stored in our Knowledge Base.
Here:
Pragueis a direct entityCzech Republicis a derived entity
Derived entities inherit all mentions of the original (direct) entity.
So, in the example above, every mention of Prague would also count as a (derived) mention of the Czech Republic.
It's also possible for an entity to have both a direct and derived mention in the same sentence.
For example, in the sentence “Prague is the capital of the Czech Republic,”
the entity Czech Republic has two mentions:
- A direct mention: Czech Republic
- A derived mention: Prague
We currently support a selection of derived entity types such as:
manufacturerindustrycountry,region,district,city,cityPart
Coverage varies depending on geography and data availability.
Synthetic Entities
While we aim to include all entities in the GKB, this is not always possible. Entities that are reliably detected in the text but lack a corresponding GKB entry are called synthetic entities.
Characteristics of synthetic entities:
- They do not have GKB identifiers, Wikidata IDs, or social media links.
- Their standard form is generated from the mention using grammatical heuristics.
- Quality may be lower due to lack of structured data.
In general, GKB-linked entities are preferred. Synthetic entities are useful for completeness but are not recommended for most production use cases, especially where accuracy or linkage is important.