Knowledge base¶
All the returned tags and entities are linked to Geneea Knowledge Base (GKB, Geneea KB). GKB combines existing open data (wikidata, DBpedia, OpenStreetMap, company registries, etc.) with our own private resources. GKB also supports custom properties (e.g. your internal IDs) and items.
This section discusses:
Entity information: obtaining information about an entity with a particular ID
Multiple Presentation Languages: getting entity standard form (title) in a particular language
Deprecated and duplicate IDs: duplicate IDs and what to do with them
Knowledge base search: searching knowledge base for entities by their name
Entity information (Info boxes)¶
Sometimes we want to display more than a name of an entity. With the following code, we get a brief description, links to wikipedia, wikidata. More is coming soon.
const kbInfoBoxes = async (input, config) => {
const response = await axios.post('knowledgebase/infoboxes', input, config)
return response.data;
};
input = {
IDs: ['G458', 'G567'],
language: 'fr'
};
kbInfoBoxes(input, config).then(report);
def kbInfoBoxes(input: Mapping[str, Any]):
return requests.post(f'{BASE_URL}knowledgebase/infoboxes', json=input, headers=HEADERS).json()
input = {
'IDs': ['G458', 'G567'],
'language': 'fr',
};
kbInfoBoxes(input)
{
G458: {
value: {
title: 'Union européenne',
header: "union politico-économique sui generis d'États européens",
body: '',
footer: {
cswiki: 'https://cs.wikipedia.org/wiki/Evropská_unie',
enwiki: 'https://en.wikipedia.org/wiki/European_Union',
wikidata: 'http://www.wikidata.org/entity/Q458',
gkb: 'http://gkbs1-env2.eu-west-1.elasticbeanstalk.com/gkbs/v1//item/G458'
}
},
language: 'fr'
},
G567: {
value: {
title: 'Angela Merkel',
header: 'chancelière fédérale allemande',
body: '',
footer: {
cswiki: 'https://cs.wikipedia.org/wiki/Angela_Merkelová',
enwiki: 'https://en.wikipedia.org/wiki/Angela_Merkel',
facebook: 'https://www.facebook.com/AngelaMerkel',
instagram: 'https://www.instagram.com/bundeskanzlerin',
wikidata: 'http://www.wikidata.org/entity/Q567',
gkb: 'http://gkbs1-env2.eu-west-1.elasticbeanstalk.com/gkbs/v1//item/G567'
}
},
language: 'fr'
}
}
{
G458: { value: 'Unia Europejska', language: 'pl' },
G567: { value: 'Angela Merkel', language: 'pl' }
}
{'G458': {'value': {'title': 'Union européenne',
'header': "union politico-économique sui generis d'États européens",
'body': '',
'footer': {'cswiki': 'https://cs.wikipedia.org/wiki/Evropská_unie',
'enwiki': 'https://en.wikipedia.org/wiki/European_Union',
'wikidata': 'http://www.wikidata.org/entity/Q458',
'gkb': 'http://gkbs1-env2.eu-west-1.elasticbeanstalk.com/gkbs/v1//item/G458'}},
'language': 'fr'},
'G567': {'value': {'title': 'Angela Merkel',
'header': 'chancelière fédérale allemande',
'body': '',
'footer': {'cswiki': 'https://cs.wikipedia.org/wiki/Angela_Merkelová',
'enwiki': 'https://en.wikipedia.org/wiki/Angela_Merkel',
'facebook': 'https://www.facebook.com/AngelaMerkel',
'instagram': 'https://www.instagram.com/bundeskanzlerin',
'wikidata': 'http://www.wikidata.org/entity/Q567',
'gkb': 'http://gkbs1-env2.eu-west-1.elasticbeanstalk.com/gkbs/v1//item/G567'}},
'language': 'fr'}}
Multiple Presentation Languages¶
Knowledgebase can be also used when we need to present entities or tags in more than one language. Obviously, we could perform repeated analyses, each with one of the desired presentation language (see Presentation Language). The following code, translates two tags from the example in the :ref:`photo recommendation guide to Polish:
const kbStdForms = async (input, config) => {
const response = await axios.post('knowledgebase/stdforms', input, config)
return response.data;
};
input = {
IDs: ['G458', 'G567'],
language: 'pl',
};
kbStdForms(input, config).then(report);
def kbStdForms(input: Mapping[str, Any]):
return requests.post(f'{BASE_URL}knowledgebase/stdforms', json=input, headers=HEADERS).json()
input = {
'IDs': ['G458', 'G567'],
'language': 'pl',
};
kbStdForms(input)
kbStdForms(input, config).then(report);
{
G458: { value: 'Unia Europejska', language: 'pl' },
G567: { value: 'Angela Merkel', language: 'pl' }
}
{'G458': {'value': 'Unia Europejska', 'language': 'pl'},
'G567': {'value': 'Angela Merkel', 'language': 'pl'}}
Deprecated and Duplicate Ids¶
Over time, it may happen that duplicate items will be created in the knowledge base.
By duplicates, we mean items that have different IDs but represent the same concept in the real world.
For example, the two items with IDs G22262439
and G8ad70d13-E
that actually represent the same person,
Jiří Kulhánek, a Czech local politician, are duplicates.
There are various reasons for such duplicates, for example:
Some of the sources for our knowledge base (such as Wikidata) are crowdsourced, and multiple items might have been created independently.
A real-world entity is recorded by multiple sources (e.g., a business register and Wikipedia), and linking them is not always straightforward.
Knowledge base items that are derived from automated data analysis are inherently noisy. For example, we might identify a John Doe, an actor, and a John Doe, a politician, and create separate knowledge base items for each, only to later discover that Mr. Doe is actually both an actor and a politician.
Whenever such duplicate items are detected, one of them is selected as the primary item, and the remaining items are marked as inactive. This means that they will no longer be used and that only the primary item can appear in the results.
Our API communicates this information in two ways:
As part of the NLP response: if an entity or tag has ever had any alternative IDs, these IDs are returned as well.
A dedicated redirects API that for any knowledge base ID returns its status and information about any duplicate IDs.
See below for more information.
Duplicates in NLP Response¶
Information about deprecated IDs, is also part of the NLP response (see Article Content Analysis). If any entity or tag has a deprecated duplicate ID, the ID is listed under feats.duplicateGkbIds. If there are multiple duplicated IDs, they are separated by a comma.
This can be seen in the following example response where the person Jiří Kulhánek has an active ID G22262439 but two deprecated IDs. Note that the response is simplified, with many irrelevant keys are omitted.
{
...
entities: [
{
id 'e1',
stdForm: 'Jiří Kulhánek',
gkbId: 'G22262439',
feats: {
duplicateGkbIds: 'G8ad70d13-E,Gfd6d708c-C'
}
},
],
tags: [
{
id: 't1',
stdForm: 'Jiří Kulhánek',
gkbId: 'G22262439',
feats: {
duplicateGkbIds: 'G8ad70d13-E,Gfd6d708c-C'
}
}
],
...
}
{
...
'entities': [
{
'id': 'e1',
'stdForm': 'Jiří Kulhánek',
'gkbId': 'G22262439',
'feats': {
'duplicateGkbIds': 'G8ad70d13-E,Gfd6d708c-C'
}
},
],
'tags': [
{
'id': 't1',
'stdForm': 'Jiří Kulhánek',
'gkbId': 'G22262439',
'feats': {
'duplicateGkbIds': 'G8ad70d13-E,Gfd6d708c-C'
}
}
],
...
}
Knowledge base items redirects¶
We can also ask for the status of some IDs directly, using the knowledge base redirect API:
const itemRedirects = async (IDs, config) => {
const response = await axios.post('knowledgebase/redirects', {gkbIds: IDs}, config)
return response.data;
};
IDs = ['G1', 'G22262439', 'G8ad70d13-E', 'Gfd6d708c-C'];
itemRedirects(IDs, config).then(report);
def itemRedirects(IDs: List[str]):
return requests.post(f'{BASE_URL}knowledgebase/redirects', json={'gkbIds': IDs}, headers=HEADERS).json()
IDs = ['G1', 'G22262439', 'G8ad70d13-E', 'Gfd6d708c-C']
itemRedirects(IDs)
In the result below, you can see, for example, that the item G8ad70d13-E
is marked as inactive and being replaced by the item G22262439
.
{
G1: {status: 'active'},
G22262439: {status: 'active', replaces: ['G8ad70d13-E', 'Gfd6d708c-C']},
G8ad70d13-E: {status: 'inactive', replacedBy: 'G22262439'},
Gfd6d708c-C: {status: 'inactive', replacedBy: 'G22262439'}
}
{
'G1': {'status': 'active'},
'G22262439': {'status': 'active', 'replaces': ['G8ad70d13-E', 'Gfd6d708c-C']},
'G8ad70d13-E': {'status': 'inactive', 'replacedBy': 'G22262439'},
'Gfd6d708c-C': {'status': 'inactive', 'replacedBy': 'G22262439'}
}
Knowledge base search¶
We provide an API for searching the knowledge base using text queries. This is mainly intended to be used in the process of providing feedback on the results of our analysis. For example, when the correct knowledge base ID is not known, it could be found by a search query using its name. Because many different entities can have the same name, the search results can contain multiple items.
If we define the following function
const itemSearch = async (query, lang) => {
const response = await axios.post('knowledgebase/search', {query: query, language: lang}, config)
return response.data;
};
def itemSearch(query: str, lang: str):
return requests.post(
f'{BASE_URL}knowledgebase/search',
json={'query': query, 'language': lang},
headers=HEADERS
).json()
then we can use it to search for entities named Michael Jordan (in English), for example:
itemSearch('Michael Jordan', 'en').then(report);
itemSearch('Michael Jordan', 'en')
and we get this result:
{
query: 'Michael Jordan',
hits: 8,
itemDetails: [
{
gkbId: 'G41421',
stdForm: {value: 'Michael Jordan', language: 'en'},
description: {value: 'American basketball player and businessman', language: 'en'},
type: 'person'
},
{
gkbId: 'G3308285',
stdForm: {value: 'Michael I. Jordan', language: 'en'},
description: {value: 'American computer scientist, University of California, Berkeley', language: 'en'},
type: 'person'
},
{
gkbId: 'G6831716',
stdForm: {value: 'Michael Jordan', language: 'en'},
description: {value: 'English footballer (born 1984)', language: 'en'},
type: 'person'
},
{
gkbId: 'G65029442',
stdForm: {value: 'Michael Jordan', language: 'en'},
description: {value: 'American football offensive lineman', language: 'en'},
type: 'person'
},
{
gkbId: 'G27069141',
stdForm: {value: 'Michael Jordan', language: 'en'},
description: {value: 'American football cornerback', language: 'en'},
type: 'person'
},
{
gkbId: 'G6831715',
stdForm: {value: 'Michael Jordan', language: 'en'},
description: {value: 'Irish politician', language: 'en'},
type: 'person'
},
{
gkbId: 'G1928047',
stdForm: {value: 'Michael Jordan', language: 'en'},
description: {value: 'German draughtsperson, artist and comics artist', language: 'en'},
type: 'person'
},
{
gkbId: 'G6831719',
stdForm: {value: 'Michael Jordan', language: 'en'},
description: {value: 'British mycologist', language: 'en'},
type: 'person'
}
]
}
{
'query': 'Michael Jordan',
'hits': 8,
'itemDetails': [
{
'gkbId': 'G41421',
'stdForm': {'value': 'Michael Jordan', 'language': 'en'},
'description': {'value': 'American basketball player and businessman', 'language': 'en'},
'type': 'person'
},
{
'gkbId': 'G3308285',
'stdForm': {'value': 'Michael I. Jordan', 'language': 'en'},
'description': {'value': 'American computer scientist, University of California, Berkeley', 'language': 'en'},
'type': 'person'
},
{
'gkbId': 'G6831716',
'stdForm': {'value': 'Michael Jordan', 'language': 'en'},
'description': {'value': 'English footballer (born 1984)', 'language': 'en'},
'type': 'person'
},
{
'gkbId': 'G65029442',
'stdForm': {'value': 'Michael Jordan', 'language': 'en'},
'description': {'value': 'American football offensive lineman', 'language': 'en'},
'type': 'person'
},
{
'gkbId': 'G27069141',
'stdForm': {'value': 'Michael Jordan', 'language': 'en'},
'description': {'value': 'American football cornerback', 'language': 'en'},
'type': 'person'
},
{
'gkbId': 'G6831715',
'stdForm': {'value': 'Michael Jordan', 'language': 'en'},
'description': {'value': 'Irish politician', 'language': 'en'},
'type': 'person'
},
{
'gkbId': 'G1928047',
'stdForm': {'value': 'Michael Jordan', 'language': 'en'},
'description': {'value': 'German draughtsperson, artist and comics artist', 'language': 'en'},
'type': 'person'
},
{
'gkbId': 'G6831719',
'stdForm': {'value': 'Michael Jordan', 'language': 'en'},
'description': {'value': 'British mycologist', 'language': 'en'},
'type': 'person'
}
]
}
Note that at this moment, the search looks for the whole names of entities and not for substring. Therefore, when searching for Jordan, the above entities are not included in the result:
itemSearch('Jordan', 'en').then(report);
{
query: 'Jordan',
hits: 1,
itemDetails: [
{
gkbId: 'G810',
stdForm: {value: 'Jordan', language: 'en'},
description: {value: 'constitutional monarchy in Western Asia', language: 'en'},
type: 'location'
}
]
}
itemSearch('Jordan', 'en')
{
'query': 'Jordan',
'hits': 1,
'itemDetails': [
{
'gkbId': 'G810',
'stdForm': {'value': 'Jordan', 'language': 'en'},
'description': {'value': 'constitutional monarchy in Western Asia', 'language': 'en'},
'type': 'location'
}
]
}