Skip to main content

Knowledge base

All the returned tags and entities are linked to Geneea Knowledge Base (GKB, Geneea KB). GKB combines existing open data (wikidata, DBpedia, OpenStreetMap, company registries, etc.) with our own private resources. GKB also supports custom properties (e.g., your internal IDs) and items.

This section discusses:

Basic code common for all the guide pages

Basic code

To use the API, you need a valid API key with appropriate authorizations. Please get in touch with us if you do not have it here. In the code below, replace <YOUR_API_KEY> with the API key.

Note that we do not provide SDKs for the API yet, but our G3 SDKs can be used to perform NLP analysis.

No special setup necessary

Entity information (Info boxes)

Sometimes we want to display more than a name of an entity. With the following code, we get a brief description, links to wikipedia, wikidata. More is coming soon.

const kbInfoBoxes = async (input, config) => {
const response = await axios.post('knowledgebase/infoboxes', input, config)
return response.data;
};

input = {
IDs: ['G458', 'G567'],
language: 'fr'
};

kbInfoBoxes(input, config).then(report);
{
G458: {
value: {
title: "Union européenne",
header: "union politico-économique sui generis d'États européens",
body: "",
footer: {
cswiki: "https://cs.wikipedia.org/wiki/Evropská_unie",
enwiki: "https://en.wikipedia.org/wiki/European_Union",
wikidata: "http://www.wikidata.org/entity/Q458",
gkb: "http://gkbs1-env2.eu-west-1.elasticbeanstalk.com/gkbs/v1//item/G458",
},
},
language: "fr",
},
G567: {
value: {
title: "Angela Merkel",
header: "chancelière fédérale allemande",
body: "",
footer: {
cswiki: "https://cs.wikipedia.org/wiki/Angela_Merkelová",
enwiki: "https://en.wikipedia.org/wiki/Angela_Merkel",
facebook: "https://www.facebook.com/AngelaMerkel",
instagram: "https://www.instagram.com/bundeskanzlerin",
wikidata: "http://www.wikidata.org/entity/Q567",
gkb: "http://gkbs1-env2.eu-west-1.elasticbeanstalk.com/gkbs/v1//item/G567",
},
},
language: "fr",
},
}

Multiple Presentation Languages

Knowledge base can be also used when we need to present entities or tags in more than one language. Obviously, we could perform repeated analyses, each with one of the desired presentation language (see Presentation Language). The following code, translates two tags from the example in the photo recommendation guide to Polish:

const kbStdForms = async (input, config) => {
const response = await axios.post('knowledgebase/stdforms', input, config)
return response.data;
};

input = {
IDs: ['G458', 'G567'],
language: 'pl',
};

kbStdForms(input, config).then(report);
{
G458: { value: 'Unia Europejska', language: 'pl' },
G567: { value: 'Angela Merkel', language: 'pl' }
}

Deprecated and Duplicate Ids

Over time, it may happen that duplicate items will be created in the knowledge base. By duplicates, we mean items that have different IDs but represent the same concept in the real world. For example, the two items with IDs G22262439 and G8ad70d13-E that actually represent the same person, Jiří Kulhánek, a Czech local politician, are duplicates.

There are various reasons for such duplicates, for example:

  • Some of the sources for our knowledge base (such as Wikidata) are crowdsourced, and multiple items might have been created independently.
  • A real-world entity is recorded by multiple sources (e.g., a business register and Wikipedia), and linking them is not always straightforward.
  • Knowledge base items that are derived from automated data analysis are inherently noisy. For example, we might identify a John Doe, an actor, and a John Doe, a politician, and create separate knowledge base items for each, only to later discover that Mr. Doe is actually both an actor and a politician.

Whenever such duplicate items are detected, one of them is selected as the primary item, and the remaining items are marked as inactive. This means that they will no longer be used and that only the primary item can appear in the results.

Our API communicates this information in two ways:

See below for more information.

Duplicates in NLP Response

Information about deprecated IDs, is also part of the NLP response (see Article Content Analysis). If any entity or tag has a deprecated duplicate ID, the ID is listed under [feats.duplicateGkbIds]. If there are multiple duplicated IDs, they are separated by a comma.

This can be seen in the following example response where the person Jiří Kulhánek has an active ID [G22262439] but two deprecated IDs. Note that the response is simplified, with many irrelevant keys are omitted.

{
...
entities: [
{
id: "e1",
stdForm: "Jiří Kulhánek",
gkbId: "G22262439",
feats: {
duplicateGkbIds: "G8ad70d13-E,Gfd6d708c-C",
},
},
],
tags: [
{
id: "t1",
stdForm: "Jiří Kulhánek",
gkbId: "G22262439",
feats: {
duplicateGkbIds: "G8ad70d13-E,Gfd6d708c-C",
},
},
],
...
}

Knowledge base items redirects

We can also ask for the status of some IDs directly, using the knowledge base redirect API:

const itemRedirects = async (IDs, config) => {
const response = await axios.post('knowledgebase/redirects', {gkbIds: IDs}, config)
return response.data;
};

IDs = ['G1', 'G22262439', 'G8ad70d13-E', 'Gfd6d708c-C'];

itemRedirects(IDs, config).then(report);

In the result below, you can see, for example, that the item G8ad70d13-E is marked as inactive and being replaced by the item G22262439.

{
G1: {status: 'active'},
G22262439: {status: 'active', replaces: ['G8ad70d13-E', 'Gfd6d708c-C']},
G8ad70d13-E: {status: 'inactive', replacedBy: 'G22262439'},
Gfd6d708c-C: {status: 'inactive', replacedBy: 'G22262439'}
}

We provide an API for searching the knowledge base using text queries. This is mainly intended to be used in the process of providing feedback on the results of our analysis. For example, when the correct knowledge base ID is not known, it could be found by a search query using its name. Because many different entities can have the same name, the search results can contain multiple items.

If we define the following function

const itemSearch = async (query, lang) => {
const response = await axios.post('knowledgebase/search', {query: query, language: lang}, config)
return response.data;
};

then we can use it to search for entities named Michael Jordan (in English), for example:

itemSearch('Michael Jordan', 'en').then(report);

and we get this result:

{
query: "Michael Jordan",
hits: 8,
itemDetails: [
{
gkbId: "G41421",
stdForm: { value: "Michael Jordan", language: "en" },
description: {
value: "American basketball player and businessman",
language: "en",
},
type: "person",
},
{
gkbId: "G3308285",
stdForm: { value: "Michael I. Jordan", language: "en" },
description: {
value: "American computer scientist, University of California, Berkeley",
language: "en",
},
type: "person",
},
{
gkbId: "G6831716",
stdForm: { value: "Michael Jordan", language: "en" },
description: {
value: "English footballer (born 1984)",
language: "en",
},
type: "person",
},
{
gkbId: "G65029442",
stdForm: { value: "Michael Jordan", language: "en" },
description: {
value: "American football offensive lineman",
language: "en",
},
type: "person",
},
{
gkbId: "G27069141",
stdForm: { value: "Michael Jordan", language: "en" },
description: {
value: "American football cornerback",
language: "en",
},
type: "person",
},
{
gkbId: "G6831715",
stdForm: { value: "Michael Jordan", language: "en" },
description: { value: "Irish politician", language: "en" },
type: "person",
},
{
gkbId: "G1928047",
stdForm: { value: "Michael Jordan", language: "en" },
description: {
value: "German draughtsperson, artist and comics artist",
language: "en",
},
type: "person",
},
{
gkbId: "G6831719",
stdForm: { value: "Michael Jordan", language: "en" },
description: { value: "British mycologist", language: "en" },
type: "person",
},
],
};

Note that at this moment, the search looks for the whole names of entities and not for substring. Therefore, when searching for Jordan, the above entities are not included in the result:

itemSearch('Jordan', 'en').then(report);

{
query: 'Jordan',
hits: 1,
itemDetails: [
{
gkbId: 'G810',
stdForm: {value: 'Jordan', language: 'en'},
description: {value: 'constitutional monarchy in Western Asia', language: 'en'},
type: 'location'
}
]
}