Knowledge base

All the returned tags and entities are linked to Geneea Knowledge Base (GKB, Geneea KB). GKB combines existing open data (wikidata, DBpedia, OpenStreetMap, company registries, etc.) with our own private resources. GKB also supports custom properties (e.g. your internal IDs) and items.

This section discusses:

Entity information (Info boxes)

Sometimes we want to display more than a name of an entity. With the following code, we get a brief description, links to wikipedia, wikidata. More is coming soon.

const kbInfoBoxes = async (input, config) => {
    const response = await axios.post('knowledgebase/infoboxes', input, config)
    return response.data;
};

input = {
    IDs: ['G458', 'G567'],
    language: 'fr'
};

kbInfoBoxes(input, config).then(report);
{
  G458: {
    value: {
      title: 'Union européenne',
      header: "union politico-économique sui generis d'États européens",
      body: '',
      footer: {
        cswiki: 'https://cs.wikipedia.org/wiki/Evropská_unie',
        enwiki: 'https://en.wikipedia.org/wiki/European_Union',
        wikidata: 'http://www.wikidata.org/entity/Q458',
        gkb: 'http://gkbs1-env2.eu-west-1.elasticbeanstalk.com/gkbs/v1//item/G458'
      }
    },
    language: 'fr'
  },
  G567: {
    value: {
      title: 'Angela Merkel',
      header: 'chancelière fédérale allemande',
      body: '',
      footer: {
        cswiki: 'https://cs.wikipedia.org/wiki/Angela_Merkelová',
        enwiki: 'https://en.wikipedia.org/wiki/Angela_Merkel',
        facebook: 'https://www.facebook.com/AngelaMerkel',
        instagram: 'https://www.instagram.com/bundeskanzlerin',
        wikidata: 'http://www.wikidata.org/entity/Q567',
        gkb: 'http://gkbs1-env2.eu-west-1.elasticbeanstalk.com/gkbs/v1//item/G567'
      }
    },
    language: 'fr'
  }
}

{
  G458: { value: 'Unia Europejska', language: 'pl' },
  G567: { value: 'Angela Merkel', language: 'pl' }
}

Multiple Presentation Languages

Knowledgebase can be also used when we need to present entities or tags in more than one language. Obviously, we could perform repeated analyses, each with one of the desired presentation language (see Presentation Language). The following code, translates two tags from the example in the :ref:`photo recommendation guide to Polish:

const kbStdForms = async (input, config) => {
    const response = await axios.post('knowledgebase/stdforms', input, config)
    return response.data;
};

input = {
    IDs: ['G458', 'G567'],
    language: 'pl',
};

kbStdForms(input, config).then(report);
{
  G458: { value: 'Unia Europejska', language: 'pl' },
  G567: { value: 'Angela Merkel', language: 'pl' }
}

Deprecated and Duplicate Ids

Over time, it may happen that duplicate items will be created in the knowledge base. By duplicates, we mean items that have different IDs but represent the same concept in the real world. For example, the two items with IDs G22262439 and G8ad70d13-E that actually represent the same person, Jiří Kulhánek, a Czech local politician, are duplicates.

There are various reasons for such duplicates, for example:

  • Some of the sources for our knowledge base (such as Wikidata) are crowdsourced, and multiple items might have been created independently.

  • A real-world entity is recorded by multiple sources (e.g., a business register and Wikipedia), and linking them is not always straightforward.

  • Knowledge base items that are derived from automated data analysis are inherently noisy. For example, we might identify a John Doe, an actor, and a John Doe, a politician, and create separate knowledge base items for each, only to later discover that Mr. Doe is actually both an actor and a politician.

Whenever such duplicate items are detected, one of them is selected as the primary item, and the remaining items are marked as inactive. This means that they will no longer be used and that only the primary item can appear in the results.

Our API communicates this information in two ways:

See below for more information.

Duplicates in NLP Response

Information about deprecated IDs, is also part of the NLP response (see Article Content Analysis). If any entity or tag has a deprecated duplicate ID, the ID is listed under feats.duplicateGkbIds. If there are multiple duplicated IDs, they are separated by a comma.

This can be seen in the following example response where the person Jiří Kulhánek has an active ID G22262439 but two deprecated IDs. Note that the response is simplified, with many irrelevant keys are omitted.

{
  ...
  entities: [
        {
            id 'e1',
            stdForm: 'Jiří Kulhánek',
            gkbId: 'G22262439',
            feats: {
                duplicateGkbIds: 'G8ad70d13-E,Gfd6d708c-C'
            }
        },
  ],
  tags: [
        {
            id: 't1',
            stdForm: 'Jiří Kulhánek',
            gkbId: 'G22262439',
            feats: {
                duplicateGkbIds: 'G8ad70d13-E,Gfd6d708c-C'
            }
        }
  ],
  ...
}

Knowledge base items redirects

We can also ask for the status of some IDs directly, using the knowledge base redirect API:

const itemRedirects = async (IDs, config) => {
    const response = await axios.post('knowledgebase/redirects', {gkbIds: IDs}, config)
    return response.data;
};

IDs = ['G1', 'G22262439', 'G8ad70d13-E', 'Gfd6d708c-C'];

itemRedirects(IDs, config).then(report);

In the result below, you can see, for example, that the item G8ad70d13-E is marked as inactive and being replaced by the item G22262439.

{
    G1: {status: 'active'},
    G22262439: {status: 'active', replaces: ['G8ad70d13-E', 'Gfd6d708c-C']},
    G8ad70d13-E: {status: 'inactive', replacedBy: 'G22262439'},
    Gfd6d708c-C: {status: 'inactive', replacedBy: 'G22262439'}
}