Feedback

If you provide feedback to us, our models will learn from it. This means the quality is automatically tuned, and even your preferences are taken into account.

In the following examples, we show how to provide feedback on tags and entities.

Providing feedback for sentiment is analogous, see API reference.

Feedback on Tags

Opt-in/Opt-out/Mixed

The feedback form is determined by the way in which tag suggestions are accepted or rejected by the journalists. There are three basic scenarios for using Geneea’s tag suggestions:

  • opt-in: journalists accept some of the tags suggested by the API; the rest is not used.

  • opt-out: journalists reject some of the tags suggested by the API; the rest is used.

  • mixed: tags are separated into two groups by their score: the most relevant tags are in the opt-out mode (journalists reject rare errors), and the rest are in the opt-in mode (journalists pick rare omissions).

In the opt-in case (when no tag is used without an explicit approval), the following type of feedback is typically sent:

  • accepted-actively: the tag was explicitly selected from the suggestions as correct and relevant,

  • rejected-passively: the tag was not selected (it is either incorrect or not relevant),

  • rejected-actively: a stronger version of rejected-passively, when the tag is explicitly marked as non-desirable.

  • expected: the tag was not part of the suggestions, but should have been.

In the opt-out case (when all tags are used, unless explicitly forbidden), the following type of feedback is typically sent:

  • rejected-actively: the tag was explicitly marked as wrong (it is either incorrect or not relevant).

  • accepted-passively: the tag was not removed from the suggestions,

  • expected: the tag was not part of the suggestions but should have been.

In all three scenarios, the rejected-actively can be split into three more informative values:

  • rejected-actively-wrong: the tag has been explicitly rejected because the article does not mention the concept at all (e.g., it might mention a different person or place with the same name).

  • rejected-actively-marginal: the tag has been explicitly rejected because despite being correct, is not relevant enough to be selected, for example, it is mentioned only marginally.

  • blocked: the tag should never be returned (e.g., the tag is too specific and a more general tag should be returned)

The rejected-actively value always works in their place, but providing the more specific values helps the system to improve its predictions faster.

Technical details

Feedback on tags consists of two main pieces of information: a tag and its status. The tag is the GKB id and/or the standard form of a tag that was returned or should have been returned. The status expresses whether the tag was returned correctly, incorrectly or not at all. The subset of reasonable tags depents on

Specifying the GKB id for a tag is optional but highly desirable as it allows automatic processing of the feedback. The tags without a GKB ids are regularly reviewed by our data scientists.

To summarize the most common status values:

  • accepted-actively: the tag has been explicitly accepted as correct,

  • accepted-passively: the tag has been accepted, but not explicitly. Typically, this means it was not rejected in the opt-out mode.

  • rejected-actively: the tag has been explicitly rejected. Optionally, a more specific value can be used instead:

    • rejected-actively-wrong: the tag has been explicitly rejected because the article does not mention the concept at all (e.g., it might mention a different person or place with the same name); more specific than rejected-actively.

    • rejected-actively-marginal: the tag has been explicitly rejected because despite being correct, is not relevant enough to be selected, for example, it is mentioned only marginally (more specific than rejected-actively)

    • blocked: the tag should never be returned (e.g., the tag is too specific and a more general tag should be returned)

  • rejected-passively: the tag has been rejected but not explicitly. Typically, this means it was not accepted in the opt-in mode.

  • expected: the tag should have been returned but wasn’t,

We also accept the following legacy values: correct (prefer accepted-actively), accepted (accepted-passively), wrong (rejected-actively), and rejected (rejected-passively).

Example

As an example, consider the following article (source: Reuters). When we use the API to obtain tags:

POST https://media-api.geneea.com/v2/nlp/analyze
{
  "id": "article-123",
  "title": "Tesla to accept Dogecoin as payment for merchandise, says Musk",
  "text": "Dec 14 (Reuters) - Tesla Inc (TSLA.O) chief Elon Musk said on Tuesday the electric carmaker will accept Dogecoin as payment for merchandise on a test basis, sending the meme-based cryptocurrency up 24%.",
  "language": "en"
}

we receive the following results (depending on your account, there might be other features than tags):

{
  "id": "article-123",
  "language": { "detected": "en" },
  "tags": [
    { "id": "t0", "gkbId": "G478214", "stdForm": "Tesla, Inc.", "type": "media", "relevance": 24.33},
    { "id": "t1", "gkbId": "G317521", "stdForm": "Elon Musk", "type": "media", "relevance": 22.5},
    { "id": "t2", "gkbId": "G15377916", "stdForm": "Dogecoin", "type": "media", "relevance": 19.24},
    { "id": "t3", "gkbId": "G130879", "stdForm": "Reuters", "type": "media", "relevance": 9.275 }
  ],
  "metadata": {"referenceKey": "211201-103000-d64a0290"}
}

Let’s assume that the journalist is using the opt-in system, i.e., they select some tags. Namely, they: - select Tesla and Elon Musk, - explicitly mark Reuters as not relevant (they could have just left it alone and it would get the status rejected-actively, but they decided to be more specific, which is nice) - are missing cryptocurrency (GKB id G13479982) among the results.

To provide this feedback, we can use the following call:

POST https://media-api.geneea.com/v2/nlp/analyze/feedback
{
  "docId": "article-123",
  "referenceKey": "211201-103000-d64a0290",
  "tags": [
    { "gkbId": "G478214", "stdForm": "Tesla, Inc.", "status": "accepted-actively"},
    { "gkbId": "G317521", "stdForm": "Elon Musk", "status": "accepted-actively"},
    { "gkbId": "G130879", "stdForm": "Reuters", "status": "rejected-actively-marginal", "comment": "Reuters not wanted" },
    { "gkbId": "G13479982", "stdForm": "cryptocurrency", "status": "expected", "comment": "cryptocurrency is missing"}
  ]
}

Providing the id of the originally analyzed document (article-123) is required and providing the referenceKey identifying the actual analysis (211201-103000-d64a0290) is highly recommended. For each tag, specify its GKB id, or standard form, or both. Specifying the GKB id is recommended. However, for expected but missing items, it is often unknown. In that case, specify the standard form only.

Feedback on Entities

Providing feedback for entities is analogous:

POST https://media-api.geneea.com/v2/nlp/analyze/feedback
{
  "docId": "article-123",
  "referenceKey": "211201-103000-d64a0290",
  "entities": [
    { "gkbId": "G130879", "status": "accepted-actively", "comment": "Reuters is ok as an entity" },
    { "stdForm": "cryptocurrency", "status": "expected", "comment": "cryptocurrency is missing"},
    { "gkbId": "G478214", "stdForm": "Tesla, Inc.", "status": "accepted-actively"},
    { "gkbId": "G317521", "stdForm": "John Smith", "status": "accepted-actively"}
  ]
}