Skip to main content

Keboola App

Our Keboola app makes it easy to use our General API in Keboola Connection, a cloud ETL.

The app can be used to analyze any text, but the standard models are optimized for three domains: news articles, hospitality customer care and transportation customer care. The quality of the results will not be as high if used outside of these domains. In order to ensure the best possible outcome for your domain, we will be happy to provide you with a customized model. We offer a basic customization for free. Contact us at info@geneea.com.

Output Tables

When you run the app, it creates the following tables:

  • analysis-result-documents – document-level results
  • analysis-result-entities – entity-level results
  • analysis-result-relations – contains relations and attributes found
  • analysis-result-sentences – contains information about individual sentences

analysis-result-documents table

The analysis-result-documents table contains document-level results in the following columns:

  • id – all id columns from the input table (used as primary keys)
  • language – detected language of the document, as ISO 639-1 language code
  • sentimentValue – detected sentiment of the document (a decimal number between -1 and 1)
  • sentimentPolarity – detected sentiment of the document (1,0 or -1)
  • sentimentLabel – sentiment of the document as a label (positive, neutral, negative, or ambivalent)
  • sentimentDetailedLabel – similar to sentimentLabel but adding very positive and very negative labels for extreme sentiment.
  • usedChars – the number of characters used by this document

For We bought some excellent wine., the table will contain the following information:

id_articlelanguatesentimentValuesentimentPolaritysentimentLabelsentimentDetailedLabelusedChars
123en0.51positivepositive100

analysis-result-entities table

The analysis-result-entities table contains entity-level results has the following columns:

  • id – all id columns from the input table (used as primary keys)
  • type – type of the found entity, e.g., person, time, number, organization, tag (main topic of the document)
  • text – disambiguated and standardized form of the entity, e.g., John Smith, Keboola, safe carseat
  • score – expresses the importance of a tag in the text
  • entityUid – ID of the entity
  • sentimentValue – detected sentiment of the document (a decimal number between -1 and 1)
  • sentimentPolarity – detected sentiment of the document (1, 0 or -1)
  • sentimentLabel – sentiment of the document as a label (positive, neutral or negative)
  • sentimentDetailedLabel – similar to sentimentLabel but adding very positive and very negative labels for extreme sentiment.

For We bought some excellent wine. and the hospitality domain, the table will contain the following information:

id_articletypetextscoreentityUidsentimentValuesentimentPolaritysentimentLabelsentimentDetailedLabel
2voc-topicfood & drink0HSP-6009
2voc-topicfood & drink > quality0HSP-6015
2fooddrink0HSP-147700neutralneutral
2foodalcoholic drink0HSP-19000neutralneutral
2foodwine1.9HSP-5210.51positivepositive
2tagwine6HSP-521
2tagalcoholic drink6HSP-190
2tagdrink6HSP-1477
2tagbuy(we,wine)3.75
2tagwine: excellent3.75

Notes:

  • There can be multiple rows per one document - each entity will be on a separate row. In some cases when the entity is detected as important and becomes a tag, the same entity will appear on two rows.
  • The variety of entity types depends on the chosen domain. For all domains we distinguish entities such as person, time, number, organization and more. For specific domains we add other types, e.g., food and restaurant for voc-hospitality.
  • For some entities, we perform ontology expansion. For example, in the example above, the text mentions wine, but the table contain multiple entities: wine, alcoholic drink, drink. The exact set is domain and work-flow dependent.
  • Entity sentiment is calculated from the sentiment of the sentence.
  • In short documents, tags are similar to entities.

analysis-result-relations table

The analysis-result-relations table contains relations and attributes found in the text. For example, good in a good pizza or the pizza is good is an attribute of pizza, while eat in John ate a pizza is a relation between John and pizza.

The table has the following columns:

  • id – all id columns from the input table (used as primary keys)
  • typeATTR for an attribute relation, VERB for a verb relation, EXTERNAL for knowledgebase relations
  • name – the standard form of the relation (e.g., expensive for type=ATTR, buy for type=VERB and parent for type=EXTERNAL)
  • negatedtrue for negated relations, false otherwise
  • subject – the subject of the relation or target of the attribute
  • object – the object of the relation, if any
  • subjectType – when the subject is an entity, its type (e.g., organization, food)
  • objectType – when the object is an entity, its type
  • subjectUid – id of the entity
  • objectUid – id of the entity
  • sentimentValue – detected sentiment of the document (a decimal number between -1 and 1)
  • sentimentPolarity – detected sentiment of the document (1, 0 or -1)
  • sentimentLabel – sentiment of the document as a label (positive, neutral or negative)
  • sentimentDetailedLabel – similar to sentimentLabel but adding very positive and very negative labels for extreme sentiment.

For We bought some excellent wine., the table will contain the following information:

id_articletypenamenegatedsubjectobjectsubjectTypeobjectTypesubjectUidobjectUidsentimentValuesentimentPolaritysentimentLabelsentimentDetailedLabel
123VERBbuyfalsewewinefoodHSP-5210.51positivepositive
123ATTRexcellentfalsewinefoodHSP-5210.51positivepositive
123EXTERNALparentfalsewinedrinkfoodfoodHSP-521HSP-147700neutraneutra
123EXTERNALparentfalsewinealcoholic drinkfoodfoodHSP-521HSP-19000neutraneutra

There can be multiple relations per one document.

analysis-result-sentences table

The analysis-result-sentences table contains information about individual sentences in the documents. These results are in beta.

  • id_article – all id columns from the input table (used as primary keys)
  • index – a zero-based index of the sentence in the document
  • segment – segment of the document - text, title or lead
  • text – text of the sentence
  • sentimentValue – detected sentiment of the document (a decimal number between -1 and 1)
  • sentimentPolarity – detected sentiment of the document (1, 0 or -1)
  • sentimentLabel – sentiment of the document as a label (positive, neutral or negative)
  • sentimentDetailedLabel – similar to sentimentLabel, but adding very positive and very negative labels for extreme sentiment.

For We bought some excellent wine., the table will contain the following information:

id_articleindexsegmenttextsentimentValuesentimentPolaritysentimentLabelsentimentDetailedLabel
1231textWe bought some excellent wine.0.51positivepositive