Keboola App

Our Keboola app makes it easy to use our General API in Keboola Connection, a cloud ETL.

The app can be used to analyze any text, but the standard models are optimized for three domains: news articles, hospitality customer care and transportation customer care. The quality of the results will not be as high if used outside of these domains. In order to ensure the best possible outcome for your domain, we will be happy to provide you with a customized model. We offer a basic customization for free. Contact us at info@geneea.com.

Output Tables

When you run the app, it creates the following tables:

  • analysis-result-documents – document-level results

  • analysis-result-entities – entity-level results

  • analysis-result-relations – contains relations and attributes found

  • analysis-result-sentences – contains information about individual sentences

analysis-result-documents table

The analysis-result-documents table contains document-level results in the following columns:

  • id – all id columns from the input table (used as primary keys)

  • language – detected language of the document, as ISO 639-1 language code

  • sentimentValue – detected sentiment of the document (a decimal number between -1 and 1)

  • sentimentPolarity – detected sentiment of the document (1, 0 or -1)

  • sentimentLabel – sentiment of the document as a label (positive, neutral, negative, or ambivalent)

  • sentimentDetailedLabel – similar to sentimentLabel but adding very positive and very negative labels for extreme sentiment.

  • usedChars – the number of characters used by this document

For We bought some excellent wine., the table will contain the following information:

id_article

language

sentimentValue

sentimentPolarity

sentimentLabel

sentimentDetailedLabel

usedChars

123

en

0.5

1

positive

positive

100

analysis-result-entities table

The analysis-result-entities table contains entity-level results has the following columns:

  • id – all id columns from the input table (used as primary keys)

  • type – type of the found entity, e.g. person, time, number, organization, tag (main topic of the document)

  • text – disambiguated and standardized form of the entity, e.g. John Smith, Keboola, safe carseat

  • score – expresses the importance of a tag in the text

  • entityUid – ID of the entity

  • sentimentValue – detected sentiment of the document (a decimal number between -1 and 1)

  • sentimentPolarity – detected sentiment of the document (1, 0 or -1)

  • sentimentLabel – sentiment of the document as a label (positive, neutral or negative)

  • sentimentDetailedLabel – similar to sentimentLabel but adding very positive and very negative labels for extreme sentiment.

For We bought some excellent wine. and the hospitality domain, the table will contain the following information:

id_article

type

text

score

entityUid

sentimentValue

sentimentPolarity

sentimentLabel

sentimentDetailedLabel

2

voc-topic

food & drink

0

HSP-6009

2

voc-topic

food & drink > quality

0

HSP-6015

2

food

drink

0

HSP-1477

0

0

neutral

neutral

2

food

alcoholic drink

0

HSP-190

0

0

neutral

neutral

2

food

wine

1.9

HSP-521

0.5

1

positive

positive

2

tag

wine

6

HSP-521

2

tag

alcoholic drink

6

HSP-190

2

tag

drink

6

HSP-1477

2

tag

buy(we,wine)

3.75

2

tag

wine: excellent

3.75

Notes:

  • There can be multiple rows per one document - each entity will be on a separate row. In some cases when the entity is detected as important and becomes a tag, the same entity will appear on two rows.

  • The variety of entity types depends on the chosen domain. For all domains we distinguish entities such as person, time, number, organization and more. For specific domains we add other types, e.g. food and restaurant for voc-hospitality.

  • For some entities, we perform ontology expansion. For example, in the example above, the text mentions wine, but the table contain multiple entities: wine, alcoholic drink, drink. The exact set is domain and work-flow dependent).

  • Entity sentiment is calculated from the sentiment of the sentence.

  • In short documents, _tags_ are similar

analysis-result-relations table

The analysis-result-relations table contains relations and attributes found in the text. For example, good in a good pizza or the pizza is good is an attribute of pizza, while eat in John ate a pizza is a relation between John and pizza.

The table has the following columns:

  • id – all id columns from the input table (used as primary keys)

  • typeATTR for an attribute relation, VERB for a verb relation, EXTERNAL for knowledgebase relations

  • name – the standard form of the relation (e.g. expensive for type=ATTR, buy for type=VERB and parent for type=EXTERNAL)

  • negatedtrue for negated relations, false otherwise

  • subject – the subject of the relation or target of the attribute

  • object – the object of the relation, if any

  • subjectType – when the subject is an entity, its type (e.g. organization, food)

  • objectType – when the object is an entity, its type

  • subjectUid – id of the entity

  • objectUid – id of the entity

  • sentimentValue – detected sentiment of the document (a decimal number between -1 and 1)

  • sentimentPolarity – detected sentiment of the document (1, 0 or -1)

  • sentimentLabel – sentiment of the document as a label (positive, neutral or negative)

  • sentimentDetailedLabel – similar to sentimentLabel but adding very positive and very negative labels for extreme sentiment.

For We bought some excellent wine., the table will contain the following information:

id_article

type

name

negated

subject

object

subjectType

objectType

subjectUid

objectUid

sentimentValue

sentimentPolarity

sentimentLabel

sentimentDetailedLabel

123

VERB

buy

false

we

wine

food

HSP-521

0.5

1

positive

positive

123

ATTR

excellent

false

wine

food

HSP-521

0.5

1

positive

positive

123

EXTERNAL

parent

false

wine

drink

food

food

HSP-521

HSP-1477

0

0

neutra

neutra

123

EXTERNAL

parent

false

wine

alcoholic drink

food

food

HSP-521

HSP-190

0

0

neutra

neutra

There can be multiple relations per one document.

analysis-result-sentences table

The analysis-result-sentences table contains information about individual sentences in the documents. These results are in beta.

  • id_article – all id columns from the input table (used as primary keys)

  • index – a zero-based index of the sentence in the document

  • segment – segment of the document - text, title or lead

  • text – text of the sentence

  • sentimentValue – detected sentiment of the document (a decimal number between -1 and 1)

  • sentimentPolarity – detected sentiment of the document (1, 0 or -1)

  • sentimentLabel – sentiment of the document as a label (positive, neutral or negative)

  • sentimentDetailedLabel – similar to sentimentLabel but adding very positive and very negative labels for extreme sentiment.

For We bought some excellent wine., the table will contain the following information:

id_article

index

segment

text

sentimentValue

sentimentPolarity

sentimentLabel

sentimentDetailedLabel

123

1

text

We bought some excellent wine.

0.5

1

positive

positive