Keboola App¶

Our Keboola app makes it easy to use our General API in Keboola Connection, a cloud ETL.

The app can be used to analyze any text, but the standard models are optimized for three domains: news articles, hospitality customer care and transportation customer care. The quality of the results will not be as high if used outside of these domains. In order to ensure the best possible outcome for your domain, we will be happy to provide you with a customized model. We offer a basic customization for free. Contact us at info@geneea.com.

Output Tables¶

When you run the app, it creates the following tables:

analysis-result-documents – document-level results
analysis-result-entities – entity-level results
analysis-result-relations – contains relations and attributes found
analysis-result-sentences – contains information about individual sentences

analysis-result-documents table¶

The analysis-result-documents table contains document-level results in the following columns:

id – all id columns from the input table (used as primary keys)

language – detected language of the document, as ISO 639-1 language code

sentimentValue – detected sentiment of the document (a decimal number between -1 and 1)

sentimentPolarity – detected sentiment of the document (1, 0 or -1)

sentimentLabel – sentiment of the document as a label (positive, neutral, negative, or ambivalent)

sentimentDetailedLabel – similar to sentimentLabel but adding very positive and very negative labels for extreme sentiment.

usedChars – the number of characters used by this document

For We bought some excellent wine., the table will contain the following information:

id_article	language	sentimentValue	sentimentPolarity	sentimentLabel	sentimentDetailedLabel	usedChars
123	en	0.5	1	positive	positive	100

analysis-result-entities table¶

The analysis-result-entities table contains entity-level results has the following columns:

id – all id columns from the input table (used as primary keys)

type – type of the found entity, e.g. person, time, number, organization, tag (main topic of the document)

text – disambiguated and standardized form of the entity, e.g. John Smith, Keboola, safe carseat

score – expresses the importance of a tag in the text

entityUid – ID of the entity

sentimentValue – detected sentiment of the document (a decimal number between -1 and 1)

sentimentPolarity – detected sentiment of the document (1, 0 or -1)

sentimentLabel – sentiment of the document as a label (positive, neutral or negative)

sentimentDetailedLabel – similar to sentimentLabel but adding very positive and very negative labels for extreme sentiment.

For We bought some excellent wine. and the hospitality domain, the table will contain the following information:

id_article	type	text	score	entityUid	sentimentValue	sentimentPolarity	sentimentLabel	sentimentDetailedLabel
2	voc-topic	food & drink	0	HSP-6009
2	voc-topic	food & drink > quality	0	HSP-6015
2	food	drink	0	HSP-1477	0	0	neutral	neutral
2	food	alcoholic drink	0	HSP-190	0	0	neutral	neutral
2	food	wine	1.9	HSP-521	0.5	1	positive	positive
2	tag	wine	6	HSP-521
2	tag	alcoholic drink	6	HSP-190
2	tag	drink	6	HSP-1477
2	tag	buy(we,wine)	3.75
2	tag	wine: excellent	3.75

Notes:

There can be multiple rows per one document - each entity will be on a separate row. In some cases when the entity is detected as important and becomes a tag, the same entity will appear on two rows.

The variety of entity types depends on the chosen domain. For all domains we distinguish entities such as person, time, number, organization and more. For specific domains we add other types, e.g. food and restaurant for voc-hospitality.

For some entities, we perform ontology expansion. For example, in the example above, the text mentions wine, but the table contain multiple entities: wine, alcoholic drink, drink. The exact set is domain and work-flow dependent).

Entity sentiment is calculated from the sentiment of the sentence.

In short documents, _tags_ are similar

analysis-result-relations table¶

The analysis-result-relations table contains relations and attributes found in the text. For example, good in a good pizza or the pizza is good is an attribute of pizza, while eat in John ate a pizza is a relation between John and pizza.

The table has the following columns:

id – all id columns from the input table (used as primary keys)

type – ATTR for an attribute relation, VERB for a verb relation, EXTERNAL for knowledgebase relations

name – the standard form of the relation (e.g. expensive for type=ATTR, buy for type=VERB and parent for type=EXTERNAL)

negated – true for negated relations, false otherwise

subject – the subject of the relation or target of the attribute

object – the object of the relation, if any

subjectType – when the subject is an entity, its type (e.g. organization, food)

objectType – when the object is an entity, its type

subjectUid – id of the entity

objectUid – id of the entity

sentimentValue – detected sentiment of the document (a decimal number between -1 and 1)

sentimentPolarity – detected sentiment of the document (1, 0 or -1)

sentimentLabel – sentiment of the document as a label (positive, neutral or negative)

sentimentDetailedLabel – similar to sentimentLabel but adding very positive and very negative labels for extreme sentiment.

For We bought some excellent wine., the table will contain the following information:

id_article	type	name	negated	subject	object	subjectType	objectType	subjectUid	objectUid	sentimentValue	sentimentPolarity	sentimentLabel	sentimentDetailedLabel
123	VERB	buy	false	we	wine		food		HSP-521	0.5	1	positive	positive
123	ATTR	excellent	false	wine		food		HSP-521		0.5	1	positive	positive
123	EXTERNAL	parent	false	wine	drink	food	food	HSP-521	HSP-1477	0	0	neutra	neutra
123	EXTERNAL	parent	false	wine	alcoholic drink	food	food	HSP-521	HSP-190	0	0	neutra	neutra

There can be multiple relations per one document.

analysis-result-sentences table¶

The analysis-result-sentences table contains information about individual sentences in the documents. These results are in beta.

id_article – all id columns from the input table (used as primary keys)

index – a zero-based index of the sentence in the document

segment – segment of the document - text, title or lead

text – text of the sentence

sentimentValue – detected sentiment of the document (a decimal number between -1 and 1)

sentimentPolarity – detected sentiment of the document (1, 0 or -1)

sentimentLabel – sentiment of the document as a label (positive, neutral or negative)

sentimentDetailedLabel – similar to sentimentLabel but adding very positive and very negative labels for extreme sentiment.

For We bought some excellent wine., the table will contain the following information:

id_article	index	segment	text	sentimentValue	sentimentPolarity	sentimentLabel	sentimentDetailedLabel
123	1	text	We bought some excellent wine.	0.5	1	positive	positive