Keboola App¶
Our Keboola app makes it easy to use our General API in Keboola Connection, a cloud ETL.
The app can be used to analyze any text, but the standard models are optimized for three domains: news articles, hospitality customer care and transportation customer care. The quality of the results will not be as high if used outside of these domains. In order to ensure the best possible outcome for your domain, we will be happy to provide you with a customized model. We offer a basic customization for free. Contact us at info@geneea.com.
Output Tables¶
When you run the app, it creates the following tables:
analysis-result-documents
– document-level resultsanalysis-result-entities
– entity-level resultsanalysis-result-relations
– contains relations and attributes foundanalysis-result-sentences
– contains information about individual sentences
analysis-result-documents table¶
The analysis-result-documents
table contains document-level results in the following columns:
id
– all id columns from the input table (used as primary keys)
language
– detected language of the document, as ISO 639-1 language code
sentimentValue
– detected sentiment of the document (a decimal number between-1
and1
)
sentimentPolarity
– detected sentiment of the document (1
,0
or-1
)
sentimentLabel
– sentiment of the document as a label (positive
,neutral
,negative
, orambivalent
)
sentimentDetailedLabel
– similar tosentimentLabel
but addingvery positive
andvery negative
labels for extreme sentiment.
usedChars
– the number of characters used by this document
For We bought some excellent wine., the table will contain the following information:
id_article |
language |
sentimentValue |
sentimentPolarity |
sentimentLabel |
sentimentDetailedLabel |
usedChars |
---|---|---|---|---|---|---|
123 |
en |
0.5 |
1 |
positive |
positive |
100 |
analysis-result-entities table¶
The analysis-result-entities
table contains entity-level results has the following columns:
id
– all id columns from the input table (used as primary keys)
type
– type of the found entity, e.g.person
,time
,number
,organization
,tag
(main topic of the document)
text
– disambiguated and standardized form of the entity, e.g. John Smith, Keboola, safe carseat
score
– expresses the importance of a tag in the text
entityUid
– ID of the entity
sentimentValue
– detected sentiment of the document (a decimal number between-1
and1
)
sentimentPolarity
– detected sentiment of the document (1
,0
or-1
)
sentimentLabel
– sentiment of the document as a label (positive
,neutral
ornegative
)
sentimentDetailedLabel
– similar tosentimentLabel
but addingvery positive
andvery negative
labels for extreme sentiment.
For We bought some excellent wine. and the hospitality domain, the table will contain the following information:
id_article |
type |
text |
score |
entityUid |
sentimentValue |
sentimentPolarity |
sentimentLabel |
sentimentDetailedLabel |
---|---|---|---|---|---|---|---|---|
2 |
voc-topic |
food & drink |
0 |
HSP-6009 |
||||
2 |
voc-topic |
food & drink > quality |
0 |
HSP-6015 |
||||
2 |
food |
drink |
0 |
HSP-1477 |
0 |
0 |
neutral |
neutral |
2 |
food |
alcoholic drink |
0 |
HSP-190 |
0 |
0 |
neutral |
neutral |
2 |
food |
wine |
1.9 |
HSP-521 |
0.5 |
1 |
positive |
positive |
2 |
tag |
wine |
6 |
HSP-521 |
||||
2 |
tag |
alcoholic drink |
6 |
HSP-190 |
||||
2 |
tag |
drink |
6 |
HSP-1477 |
||||
2 |
tag |
buy(we,wine) |
3.75 |
|||||
2 |
tag |
wine: excellent |
3.75 |
Notes:
There can be multiple rows per one document - each entity will be on a separate row. In some cases when the entity is detected as important and becomes a tag, the same entity will appear on two rows.
The variety of entity types depends on the chosen domain. For all domains we distinguish entities such as
person
,time
,number
,organization
and more. For specific domains we add other types, e.g.food
andrestaurant
forvoc-hospitality
.For some entities, we perform ontology expansion. For example, in the example above, the text mentions wine, but the table contain multiple entities:
wine
,alcoholic drink
,drink
. The exact set is domain and work-flow dependent).Entity sentiment is calculated from the sentiment of the sentence.
In short documents, _tags_ are similar
analysis-result-relations table¶
The analysis-result-relations
table contains relations and attributes found in the text. For example, good in a good pizza or the pizza is good is an attribute of pizza, while eat in John ate a pizza is a relation between John and pizza.
The table has the following columns:
id
– all id columns from the input table (used as primary keys)
type
–ATTR
for an attribute relation,VERB
for a verb relation,EXTERNAL
for knowledgebase relations
name
– the standard form of the relation (e.g.expensive
fortype=ATTR
,buy
fortype=VERB
andparent
fortype=EXTERNAL
)
negated
–true
for negated relations,false
otherwise
subject
– the subject of the relation or target of the attribute
object
– the object of the relation, if any
subjectType
– when the subject is an entity, its type (e.g.organization
,food
)
objectType
– when the object is an entity, its type
subjectUid
– id of the entity
objectUid
– id of the entity
sentimentValue
– detected sentiment of the document (a decimal number between-1
and1
)
sentimentPolarity
– detected sentiment of the document (1
,0
or-1
)
sentimentLabel
– sentiment of the document as a label (positive
,neutral
ornegative
)
sentimentDetailedLabel
– similar tosentimentLabel
but addingvery positive
andvery negative
labels for extreme sentiment.
For We bought some excellent wine., the table will contain the following information:
id_article |
type |
name |
negated |
subject |
object |
subjectType |
objectType |
subjectUid |
objectUid |
sentimentValue |
sentimentPolarity |
sentimentLabel |
sentimentDetailedLabel |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
123 |
VERB |
buy |
false |
we |
wine |
food |
HSP-521 |
0.5 |
1 |
positive |
positive |
||
123 |
ATTR |
excellent |
false |
wine |
food |
HSP-521 |
0.5 |
1 |
positive |
positive |
|||
123 |
EXTERNAL |
parent |
false |
wine |
drink |
food |
food |
HSP-521 |
HSP-1477 |
0 |
0 |
neutra |
neutra |
123 |
EXTERNAL |
parent |
false |
wine |
alcoholic drink |
food |
food |
HSP-521 |
HSP-190 |
0 |
0 |
neutra |
neutra |
There can be multiple relations per one document.
analysis-result-sentences table¶
The analysis-result-sentences
table contains information about individual sentences in the documents. These results are in beta.
id_article
– all id columns from the input table (used as primary keys)
index
– a zero-based index of the sentence in the document
segment
– segment of the document -text
,title
orlead
text
– text of the sentence
sentimentValue
– detected sentiment of the document (a decimal number between-1
and1
)
sentimentPolarity
– detected sentiment of the document (1
,0
or-1
)
sentimentLabel
– sentiment of the document as a label (positive
,neutral
ornegative
)
sentimentDetailedLabel
– similar tosentimentLabel
but addingvery positive
andvery negative
labels for extreme sentiment.
For We bought some excellent wine., the table will contain the following information:
id_article |
index |
segment |
text |
sentimentValue |
sentimentPolarity |
sentimentLabel |
sentimentDetailedLabel |
---|---|---|---|---|---|---|---|
123 |
1 |
text |
We bought some excellent wine. |
0.5 |
1 |
positive |
positive |