Keboola App
Our Keboola app makes it easy to use our General API in Keboola Connection, a cloud ETL.
The app can be used to analyze any text, but the standard models are
optimized for three domains: news articles, hospitality customer care
and transportation customer care. The quality of the results will not be
as high if used outside of these domains. In order to ensure the best
possible outcome for your domain, we will be happy to provide you with a
customized model. We offer a basic customization for free. Contact us at
info@geneea.com
.
Output Tables
When you run the app, it creates the following tables:
analysis-result-documents
– document-level resultsanalysis-result-entities
– entity-level resultsanalysis-result-relations
– contains relations and attributes foundanalysis-result-sentences
– contains information about individual sentences
analysis-result-documents table
The analysis-result-documents
table contains document-level results in the following columns:
id
– all id columns from the input table (used as primary keys)language
– detected language of the document, as ISO 639-1 language codesentimentValue
– detected sentiment of the document (a decimal number between-1
and1
)sentimentPolarity
– detected sentiment of the document (1
,0
or-1
)sentimentLabel
– sentiment of the document as a label (positive
,neutral
,negative
, orambivalent
)sentimentDetailedLabel
– similar tosentimentLabel
but addingvery positive
andvery negative
labels for extreme sentiment.usedChars
– the number of characters used by this document
For We bought some excellent wine., the table will contain the following information:
id_article | languate | sentimentValue | sentimentPolarity | sentimentLabel | sentimentDetailedLabel | usedChars |
---|---|---|---|---|---|---|
123 | en | 0.5 | 1 | positive | positive | 100 |
analysis-result-entities table
The analysis-result-entities
table contains entity-level results has
the following columns:
id
– all id columns from the input table (used as primary keys)type
– type of the found entity, e.g.,person
,time
,number
,organization
,tag
(main topic of the document)text
– disambiguated and standardized form of the entity, e.g., John Smith, Keboola, safe carseatscore
– expresses the importance of a tag in the textentityUid
– ID of the entitysentimentValue
– detected sentiment of the document (a decimal number between-1
and1
)sentimentPolarity
– detected sentiment of the document (1
,0
or-1
)sentimentLabel
– sentiment of the document as a label (positive
,neutral
ornegative
)sentimentDetailedLabel
– similar tosentimentLabel
but addingvery positive
andvery negative
labels for extreme sentiment.
For We bought some excellent wine. and the hospitality domain, the table will contain the following information:
id_article | type | text | score | entityUid | sentimentValue | sentimentPolarity | sentimentLabel | sentimentDetailedLabel |
---|---|---|---|---|---|---|---|---|
2 | voc-topic | food & drink | 0 | HSP-6009 | ||||
2 | voc-topic | food & drink > quality | 0 | HSP-6015 | ||||
2 | food | drink | 0 | HSP-1477 | 0 | 0 | neutral | neutral |
2 | food | alcoholic drink | 0 | HSP-190 | 0 | 0 | neutral | neutral |
2 | food | wine | 1.9 | HSP-521 | 0.5 | 1 | positive | positive |
2 | tag | wine | 6 | HSP-521 | ||||
2 | tag | alcoholic drink | 6 | HSP-190 | ||||
2 | tag | drink | 6 | HSP-1477 | ||||
2 | tag | buy(we,wine) | 3.75 | |||||
2 | tag | wine: excellent | 3.75 |
Notes:
- There can be multiple rows per one document - each entity will be on a separate row. In some cases when the entity is detected as important and becomes a tag, the same entity will appear on two rows.
- The variety of entity types depends on the chosen domain.
For all domains we distinguish entities such as
person
,time
,number
,organization
and more. For specific domains we add other types, e.g.,food
andrestaurant
forvoc-hospitality
. - For some entities, we perform ontology expansion.
For example, in the example above, the text mentions wine,
but the table contain multiple entities:
wine
,alcoholic drink
,drink
. The exact set is domain and work-flow dependent. - Entity sentiment is calculated from the sentiment of the sentence.
- In short documents, tags are similar to entities.
analysis-result-relations table
The analysis-result-relations
table contains relations and attributes found in the text.
For example,
good in a good pizza or the pizza is good is an attribute of pizza, while
eat in John ate a pizza is a relation between John and pizza.
The table has the following columns:
id
– all id columns from the input table (used as primary keys)type
–ATTR
for an attribute relation,VERB
for a verb relation,EXTERNAL
for knowledgebase relationsname
– the standard form of the relation (e.g.,expensive
fortype=ATTR
,buy
fortype=VERB
andparent
fortype=EXTERNAL
)negated
–true
for negated relations,false
otherwisesubject
– the subject of the relation or target of the attributeobject
– the object of the relation, if anysubjectType
– when the subject is an entity, its type (e.g.,organization
,food
)objectType
– when the object is an entity, its typesubjectUid
– id of the entityobjectUid
– id of the entitysentimentValue
– detected sentiment of the document (a decimal number between-1
and1
)sentimentPolarity
– detected sentiment of the document (1
,0
or-1
)sentimentLabel
– sentiment of the document as a label (positive
,neutral
ornegative
)sentimentDetailedLabel
– similar tosentimentLabel
but addingvery positive
andvery negative
labels for extreme sentiment.
For We bought some excellent wine., the table will contain the following information:
id_article | type | name | negated | subject | object | subjectType | objectType | subjectUid | objectUid | sentimentValue | sentimentPolarity | sentimentLabel | sentimentDetailedLabel |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
123 | VERB | buy | false | we | wine | food | HSP-521 | 0.5 | 1 | positive | positive | ||
123 | ATTR | excellent | false | wine | food | HSP-521 | 0.5 | 1 | positive | positive | |||
123 | EXTERNAL | parent | false | wine | drink | food | food | HSP-521 | HSP-1477 | 0 | 0 | neutra | neutra |
123 | EXTERNAL | parent | false | wine | alcoholic drink | food | food | HSP-521 | HSP-190 | 0 | 0 | neutra | neutra |
There can be multiple relations per one document.
analysis-result-sentences table
The analysis-result-sentences
table contains information about individual sentences in the documents.
These results are in beta.
id_article
– all id columns from the input table (used as primary keys)index
– a zero-based index of the sentence in the documentsegment
– segment of the document -text
,title
orlead
text
– text of the sentencesentimentValue
– detected sentiment of the document (a decimal number between-1
and1
)sentimentPolarity
– detected sentiment of the document (1
,0
or-1
)sentimentLabel
– sentiment of the document as a label (positive
,neutral
ornegative
)sentimentDetailedLabel
– similar tosentimentLabel
, but addingvery positive
andvery negative
labels for extreme sentiment.
For We bought some excellent wine., the table will contain the following information:
id_article | index | segment | text | sentimentValue | sentimentPolarity | sentimentLabel | sentimentDetailedLabel |
---|---|---|---|---|---|---|---|
123 | 1 | text | We bought some excellent wine. | 0.5 | 1 | positive | positive |