geneeanlpclient.g3 package

SDK encapsulating Geneea General API (G3) - see G3 API documentation. For an example of how to call the General API from Python 3.6+ - see G3 API documentation.

The SDK has the following main parts:

  • Client – a simple REST client

  • Request – an object encapsulating the request the Client send to the G3 API; typically it is built via RequestBuilder

  • G3 – object encapsulating the response of the API

  • Readers/Writers – objects for reading/writing from/to json

The client

This module implements function that encapsulate Geneea General NLP REST API V3 calls.

class geneeanlpclient.g3.client.Client(*, url: str, userKey: str)[source]

Bases: object

DEFAULT_URL = 'https://api.geneea.com/v3/analysis'

The default address of the Geneea NLP G3 API.

analyze(req: geneeanlpclient.g3.request.Request, *, connectTimeout: float = 3.05, readTimeout: float = 600) → geneeanlpclient.g3.model.G3[source]

Call Geneea G3 API.

Return type

G3

Parameters
Returns

analysis as an G3 object

static create(*, url: str = 'https://api.geneea.com/v3/analysis', userKey: str = None) → geneeanlpclient.g3.client.Client[source]

Call Geneea G3 Client.

Parameters
  • url (str) – Interpretor API URL to call

  • userKey (str) – API user key, if not specified, loaded from GENEEA_API_KEY environment variable

Returns

G3 client

The Request Objects

class geneeanlpclient.g3.request.AnalysisType[source]

Bases: enum.Enum

The linguistic analyses the G3 API can perform; more detail

ALL = 1

Perform all analyses at once

ENTITIES = 2

Recognize and standardize entities in text; more detail

LANGUAGE = 6

Detect the language the text is written in; more detail

RELATIONS = 4

Relations between entities and their attributes; more detail

SENTIMENT = 5

Detect the emotions of the author contained in the text; more detail

TAGS = 3

Assign semantic tags to a document. more detail

parse = <function AnalysisType.parse>[source]
class geneeanlpclient.g3.request.Diacritization[source]

Bases: enum.Enum

Supported diacritization modes.

AUTO = 'auto'

Diacritics is added if needed.

NONE = 'none'

No diacritization is performed.

REDO = 'redo'

Diacritics is first removed and then added if needed.

YES = 'yes'

Diacritics is added to words without it if needed.

class geneeanlpclient.g3.request.Domain[source]

Bases: enum.Enum

Typically used domains. For more info see.

MEDIA = 'media'

General media articles.

NEWS = 'news'

Media articles covering news.

SPORT = 'sport'

Media articles covering sport news.

TABLOID = 'tabloid'

Tabloid articles.

TECH = 'tech'

Media articles covering technology and science.

VOC = 'voc'

General Voice-of-the customer documents (e.g. reviews).

VOC_BANKING = 'voc-banking'

Voice-of-the customer documents covering banking (e.g. reviews of banks).

VOC_HOSPITALITY = 'voc-hospitality'

Voice-of-the customer documents covering restaurants (e.g. reviews of restaurants).

class geneeanlpclient.g3.request.LanguageCode[source]

Bases: enum.Enum

Typically used ISO 639-1 language codes.

CS = 'cs'
DE = 'de'
EN = 'en'
ES = 'es'
PL = 'pl'
SK = 'sk'
class geneeanlpclient.g3.request.ParaSpec(type, text)[source]

Bases: tuple

static abstract(text: str) → geneeanlpclient.g3.request.ParaSpec[source]

Paragraph representing an abstract (lead or perex) of the whole document.

static body(text: str) → geneeanlpclient.g3.request.ParaSpec[source]

Paragraph containing regular text (for now this is used for the whole body of the document).

property text

Text of the paragraph

static title(text: str) → geneeanlpclient.g3.request.ParaSpec[source]

Paragraph representing a title of the whole document. Also used for email subjects.

property type

Type of the paragraphs, typically one of Paragraph.TYPE_TITLE, Paragraph.TYPE_ABSTRACT, Paragraph.TYPE_BODY; possibly Paragraph.TYPE_SECTION_HEADING

class geneeanlpclient.g3.request.Request(id, title, text, paraSpecs, analyses, language, langDetectPrior, domain, textType, referenceDate, diacritization, returnMentions, returnItemSentiment, metadata, custom)[source]

Bases: tuple

class Builder(*, analyses: Iterable[geneeanlpclient.g3.request.AnalysisType] = None, language: Union[geneeanlpclient.g3.request.LanguageCode, str] = None, langDetectPrior: str = None, domain: Union[geneeanlpclient.g3.request.Domain, str] = None, textType: Union[geneeanlpclient.g3.request.TextType, str] = None, referenceDate: Union[datetime.date, str] = None, diacritization: Union[geneeanlpclient.g3.request.Diacritization, str] = None, returnMentions: bool = False, returnItemSentiment: bool = False, metadata: Mapping[str, str] = None, customConfig: Dict[str, Any] = None)[source]

Bases: object

build(*, id: Union[str, int] = None, title: str = None, text: str = None, paraSpecs: List[geneeanlpclient.g3.request.ParaSpec] = None, language: Union[geneeanlpclient.g3.request.LanguageCode, str] = None, referenceDate: Union[datetime.date, str] = None, metadata: Mapping[str, str] = None, customConfig: Dict[str, Any] = None) → geneeanlpclient.g3.request.Request[source]

Creates a new request object to be passed to the G3 client.

Parameters
  • id – Unique identifier of the document

  • title (str) – The title or subject of the document, when available; mutually exclusive with the paraSpecs parameter

  • text (str) – The main text of the document; mutually exclusive with the paraSpecs parameter

  • paraSpecs (List[ParaSpec]) – The document paragraphs; mutually exclusive with title and text parameters.

  • language – The language of the document as ISO 639-1; auto-detection will be used if None.

  • referenceDate – Date to be used for the analysis as a reference; values: NOW or in format YYYY-MM-DD

  • metadata (Mapping) – extra non-NLP type of information related to the document, key-value pairs

  • customConfig (Dict) – Any custom options passed to the G3 API endpoint

Returns

Request object to be passed to the G3 client.

Return type

Request

setCustomConfig(**customConfig) → geneeanlpclient.g3.request.Request.Builder[source]

Add custom options to the request builder. Existing custom options are overwritten.

Parameters

customConfig – Any custom options passed to the G3 API endpoint

Returns

The builder for fluent style chaining.

STD_KEYS = frozenset({'analyses', 'diacritization', 'domain', 'htmlExtractor', 'id', 'langDetectPrior', 'language', 'metadata', 'paraSpecs', 'referenceDate', 'returnItemSentiment', 'returnMentions', 'text', 'textType', 'title'})

Standard keys used by the G3 request.

property analyses

What analyses to return

property custom

Alias for field number 14

property diacritization

Determines whether to perform text diacritization

property domain

The source domain from which the document originates. See the available domains.

static fromDict(raw: Dict[str, Any]) → geneeanlpclient.g3.request.Request[source]

Reads a request object from a json-like dictionary.

property id

Unique identifier of the document

property langDetectPrior

The language detection prior; e.g. ‘de,en’.

property language

The language of the document as ISO 639-1; auto-detection will be used if omitted.

property metadata

Extra non-NLP type of information related to the document, key-value pairs

property paraSpecs

The document paragraphs; mutually exclusive with title and text parameters.

property referenceDate

Date to be used for the analysis as a reference; values: “NOW” or in format YYYY-MM-DD.

property returnItemSentiment

Should entity/mention/tag/relation etc. sentiment be returned? No sentiment is returned if None

property returnMentions

Should entity/tag/relation mentions be returned? No mentions are returned if None.

property text

The main text of the document; mutually exclusive with the paraSpecs parameter

property textType

The type or genre of text; not supported in public workflows/domains yet.

property title

The title or subject of the document, when available; mutually exclusive with the paraSpecs parameter

toDict() → Dict[str, Any][source]

Converts the request object to a json-like dictionary.

class geneeanlpclient.g3.request.TextType[source]

Bases: enum.Enum

Typically used text types.

CASUAL = 'casual'

Text that ignores many formal grammatical, orthographical and typographical conventions, e.g. social media posts.

CLEAN = 'clean'

Text that is mostly grammatically, orthographically and typographically correct, e.g. news articles.

The Response objects

Objects encapsulating the result of full analysis.

Basic objects:

  • G3 - analysis of a single document

  • Paragraph, Sentence, CharSpan

  • Entity

  • Tag

  • Relation

Objects related to tokens and tecto tokens:

  • Token - surface token (basic unit of morphology and surface syntax)

  • TectoToken - tectogrammatical token (basic unit of deep syntax)

  • NodeUtils - general utility classes for manipulating lists of tokens and tectotokens

  • Tree - class encapsulating ordered rooted trees of tokens or tecto tokens

  • TreeBuilder - builder for syntactic and tecto trees (tokens and tecto tokens should not be constructed directly)

  • TokenSupport - list of tokens within a sentence (used for Entity.Mention, Tag.Mention, Relation.Support TectoToken, etc)

class geneeanlpclient.g3.model.CharSpan[source]

Bases: tuple

Continuous non-empty span of text, relative to some large text

property end

Zero-based index of the character immediately following this span

extractText(fullText: str) → str[source]

Substring of a full text as denoted by this span

isValid() → bool[source]

Returns true if the span is valid, i.e. the start index precedes the end index.

static of(start: int, end: int) → geneeanlpclient.g3.model.CharSpan[source]

Creates a CharSpan object from start and end indexes.

Parameters
  • start (int) – the first character of this span as a zero-based offset within the full text

  • end (int) – the character immediately following this span. The span cannot be empty.

property start

The first character of this span as a zero-based offset within the full text

static withLen(start: int, length: int) → geneeanlpclient.g3.model.CharSpan[source]

Creates a CharSpan object from start index and text length.

Parameters
  • start (int) – the first character of this span as a zero-based offset within the full text

  • length (int) – the length of this span

class geneeanlpclient.g3.model.Entity(*, id: str, gkbId: str = None, stdForm: str, entityType: str, feats: Mapping[str, str] = None, mentions: List[geneeanlpclient.g3.model.Entity.Mention], sentiment: geneeanlpclient.g3.model.Sentiment = None, vectors: List[geneeanlpclient.g3.model.Vector] = None)[source]

Bases: object

A class encapsulating an Entity.

class Mention(*, id: str, mwl: str, text: str, tokenSupport: geneeanlpclient.g3.model.TokenSupport, feats: Mapping[str, str] = None, derivedFrom: Optional[geneeanlpclient.g3.model.Entity] = None, sentiment: geneeanlpclient.g3.model.Sentiment = None, vectors: List[geneeanlpclient.g3.model.Vector] = None)[source]

Bases: object

derivedFrom = None

Entity from which this mention can be derived (e.g. mention salmon for entity fish), if applicable

feats = None

Custom features/properties.

id = None

ID of the mention used to refer to it from other objects

property isContinuous

Checks whether the entity mention is continuous (most are).

property isDerived

True iff this entity mention is derived from some other entity (e.g. mention salmon for entity fish).

mentionOf = None

Entity this mention belongs to

mwl = None

Lemma of this mention (potentially multiword lemma), i.e. base form of the entity expression.

property sentence

Sentence containing this entity mention. Entity mention belongs to maximally one sentence; artificial mentions without tokens belong to no sentence.

sentiment = None

Sentiment of this mention. Note: Not supported yet.

text = None

The form of this entity mention, as it occurs in the text.

tokenSupport = None

Tokens of this entity mention.

vectors = None

Optional vectors for this mention.

feats = None

Custom features/properties.

gkbId = None

Unique identifier of this entity in Geneea knowledge-base

id = None

ID of the entity used to refer to it from other objects

mentions = None

Actual occurrences of this entity in the text. Empty if not requested/supported.

sentiment = None

Sentiment of this entity. None if not requested.

stdForm = None

Standard form of the entity, abstracting from alternative names

type = None

Basic type of this entity (e.g. person, location, …)

vectors = None

Optional vectors for this entity.

class geneeanlpclient.g3.model.G3(*, docId: str = None, language: geneeanlpclient.g3.model.Language, paragraphs: List[geneeanlpclient.g3.model.Paragraph], entities: List[geneeanlpclient.g3.model.Entity], tags: List[geneeanlpclient.g3.model.Tag], relations: List[geneeanlpclient.g3.model.Relation], docSentiment: Optional[geneeanlpclient.g3.model.Sentiment], docVectors: List[geneeanlpclient.g3.model.Vector] = None, usedChars: int = None, metadata: Mapping[str, Any] = None, debugInfo: Any = None)[source]

Bases: object

body() → Optional[geneeanlpclient.g3.model.Paragraph][source]

Returns the body paragraph if present, None if not, and throws a ValueError if there are multiple body paragraphs.

debugInfo = None

Debugging information, if any

docId = None

Document id

docSentiment = None

Sentiment of the document.

docVectors = None

Optional vectors for the whole document.

entities = None

The entities in the document.

getParaByType(paraType: str) → Optional[geneeanlpclient.g3.model.Paragraph][source]

Returns a paragraph with the specified type. Throws a ValueError if there are more than one, and return None if there are none. This is intended for legacy paragraphs corresponding to title/lead/text segments.

Returns

a paragraph with the specified type.

language = None

Language of the document and analysis.

lead() → Optional[geneeanlpclient.g3.model.Paragraph][source]

Returns the lead paragraph if present, None if not, and throws a ValueError if there are multiple lead paragraphs.

metadata = None

The extra non-NLP type of information related to analysis.

paragraphs = None

The paragraphs within the document. For F2, these are segments.

relations = None

The relations in the document.

property sentences

Sentences across all paragraphs.

tags = None

The tags of the document.

property tectoTokens

Tecto tokens across all paragraphs.

title() → Optional[geneeanlpclient.g3.model.Paragraph][source]

Returns the title paragraph if present, None if not, and throws a ValueError if there are multiple title paragraphs.

property tokens

Tokens across all paragraphs.

usedChars = None

Characters billed for the analysis.

class geneeanlpclient.g3.model.Language[source]

Bases: tuple

Language of the document.

property detected

Language of the document as detected

class geneeanlpclient.g3.model.NodeUtils[source]

Bases: object

static coverage(node: Node, reflexive=True, ordered=True) → List[Node][source]

All nodes dominated by a node.

Return type

List

Parameters
  • node – node to get coverage of

  • reflexive (bool) – whether the node itself is included

  • ordered (bool) – whether should the result be ordered by word order

Returns

coverage of a node

static filteredInOrder(node: Node, skipPredicate: Callable[Node, bool], includeFilteredRoot: bool = True) → Iterable[Node][source]

In-order iterator over the subtree of this token which optionally skips some subtrees.

Return type

Iterable

Parameters
  • node – root of the tree to traverse

  • skipPredicate (Callable) – when this predicate is true on any token, the token’s subtree is not traversed

  • includeFilteredRoot (bool) – if true the tokens on which skipPredicate function returns true are included in the result; otherwise they are not

static filteredPreOrder(node: Node, skipPredicate: Callable[Node, bool], includeFilteredRoot: bool = True) → Iterable[Node][source]

Pre-order iterator over the subtree of a node which optionally skips some subtrees.

Return type

Iterable

Parameters
  • node – root of the tree to traverse

  • skipPredicate (Callable) – when this predicate is true on any node, the node’s subtree is not traversed

  • includeFilteredRoot (bool) – if true, the nodes on which skipPredicate function returns true are included in the result;

static inOrder(node: Node) → Iterable[Node][source]

In-order iterator over the subtree of this token.

Return type

Iterable

Parameters

node – root of the tree to traverse

static isContinuous(tokens: Sequence[Node]) → bool[source]

Checks if the tokens form a continuous sequence. Assumes the tokens to be sorted and from the same sentence (not checked).

Returns

true if the list is continuous, false otherwise.

static isFromSameSentence(tokens: Sequence[Node]) → bool[source]

Checks if all the tokens come from the same sentence.

Returns

true if the list of tokens is empty, all they are all within the same sentence, false otherwise.

static isSorted(tokens: Sequence[Node]) → bool[source]

Checks if a list of tokens is sorted by word-order (i.e. their sentence index). Requires the tokens to be from the same sentence (not checked).

static preOrder(node: Node) → Iterable[Node][source]

Pre-order iterator over the subtree of this token.

Return type

Iterable

Parameters

node – root of the tree to traverse

static sorted(tokens: Sequence[Node]) → List[Node][source]

Orders a list of tokens by word-order (i.e. their sentence index). Requires the tokens to be from the same sentence (not checked).

Returns

sorted list of tokens

static toSimpleString(tokens: Sequence[Node], quote: bool = False) → str[source]

Utility method for creating strings with a simplified token list.

Return type

str

Parameters
  • tokens (Sequence) – tokens to print

  • quote (bool) – surround each node string with single quotes; useful for __repr__ string

class geneeanlpclient.g3.model.Paragraph(*, id: str, type: str, text: str, corrText: str, sentences: List[geneeanlpclient.g3.model.Sentence], sentiment: geneeanlpclient.g3.model.Sentiment = None, vectors: List[geneeanlpclient.g3.model.Vector] = None)[source]

Bases: object

TYPE_ABSTRACT = 'ABSTRACT'

Type of a paragraph representing an abstract (lead or perex) of the whole document

TYPE_BODY = 'BODY'

Type of a paragraph containing regular text (for now this is used for the whole body of the document)

TYPE_SECTION_HEADING = 'section_heading'

Type of a paragraph representing a section/chapter heading (not used yet)

TYPE_TITLE = 'TITLE'

Type of a paragraph representing a title of the whole document. Also used for email subjects.

container = None

the full analysis object containing this paragraph

corrText = None

the paragraph text after correction (corrected token offsets link here)

id = None

ID of the paragraph used to refer to it from other objects

sentences = None

the sentences the paragraph consists of

sentiment = None

Optional sentiment of the paragraph

property tectoTokens

Tecto tokens across all sentences.

text = None

the original paragraph text (token offsets link here)

property tokens

Tokens across all sentences.

type = None

title, section heading, lead, body text, etc. For now, it is simply the segment type: title, lead, body

vectors = None

Optional vectors for this paragraph.

class geneeanlpclient.g3.model.Relation(*, id: str, name: str, textRepr: str, type: str, args: List[geneeanlpclient.g3.model.Argument], feats: Mapping[str, str] = None, support: List[geneeanlpclient.g3.model.Support], sentiment: geneeanlpclient.g3.model.Sentiment = None, vectors: List[geneeanlpclient.g3.model.Vector] = None)[source]

Bases: object

class Argument(name, type, entity)[source]

Bases: tuple

property entity

The entity corresponding to this argument, if any. None if the argument is not an entity.

property name

Name of the argument (e.g. John)

property type

Type of the argument (subject, object)

FEAT_MODALITY = 'modality'
FEAT_NEGATED = 'negated'
class Support[source]

Bases: tuple

Tokens corresponding to a single head (predicate) of a relation

property tectoToken

Tecto token corresponding to the tokens. None if tecto tokens are not part of the model.

property tokenSupport

Tokens corresponding to the head of the relation

TYPE_ATTR = 'attr'

Attribute relation (e.g. good(pizza) for _good pizza_, _pizza is good_), the attribute is

TYPE_EXTERNAL = 'external'

Relation where at least one argument is outside of the the document (e.g. between pizza in the document and food item in the knowledgebase)

TYPE_RELATION = 'relation'

Verbal relation (e.g. eat(pizza) for _eat a pizza._

args = None

Arguments of the relation (subject, possibly an object).

feats = None

Any features of the relation e.g. [modality: can]

id = None

ID of the relation used to refer to it from other objects

property isNegated
property modality
name = None

Name of the relation , e.g. eat for _eat a pizza_ or good for _a good pizza_

sentiment = None

Sentiment of this relation. None if not requested.

support = None

Tecto-tokens of all the mentions of the relations (restricted to its head). Empty if not requested.

textRepr = None

Human readable representation of the relation, e.g. `eat-not(SUBJ:John, DOBJ:pizza)

type = None

One of Relation.TYPE_ATTR, Relation.TYPE_RELATION, Relation.TYPE_EXTERNAL

vectors = None

Optional vectors for this relation.

class geneeanlpclient.g3.model.Sentence(*, id: str, root: geneeanlpclient.g3.model.Token, tokens: List[geneeanlpclient.g3.model.Token], tectoRoot: geneeanlpclient.g3.model.TectoToken = None, tectoTokens: List[geneeanlpclient.g3.model.TectoToken], sentiment: geneeanlpclient.g3.model.Sentiment = None, vectors: List[geneeanlpclient.g3.model.Vector] = None)[source]

Bases: object

A single sentence with its morphological, syntactical, deep-syntactical and sentimental analysis

property charSpan

text span within the paragraph

property corrCharSpan

corrected text span within the paragraph

property corrText

corrected text of the sentence

id = None

ID of the sentence used to refer to it from other objects

paragraph = None

the paragraph containing this sentence

root = None

Token which is the root of the syntactic structure of the sentence

sentiment = None

Optional sentiment of the sentence

tectoRoot = None

Tecto token which is the root of the tecto structure of the sentence

tectoTokens = None

All tecto tokens of the sentence; the order has no meaning

property text

text of the sentence (before correction)

tokens = None

All tokens of the sentence ordered by word-order

vectors = None

Optional vectors for this sentence.

class geneeanlpclient.g3.model.Sentiment[source]

Bases: tuple

Class encapsulating sentiment of a document, sentence or relation

property label

Human readable label describing the average sentiment

property mean

Average sentiment

property negative

Average sentiment of negative items

property positive

Average sentiment of positive items

class geneeanlpclient.g3.model.Tag(*, id: str, gkbId: str = None, stdForm: str, tagType: str, relevance: float, feats: Mapping[str, str] = None, mentions: List[geneeanlpclient.g3.model.Tag.Mention], sentiment: geneeanlpclient.g3.model.Sentiment = None, vectors: List[geneeanlpclient.g3.model.Vector] = None)[source]

Bases: object

class Mention(*, id: str, tokenSupport: geneeanlpclient.g3.model.TokenSupport, feats: Mapping[str, str] = None, sentiment: geneeanlpclient.g3.model.Sentiment = None, vectors: List[geneeanlpclient.g3.model.Vector] = None)[source]

Bases: object

feats = None

Custom features/properties.

id = None

ID of the mention used to refer to it from other objects

property isContinuous

Checks whether the tag mention is continuous (most are).

mentionOf = None

Tag this mention belongs to

property sentence

Sentence containing this tag mention. Tag mention belongs to maximally one sentence; artificial mentions without tokens belong to no sentence.

sentiment = None

Sentiment of this mention. Not supported yet.

tokenSupport = None

Tokens of this tag mention.

vectors = None

Optional vectors for this mention.

TYPE_TOPIC = 'topic'

Type of the tag with the main topic of the document

TYPE_TOPIC_DISTRIBUTION = 'topic.distribution'

Type of the tags with the topic distribution of the document

feats = None

Custom features

gkbId = None

Unique identifier of this tag in Geneea knowledge-base. None if not found/linked.

id = None

ID of the tag used to refer to it from other objects

mentions = None

Text segments related to this tag. Empty if not appropriate/requested/supported.

relevance = None

Relevance of the tag relative to the content of the document

sentiment = None

Sentiment of this tag. Not supported yet.

stdForm = None

Standard form of the tag, abstracting from its alternative names

type = None

Domain-specific type (e.g. content, theme, iAB, department)

vectors = None

Optional vectors for this tag.

class geneeanlpclient.g3.model.TectoToken(*, id: str, idx: int, fnc: str, lemma: str, tokenSupport: geneeanlpclient.g3.model.TokenSupport = None, entityMention: Optional[geneeanlpclient.g3.model.Entity.Mention] = None, entity: Optional[geneeanlpclient.g3.model.Entity] = None, feats: Mapping[str, str] = None)[source]

Bases: geneeanlpclient.g3.model._Node

A tecto token, i.e. a tectogrammatical abstraction of a word (e.g. ‘did not sleep’ are three tokens but a single tecto-token) Tecto tokens have an zero-based index reflecting their position within their sentence.

property children

Dependents of this token ordered by word-order.

entity = None

Entity associated with this tecto token; None if there is no such entity.

entityMention = None

Entity mention associated with this tecto token; None if there is no such entity.

feats = None

Grammatical and other features of the tecto token

fnc = None

Label of the dependency edge.

lemma = None

Tecto lemma

property parent

Dependency parent of this tecto token. None if this token is the root of the sentence.

toSimpleString() → str[source]

Converts the tecto token to a default non-recursive string: index + lemma

toStringIL() → str[source]

Converts the tecto token to a non-recursive string: index + lemma

toStringILF() → str[source]

Converts the tecto token to a non-recursive string: index + lemma + function

tokenSupport = None

Surface token corresponding to this tecto token; not necessarily adjacent; ordered by word-order

class geneeanlpclient.g3.model.Token(*, id: str, idx: int, text: str, charSpan: geneeanlpclient.g3.model.CharSpan, corrText: str, corrCharSpan: geneeanlpclient.g3.model.CharSpan, deepLemma: str = None, lemma: str = None, pos: geneeanlpclient.common.ud.UPos = None, feats: Mapping[str, str] = None, morphTag: str = None, fnc: geneeanlpclient.common.ud.UDep = None)[source]

Bases: geneeanlpclient.g3.model._Node

A token including basic morphological and syntactic information. A token is similar to a word, but includes punctuation. Tokens have an zero-based index reflecting their position within their sentence. The morphological and syntactical features might be None (deepLemma, lemma, morphTag, pos, fnc, parent), or empty (children) if not requested or supported.

FEAT_LEMMA_INFO = 'lemmaInfo'
FEAT_NEGATED = 'negated'
FEAT_UNKNOWN = 'unknown'
charSpan = None

Character span within the paragraph

property children

Dependents of this token ordered by word-order.

corrCharSpan = None

Character span within the paragraph (after correction)

corrText = None

Text of this token after correction

deepLemma = None

Lemma of the token e.g. bezpecny. None if not requested/supported.

feats = None

Universal and custom features

fnc = None

Label of the dependency edge. None if not requested/supported.

property isNegated

True iff the token form contains a negation prefix.

property isUnknown

True iff the token is unknown to the lemmatizer. The lemma provided is the same as the token itself.

property leftChildren

Children of this token that precede it.

lemma = None

Simple lemma of the token, e.g. nejnebezpecnejsi (in Cz, includes negation and grade). None if not requested/supported.

morphTag = None

Morphological tag, e.g. AAMS1-…., VBD, … None if not requested/supported.

next() → Optional[geneeanlpclient.g3.model.Token][source]

The next token or None if this token is sentence final.

offsetToken(offset: int) → Optional[geneeanlpclient.g3.model.Token][source]

Token following or preceding this token within the sentence.

Parameters

offset (int) – relative offset. The following tokens have a positive offset, preceding a negative one. The ext token has offset = 1.

Returns

the token at the relative offset or None if the offset is invalid

property parent

Dependency parent of this token. None if root of the sentence or result contains not syntax info.

pos = None

Google universal tag. None if not requested/supported.

previous() → Optional[geneeanlpclient.g3.model.Token][source]

The previous token or None if this token is sentence initial.

property rightChildren

Children of this token that follow it.

text = None

Text of this token

toSimpleString() → str[source]

Converts the token to a default non-recursive string: index + text

toStringITx() → str[source]

Converts the token to a non-recursive string: index + text

toStringITxF() → str[source]

Converts the token to a non-recursive string: index + text + function

class geneeanlpclient.g3.model.TokenSupport[source]

Bases: tuple

Tokens within a single sentence; ordered by word-order; non-empty, continuous or discontinuous. Do not construct directly, use TokenSupport.of

property charSpan

The character span between the first and last token relative to the enclosing paragraph; for discontinuous support this includes intervening gaps.

property first

The first token.

property firstCharParaIdx

Index of the first character within the enclosing paragraph.

property isContinuous

Is this support a continuous sequence of tokens, i.e. a token span?

property last

The last token.

property lastCharParaIdx

Index of the last character within the enclosing paragraph.

len() → int[source]

Number of covered tokens.

static of(tokens: Sequence[geneeanlpclient.g3.model.Token]) → geneeanlpclient.g3.model.TokenSupport[source]

Creates a TokenSupport object from a list of tokens.

Parameters

tokens (Sequence) – non-empty list of tokens (might not be sorted)

property sentence
spans() → Iterable[geneeanlpclient.g3.model.TokenSupport][source]

Breaks this token support into continuous sub-sequences of tokens.

Returns

series of token supports together equivalent to this token support

property text

Substring of a full text as denoted by this support (before correction). For discontinuous supports, the result includes the intervening gaps. From ' '.join(tokenSupport.texts()) differs in correctly reflecting whitespace in the original text.

texts() → List[str][source]

The coverage texts of each of the continuous spans, ordered by word-order.

property tokens

The tokens of this support.

class geneeanlpclient.g3.model.Tree(root: Node, tokens: Sequence[Node])[source]

Bases: typing.Generic

class geneeanlpclient.g3.model.TreeBuilder[source]

Bases: typing.Generic

Builder creating a dependency tree out of tokens.

addDependency(childIdx: int, parentIdx: int) → geneeanlpclient.g3.model.TreeBuilder[Node][source]

Record a dependency edge. The tokens connected by the edge might be added later.

Parameters
  • childIdx (int) – index of the child token (note: tokens are indexed within their sentences)

  • parentIdx (int) – index of the parent token (note: tokens are indexed within their sentences)

Returns

the builder to allow chained calls

addDummyDependecies()[source]

All nodes are hanged to the first one.

addNode(node: Node) → geneeanlpclient.g3.model.TreeBuilder[Node][source]

Record a single token as a node of the tree.

Parameters

node – token to add. Its index must be correct, parent and children fields are ignored.

Returns

the builder to allow chained calls

addNodes(nodes: Iterable[Node]) → geneeanlpclient.g3.model.TreeBuilder[Node][source]

Record a collection of tokens as nodes of the tree.

Parameters

nodes (Iterable) – tokens to add. Their index must be correct, parent and children fields are ignored.

Returns

the builder to allow chained calls

build() → Optional[geneeanlpclient.g3.model.Tree[Node]][source]
class geneeanlpclient.g3.model.Vector[source]

Bases: tuple

Class encapsulating a vector

property dimension

Returns dimension of this vector.

property name

Name identifying the model of this vector

property values

The vector values

property version

A particular version of the model which produced this vector

Reading/Writing from/to json

The SDK contains a reader/writer from/to json returned by the G3 api endpoint. It also supports converting from/to the legacy F2 json (Full Analysis V2).

geneeanlpclient.g3.reader.G3_KEYS = frozenset({'debugInfo', 'docSentiment', 'docVectors', 'entities', 'id', 'itemSentiments', 'itemVectors', 'language', 'metadata', 'paragraphs', 'relations', 'tags', 'usedChars', 'version'})

Standard keys used in G3 analysis json

geneeanlpclient.g3.reader.fromDict(rawAnalysis: Dict[str, Any]) → geneeanlpclient.g3.model.G3[source]

Reads the G3 object from a json-based dictionary as returned from Geneea G3 API.

geneeanlpclient.g3.writer.toDict(obj: geneeanlpclient.g3.model.G3) → Dict[str, Any][source]

Writes the G3 model to a json-based dictionary to a format as returned by Geneea G3 API.

geneeanlpclient.g3.f2converter.F2_KEYS = frozenset({'debugInfo', 'entities', 'hashtags', 'id', 'keywords', 'language', 'lead', 'leadLemmas', 'relations', 'sentiment', 'tectoSentences', 'text', 'textLemmas', 'title', 'titleLemmas', 'topic', 'version'})

Standard keys used in F2 analysis json

geneeanlpclient.g3.f2converter.fromF2Dict(rawAnalysis: Dict[str, Any]) → geneeanlpclient.g3.model.G3[source]

Reads from the legacy F2 object

geneeanlpclient.g3.f2converter.toF2Dict(obj: geneeanlpclient.g3.model.G3) → Dict[str, Any][source]

Writes to the legacy F2 object