Skip to main content

Token

A token represents a unit of text, which could be a word, separator, or othere meaningul component.

Fields

PropertyRequiredTypeDescription
idTruestringID of the token used to refer to it from other objects
offTrueintegerOffset in the token's parent paragraph
textTruestringToken's text
origOffFalseintegerToken's offset in the original paragraph (omitted when identical to "off")
origTextFalsestringToken's text in the original paragraph (omitted when identical to "text")
lemmaFalsestringLemma, the canonical or dictionary form of the word
posFalsestringUniversal POS
parIdFalsestringID of the syntactic parent token; missing for the root token
fncFalsestringGrammatical function of this token according to Universal Dependencies (e.g., nsubj, obj)
featsFalseMap[string, string]Universal and custom features