Language Detection¶
Language detection can be called separately or as part of other functions.
Recognized languages¶
Our default mode distinguishes 31 languages:
ar - Arabic |
el - Greek |
he - Hebrew |
it - Italian |
nl - Dutch |
sk - Slovak |
zh - Chinese |
bg - Bulgarian |
en - English |
hi - Hindi |
ja - Japanese |
pa - Punjabi |
sv - Swedish |
|
cs - Czech |
es - Spanish |
hr - Croatian |
ko - Korean |
pl - Polish |
tr - Turkish |
|
da - Danish |
fi - Finnish |
hu - Hungarian |
lt - Lithuanian |
pt - Portuguese |
uk - Ukrainian |
|
de - German |
fr - French |
id - Indonesian |
nl - Dutch |
ru - Russian |
vi - Vietnamese |
Sample call¶
curl -X POST https://api.geneea.com/v3/analysis \
-H 'Authorization: user_key <YOUR USER KEY>' \
-H 'Content-Type: application/json' \
-d '{
"id": "1",
"text": "The trip to Innsbruck was great.",
"analyses": ["language"]
}'
# On Windows, use \" instead of " and " instead of '
from geneeanlpclient import g3
requestBuilder = g3.Request.Builder(analyses=[g3.AnalysisType.LANGUAGE])
with g3.Client.create(userKey=<YOUR USER KEY>) as analyzer:
result = analyzer.analyze(requestBuilder.build(
id=str(1),
text='The trip to Innsbruck was great.'
))
print(result.language.detected)
import requests
def callGeneea(input):
url = 'https://api.geneea.com/v3/analysis'
headers = {
'content-type': 'application/json',
'Authorization': 'user_key <your user key>'
}
return requests.post(url, json=input, headers=headers).json()
responseObj = callGeneea({
'id': '1',
'text': 'The trip to Innsbruck was great.',
'analyses': ['language']
})
print(responseObj)
Priors¶
If you know your texts can be only in certain languages, you can specify a prior – a single language or a combination of several languages. Currently, the supported priors are:
cs,de |
cs,en,sk |
cs,de,es,nl,pl |
de,en,es,nl |
es,nl,pl |
cs,en |
cs,es,nl |
cs,en,es,nl,pl |
de,en,es,pl |
nl,pl |
cs,es |
cs,es,pl |
cs,de,en,es,nl,pl |
de,en,nl,pl |
en,zh |
cs,nl |
cs,nl,pl |
de,en |
de,es,nl,pl |
|
cs,pl |
cs,de,en,es |
de,es |
en,es |
|
cs,sk |
cs,de,en,nl |
de,nl |
en,nl |
|
cs,de,en |
cs,de,en,pl |
de,pl |
en,pl |
|
cs,de,es |
cs,de,en,sk |
de,en,es |
en,es,nl |
|
cs,de,nl |
cs,en,es,nl |
de,en,nl |
en,es,pl |
|
cs,de,pl |
cs,en,es,pl |
de,en,pl |
en,nl,pl |
|
cs,en,es |
cs,es,nl,pl |
de,es,nl |
en,es,nl,pl |
|
cs,en,nl |
cs,de,en,es,nl |
de,es,pl |
es,nl |
|
cs,en,pl |
cs,de,en,es,pl |
de,nl,pl |
es,pl |
EU |
Use the prior exactly as written above (the same order, no spaces) and pass it via the options
parameter:
curl -X POST https://api.geneea.com/v3/analysis \
-H 'Authorization: user_key <YOUR USER KEY>' \
-H 'Content-Type: application/json' \
-d '{
"id": "1",
"text": "The trip to Innsbruck was great.",
"options": {"lang_prior":"en,nl"}',
"analyses": ["language"]
}'
# On Windows, use \" instead of " and " instead of '
from geneeanlpclient import g3
requestBuilder = g3.Request.Builder(analyses=[g3.AnalysisType.LANGUAGE], customConfig={'options': {'lang_prior': 'en,nl'}})
with g3.Client.create(userKey=GENEEA_API_KEY) as analyzer:
result = analyzer.analyze(requestBuilder.build(
id=str(1),
text='The trip to Innsbruck was great.'
))
print(result.language.detected)
import requests
def callGeneea(input):
url = 'https://api.geneea.com/v3/analysis'
headers = {
'content-type': 'application/json',
'Authorization': 'user_key <your user key>'
}
return requests.post(url, json=input, headers=headers).json()
responseObj = callGeneea({
'id': '1',
'text': 'The trip to Innsbruck was great.',
'options': {'lang_prior':'en,nl'},
'analyses': ['language']
})
print(responseObj)
Customization¶
We can customize our language detection to your needs. Maybe your emails contain error messages in English, or product names sounding French, etc.