Geneea Knowledge Base
The Geneea Knowledge Base (GKB or Geneea KB) is a comprehensive repository of information about entities and concepts that powers Geneea's NLP services. It serves as the foundation for entity recognition, linking, and enrichment across all Geneea products.
What is GKB?
GKB contains structured information about millions of entities, including both their real-world properties (such as a German politician, a Russian sci-fi movie, or a multinational corporation) and their linguistic properties (such as names in various languages and contexts, alternative spellings, and abbreviations).
When Geneea's NLP services analyze text, they identify mentions of entities and link them to corresponding entries in GKB. This linking provides rich contextual information that enhances the analysis results.
Data Sources
GKB integrates data from multiple high-quality sources:
- Wikidata - A free, collaborative knowledge base that serves as central storage for structured data from Wikipedia and other Wikimedia projects
- DBpedia - Structured content extracted from Wikipedia
- OpenStreetMap - Geographic data for locations, landmarks, and points of interest
- Company registries - Official business registration data from various countries
- Geneea's proprietary data - Curated information maintained by Geneea's team
These sources are continuously updated and reconciled to ensure data quality and freshness.
Entity Types
GKB contains various types of entities, including:
- People - Politicians, athletes, artists, business leaders, and other notable individuals
- Organizations - Companies, government bodies, NGOs, sports teams, and other institutions
- Locations - Countries, cities, regions, landmarks, and geographic features
- Events - Historical events, recurring events, sports competitions, and cultural happenings
- Products - Consumer products, brands, and services
- Creative works - Books, movies, music albums, TV shows, and other media
- Concepts - Abstract ideas, topics, and general keywords
Each entity has a unique GKB identifier (e.g., G42 for Douglas Adams) that remains stable over time, even as the entity's properties are updated.
Data Organization
Buckets
The data in GKB is organized into buckets. Each bucket is a logical container for a set of entities:
- Generic bucket - Contains publicly available entities from open data sources, accessible to all customers
- Private buckets - Custom buckets that can contain customer-specific entities, such as internal products, proprietary taxonomies, or company-specific data
Customers can use the generic bucket alone or combine it with their own private bucket for enhanced coverage.
Entity Properties
Each entity in GKB includes:
- Standard form (stdForm) - The canonical name of the entity in a given language
- Labels - Alternative names, aliases, and variations in multiple languages
- Type and subtypes - Classification of the entity (e.g., person, organization, location)
- Description - A brief textual description of the entity
- External IDs - Links to external databases (Wikidata ID, Wikipedia URL, etc.)
- Custom properties - Additional metadata that can be configured per customer
Multilingual Support
GKB provides comprehensive multilingual support:
- Entity names are available in multiple languages
- The presentation language feature allows you to retrieve entity names in your preferred language
- Fallback languages can be specified when the preferred language is not available
For example, you can analyze an article in German but retrieve entity names in English for your English-speaking audience.
Integration with NLP Services
When you use Geneea's NLP services (such as the Media API), entities detected in your text are automatically linked to GKB. The analysis results include:
- The GKB identifier for each entity
- The entity's standard form in your preferred language
- Entity type and subtypes
- Links to external resources (Wikipedia, Wikidata, etc.)
You can then use the Knowledge Base API endpoints to retrieve additional information about these entities.
Custom Data Integration
For customers with specific needs, GKB can be enhanced with:
- Custom entity IDs - Map GKB entities to your internal identifiers
- Custom properties - Add your own metadata to entities
- Private entities - Add entities that are specific to your domain or organization
- Custom taxonomies - Integrate your internal classification systems
Contact Geneea to discuss custom data integration options.
API Access
GKB data is accessible through the Media API's knowledge base endpoints:
- /v2/knowledgebase/details - Retrieve detailed information about entities
- /v2/knowledgebase/infoboxes - Get brief info boxes for display
- /v2/knowledgebase/stdforms - Get entity names in specific languages
- /v2/knowledgebase/search - Search for entities by name
- /v2/knowledgebase/redirects - Check for deprecated or merged entity IDs
See the Knowledge Base Guide for detailed usage examples.