home edit page issue tracker

This page pertains to UD version 2.

UD for Georgian

This is a work-in-progress overview of the UD annotation for Georgian.

Tokenization and Word Segmentation


Lemmatization

Lemmatization Strategies

Georgian dictionaries employ various strategies for representing lemmas in nominal and verbal paradigms, each with distinct implications for computational systems and linguistic analysis:

Nominals: The lemma is consistently represented as the nominative singular form, providing a straightforward and standard approach.

Verbs: Unlike nominals, Georgian verbs lack an infinitive form, resulting in diverse lemmatization strategies:

Lemmatization in UD Treebanks

For Universal Dependency (UD) treebanks, lemmatization practices typically reflect a hybrid approach, influenced by the diverse strategies used for Georgian verbs. Depending on the treebank, two main approaches are observed:

By combining these approaches, UD treebanks aim to balance linguistic tradition with computational utility and user accessibility.


Morphology

Tags


Features

Lexical Features

Inflectional Features

Nominal Features
Verbal Features

Instruction: Describe inherent and inflectional features for major word classes (at least NOUN and VERB). Describe other noteworthy features. Include links to language-specific feature definitions if any.


Syntax

v-type —————— m-type ——————
NOM NOM (v-set)    
NOM NOM (v-set) + DAT DAT (m-set)  
NOM ERG (v-set) + DAT NOM (m-set)  
NOM ERG (v-set) + DAT NOM (m-set) + DAT DAT ( -a)
NOM ERG (v-set) + DAT NOM (-set) + DAT DAT (m- -a)

Treebanks

UD_Georgian-GLC is the first UD treebank for Georgian.