UD Latgalian Cairo
Language: Latgalian (code: ltg
Family: Indo-European, Baltic
This treebank has been part of Universal Dependencies since the UD v2.14 release.
The following people have contributed to making this treebank part of UD: Lauma Pretkalniņa, Gunta Nešpore-Bērzkalne.
Repository: UD_Latgalian-Cairo
License: CC BY-SA 4.0
Genre: grammar-examples
Questions, comments? General annotation questions (either Latgalian-specific or cross-linguistic) can be raised in the main UD issue tracker.
Annotation | Source |
Lemmas | annotated manually |
UPOS | annotated manually in non-UD style, automatically converted to UD |
XPOS | annotated manually |
Features | annotated manually in non-UD style, automatically converted to UD |
Relations | annotated manually in non-UD style, automatically converted to UD |
UD_Latgalian-Cairo is an example treebank to provide minimal dataset for Latgalian based on the Cairo sample sentences. Created by AI Lab at Institute of Mathematics and Computer Science, University of Latvia.
This treebank was developed as a proof-of-concept by the team developing Latvian UD Treebank (UD_Latvian-LVTB). It contains the 20 Cairo example sentences and is as far as we are know the only Latgalian treebank in existance.
This work was supported by the State Research Programme’s project Research on Modern Latvian Language and Development of Language Technology under the grant agreement No. VPP-LETONIKA-2021/1-0006.
- Pretkalniņa L., Rituma L., Saulīte B. Deriving enhanced Universal Dependencies from a hybrid dependency-constituency treebank. Proceedings of the 21sh International Conference Text, Speech, and Dialogue, LNCS, Vol. 11107, Springer Link, 2018, pp. 95-105
Statistics of UD Latgalian Cairo
POS Tags
Aspect – Case – Definite – Degree – Evident – Gender – Mood – Number – Person – Polarity – Poss – PronType – Reflex – Tense – VerbForm – Voice
acl – advcl – advmod – amod – appos – case – cc – ccomp – conj – csubj – det – discourse – fixed – flat:name – iobj – mark – nmod – nsubj – obj – obl – orphan – punct – root – vocative – xcomp
Tokenization and Word Segmentation
- This corpus contains 20 sentences and 170 tokens.
- This corpus contains 31 tokens (18%) that are not followed by a space.
- This corpus does not contain words with spaces.
- This corpus does not contain words that contain both letters and punctuation.
- This corpus uses 12 UPOS tags out of 17 possible: ADJ, ADP, ADV, CCONJ, DET, NOUN, PART, PRON, PROPN, PUNCT, SCONJ, VERB
- This corpus does not use the following tags: NUM, AUX, INTJ, SYM, X
- This corpus contains 4 word types tagged as particles (PART): Voi, koč, ni, tik
- This corpus contains 8 lemmas tagged as pronouns (PRON): es, jei, jis, jī, kas, kurs, tu, tys
- This corpus contains 6 lemmas tagged as determiners (DET): itei, kaids, muns, sova, sovs, tei
- This corpus contains 0 lemmas tagged as auxiliaries (AUX):
- There are 3 (de)verbal forms:
- Fin
- VERB: ir, pīraksteja, Navarēja, apsaskuove, attaisi, ceņtēs, dabuoja, dūmoj, gribi, izauga
- Inf
- VERB: apgrīzt, atmest, atīt, izalaseit, nūmozguot, tikt, īt
- Part
- VERB: pīguoduota
Nominal Features
- Fem
- ADJ: lela, moza, sorkonā
- DET: Itei, sovai, tamā
- NOUN: mašynu, Meitine, bronzu, draudzinei, dzeršonu, dīnā, golvyspiļsātā, jausmys, kruosā, peipiešonu
- PRON: jei, Jai, Jis
- PROPN: Mareja, Braunys, Džeina, Fraņcejis, Marejis, Parizē
- VERB-Part: pīguoduota
- Masc
- ADJ: foršuoks, tovejais
- DET: Muns, kaida, sovam
- NOUN: bruoļs, leits, lūgu, motus, ritini, statini, sudobru, sābri, tēte, veiram
- PRON: jis, Jim, jam, juo, kurs, tuo, tū
- PROPN: Pītera, Pīters, Sem, Smita
- Plur
- NOUN: motus, sābri
- PRON: Jim
- Sing
- ADJ: foršuoks, lela, moza, sorkonā, tovejais
- DET: Itei, Muns, kaida, sovai, sovam, tamā
- NOUN: mašynu, Meitine, bronzu, bruoļs, draudzinei, dzeršonu, dīnā, golvyspiļsātā, jausmys, kruosā
- PRON: jis, tu, jei, Es, Jai, Maņ, jam, juo, kurs, tuo
- PROPN: Pītera, Mareja, Pīters, Braunys, Džeina, Fraņcejis, Marejis, Parizē, Sem, Smita
- VERB-Fin: attaisi, dūmoj, gribi, navarieju, variesi, verīs
- VERB-Part: pīguoduota
- Acc
- NOUN: mašynu, bronzu, dzeršonu, lūgu, motus, peipiešonu, ritini, statini, sudobru, ustobu
- PRON: kuo, tuo, tū
- Dat
- DET: sovai, sovam
- NOUN: draudzinei, veiram
- PRON: Jai, Jim, Maņ, jam
- Gen
- DET: kaida
- NOUN: jausmys, īmesļa
- PRON: juo
- PROPN: Pītera, Braunys, Fraņcejis, Marejis, Smita
- Loc
- ADJ: sorkonā
- DET: tamā
- NOUN: dīnā, golvyspiļsātā, kruosā
- PROPN: Parizē
- Nom
- ADJ: foršuoks, lela, moza, tovejais
- DET: Itei, Muns
- NOUN: Meitine, bruoļs, leits, sābri, tēte, vaļsts, viestule
- PRON: jis, tu, jei, Es, kurs
- PROPN: Mareja, Pīters, Džeina
- VERB-Part: pīguoduota
- Voc
- PROPN: Sem
- Def
- ADJ: tovejais
- Ind
- ADJ: foršuoks, lela, moza, sorkonā
- VERB-Part: pīguoduota
Degree and Polarity
- Cmp
- ADJ: foršuoks
- Pos
- ADJ: lela, moza, sorkonā, tovejais
- ADV: mudri
- VERB-Part: pīguoduota
- Neg
- VERB-Fin: Navarēja, naizdareja, nav, navarieju
- Pos
- VERB-Fin: ir, pīraksteja, apsaskuove, attaisi, ceņtēs, dabuoja, dūmoj, gribi, izauga, leist
- VERB-Inf: apgrīzt, atmest, atīt, izalaseit, nūmozguot, tikt, īt
- VERB-Part: pīguoduota
Verbal Features
- Perf
- VERB-Part: pīguoduota
- Imp
- VERB-Fin: attaisi
- Ind
- VERB-Fin: ir, pīraksteja, Navarēja, apsaskuove, ceņtēs, dabuoja, dūmoj, gribi, izauga, leist
- Fut
- VERB-Fin: variesi
- Past
- VERB-Fin: pīraksteja, Navarēja, apsaskuove, ceņtēs, dabuoja, izauga, lyka, naizdareja, navarieju, nūkruosuoja
- VERB-Part: pīguoduota
- Pres
- VERB-Fin: ir, dūmoj, gribi, leist, nav, ruodīs, verīs
- Act
- VERB-Fin: ir, pīraksteja, Navarēja, apsaskuove, attaisi, ceņtēs, dabuoja, dūmoj, gribi, izauga
- Pass
- VERB-Part: pīguoduota
- Fh
- VERB-Fin: ir, pīraksteja, Navarēja, apsaskuove, ceņtēs, dabuoja, dūmoj, gribi, izauga, leist
Pronouns, Determiners, Quantifiers
- Dem
- DET: Itei, tamā
- PRON: tuo, tū
- Ind
- DET: kaida
- Prs
- DET: Muns, sovai, sovam
- PRON: jis, tu, jei, Es, Jai, Jim, Maņ, jam, juo
- Rel
- PRON: kuo, kurs
- Yes
- DET: Muns, sovai, sovam
- Yes
- VERB-Fin: apsaskuove, ceņtēs, ruodīs, verīs
- 1
- PRON: Es, Maņ
- VERB-Fin: navarieju
- 2
- PRON: tu
- VERB-Fin: attaisi, dūmoj, gribi, variesi, verīs
- 3
- DET: Itei, tamā
- PRON: jis, jei, Jai, Jim, jam, juo, tuo, tū
- VERB-Fin: ir, pīraksteja, Navarēja, apsaskuove, ceņtēs, dabuoja, izauga, leist, lyka, naizdareja
Other Features
Auxiliary Verbs and Copula
- This corpus does not contain copulas.
- This corpus does not contain auxiliaries.
Core Arguments, Oblique Arguments and Adjuncts
Here we consider only relations between verbs (parent) and nouns or pronouns (child).
- nsubj
- VERB-Fin--NOUN-Gen (1)
- VERB-Fin--NOUN-Nom (5)
- VERB-Fin--PRON-Nom (12)
- obj
- VERB-Fin--NOUN-Acc (6)
- VERB-Fin--PRON-Acc (2)
- VERB-Inf--NOUN-Acc (3)
- iobj
- VERB-Fin--NOUN-Dat (2)