UD Latgalian Cairo
Language: Latgalian (code: ltg
)
Family: IE
This treebank has been part of Universal Dependencies since the UD v2.14 release.
The following people have contributed to making this treebank part of UD: Lauma Pretkalniņa, Gunta Nešpore-Bērzkalne.
Repository: UD_Latgalian-Cairo
Search this treebank on-line: PML-TQ
Download all treebanks: UD 2.15
License: CC BY-SA 4.0
Genre: grammar-examples
Questions, comments? General annotation questions (either Latgalian-specific or cross-linguistic) can be raised in the main UD issue tracker. You can report bugs in this treebank in the treebank-specific issue tracker on Github. If you want to collaborate, please contact [lauma (æt) ailab • lv]. Development of the treebank happens outside the UD repository. If there are bugs, either the original data source or the conversion procedure must be fixed. Do not submit pull requests against the UD repository.
Annotation | Source |
---|---|
Lemmas | annotated manually |
UPOS | annotated manually in non-UD style, automatically converted to UD |
XPOS | annotated manually |
Features | annotated manually in non-UD style, automatically converted to UD |
Relations | annotated manually in non-UD style, automatically converted to UD |
Description
UD_Latgalian-Cairo is an example treebank to provide minimal dataset for Latgalian based on the Cairo sample sentences. Created by AI Lab at Institute of Mathematics and Computer Science, University of Latvia.
This treebank was developed as a proof-of-concept by the team developing Latvian UD Treebank (UD_Latvian-LVTB). It contains the 20 Cairo example sentences and is as far as we are know the only Latgalian treebank in existance.
Acknowledgments
This work was supported by the State Research Programme’s project Research on Modern Latvian Language and Development of Language Technology under the grant agreement No. VPP-LETONIKA-2021/1-0006.
References
- Pretkalniņa L., Rituma L., Saulīte B. Deriving enhanced Universal Dependencies from a hybrid dependency-constituency treebank. Proceedings of the 21sh International Conference Text, Speech, and Dialogue, LNCS, Vol. 11107, Springer Link, 2018, pp. 95-105
Statistics of UD Latgalian Cairo
POS Tags
ADJ – ADP – ADV – CCONJ – DET – NOUN – PART – PRON – PROPN – PUNCT – SCONJ – VERB
Features
Aspect – Case – Definite – Degree – Evident – Gender – Mood – Number – Person – Polarity – Poss – PronType – Reflex – Tense – VerbForm – Voice
Relations
acl – advcl – advmod – advmod:emph – advmod:neg – amod – appos – case – cc – ccomp – conj – csubj – det – discourse – fixed – flat:name – iobj – mark – nmod – nsubj – obj – obl – orphan – punct – root – vocative – xcomp
Tokenization and Word Segmentation
- This corpus contains 20 sentences and 170 tokens.
- This corpus contains 31 tokens (18%) that are not followed by a space.
- This corpus does not contain words with spaces.
- This corpus does not contain words that contain both letters and punctuation.
Morphology
Tags
- This corpus uses 12 UPOS tags out of 17 possible: ADJ, ADP, ADV, CCONJ, DET, NOUN, PART, PRON, PROPN, PUNCT, SCONJ, VERB
- This corpus does not use the following tags: NUM, AUX, INTJ, SYM, X
- This corpus contains 4 word types tagged as particles (PART): Voi, koč, ni, tik
- This corpus contains 7 lemmas tagged as pronouns (PRON): es, jei, jis, jī, kas, tu, tys
- This corpus contains 7 lemmas tagged as determiners (DET): itei, kaids, kurs, muns, sova, sovs, tei
- This corpus contains 0 lemmas tagged as auxiliaries (AUX):
- There are 3 (de)verbal forms:
- Fin
- VERB: ir, pīraksteja, Navarēja, apsaskuove, attaisi, ceņtēs, dabuoja, dūmoj, gribi, izauga
- Inf
- VERB: apgrīzt, atmest, atīt, izalaseit, nūmozguot, tikt, īt
- Part
- VERB: pīguoduota
Nominal Features
- Fem
- ADJ: lela, moza, sorkonā
- DET: Itei, sovai, tamā
- NOUN: mašynu, Meitine, bronzu, draudzinei, dzeršonu, dīnā, golvyspiļsātā, jausmys, kruosā, peipiešonu
- PRON: jei, Jai, Jis
- PROPN: Mareja, Braunys, Džeina, Fraņcejis, Marejis, Parizē
- VERB-Part: pīguoduota
- Masc
- ADJ: foršuoks, tovejais
- DET: Muns, kaida, kurs, sovam
- NOUN: bruoļs, leits, lūgu, motus, ritini, statini, sudobru, sābri, tēte, veiram
- PRON: jis, Jim, jam, juo, tuo, tū
- PROPN: Pītera, Pīters, Sem, Smita
- Plur
- NOUN: motus, sābri
- PRON: Jim
- Sing
- ADJ: foršuoks, lela, moza, sorkonā, tovejais
- DET: Itei, Muns, kaida, kurs, sovai, sovam, tamā
- NOUN: mašynu, Meitine, bronzu, bruoļs, draudzinei, dzeršonu, dīnā, golvyspiļsātā, jausmys, kruosā
- PRON: jis, tu, jei, Es, Jai, Maņ, jam, juo, tuo, tū
- PROPN: Pītera, Mareja, Pīters, Braunys, Džeina, Fraņcejis, Marejis, Parizē, Sem, Smita
- VERB-Fin: attaisi, dūmoj, gribi, navarieju, variesi, verīs
- VERB-Part: pīguoduota
- Acc
- NOUN: mašynu, bronzu, dzeršonu, lūgu, motus, peipiešonu, ritini, statini, sudobru, ustobu
- PRON: kuo, tuo, tū
- Dat
- DET: sovai, sovam
- NOUN: draudzinei, veiram
- PRON: Jai, Jim, Maņ, jam
- Gen
- DET: kaida
- NOUN: jausmys, īmesļa
- PRON: juo
- PROPN: Pītera, Braunys, Fraņcejis, Marejis, Smita
- Loc
- ADJ: sorkonā
- DET: tamā
- NOUN: dīnā, golvyspiļsātā, kruosā
- PROPN: Parizē
- Nom
- ADJ: foršuoks, lela, moza, tovejais
- DET: Itei, Muns, kurs
- NOUN: Meitine, bruoļs, leits, sābri, tēte, vaļsts, viestule
- PRON: jis, tu, jei, Es
- PROPN: Mareja, Pīters, Džeina
- VERB-Part: pīguoduota
- Voc
- PROPN: Sem
- Def
- ADJ: tovejais
- Ind
- ADJ: foršuoks, lela, moza, sorkonā
- VERB-Part: pīguoduota
Degree and Polarity
- Cmp
- ADJ: foršuoks
- Pos
- ADJ: lela, moza, sorkonā, tovejais
- ADV: mudri
- VERB-Part: pīguoduota
- Neg
- VERB-Fin: Navarēja, naizdareja, nav, navarieju
- Pos
- VERB-Fin: ir, pīraksteja, apsaskuove, attaisi, ceņtēs, dabuoja, dūmoj, gribi, izauga, leist
- VERB-Inf: apgrīzt, atmest, atīt, izalaseit, nūmozguot, tikt, īt
- VERB-Part: pīguoduota
Verbal Features
- Perf
- VERB-Part: pīguoduota
- Imp
- VERB-Fin: attaisi
- Ind
- VERB-Fin: ir, pīraksteja, Navarēja, apsaskuove, ceņtēs, dabuoja, dūmoj, gribi, izauga, leist
- Fut
- VERB-Fin: variesi
- Past
- VERB-Fin: pīraksteja, Navarēja, apsaskuove, ceņtēs, dabuoja, izauga, lyka, naizdareja, navarieju, nūkruosuoja
- VERB-Part: pīguoduota
- Pres
- VERB-Fin: ir, dūmoj, gribi, leist, nav, ruodīs, verīs
- Act
- VERB-Fin: ir, pīraksteja, Navarēja, apsaskuove, attaisi, ceņtēs, dabuoja, dūmoj, gribi, izauga
- Pass
- VERB-Part: pīguoduota
- Fh
- VERB-Fin: ir, pīraksteja, Navarēja, apsaskuove, ceņtēs, dabuoja, dūmoj, gribi, izauga, leist
Pronouns, Determiners, Quantifiers
- Dem
- DET: Itei, tamā
- PRON: tuo, tū
- Ind
- DET: kaida
- Prs
- DET: Muns, sovai, sovam
- PRON: jis, tu, jei, Es, Jai, Jim, Maņ, jam, juo
- Rel
- DET: kurs
- PRON: kuo
- Yes
- DET: Muns, sovai, sovam
- Yes
- VERB-Fin: apsaskuove, ceņtēs, ruodīs, verīs
- 1
- PRON: Es, Maņ
- VERB-Fin: navarieju
- 2
- PRON: tu
- VERB-Fin: attaisi, dūmoj, gribi, variesi, verīs
- 3
- DET: Itei, tamā
- PRON: jis, jei, Jai, Jim, jam, juo, tuo, tū
- VERB-Fin: ir, pīraksteja, Navarēja, apsaskuove, ceņtēs, dabuoja, izauga, leist, lyka, naizdareja
Other Features
Syntax
Auxiliary Verbs and Copula
- This corpus does not contain copulas.
- This corpus does not contain auxiliaries.
Core Arguments, Oblique Arguments and Adjuncts
Here we consider only relations between verbs (parent) and nouns or pronouns (child).
- nsubj
- VERB-Fin--NOUN-Gen (1)
- VERB-Fin--NOUN-Nom (5)
- VERB-Fin--PRON-Nom (11)
- obj
- VERB-Fin--NOUN-Acc (6)
- VERB-Fin--PRON-Acc (2)
- VERB-Inf--NOUN-Acc (3)
- iobj
- VERB-Fin--NOUN-Dat (2)
Relations Overview
- This corpus uses 3 relation subtypes: advmod:emph, advmod:neg, flat:name
- The following 1 main types are not used alone, they are always subtyped: flat
- The following 12 relation types are not used in this corpus at all: expl, dislocated, aux, cop, nummod, clf, compound, list, parataxis, goeswith, reparandum, dep