home edit page issue tracker

This page pertains to UD version 2.

UD for Latgalian

It is important to note that currently UD guidelines for annotating Latgalian is in a very early stage as not much text has been annotated yet.

Tokenization and Word Segmentation

In general, words are delimited by whitespace characters and punctuation is separated. Description of exceptions follows:

Paragraph borders from the original text is indicated by comment line # newpar in cases when paragraph borders aligns sentence borders and MISC value NewPar=Yes for the token following mid-sentence paragraph break. MISC value SpaceAfter=No is used to note tokens lacking any whitespace after.

Morphology

Tags

Latgalian uses all 17 universal POS categories.

Particles

PART tag is used for following function words: ar, ari, , ba, da, dīvamžāļ, dīz, gon, ik, it, kab, kazyn, konče, koč, kod, kuo, lai, laikam, mošeit, mož, na, nabejs, naviņ, naz, nazyn, nui, , pat, prūtams, rikti, ta, tak, tik, tikai, to, tok, tože, varbyut, viņ, vys, vīneigi. This list might be expanded in future.

Pronouns and Determiners

Effectively distinguishing PRON and DET categories in Latgalian (similarly as in Latvian) is very hard and currently no clear guidelines has been developed yet. Following the example of Latvian, distinction is done by lemma.

Currently DET are: itei, itys, kaida, kaids, kura, kurs, muna, muns, sova, sovs, tei, tis, toveja, tovejs.

PRON are: es, jei, jis, , kas, tu.

These lists will be expanded in future.

Auxiliary Verbs

Latgalian has one auxiliary verb AUX: byut “to be”. The auxiliary verb is used in several types of constructions:

Byut may still occur as normal VERB if it is used in purely existential sentences or indicate location.

Verbs with modal meaning are not considered auxiliary in Latgalian.

Deverbal Nouns, Participles, Coverbs

Latgalian features rich set of deverbal derivations and not everything has been analized to align with UD guidelines yet. However, deverbal nouns with endings -šona, -šonuos (skrīšona “running”) are tagged as NOUN. Most converbs with endings -ūt, -ūts, -ūte, -ūtīs, -om, -omīs, -dams, -dama, -damīs, -damuos are tagged as VERB or AUX. Most adjectival participles (radzams, aizguojs, nagaideits, valkūšs) are tagged as VERB. Exceptions are lexicalized uses with separate meaning, like prūtams “of course”, acimradzūt “obvious”, which are tagged as PART, and īspiejams “possible”, which is tagged as ADJ.

Features

Nominal Features

Verbal Features

Pronouns, Determiners, Quantifiers

Unused Features

Features not applicable for Latvian:

Syntax

Core Arguments

Non-verbal Clauses

The copula verb byut “be” is used in equational and attributional nonverbal clauses. Purely existential clauses (also indicating location) use būt as well, but it is treated as the head of the clause and tagged VERB.

Relations Overview

The following relation subtypes are used in Latgalian:

The following relation types are not used for Latgalian: clf, dislocated, list, reparandum. However, reparandum should be introduced in future, as appropriate speech texts are annotated.

Treebanks

There is 1 Latgalian UD treebank: