home edit page issue tracker

This page pertains to UD version 2.

UD for Latvian

Tokenization and Word Segmentation

In general, words are delimited by whitespace characters and punctuation is separated. Description of exceptions follows:

Paragraph borders from the original text is indicated by comment line # newpar in cases when paragraph borders aligns sentence borders and MISC value NewPar=Yes for the token following mid-sentence paragraph break. MISC value SpaceAfter=No is used to note tokens lacking any whitespace after.

Morphology

Tags

Latvian uses all 17 universal POS categories.

Particles

PART tag is used for following function words: acīmredzot, ak, ar, arī, arīdzan, da, diemžēl, diez, diezin, droši, gan, i, ij, ik, ir, it, itin, ja, jau, , jel, jo, kaut, , lai, laikam, mjā, ne, nea, nebūt, nez, nezin, , nu, nudien, nujā, , nūja, nūjā, pat, patiesi, patiešām, protams, proti, taču, tad, tak, , tāpat, tātad, tiešām, tik, tikai, tikpat, tipa, tomēr, turklāt, vai, varbūt, vēl, vien, vienīgi, vis.

Particles can be homonymous with other POS, most notably, conjunctions CCONJ and SCONJ, interjections INTJ, and adjectives ADJ, correct POS is assigned based on sentence context.

Pronouns and Determiners

Effectively distinguishing PRON and DET categories in Latvian is very hard as words used as DET can also be used as PRON, and, thus, traditional Latvian grammar does not define determiners as a distinct POS. Since version 2.15 pronoun (PRON) vs. determiner (DET) distinction is done by lemma (similarly as is done with PDT). In earlyer versions distinction was made based on tree structure.

Currently DET are: abas, abi, cikais, cikas, ciki, cita, cits, daudzi, daža, dažs, ikkatra, ikkatrs, ikkura, ikkurš, ikviena, ikviens, jebkāda, jebkāds, jebkura, jebkurš, jelkāda, jelkāds, jūsējs, kāda, kādā, kādais, kāds, katra, katrs, kura, kurā, kurais, kurs, kurš, manējs, mana, mans, mūsējs, nekāda, nekādā, nekādais, nekāds, neviena, neviens, pate, pati, pats, savējs, sava, savs, šāda, šāds, šī, šis, šitāda, šitāds, šitaids, šitejāda, šitejāds, šitā, šitais, šitas, šitentāda, šitentāds, šitentas, štā, štas, štis, tāda, tāds, , tas, taste, tāte, tavējs, tava, tavs, vairāki, vēlviena, vēlviens, vienotra, vienotrs, viņējs, viņā, viņais, visa, viss.

PRON are: daudzkas, es, jebkas, jelkas, jis, jūs, kas, mēs, nekas, nezinkas, sevis, tu, viņa, viņš, viš.

Syntax role det is used for Latvian pronoun category, which modify nouns in the sentence and agree with this noun in gender, number and case. Pronominal quantifiers daudzi “many” and vairāki “several” , and personal possessives manējais, tavējais, mūsējais, jūsējais, viņējais are DET, however in Latvian grammar they are described as adjectives.

Auxiliary Verbs

Latvian has three auxiliary verbs AUX: būt “to be”, tikt “to get”, and tapt “to become” (obsolete). The auxiliary verb is used in several types of constructions: * Analytic word forms of verbs (būt, tikt). * The copula in non-verbal predicates (būt). * The copula in infinitive predicates (būt).

Būt, tikt and tapt may still occur as normal VERB if they are used in purely existential sentences or indicate location. Verbs with modal meaning are not considered auxiliary in Latvian.

Deverbal Nouns, Participles, Coverbs

Deverbal nouns with endings -šana, -šanās (skriešana “running”) are tagged as NOUN. Most converbs with endings -ot, -oties, -am, -ām, -amies, -āmies, -dams, dama, -damies, -damās are tagged as VERB or AUX. Most adjectival participles (redzams, aizgājis, negaidīts, velkošs) are tagged as VERB. Exceptions are lexicalized uses with separate meaning, like protams “of course”, acīmredzot “obvious”, which are tagged as PART, and iespējams “possible”, which is tagged as ADJ.

Features

Nominal Features

Degree and Polarity

Verbal Features

Pronouns, Determiners, Quantifiers

Unused Features

Features not applicable for Latvian:

Syntax

Core Arguments

Non-verbal Clauses

The copula verb būt “be” is used in equational and attributional nonverbal clauses. Purely existential clauses (also indicating location) use būt as well, but it is treated as the head of the clause and tagged VERB.

Relations Overview

The following relation subtypes are used in Latvian:

The following relation types are not used for Latvian: clf, dislocated, list, reparandum. However, reparandum should be introduced in future, as appropriate speech texts are annotated.

Annotating Textual Errors

Following MISC values can be used to annotate errors in the source text interfering with treebank annotation:

Treebanks

There are 2 Latvian UD treebanks: