home edit page issue tracker

This page pertains to UD version 2.

UD for Latvian

Tokenization and Word Segmentation

In general, words are delimited by whitespace characters and punctuation is separated. Description of exceptions follows:

Morphology

Tags

Latvian uses all 17 universal POS categories.

Particles

PART tag is used for following function words: acīmredzot, ak, ar, arī, arīdzan, da, diemžēl, diez, diezin, droši, gan, i, ij, ik, ir, it, itin, ja, jau, , jel, jo, kaut, , lai, laikam, mjā, ne, nea, nebūt, nez, nezin, , nu, nudien, nujā, , nūja, nūjā, pat, patiesi, patiešām, protams, proti, taču, tad, tak, , tāpat, tātad, tiešām, tik, tikai, tikpat, tipa, tomēr, turklāt, vai, varbūt, vēl, vien, vienīgi, vis.

Particles can be homonymous with other POS, most notably, conjunctions CCONJ and SCONJ, interjections INTJ, and adjectives ADJ, correct POS is assigned based on sentence context.

Pronouns and Determiniers

Effectively distinguishing PRON and DET categories in Latvian is very hard as words used as DET can also be used as PRON, and, thus, traditional Latvian grammar does not define determiners as a distinct POS. The pronoun (PRON) vs. determiner (DET) distinction is based on the role of the word in the UD tree: if the role in current sentence is det the word is tagged as DET. In turn the role det is used for Latvian pronoun category, which modify nouns in the sentence and agree with this noun in gender, number and case. If these words are used independently in a given sentence, they are tagged as PRON. Pronominal quantifiers daudzi “many” and vairāki “several” , and personal possessives manējais, tavējais, mūsējais, jūsējais, viņējais are DET as well if they modify the noun in the sentence, however in Latvian grammar they are described as adjectives.

Auxiliary Verbs

Features

Nominal Features

Degree and Polarity

Verbal Features

Pronouns, Determiners, Quantifiers

Unused Features

Features not applicable for Latvian:

Syntax

Core Arguments

Non-verbal Clauses

The copula verbs būt “be” and kļūt “become” is used in equational and attributional nonverbal clauses. Purely existential clauses (also indicating location) use būt as well, but it is treated as the head of the clause and tagged VERB.

Relations Overview

The following relation subtypes are used in Latvian:

The following relation types are not used for Latvian: clf, dislocated, list, reparandum. However, reparandum should be introduced in future, as appropriate speech texts are annotated.

Treebanks

There is 1 Latvian UD treebank: