home edit page issue tracker

This page pertains to UD version 2.

UD for Latvian

Tokenization and Word Segmentation

In general, words are delimited by whitespace characters and punctuation is separated. Description of exceptions follows:

Paragraph borders from the original text is indicated by comment line # newpar in cases when paragraph borders aligns sentence borders and MISC value NewPar=Yes for the token following mid-sentence paragraph break. MISC value SpaceAfter=No is used to note tokens lacking any whitespace after.

Morphology

Tags

Latvian uses all 17 universal POS categories.

Particles

PART tag is used for following function words: acīmredzot, ak, ar, arī, arīdzan, da, diemžēl, diez, diezin, droši, gan, i, ij, ik, ir, it, itin, ja, jau, , jel, jo, kaut, , lai, laikam, mjā, ne, nea, nebūt, nez, nezin, , nu, nudien, nujā, , nūja, nūjā, pat, patiesi, patiešām, protams, proti, taču, tad, tak, , tāpat, tātad, tiešām, tik, tikai, tikpat, tipa, tomēr, turklāt, vai, varbūt, vēl, vien, vienīgi, vis.

Particles can be homonymous with other POS, most notably, conjunctions CCONJ and SCONJ, interjections INTJ, and adjectives ADJ, correct POS is assigned based on sentence context.

Pronouns and Determiners

Effectively distinguishing PRON and DET categories in Latvian is very hard as words used as DET can also be used as PRON, and, thus, traditional Latvian grammar does not define determiners as a distinct POS. The pronoun (PRON) vs. determiner (DET) distinction is based on the role of the word in the UD tree: if the role in current sentence is det the word is tagged as DET. In turn the role det is used for Latvian pronoun category, which modify nouns in the sentence and agree with this noun in gender, number and case. If these words are used independently in a given sentence, they are tagged as PRON. Pronominal quantifiers daudzi “many” and vairāki “several” , and personal possessives manējais, tavējais, mūsējais, jūsējais, viņējais are DET as well if they modify the noun in the sentence, however in Latvian grammar they are described as adjectives.

Auxiliary Verbs

Latvian has three auxiliary verbs AUX: būt “to be”, tikt “to get”, and tapt “to become” (obsolete). The auxiliary verb is used in several types of constructions: * Analytic word forms of verbs (būt, tikt). * The copula in non-verbal predicates (būt). * The copula in infinitive predicates (būt).

Būt, tikt and tapt may still occur as normal VERB if they are used in purely existential sentences or indicate location. Verbs with modal meaning are not considered auxiliary in Latvian.

Deverbal Nouns, Participles, Coverbs

Deverbal nouns with endings -šana, -šanās (skriešana “running”) are tagged as NOUN. Most converbs with endings -ot, -oties, -am, -ām, -amies, -āmies, -dams, -damies, -damās are tagged as VERB or AUX. Most adjectival participles (redzams, aizgājis, negaidīts, velkošs) are tagged as VERB. Exceptions are lexicalized uses with separate meaning, like protams “of course”, acīmredzot “obvious”, which are tagged as PART, and iespējams “possible”, which is tagged as ADJ.

Features

Nominal Features

Degree and Polarity

Verbal Features

Pronouns, Determiners, Quantifiers

Unused Features

Features not applicable for Latvian:

Syntax

Core Arguments

Non-verbal Clauses

The copula verb būt “be” is used in equational and attributional nonverbal clauses. Purely existential clauses (also indicating location) use būt as well, but it is treated as the head of the clause and tagged VERB.

Relations Overview

The following relation subtypes are used in Latvian:

The following relation types are not used for Latvian: clf, dislocated, list, reparandum. However, reparandum should be introduced in future, as appropriate speech texts are annotated.

Annotating Textual Errors

Following MISC values can be used to annotate errors in the source text interfering with treebank annotation:

Treebanks

There are 2 Latvian UD treebanks: