home edit page issue tracker

This page pertains to UD version 2.

Treebank Statistics: UD_Latvian-LVTB: POS Tags: PROPN

There are 4255 PROPN lemmas (18%), 5666 PROPN types (11%) and 13281 PROPN tokens (4%). Out of 17 observed tags, the rank of PROPN is: 2 in number of lemmas, 4 in number of types and 8 in number of tokens.

The 10 most frequent PROPN lemmas: Latvija, Rīga, Eiropa, Krievija, ES, Sofija, Saeima, LETA, Baltija, Vācija

The 10 most frequent PROPN types: Latvijas, Latvijā, Eiropas, Rīgas, ES, Krievijas, LETA, Baltijas, Sofija, Rīgā

The 10 most frequent ambiguous lemmas: v. (PROPN 14, NOUN 1), m. (PROPN 12, X 1), d. (PROPN 10, X 1), EK (PROPN 15, NOUN 1), g. (NOUN 11, PROPN 4), FM (PROPN 10, X 1), V (PROPN 6, NUM 3), BKUS (PROPN 3, NOUN 1), Huawei (PROPN 3, X 2), t. (PROPN 1, SYM 1)

The 10 most frequent ambiguous types: M. (PROPN 26, NOUN 1, X 1), Satversmes (PROPN 26, NOUN 1), Saules (PROPN 26, NOUN 2), D. (PROPN 19, X 1), EK (PROPN 15, NOUN 1), Mēness (PROPN 14, NOUN 2), FM (PROPN 10, X 1), Jūrmalā (PROPN 10, NOUN 1), airBaltic (PROPN 10, X 1), vilnis (NOUN 3, PROPN 2)

Morphology

The form / lemma ratio of PROPN is 1.331610 (the average of all parts of speech is 2.328168).

The 1st highest number of forms (8) was observed with the lemma “Jānis”: JĀNIS, Jāni, Jānim, Jānis, Jāņa, Jāņi, Jāņiem, Jāņu.

The 2nd highest number of forms (6) was observed with the lemma “Eiropa”: EIROPAS, Eiropa, Eiropai, Eiropas, Eiropu, Eiropā.

The 3rd highest number of forms (6) was observed with the lemma “Rīga”: RĪGAS, Rīga, Rīgai, Rīgas, Rīgu, Rīgā.

PROPN occurs with 5 features: Gender (11437; 86% instances), Case (11409; 86% instances), Number (11409; 86% instances), Abbr (1229; 9% instances), Typo (19; 0% instances)

PROPN occurs with 13 feature-value pairs: Abbr=Yes, Case=Acc, Case=Dat, Case=Gen, Case=Loc, Case=Nom, Case=Voc, Gender=Fem, Gender=Masc, Number=Plur, Number=Ptan, Number=Sing, Typo=Yes

PROPN occurs with 44 feature combinations. The most frequent feature combination is Case=Gen|Gender=Fem|Number=Sing (3001 tokens). Examples: Latvijas, Eiropas, Rīgas, Krievijas, Baltijas, Saeimas, Jelgavas, Liepājas, Bauskas, Lietuvas

Relations

PROPN nodes are attached to their parents using 23 different relations: nmod (4500; 34% instances), nsubj (2848; 21% instances), flat:name (2061; 16% instances), obl (1381; 10% instances), conj (1058; 8% instances), iobj (377; 3% instances), obj (293; 2% instances), parataxis (210; 2% instances), root (171; 1% instances), appos (104; 1% instances), nsubj:pass (85; 1% instances), discourse (47; 0% instances), vocative (34; 0% instances), dep (26; 0% instances), orphan (26; 0% instances), acl (24; 0% instances), xcomp (21; 0% instances), ccomp (9; 0% instances), flat (2; 0% instances), advcl (1; 0% instances), amod (1; 0% instances), csubj (1; 0% instances), flat:foreign (1; 0% instances)

Parents of PROPN nodes belong to 16 different parts of speech: NOUN (4893; 37% instances), VERB (4777; 36% instances), PROPN (3107; 23% instances), (171; 1% instances), ADJ (112; 1% instances), ADV (61; 0% instances), X (57; 0% instances), PRON (43; 0% instances), NUM (40; 0% instances), AUX (6; 0% instances), INTJ (5; 0% instances), SYM (4; 0% instances), DET (2; 0% instances), CCONJ (1; 0% instances), PART (1; 0% instances), PUNCT (1; 0% instances)

8225 (62%) PROPN nodes are leaves.

2394 (18%) PROPN nodes have one child.

1533 (12%) PROPN nodes have two children.

1129 (9%) PROPN nodes have three or more children.

The highest child degree of a PROPN node is 34.

Children of PROPN nodes are attached using 25 different relations: punct (2352; 24% instances), flat:name (2123; 22% instances), nmod (1661; 17% instances), conj (1107; 11% instances), case (755; 8% instances), cc (541; 6% instances), acl (214; 2% instances), parataxis (209; 2% instances), discourse (196; 2% instances), amod (149; 2% instances), appos (70; 1% instances), det (60; 1% instances), orphan (59; 1% instances), advmod (39; 0% instances), dep (39; 0% instances), nsubj (35; 0% instances), cop (33; 0% instances), advcl (20; 0% instances), obl (16; 0% instances), mark (8; 0% instances), nummod (6; 0% instances), flat (5; 0% instances), flat:foreign (4; 0% instances), aux (1; 0% instances), iobj (1; 0% instances)

Children of PROPN nodes belong to 17 different parts of speech: PROPN (3107; 32% instances), PUNCT (2352; 24% instances), NOUN (1916; 20% instances), ADP (715; 7% instances), CCONJ (533; 5% instances), VERB (271; 3% instances), PART (167; 2% instances), ADJ (143; 1% instances), X (106; 1% instances), NUM (97; 1% instances), ADV (87; 1% instances), DET (55; 1% instances), SYM (52; 1% instances), SCONJ (38; 0% instances), AUX (34; 0% instances), PRON (28; 0% instances), INTJ (2; 0% instances)