home edit page issue tracker

This page pertains to UD version 2.

Treebank Statistics: UD_Czech-CLTT: POS Tags: NUM

There are 83 NUM lemmas (3%), 97 NUM types (2%) and 434 NUM tokens (1%). Out of 15 observed tags, the rank of NUM is: 6 in number of lemmas, 6 in number of types and 12 in number of tokens.

The 10 most frequent NUM lemmas: 1, 3, 2, jeden, 4, 5, dva, 41, 7, dvanáct

The 10 most frequent NUM types: 1, 3, 2, 4, jeden, 5, 41, 7, jedné, tří

The 10 most frequent ambiguous lemmas:

The 10 most frequent ambiguous types: jednou (ADV 3, NUM 3)

Morphology

The form / lemma ratio of NUM is 1.168675 (the average of all parts of speech is 1.713272).

The 1st highest number of forms (9) was observed with the lemma “jeden”: jeden, jedno, jednoho, jednom, jednomu, jednou, jednu, jedné, jedním.

The 2nd highest number of forms (4) was observed with the lemma “dva”: dva, dvou, dvě, dvěma.

The 3rd highest number of forms (2) was observed with the lemma “dvanáct”: dvanáct, dvanácti.

NUM occurs with 6 features: NumForm (434; 100% instances), NumType (434; 100% instances), Case (68; 16% instances), Number (68; 16% instances), Gender (46; 11% instances), Animacy (12; 3% instances)

NUM occurs with 17 feature-value pairs: Animacy=Inan, Case=Acc, Case=Dat, Case=Gen, Case=Ins, Case=Loc, Case=Nom, Gender=Fem, Gender=Fem,Neut, Gender=Masc, Gender=Masc,Neut, Gender=Neut, NumForm=Roman, NumForm=Word, NumType=Card, Number=Plur, Number=Sing

NUM occurs with 19 feature combinations. The most frequent feature combination is NumForm=Roman|NumType=Card (366 tokens). Examples: 1, 3, 2, 4, 5, 41, 7, 10, 2004, 2008

Relations

NUM nodes are attached to their parents using 12 different relations: nummod (283; 65% instances), nmod (47; 11% instances), conj (44; 10% instances), nummod:gov (24; 6% instances), advcl (17; 4% instances), obj (8; 2% instances), obl (5; 1% instances), nsubj (2; 0% instances), compound (1; 0% instances), dep (1; 0% instances), obl:arg (1; 0% instances), orphan (1; 0% instances)

Parents of NUM nodes belong to 7 different parts of speech: NOUN (332; 76% instances), NUM (40; 9% instances), VERB (36; 8% instances), ADV (13; 3% instances), ADJ (6; 1% instances), X (6; 1% instances), SYM (1; 0% instances)

231 (53%) NUM nodes are leaves.

164 (38%) NUM nodes have one child.

31 (7%) NUM nodes have two children.

8 (2%) NUM nodes have three or more children.

The highest child degree of a NUM node is 8.

Children of NUM nodes are attached using 13 different relations: punct (84; 31% instances), nmod (56; 21% instances), conj (42; 16% instances), cc (34; 13% instances), mark (17; 6% instances), advmod:emph (14; 5% instances), dep (7; 3% instances), obl (6; 2% instances), advmod (3; 1% instances), case (3; 1% instances), compound (1; 0% instances), cop (1; 0% instances), nsubj (1; 0% instances)

Children of NUM nodes belong to 12 different parts of speech: PUNCT (84; 31% instances), NUM (40; 15% instances), X (33; 12% instances), NOUN (25; 9% instances), CCONJ (24; 9% instances), SCONJ (21; 8% instances), ADV (19; 7% instances), SYM (13; 5% instances), PART (6; 2% instances), ADP (2; 1% instances), AUX (1; 0% instances), DET (1; 0% instances)