home edit page issue tracker

This page pertains to UD version 2.

Treebank Statistics: UD_Czech-PDT: POS Tags: NUM

There are 1226 NUM lemmas (4%), 1276 NUM types (2%) and 8531 NUM tokens (3%). Out of 17 observed tags, the rank of NUM is: 6 in number of lemmas, 6 in number of types and 12 in number of tokens.

The 10 most frequent NUM lemmas: jeden, dva, 1, tři, 2, oba, 3, čtyři, pět, 4

The 10 most frequent NUM types: 1, 2, 3, tři, dva, dvě, 4, 10, jeden, 5

The 10 most frequent ambiguous lemmas: I (NUM 20, NOUN 13, X 2), V (NOUN 49, NUM 5), XX (NOUN 1, NUM 1)

The 10 most frequent ambiguous types: tří (NUM 44, ADJ 1), jednou (ADV 32, NUM 32), I (CCONJ 91, NUM 20, NOUN 13, X 2), V (ADP 797, NOUN 49, NUM 5), XX (NOUN 1, NUM 1)

Morphology

The form / lemma ratio of NUM is 1.040783 (the average of all parts of speech is 1.964432).

The 1st highest number of forms (10) was observed with the lemma “jeden”: jeden, jedna, jedno, jednoho, jednom, jednomu, jednou, jednu, jedné, jedním.

The 2nd highest number of forms (6) was observed with the lemma “čtyři”: čtyř, čtyřech, čtyřem, čtyři, čtyřma, čtyřmi.

The 3rd highest number of forms (5) was observed with the lemma “tři”: třech, třem, třemi, tři, tří.

NUM occurs with 6 features: NumType (8531; 100% instances), NumForm (8530; 100% instances), Case (2368; 28% instances), Number (2368; 28% instances), Gender (966; 11% instances), Animacy (56; 1% instances)

NUM occurs with 21 feature-value pairs: Animacy=Anim, Animacy=Inan, Case=Acc, Case=Dat, Case=Gen, Case=Ins, Case=Loc, Case=Nom, Gender=Fem, Gender=Fem,Neut, Gender=Masc, Gender=Masc,Neut, Gender=Neut, NumForm=Digit, NumForm=Roman, NumForm=Word, NumType=Card, NumType=Frac, Number=Dual, Number=Plur, Number=Sing

NUM occurs with 33 feature combinations. The most frequent feature combination is NumForm=Digit|NumType=Card (6033 tokens). Examples: 1, 2, 3, 4, 10, 5, 1992, 6, 1993, 15

Relations

NUM nodes are attached to their parents using 23 different relations: nummod (4215; 49% instances), nummod:gov (1143; 13% instances), conj (801; 9% instances), dep (448; 5% instances), compound (442; 5% instances), obl (431; 5% instances), root (273; 3% instances), obj (266; 3% instances), nsubj (170; 2% instances), orphan (94; 1% instances), obl:arg (83; 1% instances), appos (47; 1% instances), nsubj:pass (33; 0% instances), nmod (31; 0% instances), xcomp (17; 0% instances), advcl (11; 0% instances), flat (11; 0% instances), ccomp (5; 0% instances), acl (3; 0% instances), iobj (3; 0% instances), acl:relcl (2; 0% instances), csubj:pass (1; 0% instances), mark (1; 0% instances)

Parents of NUM nodes belong to 13 different parts of speech: NOUN (5166; 61% instances), NUM (1184; 14% instances), VERB (805; 9% instances), PROPN (364; 4% instances), (273; 3% instances), ADJ (204; 2% instances), DET (184; 2% instances), SYM (119; 1% instances), X (88; 1% instances), ADV (86; 1% instances), PRON (36; 0% instances), AUX (20; 0% instances), CCONJ (2; 0% instances)

4250 (50%) NUM nodes are leaves.

2842 (33%) NUM nodes have one child.

866 (10%) NUM nodes have two children.

573 (7%) NUM nodes have three or more children.

The highest child degree of a NUM node is 27.

Children of NUM nodes are attached using 30 different relations: punct (2224; 33% instances), nmod (846; 12% instances), conj (794; 12% instances), case (531; 8% instances), compound (442; 7% instances), advmod:emph (413; 6% instances), det (386; 6% instances), cc (287; 4% instances), dep (193; 3% instances), amod (122; 2% instances), cop (99; 1% instances), nsubj (85; 1% instances), mark (83; 1% instances), orphan (70; 1% instances), advmod (58; 1% instances), appos (50; 1% instances), obl (27; 0% instances), flat (23; 0% instances), acl:relcl (12; 0% instances), parataxis (10; 0% instances), xcomp (9; 0% instances), advcl (8; 0% instances), csubj (5; 0% instances), obl:arg (4; 0% instances), aux (3; 0% instances), obj (3; 0% instances), acl (2; 0% instances), discourse (2; 0% instances), det:nummod (1; 0% instances), fixed (1; 0% instances)

Children of NUM nodes belong to 16 different parts of speech: PUNCT (2224; 33% instances), NUM (1184; 17% instances), NOUN (901; 13% instances), ADP (529; 8% instances), DET (444; 7% instances), ADV (299; 4% instances), CCONJ (280; 4% instances), SYM (232; 3% instances), PART (181; 3% instances), ADJ (144; 2% instances), AUX (102; 2% instances), SCONJ (84; 1% instances), PROPN (82; 1% instances), VERB (50; 1% instances), PRON (35; 1% instances), X (22; 0% instances)