home edit page issue tracker

This page pertains to UD version 2.

Treebank Statistics: UD_Czech-PDT: POS Tags: NUM

There are 1226 NUM lemmas (4%), 1276 NUM types (2%) and 8531 NUM tokens (3%). Out of 17 observed tags, the rank of NUM is: 6 in number of lemmas, 6 in number of types and 12 in number of tokens.

The 10 most frequent NUM lemmas: jeden, dva, 1, tři, 2, oba, 3, čtyři, pět, 4

The 10 most frequent NUM types: 1, 2, 3, tři, dva, dvě, 4, 10, jeden, 5

The 10 most frequent ambiguous lemmas: I (NUM 20, NOUN 13, X 2), V (NOUN 49, NUM 5), XX (NOUN 1, NUM 1)

The 10 most frequent ambiguous types: tří (NUM 44, ADJ 1), jednou (ADV 32, NUM 32), I (CCONJ 91, NUM 20, NOUN 13, X 2), V (ADP 797, NOUN 49, NUM 5), XX (NOUN 1, NUM 1)

Morphology

The form / lemma ratio of NUM is 1.040783 (the average of all parts of speech is 1.964432).

The 1st highest number of forms (10) was observed with the lemma “jeden”: jeden, jedna, jedno, jednoho, jednom, jednomu, jednou, jednu, jedné, jedním.

The 2nd highest number of forms (6) was observed with the lemma “čtyři”: čtyř, čtyřech, čtyřem, čtyři, čtyřma, čtyřmi.

The 3rd highest number of forms (5) was observed with the lemma “tři”: třech, třem, třemi, tři, tří.

NUM occurs with 7 features: NumType (8531; 100% instances), NumForm (8530; 100% instances), Case (2368; 28% instances), Number (2368; 28% instances), NumValue (1210; 14% instances), Gender (966; 11% instances), Animacy (56; 1% instances)

NUM occurs with 22 feature-value pairs: Animacy=Anim, Animacy=Inan, Case=Acc, Case=Dat, Case=Gen, Case=Ins, Case=Loc, Case=Nom, Gender=Fem, Gender=Fem,Neut, Gender=Masc, Gender=Masc,Neut, Gender=Neut, NumForm=Digit, NumForm=Roman, NumForm=Word, NumType=Card, NumType=Frac, NumValue=1,2,3, Number=Dual, Number=Plur, Number=Sing

NUM occurs with 38 feature combinations. The most frequent feature combination is NumForm=Digit|NumType=Card (6033 tokens). Examples: 1, 2, 3, 4, 10, 5, 1992, 6, 1993, 15

Relations

NUM nodes are attached to their parents using 22 different relations: nummod (4215; 49% instances), nummod:gov (1143; 13% instances), conj (801; 9% instances), dep (448; 5% instances), compound (442; 5% instances), obl (437; 5% instances), root (271; 3% instances), obj (266; 3% instances), nsubj (170; 2% instances), orphan (94; 1% instances), obl:arg (83; 1% instances), appos (47; 1% instances), nsubj:pass (33; 0% instances), nmod (31; 0% instances), xcomp (17; 0% instances), flat (11; 0% instances), advcl (10; 0% instances), ccomp (5; 0% instances), iobj (3; 0% instances), acl (2; 0% instances), acl:relcl (1; 0% instances), mark (1; 0% instances)

Parents of NUM nodes belong to 13 different parts of speech: NOUN (5143; 60% instances), NUM (1184; 14% instances), VERB (803; 9% instances), PROPN (363; 4% instances), (271; 3% instances), ADJ (203; 2% instances), DET (183; 2% instances), SYM (119; 1% instances), X (88; 1% instances), ADV (82; 1% instances), AUX (56; 1% instances), PRON (34; 0% instances), CCONJ (2; 0% instances)

4250 (50%) NUM nodes are leaves.

2844 (33%) NUM nodes have one child.

868 (10%) NUM nodes have two children.

569 (7%) NUM nodes have three or more children.

The highest child degree of a NUM node is 27.

Children of NUM nodes are attached using 30 different relations: punct (2218; 33% instances), nmod (846; 13% instances), conj (791; 12% instances), case (531; 8% instances), compound (442; 7% instances), advmod:emph (413; 6% instances), det (386; 6% instances), cc (287; 4% instances), dep (193; 3% instances), amod (122; 2% instances), cop (93; 1% instances), mark (80; 1% instances), nsubj (80; 1% instances), orphan (70; 1% instances), advmod (57; 1% instances), appos (50; 1% instances), obl (27; 0% instances), flat (23; 0% instances), acl:relcl (12; 0% instances), parataxis (10; 0% instances), xcomp (9; 0% instances), advcl (8; 0% instances), csubj (5; 0% instances), obl:arg (4; 0% instances), obj (3; 0% instances), acl (2; 0% instances), aux (2; 0% instances), discourse (2; 0% instances), det:nummod (1; 0% instances), fixed (1; 0% instances)

Children of NUM nodes belong to 16 different parts of speech: PUNCT (2218; 33% instances), NUM (1184; 17% instances), NOUN (894; 13% instances), ADP (529; 8% instances), DET (443; 7% instances), ADV (298; 4% instances), CCONJ (280; 4% instances), SYM (232; 3% instances), PART (181; 3% instances), ADJ (144; 2% instances), AUX (98; 1% instances), PROPN (82; 1% instances), SCONJ (81; 1% instances), VERB (48; 1% instances), PRON (34; 1% instances), X (22; 0% instances)