This is part of archived UD v1 documentation. See http://universaldependencies.org/ for the current version.
home et/pos issue tracker

NUM: numeral

Definition

A numeral is a word that expresses a number and a relation to the number, such as quantity, sequence, frequency or fraction.
Both cardinal and ordinal numerals get the postag NUM. Also words like paar “pair”, paarsada “about twenty”, paarkümmend “about two hundred” etc, tosin “dozen” are labelled as NUM.


Treebank Statistics (UD_Estonian)

There are 773 NUM lemmas (3%), 963 NUM types (2%) and 4131 NUM tokens (2%). Out of 15 observed tags, the rank of NUM is: 6 in number of lemmas, 6 in number of types and 12 in number of tokens.

The 10 most frequent NUM lemmas: kaks, üks, kolm, miljon, viis, kümme, neli, pool, paar, 000

The 10 most frequent NUM types: kaks, kolm, 000, üks, kahe, miljonit, ühe, paar, viis, neli

The 10 most frequent ambiguous lemmas: üks (PRON 495, NUM 207), viis (NUM 114, NOUN 27), pool (NUM 98, NOUN 49, ADV 20, ADP 17), paar (NUM 97, NOUN 12), 2000 (NUM 48, ADJ 1), seitse (NUM 48, NOUN 1), sada (NUM 45, VERB 4, NOUN 2), kolmandik (NUM 18, NOUN 6), 2001 (NUM 12, ADJ 1), 1990 (NUM 11, ADJ 1)

The 10 most frequent ambiguous types: üks (PRON 152, NUM 70), ühe (PRON 70, NUM 58), paar (NUM 50, NOUN 3), viis (NUM 57, VERB 18, NOUN 5), 2000 (NUM 47, ADJ 1), pool (NUM 37, ADV 20, ADP 16, NOUN 9), poole (ADP 116, NUM 31, NOUN 6, ADV 2), seitse (NUM 28, NOUN 1), kuus (NUM 27, NOUN 27), paari (NUM 17, NOUN 6)

Morphology

The form / lemma ratio of NUM is 1.245796 (the average of all parts of speech is 1.839644).

The 1st highest number of forms (17) was observed with the lemma “üks”: ühe, ühe-, ühega, üheks, ühel, ühele, ühelgi, ühelt, ühena, ühes, ühest, üht, ühte, ühtegi, ühtki, üks, ükski.

The 2nd highest number of forms (13) was observed with the lemma “kümme”: Kümned, kümme, kümmet, kümne, kümnega, kümneid, kümneks, kümnel, kümnele, kümnest, kümnete, kümnetesse, kümnetest.

The 3rd highest number of forms (12) was observed with the lemma “miljon”: Miljonitel, miljon, miljoneid, miljoni, miljonid, miljoniga, miljonilt, miljonini, miljonist, miljonit, miljonite, miljonitest.

NUM occurs with 6 features: NumType (4131; 100% instances), NumForm (4077; 99% instances), Case (3275; 79% instances), Number (3275; 79% instances), Hyph (6; 0% instances), PronType (1; 0% instances)

NUM occurs with 22 feature-value pairs: Case=Abl, Case=Add, Case=Ade, Case=All, Case=Com, Case=Ela, Case=Ess, Case=Gen, Case=Ill, Case=Ine, Case=Nom, Case=Par, Case=Ter, Case=Tra, Hyph=Yes, NumForm=Digit, NumForm=Letter, NumType=Card, NumType=Ord, Number=Plur, Number=Sing, PronType=Ind

NUM occurs with 50 feature combinations. The most frequent feature combination is Case=Nom|Number=Sing|NumForm=Digit|NumType=Card (983 tokens). Examples: 000, 2000, 1997, 1999, 15, 1998, 20, 2002, 50, 1

Relations

NUM nodes are attached to their parents using 15 different relations: nummod (3386; 82% instances), compound (342; 8% instances), conj (108; 3% instances), root (98; 2% instances), nsubj (80; 2% instances), dobj (52; 1% instances), parataxis (27; 1% instances), nsubj:cop (19; 0% instances), dep (5; 0% instances), acl:relcl (4; 0% instances), list (3; 0% instances), nmod (3; 0% instances), csubj (2; 0% instances), amod (1; 0% instances), name (1; 0% instances)

Parents of NUM nodes belong to 11 different parts of speech: NOUN (2584; 63% instances), NUM (566; 14% instances), VERB (462; 11% instances), PROPN (279; 7% instances), ROOT (98; 2% instances), ADJ (82; 2% instances), ADV (39; 1% instances), SYM (9; 0% instances), ADP (8; 0% instances), X (3; 0% instances), AUX (1; 0% instances)

2623 (63%) NUM nodes are leaves.

1115 (27%) NUM nodes have one child.

250 (6%) NUM nodes have two children.

143 (3%) NUM nodes have three or more children.

The highest child degree of a NUM node is 14.

Children of NUM nodes are attached using 21 different relations: punct (440; 20% instances), advmod (383; 17% instances), compound (339; 15% instances), nmod (318; 14% instances), nummod (156; 7% instances), conj (124; 6% instances), case (100; 4% instances), amod (97; 4% instances), cc (80; 4% instances), det (53; 2% instances), nsubj:cop (44; 2% instances), cop (41; 2% instances), appos (12; 1% instances), advcl (8; 0% instances), parataxis (8; 0% instances), mark (6; 0% instances), nmod:poss (6; 0% instances), dep (5; 0% instances), advmod:quant (4; 0% instances), nsubj (2; 0% instances), xcomp (1; 0% instances)

Children of NUM nodes belong to 13 different parts of speech: NUM (566; 25% instances), PUNCT (440; 20% instances), ADV (396; 18% instances), NOUN (343; 15% instances), ADJ (109; 5% instances), ADP (100; 4% instances), PRON (83; 4% instances), CONJ (79; 4% instances), VERB (54; 2% instances), PROPN (46; 2% instances), SCONJ (6; 0% instances), SYM (4; 0% instances), X (1; 0% instances)


NUM in other languages: [bg] [cs] [de] [el] [en] [es] [eu] [fa] [fi] [fr] [ga] [he] [hu] [it] [ja] [ko] [sv] [u]