NUM

home cs/pos edit page issue tracker

`NUM`: numeral

Definition

A numeral is a word, functioning most typically as a determiner, adjective or pronoun, that expresses a number and a relation to the number, such as quantity, sequence, frequency or fraction.

Note that cardinal numerals are covered by NUM whether they are used as determiners or not (as in Windows 7) and whether they are expressed as words (čtyři), digits (4) or Roman numerals (IV).

Czech grammar distinguishes several subclasses of pronominal numerals (quantifiers): interrogative and relative (kolik “how many”); demonstrative (tolik “this many”); indefinite (několik, mnoho, málo “several, many, few”). These words behave similarly to (most) cardinal numbers, e.g. they require that the counted noun phrase be in genitive. They are not similar to adjectives (unlike their English counterparts). However, in accord with the UD standard, they should be tagged DET, not NUM.

In addition, several types of (non-pronominal) numerals, such as ordinal numerals and multiplicative numerals, are tagged ADJ or ADV, based on their syntactic and morphological behavior.

Examples

0, 1, 2, 3, 4, 5, 2014, 1000000, 3.14159265359
I, II, III, IV, V, MMXIV
jeden, dva, tři, čtyři, pět, sedmdesát “one, two, three, four, five, seventy”
polovina, třetina, čtvrtina, pětina “one-half, one third, quarter, one fifth”: denominators of fractions constitute a separate class of cardinal numerals.
čtvero, patero “four, five” (These are special forms, so-called generic numerals. They are used rarely, in literary or archaic style.)
jedny, dvoje, troje, čtvery, patery, sedmdesátery “one set of, two sets of, three sets of, four sets of, five sets of, seventy sets of”

Counterexamples

první, druhý, třetí “first, second, third”: adjectival ordinal numerals. They are tagged ADJ, and the cs-feat/NumType feature reveals their semantic relation to numbers.
poprvé, podruhé, potřetí “for the first time, for the second time, for the third time”: adverbial ordinal numerals. They are tagged ADV, and the cs-feat/NumType feature reveals their semantic relation to numbers.
jednou, dvakrát, třikrát “once, twice, three times”: multiplicative numerals. They are tagged ADV, and the cs-feat/NumType feature reveals their semantic relation to numbers.
dvojí, trojí, čtverý, paterý, sedmdesáterý “twofold, three kinds of, four kinds of, five kinds of, seventy kinds of”: generic numerals. They are tagged ADJ.
dvojice, trojice, čtveřice “pair, triplet, foursome”: n-tuples (n-tice) are not considered numerals in the Czech grammar. They are tagged NOUN.
jednička, dvojka, trojka, čtyřka, pětka “number one, number two, number three, number four, number five”: names of numbers, or of objects identified by the number (e.g. of a bus route). They are not considered numerals and they are tagged NOUN.
tisíc, milión, miliarda, bilión “thousand, million, billion, trillion”: words for large quantities are ambiguous between cardinal numerals (tagged NUM) and nouns. If they inflect as nouns, they are tagged NOUN; but the borderline is fuzzy. For instance, in phrases like tisíce lidí demonstrovaly v ulicích (“thousands of people demonstrated in the streets”), tisíce is a noun. In numeric expressions, e.g. 110 tisíc dolarů (“110 thousand dollars”), it is a cardinal numeral.

References

Treebank Statistics (UD_Czech)

There are 3435 NUM lemmas (6%), 3542 NUM types (3%) and 41510 NUM tokens (3%). Out of 17 observed tags, the rank of NUM is: 5 in number of lemmas, 5 in number of types and 10 in number of tokens.

The 10 most frequent NUM lemmas: jeden, dva, 1, tři, 2, oba, 3, 4, pět, čtyři

The 10 most frequent NUM types: 1, 2, 3, dva, tři, 4, jeden, 6, dvě, tisíc

The 10 most frequent ambiguous lemmas: jeden (NUM 2526, ADJ 31), tři (NUM 1207, ADJ 1), pět (NUM 625, VERB 1), tisíc (NUM 539, NOUN 330, ADV 1), 12 (NUM 307, ADV 1), osm (NUM 236, ADJ 1), I (NUM 97, PROPN 62, ADJ 17, PRON 16), půl (NOUN 177, NUM 64), třináct (NUM 53, ADJ 1), sto (NOUN 304, NUM 41)

The 10 most frequent ambiguous types: tisíc (NUM 538, NOUN 92), dvou (NUM 519, ADJ 1), 12 (NUM 306, ADV 1), tří (NUM 239, ADJ 3), jedno (NUM 152, ADJ 1), jednou (ADV 165, NUM 129), čtyř (NUM 100, ADJ 1), I (CONJ 465, NUM 97, PROPN 62, ADJ 19, PRON 6, NOUN 1), osmi (NUM 91, ADJ 1), půl (NOUN 164, NUM 64)

tisíc
- NUM 538: Ročně vyprodukovaných 280 - 350 tisíc tun popelovin se musí ukládat .
- NOUN 92: Pak je tu jediný problém , a sice uplatnit všech tisíc bodů .
dvou
- NUM 519: Kompletní informace pro drobného investora v LN na dvou stránkách
- ADJ 1: Izraelský premiér Jicchak Rabin včera prohlásil , že palestinský předák Jásir Arafat požádal o dvou až třítýdenní odklad , který by umožnil Palestincům připravit se na převzetí správy nad autonomními územími v pásmu Gazy a v Jerichu na západním břehu Jordánu .
12
- NUM 306: Přenosová rychlost : ( A 4 / sec ) 12
- ADV 1: B . Clemensha , D . Simonich , P . Batista měřili od r . 1972 do r . 1987 výšku sodíkové vrstvy a zjistili , že výška této vrstvy klesá v průměru o 50 m ( 12 m ) ročně .
tří
- NUM 239: Počet policistů by měl do dvou až tří let odpovídat potřebám policie .
- ADJ 3: Ubytování ve tří , čtyř a pětilůžkových pokojích s vlastním sociálním zařízením .
jedno
- NUM 152: Když jedno chybí , nepodaří se to .
- ADJ 1: Ke snížení úroků z depozit dochází u T - Kont ( o 1.5 - 2 % ) , u vkladových certifikátů s výjimkou jedno - a dvouměsíčních certifikátů ( o 0 , 2 až 2.5 % ) a u vkladů právnických osob a fyzických osob - podnikatelů na tři , šest a devět měsíců ( o 0.15 až 0.8 % ) .
jednou
- ADV 165: Až jednou . . .
- NUM 129: Finanční otázka je jednou stránkou věci , druhou je otázka technická .
čtyř
- NUM 100: Přítomni byli také zástupci čtyř bank .
- ADJ 1: Ubytování ve tří , čtyř a pětilůžkových pokojích s vlastním sociálním zařízením .
I
- CONJ 465: I velké firmy se specializují jen na několik málo teritorií .
- NUM 97: KAREL HAVLÍČEK BOROVSKÝ , Dílo I
- PROPN 62: Akademikem se zato stal známý teoretik antisemitismu I . Šafarevič .
- ADJ 19: S V . I . P . prostory však prý byla na obou stadionech spokojenost . . .
- PRON 6: Jen zřídkakdy Moby bere do svých rukou i hardcoreovou kytaru ( All That I Need Is To Be Loved ) .
- NOUN 1: V Soluně například stojí 0.11 karátový diamant ( barvy I , velmi dobrého až dobrého brusu a čistoty SI 1 ) včetně DPH 2700 korun ( do konce dubna ho pořídíte za 2400 korun ) .
osmi
- NUM 91: K účasti je letos přihlášeno třicet osm sborů z osmi zemí Evropy .
- ADJ 1: Podle názoru Tomáše Duba z ministerstva hospodářství představuje osmi až desetiprocentní odhad podílu šedé ekonomiky na HDP , který provedla ČNB , spíše spodní hranici reálného stavu .
půl
- NOUN 164: Při troše štěstí získáte za čtyři roky půl milionu z nájemného *
- NUM 64: Ty mají většinou do půl karátu .

Morphology

The form / lemma ratio of NUM is 1.031150 (the average of all parts of speech is 2.195970).

The 1st highest number of forms (10) was observed with the lemma “jeden”: jeden, jedna, jedno, jednoho, jednom, jednomu, jednou, jednu, jedné, jedním

The 2nd highest number of forms (8) was observed with the lemma “třetina”: třetin, třetina, třetinou, třetinu, třetiny, třetinách, třetinám, třetině

The 3rd highest number of forms (7) was observed with the lemma “čtvrtina”: čtvrtina, čtvrtinami, čtvrtinou, čtvrtinu, čtvrtiny, čtvrtinách, čtvrtině

NUM occurs with 10 features: cs-feat/NumType (41510; 100% instances), cs-feat/NumForm (41168; 99% instances), cs-feat/Number (11649; 28% instances), cs-feat/Case (11623; 28% instances), cs-feat/NumValue (8050; 19% instances), cs-feat/Gender (4759; 11% instances), cs-feat/Animacy (303; 1% instances), cs-feat/Foreign (29; 0% instances), cs-feat/NameType (20; 0% instances), cs-feat/Style (2; 0% instances)

NUM occurs with 25 feature-value pairs: Animacy=Anim, Animacy=Inan, Case=Acc, Case=Dat, Case=Gen, Case=Ins, Case=Loc, Case=Nom, Foreign=Foreign, Gender=Fem, Gender=Fem,Neut, Gender=Masc, Gender=Masc,Neut, Gender=Neut, NameType=Com, NumForm=Digit, NumForm=Roman, NumForm=Word, NumType=Card, NumType=Frac, NumValue=1,2,3, Number=Dual, Number=Plur, Number=Sing, Style=Arch

NUM occurs with 59 feature combinations. The most frequent feature combination is NumForm=Digit|NumType=Card (29484 tokens). Examples: 1, 2, 3, 4, 6, 5, 1992, 10, 1994, 1993

Relations

NUM nodes are attached to their parents using 21 different relations: cs-dep/nummod (19668; 47% instances), cs-dep/nummod:gov (7353; 18% instances), cs-dep/conj (4243; 10% instances), cs-dep/compound (2797; 7% instances), cs-dep/dep (1948; 5% instances), cs-dep/advmod (1880; 5% instances), cs-dep/root (1219; 3% instances), cs-dep/dobj (937; 2% instances), cs-dep/nsubj (743; 2% instances), cs-dep/appos (286; 1% instances), cs-dep/nmod (124; 0% instances), cs-dep/xcomp (81; 0% instances), cs-dep/iobj (61; 0% instances), cs-dep/nsubjpass (58; 0% instances), cs-dep/advcl (39; 0% instances), cs-dep/acl (28; 0% instances), cs-dep/ccomp (25; 0% instances), cs-dep/parataxis (10; 0% instances), cs-dep/advmod:emph (5; 0% instances), cs-dep/csubj (3; 0% instances), cs-dep/cc (2; 0% instances)

Parents of NUM nodes belong to 15 different parts of speech: NOUN (26036; 63% instances), NUM (6329; 15% instances), VERB (3765; 9% instances), PROPN (2561; 6% instances), ROOT (1219; 3% instances), ADJ (773; 2% instances), ADV (318; 1% instances), SYM (254; 1% instances), PRON (196; 0% instances), CONJ (28; 0% instances), PUNCT (24; 0% instances), DET (4; 0% instances), ADP (1; 0% instances), INTJ (1; 0% instances), PART (1; 0% instances)

24326 (59%) NUM nodes are leaves.

9736 (23%) NUM nodes have one child.

4008 (10%) NUM nodes have two children.

3440 (8%) NUM nodes have three or more children.

The highest child degree of a NUM node is 85.

Children of NUM nodes are attached using 27 different relations: cs-dep/punct (11568; 37% instances), cs-dep/nmod (4349; 14% instances), cs-dep/conj (3939; 12% instances), cs-dep/compound (2797; 9% instances), cs-dep/case (2110; 7% instances), cs-dep/advmod:emph (2020; 6% instances), cs-dep/cc (1359; 4% instances), cs-dep/dep (792; 3% instances), cs-dep/amod (616; 2% instances), cs-dep/cop (469; 1% instances), cs-dep/nsubj (390; 1% instances), cs-dep/mark (305; 1% instances), cs-dep/advmod (244; 1% instances), cs-dep/appos (244; 1% instances), cs-dep/nummod (99; 0% instances), cs-dep/parataxis (50; 0% instances), cs-dep/acl (46; 0% instances), cs-dep/csubj (33; 0% instances), cs-dep/xcomp (30; 0% instances), cs-dep/advcl (25; 0% instances), cs-dep/dobj (20; 0% instances), cs-dep/det:nummod (16; 0% instances), cs-dep/aux (9; 0% instances), cs-dep/neg (7; 0% instances), cs-dep/discourse (4; 0% instances), cs-dep/foreign (1; 0% instances), cs-dep/vocative (1; 0% instances)

Children of NUM nodes belong to 16 different parts of speech: PUNCT (11568; 37% instances), NUM (6329; 20% instances), NOUN (4435; 14% instances), ADP (2094; 7% instances), ADV (1496; 5% instances), CONJ (1238; 4% instances), SYM (922; 3% instances), PART (912; 3% instances), ADJ (784; 2% instances), VERB (715; 2% instances), PROPN (381; 1% instances), PRON (342; 1% instances), SCONJ (301; 1% instances), DET (16; 0% instances), AUX (9; 0% instances), INTJ (1; 0% instances)

NUM in other languages: [bg] [cs] [de] [el] [en] [es] [eu] [fa] [fi] [fr] [ga] [he] [hu] [it] [ja] [ko] [sv] [u]

NUM: numeral