nummod
: numeric modifier
A numeric modifier of a noun is any number phrase that serves to modify the meaning of the noun with a quantity.
Jan snědl tři řízky . \n Jan ate three steaks .
nummod(řízky, tři)
nummod(steaks, three)
Agreement and government with Czech quantifiers
The morphological and syntactic behavior of Czech numerals is a complex matter. Small cardinal numerals jeden “one”, dva “two”, tři “three” and čtyři “four” agree with the counted noun in cs-feat/Case (jeden also agrees in cs-feat/Gender and cs-feat/Number; dva also agrees in cs-feat/Gender). They behave as if they modify the counted noun; they are similar to adjectives in this respect. Examples:
- Jeden muž spal, dva muži hráli karty. “One man slept, two men played cards.”
- Jedna žena spala, dvě ženy hrály karty. “One woman slept, two women played cards.”
- Jedno kotě spalo, dvě koťata si hrála. “One kitten slept, two kittens played.”
In PDT, these numerals are attached to their counted nouns as Atr
(attribute).
It is straightforward to convert such dependencies to nummod
:
Jedno kotě spalo . \n One kitten slept .
nummod(kotě, Jedno)
nsubj(spalo, kotě)
punct(spalo, .-4)
nummod(kitten, One)
nsubj(slept, kitten)
punct(slept, .-9)
Larger cardinals behave differently. They require that the counted noun be in the genitive case; this indicates that they actually govern the noun. Such constructions are parallel to nouns modified by other noun phrases in genitive. The whole phrase (numeral + counted noun) behaves as a noun phrase in neuter gender and singular number (which is important for subject-verb agreement).
- Pět mužů hrálo karty. “Five men played cards.”
- Skupina mužů hrála karty. “A group of men played cards.”
In PDT, these numerals are analyzed as heads of the counted nouns, which are attached to the numeral as Atr
:
# This is not UD, it is Prague Dependency Treebank, and we want to clearly distinguish it from the UD examples.
# visual-style nodes yellow
# visual-style arcs blue
1 Pět pět NUM _ Case=Nom 3 Sb _ Five
2 mužů muž NOUN _ Case=Gen|Gender=Masc|Number=Plur 1 Atr _ men
3 hrálo hrát VERB _ Gender=Neut|Number=Sing 0 Pred _ played
4 karty karta NOUN _ Case=Acc|Gender=Fem|Number=Plur 3 Obj _ cards
5 . . PUNCT _ _ 0 AuxK _ .
There are both advantages and drawbacks to this solution. On the one hand, it reflects well the agreement in case, gender and number. On the other hand, it is confusing that there are two different analyses of counted noun constructions, depending on the numeric value.
Moreover, the numeral does not govern the noun in all morphological cases. The following table shows the case of the whole phrase (numeral + noun; first column) and the consequences for the case of the parts (note that these numerals have only two distinct morphological forms, resulting in homonymy).
Phrase Case | Example | Numeral Case | Noun Case |
---|---|---|---|
Nom | pět mužů | Nom | Gen |
Gen | pěti mužů | Gen | Gen |
Dat | pěti mužům | Dat | Dat |
Acc | pět mužů | Acc | Gen |
Voc | pět mužů | Voc | Gen |
Loc | pěti mužích | Loc | Loc |
Ins | pěti muži | Ins | Ins |
We can say that the noun has the case of the whole phrase if it is dative, locative or instrumental. The numeral then agrees with the noun in case. The numeral forces the noun to the genitive case if the whole phrase is nominative, accusative or vocative (but the vocative usage is rather hypothetical). In genitive, the noun and the numeral agree with each other; but note that the numeral uses its inflected form, as in the other cases where it agrees with the noun.
In PDT, the genitive, dative, locative and instrumental cases are analyzed in parallel to the low-value numerals, i.e. the noun governs the numeral:
# This is not UD, it is Prague Dependency Treebank, and we want to clearly distinguish it from the UD examples.
# visual-style nodes yellow
# visual-style arcs blue
1 Hrál hrát VERB _ Gender=Masc|Number=Sing 0 Pred _ He-played
2 karty karta NOUN _ Case=Acc|Gender=Fem|Number=Plur 1 Obj _ cards
3 s s ADP _ _ 1 AuxP _ with
4 pěti pět NUM _ Case=Ins 6 Atr _ five
5 dalšími další ADJ _ Case=Ins|Gender=Masc|Number=Plur 6 Atr _ other
6 muži muž NOUN _ Case=Ins|Gender=Masc|Number=Plur 3 Obj _ men
7 . . PUNCT _ _ 0 AuxK _ .
High-value numerals where the lowest-order digit is more than zero and less than five (e.g. 21, 22, 23, 24) may behave both ways:
- dvacet dva muži (noun governs numeral)
- dvacet dva mužů (numeral governs noun)
- dvaadvacet mužů (alternative form; it does not end with dva, thus the numeral governs the noun)
- 22 muži (assuming the reader will pronounce 22 as dvacet dva, not dvaadvacet)
- 22 mužů (pronounced either way)
Pronominal quantifiers behave as high-value numerals and govern the quantifed nouns:
- Kolik mužů hrálo karty? “How many men played cards?”
- Několik (mnoho, málo) mužů hrálo karty. “Several (many, few) men played cards.”
- Tolik mužů hrát karty jsem ještě neviděl. “I have never seen so many men playing cards.”
# This is not UD, it is Prague Dependency Treebank, and we want to clearly distinguish it from the UD examples.
# visual-style nodes yellow
# visual-style arcs blue
1 Kolik kolik NUM _ Case=Nom 3 Sb _ How-many
2 mužů muž NOUN _ Case=Gen|Gender=Masc|Number=Plur 1 Atr _ men
3 hrálo hrát VERB _ Gender=Neut|Number=Sing 0 Pred _ played
4 karty karta NOUN _ Case=Acc|Gender=Fem|Number=Plur 3 Obj _ cards
5 ? ? PUNCT _ _ 0 AuxK _ ?
The UD conversion of the PDT data unifies analyses of counted noun phrases and uses a structure that is parallel among all the above cases, and also with universal dependencies in other languages. The counted noun is always the head and the numeral is always attached as its modifier. Nevertheless, we use different relation labels to mark situations where the numeral (or quantifier) actually governs the morphological case of the noun. There are four labels used:
Numeric | Pronominal | |
Noun governs | nummod | det:nummod |
Numeral governs | nummod:gov | det:numgov |
Tři muži hráli karty . \n Three men played cards .
nummod(muži, Tři)
nsubj(hráli, muži)
dobj(hráli, karty)
punct(hráli, .-5)
nummod(men, Three)
nsubj(played, men)
dobj(played, cards)
punct(played, .-11)
Pět mužů hrálo karty . \n Five men played cards .
nummod:gov(mužů, Pět)
nsubj(hrálo, mužů)
dobj(hrálo, karty)
punct(hrálo, .-5)
nummod:gov(men, Five)
nsubj(played, men)
dobj(played, cards)
punct(played, .-11)
Kolik mužů hrálo karty ? \n How-many men played cards ?
det:numgov(mužů, Kolik)
nsubj(hrálo, mužů)
dobj(hrálo, karty)
punct(hrálo, ?-5)
det:numgov(men, How-many)
nsubj(played, men)
dobj(played, cards)
punct(played, ?-11)
Hrál jsem karty s pěti muži . \n Played I-have cards with five men .
aux(Hrál, jsem)
dobj(Hrál, karty)
iobj(Hrál, muži)
case(muži, s)
nummod(muži, pěti)
punct(Hrál, .-7)
aux(Played, I-have)
dobj(Played, cards)
iobj(Played, men)
case(men, with)
nummod(men, five)
punct(Played, .-15)
Nepamatuji si , s kolika muži jsem hrál karty . \n I-do-not-remember myself , with how-many men I-have played cards .
ccomp(Nepamatuji, hrál)
compound:reflex(Nepamatuji, si)
punct(hrál, ,-3)
aux(hrál, jsem)
dobj(hrál, karty)
iobj(hrál, muži)
case(muži, s)
det:nummod(muži, kolika)
punct(Nepamatuji, .-10)
ccomp(I-do-not-remember, played)
compound:reflex(I-do-not-remember, myself)
punct(played, ,-14)
aux(played, I-have)
dobj(played, cards)
iobj(played, men)
case(men, with)
det:nummod(men, how-many)
punct(I-do-not-remember, .-21)
Additional remarks
In PDT the words milión “million”, miliarda “billion” and higher are usually tagged as nouns, not as numerals. In the typical case, the million is in genitive, it is preceded by a smaller number, and it is not followed by smaller numerals (as it is in million five hundred thousand). It is followed by the counted noun. Thus the following examples receive parallel analyses:
50 miliónů korun \n 50 millions of-crowns
nummod:gov(miliónů, 50-1)
nummod:gov(millions, 50-5)
nmod(miliónů, korun)
nmod(millions, of-crowns)
50 pytlů bankovek \n 50 sacks of-bills
nummod:gov(pytlů, 50-1)
nummod:gov(sacks, 50-5)
nmod(pytlů, bankovek)
nmod(sacks, of-bills)
On the other hand the word tisíc “thousand” may be a noun (na náměstí byly tisíce lidí “there were thousands of people in the square”) or a numeral:
nanejvýš 50 tisíc korun \n at-most 50 thousand crowns
advmod:emph(korun, nanejvýš)
nummod:gov(korun, tisíc)
compound(tisíc, 50-2)
advmod:emph(crowns, at-most)
nummod:gov(crowns, thousand)
compound(thousand, 50-7)
Note that the two numeral words in the above example are joined using the compound relation. Also note that the intensifier nanejvýš is attached to the head of the phrase (korun) and not to the number. This is in accord both with the UD guidelines and with the original PDT annotation of agreeing numerals (e.g. jen čtyři firmy, jen několik procent).
Similarly there may be other nodes (such as punctuation) that are attached to the head of the phrase and they are related to the whole phrase rather than directly to the head noun:
( 9 dní ) \n ( 9 days )
punct(dní, (-1)
nummod:gov(dní, 9-2)
punct(dní, )-4)
punct(days, (-6)
nummod:gov(days, 9-7)
punct(days, )-9)
5 minut včetně seřízení \n 5 minutes including adjustment
nummod:gov(minut, 5-1)
nmod(minut, seřízení)
case(seřízení, včetně)
nummod:gov(minutes, 5-6)
nmod(minutes, adjustment)
case(adjustment, including)
Dates
# This is not UD, it is Prague Dependency Treebank, and we want to clearly distinguish it from the UD examples.
# visual-style nodes yellow
# visual-style arcs blue
1 Ředitel ředitel NOUN _ _ 2 Sb _ The-director
2 navrhl navrhnout VERB _ _ 0 Pred _ proposed
3 zrušit zrušit VERB _ _ 2 Obj _ to-disband
4 profesionální profesionální ADJ _ _ 5 Atr _ the-professional
5 scénu scéna NOUN _ _ 3 Obj _ scene
6 k k ADP _ _ 3 AuxP _ towards
7 31 31 NUM _ _ 9 Atr _ the-31
8 . . PUNCT _ _ 7 AuxG _ th
9 12 12 NUM _ _ 6 Adv _ December
10 . . PUNCT _ _ 9 AuxG _ .
Ředitel navrhl zrušit profesionální scénu k 31 . 12 . \n Director proposed to-disband professional scene towards 31 st December .
advmod(zrušit, 12)
case(12, k)
punct(12, .-10)
nummod(12, 31-7)
punct(31-7, .-8)
advmod(to-disband, December)
case(December, towards)
punct(December, .-21)
nummod(December, 31-18)
punct(31-18, st)
Numerals expressed using digits are labeled nummod
even if they represent ordinal numerals,
which would be labeled amod
:
# This is not UD, it is Prague Dependency Treebank, and we want to clearly distinguish it from the UD examples.
# visual-style nodes yellow
# visual-style arcs blue
1 Letošní letošní ADJ _ _ 2 Atr _ This-year's
2 veletrh veletrh NOUN _ _ 4 Sb _ fair
3 se se PRON _ _ 4 AuxR _ itself
4 uskuteční uskutečnit VERB _ _ 0 Pred _ will-take-place
5 od od ADP _ _ 4 AuxP _ from
6 9 9 NUM _ _ 5 ExD _ 9
7 . . PUNCT _ _ 6 AuxG _ th
8 do do ADP _ _ 4 AuxP _ to
9 12 12 NUM _ _ 11 Atr _ 12
10 . . PUNCT _ _ 9 AuxG _ th
11 března březen NOUN _ _ 8 Adv _ March
12 . . PUNCT _ _ 0 AuxK _ .
Letošní veletrh se uskuteční od 9 . do 12 . března . \n This-year's fair itself will-take-place from 9 th to 12 th March .
advmod(uskuteční, března)
case(března, do)
nummod(března, 12-9)
remnant(12-9, 9-6)
remnant(do, od)
advmod(will-take-place, March)
case(March, to)
nummod(March, 12-22)
remnant(12-22, 9-19)
remnant(to, from)
Numbered objects
House number in address is attached as nummod
to the name of the street:
v budově Na poříčí 12 \n in the-building Na poříčí 12
nmod(budově, poříčí-4)
case(poříčí-4, Na-3)
nummod(poříčí-4, 12-5)
nmod(the-building, poříčí-10)
case(poříčí-10, Na-9)
nummod(poříčí-10, 12-11)
Treebank Statistics (UD_Czech)
This relation is universal.
There are 1 language-specific subtypes of nummod
: nummod:gov.
19668 nodes (1%) are attached to their parents as nummod
.
11411 instances of nummod
(58%) are right-to-left (child precedes parent).
Average distance between parent and child is 1.57982509660362.
The following 11 pairs of parts of speech are connected with nummod
: cs-pos/NOUN-cs-pos/NUM (17449; 89% instances), cs-pos/PROPN-cs-pos/NUM (1624; 8% instances), cs-pos/ADJ-cs-pos/NUM (260; 1% instances), cs-pos/SYM-cs-pos/NUM (152; 1% instances), cs-pos/NUM-cs-pos/NUM (99; 1% instances), cs-pos/PRON-cs-pos/NUM (30; 0% instances), cs-pos/CONJ-cs-pos/NUM (28; 0% instances), cs-pos/PUNCT-cs-pos/NUM (10; 0% instances), cs-pos/VERB-cs-pos/NUM (10; 0% instances), cs-pos/ADV-cs-pos/NUM (5; 0% instances), cs-pos/INTJ-cs-pos/NUM (1; 0% instances).
# visual-style 3 bgColor:blue
# visual-style 3 fgColor:white
# visual-style 1 bgColor:blue
# visual-style 1 fgColor:white
# visual-style 1 3 nummod color:blue
1 Obrázek obrázek NOUN NNIS1-----A---- Animacy=Inan|Case=Nom|Gender=Masc|Negative=Pos|Number=Sing 0 root _ SpaceAfter=No
2 : : PUNCT Z:------------- _ 3 punct _ _
3 3 3 NUM C=------------- NumForm=Digit|NumType=Card 1 nummod _ _
# visual-style 9 bgColor:blue
# visual-style 9 fgColor:white
# visual-style 7 bgColor:blue
# visual-style 7 fgColor:white
# visual-style 7 9 nummod color:blue
1 Výrobce výrobce NOUN NNMS1-----A---- Animacy=Anim|Case=Nom|Gender=Masc|Negative=Pos|Number=Sing 0 root _ _
2 - - PUNCT Z:------------- _ 1 punct _ _
3 typ typ NOUN NNIS1-----A---- Animacy=Inan|Case=Nom|Gender=Masc|Negative=Pos|Number=Sing 1 conj _ SpaceAfter=No
4 : : PUNCT Z:------------- _ 5 punct _ _
5 PANASONIC Panasonic PROPN NNIS1-----A---- Animacy=Inan|Case=Nom|Gender=Masc|NameType=Com,Pro|Negative=Pos|Number=Sing 3 nmod _ _
6 PANAFAX Panafax PROPN NNIS1-----A---- Animacy=Inan|Case=Nom|Gender=Masc|NameType=Pro|Negative=Pos|Number=Sing 5 nmod _ _
7 UF UF PROPN NNXXX-----A---8 Abbr=Yes|NameType=Pro|Negative=Pos 6 nmod _ SpaceAfter=No|LId=UF-98
8 - - PUNCT Z:------------- _ 9 punct _ SpaceAfter=No
9 311 311 NUM C=------------- NumForm=Digit|NumType=Card 7 nummod _ _
# visual-style 8 bgColor:blue
# visual-style 8 fgColor:white
# visual-style 9 bgColor:blue
# visual-style 9 fgColor:white
# visual-style 9 8 nummod color:blue
1 The the ADJ AAXXX----1A---- Degree=Pos|Foreign=Foreign|Negative=Pos 11 foreign _ LId=the-1|LGloss=(obv._souč._anglických_názvů,_urč._člen)
2 Black Black ADJ AAXXX----1A---- Degree=Pos|Foreign=Foreign|NameType=Com,Oth|Negative=Pos 11 foreign _ _
3 Box Box ADJ AAXXX----1A---- Degree=Pos|Foreign=Foreign|NameType=Com,Oth|Negative=Pos 11 foreign _ _
4 Summer Summer ADJ AAXXX----1A---- Degree=Pos|Foreign=Foreign|NameType=Oth|Negative=Pos 11 foreign _ _
5 Festival Festival PROPN NNIXX-----A---- Animacy=Inan|Foreign=Foreign|Gender=Masc|NameType=Oth|Negative=Pos 11 foreign _ _
6 of of ADP RR--X---------- AdpType=Prep|Foreign=Foreign 11 foreign _ LId=of-1|LGloss=(obv._souč._anglických_názvů,_předl._2._p.)
7 Czech Czech ADJ AAXXX----1A---- Degree=Pos|Foreign=Foreign|Negative=Pos 11 foreign _ LId=Czech-2
8 20 20 NUM C=------------- NumForm=Digit|NumType=Card 9 nummod _ SpaceAfter=No
9 th th ADJ AAXXX----1A---- Degree=Pos|Foreign=Foreign|Negative=Pos 11 foreign _ LId=th-2
10 Century Century ADJ AAXXX----1A---- Degree=Pos|Foreign=Foreign|NameType=Oth|Negative=Pos 11 foreign _ _
11 Plays Plays PROPN NNFPX-----A---- Foreign=Foreign|Gender=Fem|NameType=Oth|Negative=Pos|Number=Plur 0 root _ SpaceAfter=No
12 - - PUNCT Z:------------- _ 11 punct _ _
nummod in other languages: [bg] [cs] [de] [el] [en] [es] [eu] [fa] [fi] [fr] [ga] [he] [hu] [it] [ja] [ko] [sv] [u]