NumType
: numeral type
Some languages (especially Slavic) have a complex system of numerals. For example, in the school grammar of Czech, the main part of speech is “numeral”, it includes almost everything where counting is involved and there are various subtypes. It also includes interrogative, relative, indefinite and demonstrative words referring to numbers (words like kolik / how many, tolik / so many, několik / some, a few), so at the same time we may have a non-empty value of PronType. (In English, these words are called quantifiers and they are considered a subgroup of determiners.)
From the syntactic point of view, some numtypes behave like adjectives
and some behave like adverbs. We tag them u-pos/ADJ and
u-pos/ADV respectively. Thus the NumType
feature applies to
several different parts of speech:
- u-pos/NUM: cardinal numerals
- u-pos/DET: quantifiers
- u-pos/ADJ: definite adjectival, e.g. ordinal numerals
- u-pos/ADV: adverbial (e.g. ordinal and multiplicative) numerals, both definite and pronominal
Card
: cardinal number or corresponding interrogative / relative / indefinite / demonstrative word
Note that in some Indo-European languages there is a fuzzy borderline between numerals and nouns for thousand, million and billion.
Examples
- [en] one, two, three
- [cs] jeden, dva, tři “one, two, three”; kolik “how many”; několik “some”; tolik “so many”; mnoho “many”; málo “few”
Ord
: ordinal number or corresponding interrogative / relative / indefinite / demonstrative word
This is a subtype of adjective or (in some languages) of adverb.
Examples
- [en] first, second, third;
- [cs] adjectival: první “first”; druhý “second”, třetí “third”; kolikátý lit. how manieth “which rank”; několikátý “some rank”; tolikátý “this/that rank”
- [cs] adverbial: poprvé “for the first time”; podruhé “for the second time”; potřetí “for the third time”; pokolikáté “for which time”, poněkolikáté “for x-th time”, potolikáté
Mult
: multiplicative numeral or corresponding interrogative / relative / indefinite / demonstrative word
This is subtype of adverb.
Examples
- [cs] jednou “once”; dvakrát “twice”; třikrát “three times”; kolikrát “how many times”, několikrát “several times”; tolikrát “so many times”
Frac
: fraction
This is a subtype of cardinal numbers, occasionally distinguished in corpora. It may denote a fraction or just the denominator of the fraction. In various languages these words may behave morphologically and syntactically as nouns or ordinal numerals.
Examples
- [en] three-quarters
- [cs] půl / polovina “half”; třetina “one third”; čtvrt / čtvrtina “quarter”
Sets
: number of sets of things
Morphologically distinct class of numerals used to count sets of things, or nouns that are pluralia tantum.
Examples
- [cs] dvoje / troje boty “two / three [pairs of] shoes”; as opposed to normal cardinal numbers: dvě / tři boty “two / three shoes”
Dist
: distributive numeral
Used to express that the same quantity is distributed to each member in a set of targets.
Examples
- [hu] három-három in gyermekenként három-három ezer forinttal “three thousand forint per child”
Range
: range of values
This could be considered a subtype of cardinal numbers, occasionally distinguished in corpora.
Examples
- [en] two-five “two to five” (provided tokenization leaves it as one token.)
Gen
: generic numeral, i.e. a numeral that is neither of the above
Czech school grammar distinguishes this subclass, which is why it
appears in Czech tagsets. Other Slavic languages may have similar
words but their traditional classification may differ. (Note that
“generic numerals” in Czech grammar also include the Sets
subclass
mentioned above.)
Examples
- [cs] čtvero, patero, desatero (specific forms of four, five, ten; they are morphologically, syntactically and stylistically distinct from the default forms čtyři, pět, deset); dvojí, trojí, čtverý (twofold, threefold, fourfold; these are morphologically and syntactically adjectives).
NumType in other languages: [bg] [cs] [de] [el] [en] [es] [eu] [fa] [fi] [fr] [ga] [he] [hu] [it] [ja] [ko] [sv] [u]