NumType: numeral type
In Slovenian UD Treebank, NumType is a lexical feature of numerals and some adjectives that denote counting by numbers.
Card: cardinal number
Examples
- en, dva, tri “one, two, three”
- 1, 2, 3
- I, II, III
Ord: ordinal number
Examples
- prvi, drugi, tretji “first, second, third”
- 1., 2., 3.
- I., II., III.
Sets: number of sets of things
Numerals used to count sets of things or nouns that are pluralia tantum.
Examples
- enoj, dvoj, troj “one-fold, two-fold, three-fold”
Gen: generic numeral, i.e. a numeral that is neither of the above
Examples
- enojen, dvojen, trojen “single, double, triple”
Conversion from JOS
All numerals with Type=cardinal are converted to NumType=Card and all numerals with Type=ordinal are converted to NumType=Ord. Numerals with Type=pronominal are either converted to NumType=Card (lemmas en and eden) or to NumType=Ord (lemma drug). Numerals with Type=special are either converted to NumType=Sets (lemmas not ending in -en) or to NumType=Gen (lemmas ending in -en).
Note that other types of quantifying words have not been explicitly marked in JOS, so assigning these and other NumType values to other words or part-of-speech categories, such as adjectives (enkraten, dvakraten, trikraten), adverbs (enkrat, dvakrat, trikrat; prvič, drugič, tretjič), determiners (veliko, malo, nekaj, koliko) and nouns (tretjina, polovica, četrtina), remains for future work.
Treebank Statistics (UD_Slovenian)
This feature is universal.
It occurs with 4 different values: Card, Gen, Ord, Sets.
2241 tokens (2%) have a non-empty value of NumType.
623 types (2%) occur at least once with a non-empty value of NumType.
509 lemmas (3%) occur at least once with a non-empty value of NumType.
The feature is used with 2 part-of-speech tags: sl-pos/NUM (1927; 1% instances), sl-pos/ADJ (314; 0% instances).
NUM
1927 sl-pos/NUM tokens (100% of all NUM tokens) have a non-empty value of NumType.
The most frequent other feature values with which NUM and NumType co-occurred: Gender=EMPTY (1441; 75%), Number=EMPTY (1187; 62%), Case=EMPTY (1187; 62%), NumForm=Digit (1166; 61%).
NUM tokens may have the following values of NumType:
Card(1665; 86% of non-emptyNumType): eno, tri, dveh, dva, ena, eden, tisoč, štiri, štirih, dveOrd(257; 13% of non-emptyNumType): 1., 20., 18., 9., 14., 17., 19., 3., 6., 15.Sets(5; 0% of non-emptyNumType): dvoje, tisočerih, troje
NumType seems to be lexical feature of NUM. 100% lemmas (485) occur only with one value of NumType.
ADJ
314 sl-pos/ADJ tokens (2% of all ADJ tokens) have a non-empty value of NumType.
The most frequent other feature values with which ADJ and NumType co-occurred: Definite=EMPTY (314; 100%), Degree=EMPTY (314; 100%), VerbForm=EMPTY (314; 100%), Number=Sing (251; 80%).
ADJ tokens may have the following values of NumType:
Gen(4; 1% of non-emptyNumType): dvojnega, dvojnim, dvojno, trojnimOrd(310; 99% of non-emptyNumType): prvi, prva, prvo, prve, prvem, prvih, prvega, tretji, tretje, prvimEMPTY(14713): drugi, mogoče, druge, sam, novo, drugih, nove, različnih, slovenski, veliko
NumType seems to be lexical feature of ADJ. 100% lemmas (24) occur only with one value of NumType.
Relations with Agreement in NumType
The 10 most frequent relations where parent and child node agree in NumType:
NUM –[conj]–> NUM (94; 100%),
NUM –[compound]–> NUM (31; 91%).
Treebank Statistics (UD_Slovenian-SST)
This feature is universal.
It occurs with 4 different values: Card, Gen, Ord, Sets.
586 tokens (2%) have a non-empty value of NumType.
121 types (2%) occur at least once with a non-empty value of NumType.
76 lemmas (2%) occur at least once with a non-empty value of NumType.
The feature is used with 2 part-of-speech tags: sl-pos/NUM (499; 2% instances), sl-pos/ADJ (87; 0% instances).
NUM
499 sl-pos/NUM tokens (100% of all NUM tokens) have a non-empty value of NumType.
The most frequent other feature values with which NUM and NumType co-occurred: NumForm=Word (499; 100%), Number=Plur (287; 58%).
NUM tokens may have the following values of NumType:
Card(498; 100% of non-emptyNumType): eno, dva, en, ena, tri, tisoč, dvajset, dve, pet, enegaSets(1; 0% of non-emptyNumType): dvoje
NumType seems to be lexical feature of NUM. 100% lemmas (53) occur only with one value of NumType.
ADJ
87 sl-pos/ADJ tokens (5% of all ADJ tokens) have a non-empty value of NumType.
The most frequent other feature values with which ADJ and NumType co-occurred: VerbForm=EMPTY (87; 100%), Degree=EMPTY (87; 100%), Definite=EMPTY (85; 98%), Number=Sing (82; 94%).
ADJ tokens may have the following values of NumType:
Gen(3; 3% of non-emptyNumType): dvojni, dvojno, trojniOrd(84; 97% of non-emptyNumType): prvi, prvo, prva, tretjo, prvega, devetindvajseti, peta, tretja, tretji, tridesetiEMPTY(1578): dobro, drugo, drugi, dober, zanimivo, druga, drugega, glavnem, lep, lepa
NumType seems to be lexical feature of ADJ. 100% lemmas (23) occur only with one value of NumType.
Relations with Agreement in NumType
The 10 most frequent relations where parent and child node agree in NumType:
NUM –[compound]–> NUM (48; 100%),
NUM –[conj]–> NUM (29; 100%),
ADJ –[conj]–> ADJ (5; 56%),
NUM –[reparandum]–> NUM (4; 100%),
NUM –[mwe]–> NUM (4; 100%),
ADJ –[reparandum]–> ADJ (2; 100%),
NUM –[nummod]–> NUM (1; 100%),
NUM –[advmod]–> NUM (1; 100%).
NumType in other languages: [bg] [cs] [de] [el] [en] [es] [eu] [fa] [fi] [fr] [ga] [he] [hu] [it] [ja] [ko] [sv] [u]