Treebank Statistics: UD_Finnish-OOD: POS Tags: NUM
There are 186 NUM lemmas (3%), 209 NUM types (3%) and 381 NUM tokens (2%).
Out of 15 observed tags, the rank of NUM is: 6 in number of lemmas, 7 in number of types and 10 in number of tokens.
The 10 most frequent NUM lemmas: yksi, 2, 40, 20, kaksi, kolme, 100, 5, 10, 60
The 10 most frequent NUM types: 2, 40, yksi, 20, 5, 10, 100, 60, 90, kaksi
The 10 most frequent ambiguous lemmas: yksi (NUM 18, PRON 1), pari (NUM 8, NOUN 1), puoli (NOUN 14, NUM 2), toinen (ADJ 12, PRON 12, NUM 1)
The 10 most frequent ambiguous types: yksi (NUM 10, PRON 1), 8 (NUM 3, PUNCT 1)
- yksi
- 8
Morphology
The form / lemma ratio of NUM is 1.123656 (the average of all parts of speech is 1.565977).
The 1st highest number of forms (5) was observed with the lemma “sata”: sadan, sadasta, sata, satoja, satojen.
The 2nd highest number of forms (5) was observed with the lemma “yksi”: yhdellä, yhden, yhdestä, yhtenä, yksi.
The 3rd highest number of forms (4) was observed with the lemma “kolme”: kolme, kolmeen, kolmen, kolmessa.
NUM occurs with 4 features: NumType (339; 89% instances), Case (83; 22% instances), Number (82; 22% instances), Typo (1; 0% instances)
NUM occurs with 13 feature-value pairs: Case=Abl, Case=Ade, Case=Ela, Case=Ess, Case=Gen, Case=Ill, Case=Ine, Case=Nom, Case=Par, NumType=Card, Number=Plur, Number=Sing, Typo=Yes
NUM occurs with 15 feature combinations.
The most frequent feature combination is NumType=Card (259 tokens).
Examples: 2, 40, 20, 5, 10, 100, 60, 90, 2014, yksi
Relations
NUM nodes are attached to their parents using 16 different relations: nummod (218; 57% instances), obl (53; 14% instances), root (27; 7% instances), nmod (16; 4% instances), flat (15; 4% instances), parataxis (13; 3% instances), nmod:poss (10; 3% instances), conj (8; 2% instances), obj (4; 1% instances), orphan (4; 1% instances), advcl (3; 1% instances), appos (3; 1% instances), flat:name (2; 1% instances), nsubj (2; 1% instances), nsubj:cop (2; 1% instances), ccomp (1; 0% instances)
Parents of NUM nodes belong to 11 different parts of speech: NOUN (223; 59% instances), VERB (57; 15% instances), (27; 7% instances), SYM (26; 7% instances), PROPN (16; 4% instances), ADJ (11; 3% instances), ADV (6; 2% instances), NUM (6; 2% instances), X (6; 2% instances), PRON (2; 1% instances), AUX (1; 0% instances)
270 (71%) NUM nodes are leaves.
53 (14%) NUM nodes have one child.
32 (8%) NUM nodes have two children.
26 (7%) NUM nodes have three or more children.
The highest child degree of a NUM node is 7.
Children of NUM nodes are attached using 19 different relations: punct (56; 26% instances), advmod (43; 20% instances), nsubj:cop (35; 16% instances), case (26; 12% instances), nmod (8; 4% instances), conj (7; 3% instances), discourse (7; 3% instances), cop (6; 3% instances), obl (6; 3% instances), advcl (4; 2% instances), appos (4; 2% instances), mark (4; 2% instances), parataxis (4; 2% instances), cc (3; 1% instances), orphan (2; 1% instances), aux (1; 0% instances), det (1; 0% instances), nmod:poss (1; 0% instances), nummod (1; 0% instances)
Children of NUM nodes belong to 14 different parts of speech: PUNCT (56; 26% instances), NOUN (47; 21% instances), ADV (39; 18% instances), ADP (25; 11% instances), SYM (10; 5% instances), AUX (7; 3% instances), ADJ (6; 3% instances), NUM (6; 3% instances), PRON (6; 3% instances), VERB (5; 2% instances), SCONJ (4; 2% instances), CCONJ (3; 1% instances), PROPN (3; 1% instances), X (2; 1% instances)