home edit page issue tracker

This page pertains to UD version 2.

Treebank Statistics: UD_Indonesian-CSUI: POS Tags: NUM

There are 684 NUM lemmas (16%), 688 NUM types (15%) and 1857 NUM tokens (7%). Out of 17 observed tags, the rank of NUM is: 3 in number of lemmas, 4 in number of types and 6 in number of tokens.

The 10 most frequent NUM lemmas: 2007, triliun, miliar, 2006, juta, 2008, satu, dua, 30, 10

The 10 most frequent NUM types: 2007, triliun, miliar, 2006, juta, 2008, satu, dua, 30, 10

The 10 most frequent ambiguous lemmas: 2006 (NUM 80, PROPN 2), kedua (NUM 11, ADJ 8), 4 (NUM 10, CCONJ 1), 8 (NUM 8, PROPN 2), 33 (NUM 1, PROPN 1), kelima (ADJ 2, NUM 1), paruh (NOUN 1, NUM 1)

The 10 most frequent ambiguous types: 2006 (NUM 80, PROPN 2), kedua (ADJ 8, NUM 8), 4 (NUM 10, CCONJ 1), 8 (NUM 8, PROPN 2), 33 (NUM 1, PROPN 1), kelima (ADJ 2, NUM 1)

Morphology

The form / lemma ratio of NUM is 1.005848 (the average of all parts of speech is 1.085880).

The 1st highest number of forms (2) was observed with the lemma “dua”: dua, dua-duanya.

The 2nd highest number of forms (2) was observed with the lemma “juta”: juta, jutaan.

The 3rd highest number of forms (2) was observed with the lemma “puluh”: puluh, puluhan.

NUM occurs with 2 features: NumType (1857; 100% instances), PronType (1; 0% instances)

NUM occurs with 2 feature-value pairs: NumType=Card, PronType=Tot

NUM occurs with 2 feature combinations. The most frequent feature combination is NumType=Card (1856 tokens). Examples: 2007, triliun, miliar, 2006, juta, 2008, satu, dua, 30, 10

Relations

NUM nodes are attached to their parents using 13 different relations: nummod (1315; 71% instances), flat (383; 21% instances), nmod (41; 2% instances), obl (37; 2% instances), obl:tmod (24; 1% instances), obj (22; 1% instances), conj (14; 1% instances), nmod:tmod (12; 1% instances), nsubj (3; 0% instances), root (3; 0% instances), acl (1; 0% instances), advcl (1; 0% instances), nsubj:pass (1; 0% instances)

Parents of NUM nodes belong to 8 different parts of speech: NOUN (1213; 65% instances), NUM (401; 22% instances), PROPN (137; 7% instances), VERB (83; 4% instances), SYM (12; 1% instances), ADJ (4; 0% instances), X (4; 0% instances), (3; 0% instances)

1293 (70%) NUM nodes are leaves.

450 (24%) NUM nodes have one child.

75 (4%) NUM nodes have two children.

39 (2%) NUM nodes have three or more children.

The highest child degree of a NUM node is 10.

Children of NUM nodes are attached using 17 different relations: flat (410; 56% instances), case (111; 15% instances), punct (57; 8% instances), advmod (46; 6% instances), nmod (35; 5% instances), nummod (17; 2% instances), acl:relcl (15; 2% instances), cc (14; 2% instances), conj (13; 2% instances), nmod:tmod (7; 1% instances), det (3; 0% instances), nmod:lmod (3; 0% instances), acl (2; 0% instances), appos (1; 0% instances), mark (1; 0% instances), nsubj (1; 0% instances), parataxis (1; 0% instances)

Children of NUM nodes belong to 11 different parts of speech: NUM (401; 54% instances), ADP (111; 15% instances), PUNCT (57; 8% instances), NOUN (47; 6% instances), PROPN (40; 5% instances), ADV (34; 5% instances), ADJ (22; 3% instances), CCONJ (15; 2% instances), VERB (6; 1% instances), DET (3; 0% instances), SCONJ (1; 0% instances)