home edit page issue tracker

This page pertains to UD version 2.

Treebank Statistics: UD_Indonesian-GSD: POS Tags: NUM

There are 891 NUM lemmas (4%), 895 NUM types (4%) and 4383 NUM tokens (4%). Out of 16 observed tags, the rank of NUM is: 5 in number of lemmas, 5 in number of types and 9 in number of tokens.

The 10 most frequent NUM lemmas: dua, pertama, kedua, 1, satu, 2, 3, 5, tiga, 4

The 10 most frequent NUM types: dua, pertama, kedua, 1, satu, 2, 3, 5, tiga, 4

The 10 most frequent ambiguous lemmas: dua (NUM 144, PROPN 3), pertama (NUM 121, ADJ 68, PROPN 4, NOUN 1, SCONJ 1), kedua (NUM 109, DET 16, NOUN 13, PROPN 9), 1 (NUM 100, PROPN 12), satu (DET 227, NUM 98, NOUN 23, PROPN 7, ADJ 1), 2 (NUM 86, PROPN 7, DET 1), 3 (NUM 79, PROPN 10), 5 (NUM 78, PROPN 1), tiga (NUM 76, PROPN 9), ke (ADP 359, NUM 63, DET 9, VERB 1, X 1)

The 10 most frequent ambiguous types: pertama (NUM 106, ADJ 62), kedua (NUM 97, DET 15), 1 (NUM 100, PROPN 12), satu (DET 209, NUM 92, NOUN 20, ADJ 1), 2 (NUM 86, PROPN 7, DET 1), 3 (NUM 79, PROPN 10), 5 (NUM 78, PROPN 1), tiga (NUM 75, PROPN 1), ke (ADP 356, NUM 62, DET 9, VERB 1, X 1), 6 (NUM 51, PROPN 1)

Morphology

The form / lemma ratio of NUM is 1.004489 (the average of all parts of speech is 1.045328).

The 1st highest number of forms (2) was observed with the lemma “kedua”: kedua, keduanya.

The 2nd highest number of forms (2) was observed with the lemma “keenam”: keenam, keenamnya.

The 3rd highest number of forms (2) was observed with the lemma “pertama”: pertama, pertamanya.

NUM occurs with 6 features: NumType (4383; 100% instances), Number (28; 1% instances), Number[psor] (21; 0% instances), Person[psor] (21; 0% instances), Degree (5; 0% instances), Voice (4; 0% instances)

NUM occurs with 6 feature-value pairs: Degree=Pos, NumType=Card, Number=Sing, Number[psor]=Sing, Person[psor]=3, Voice=Act

NUM occurs with 5 feature combinations. The most frequent feature combination is NumType=Card (4334 tokens). Examples: dua, pertama, kedua, 1, satu, 2, 3, 5, tiga, 4

Relations

NUM nodes are attached to their parents using 19 different relations: nummod (4106; 94% instances), appos (83; 2% instances), det (58; 1% instances), conj (50; 1% instances), amod (32; 1% instances), fixed (14; 0% instances), root (10; 0% instances), obj (7; 0% instances), compound (5; 0% instances), nsubj (3; 0% instances), advmod (2; 0% instances), dep (2; 0% instances), iobj (2; 0% instances), nsubj:pass (2; 0% instances), obl (2; 0% instances), punct (2; 0% instances), acl (1; 0% instances), nmod (1; 0% instances), parataxis (1; 0% instances)

Parents of NUM nodes belong to 14 different parts of speech: NOUN (2323; 53% instances), PROPN (1129; 26% instances), NUM (483; 11% instances), VERB (231; 5% instances), SYM (143; 3% instances), ADJ (22; 1% instances), ADV (19; 0% instances), (10; 0% instances), DET (7; 0% instances), PRON (5; 0% instances), PUNCT (5; 0% instances), ADP (3; 0% instances), CCONJ (2; 0% instances), X (1; 0% instances)

3703 (84%) NUM nodes are leaves.

193 (4%) NUM nodes have one child.

392 (9%) NUM nodes have two children.

95 (2%) NUM nodes have three or more children.

The highest child degree of a NUM node is 8.

Children of NUM nodes are attached using 23 different relations: punct (609; 45% instances), nummod (365; 27% instances), det (61; 5% instances), conj (47; 3% instances), cc (42; 3% instances), case (27; 2% instances), nmod (26; 2% instances), nsubj (26; 2% instances), amod (25; 2% instances), appos (24; 2% instances), fixed (23; 2% instances), flat (19; 1% instances), compound (13; 1% instances), cop (13; 1% instances), advmod (11; 1% instances), dep (5; 0% instances), acl (2; 0% instances), csubj (2; 0% instances), obj (2; 0% instances), ccomp (1; 0% instances), mark (1; 0% instances), parataxis (1; 0% instances), xcomp (1; 0% instances)

Children of NUM nodes belong to 15 different parts of speech: PUNCT (550; 41% instances), NUM (483; 36% instances), SYM (62; 5% instances), NOUN (53; 4% instances), CCONJ (43; 3% instances), PROPN (37; 3% instances), ADP (29; 2% instances), DET (19; 1% instances), PRON (17; 1% instances), ADJ (14; 1% instances), AUX (13; 1% instances), ADV (12; 1% instances), VERB (8; 1% instances), X (5; 0% instances), SCONJ (1; 0% instances)