home edit page issue tracker

This page pertains to UD version 2.

Treebank Statistics: UD_Indonesian-PUD: POS Tags: NUM

There are 211 NUM lemmas (5%), 222 NUM types (5%) and 501 NUM tokens (3%). Out of 17 observed tags, the rank of NUM is: 5 in number of lemmas, 5 in number of types and 11 in number of tokens.

The 10 most frequent NUM lemmas: satu, dua, tiga, juta, puluh, empat, 1, 10, miliar, 3

The 10 most frequent NUM types: satu, dua, kedua, tiga, juta, empat, 1, 10, 3, puluh

The 10 most frequent ambiguous lemmas: satu (NUM 51, VERB 1), dua (NUM 49, ADJ 8), tiga (NUM 17, ADJ 7), empat (NUM 10, ADJ 1), 3 (NUM 7, ADJ 2), enam (NUM 6, ADJ 1), 20 (NUM 5, ADJ 1), 8 (NUM 3, ADJ 1), 14 (NUM 2, ADJ 1), 16 (ADJ 2, NUM 2)

The 10 most frequent ambiguous types: kedua (NUM 14, ADJ 6), III (PROPN 4, NUM 1), ketiga (ADJ 7, NUM 1)

Morphology

The form / lemma ratio of NUM is 1.052133 (the average of all parts of speech is 1.137196).

The 1st highest number of forms (3) was observed with the lemma “dua”: berdua, dua, kedua.

The 2nd highest number of forms (3) was observed with the lemma “puluh”: puluh, puluhan, sepuluh.

The 3rd highest number of forms (2) was observed with the lemma “enam”: enam, keenam.

NUM occurs with 1 features: NumType (501; 100% instances)

NUM occurs with 1 feature-value pairs: NumType=Card

NUM occurs with 1 feature combinations. The most frequent feature combination is NumType=Card (501 tokens). Examples: satu, dua, kedua, tiga, juta, empat, 1, 10, 3, puluh

Relations

NUM nodes are attached to their parents using 12 different relations: nummod (358; 71% instances), flat (76; 15% instances), obl:tmod (16; 3% instances), conj (11; 2% instances), nsubj (10; 2% instances), nmod (8; 2% instances), appos (6; 1% instances), nmod:tmod (5; 1% instances), obl (5; 1% instances), nsubj:pass (2; 0% instances), obj (2; 0% instances), root (2; 0% instances)

Parents of NUM nodes belong to 8 different parts of speech: NOUN (332; 66% instances), NUM (62; 12% instances), SYM (38; 8% instances), VERB (37; 7% instances), PROPN (25; 5% instances), ADJ (3; 1% instances), PRON (2; 0% instances), (2; 0% instances)

360 (72%) NUM nodes are leaves.

90 (18%) NUM nodes have one child.

28 (6%) NUM nodes have two children.

23 (5%) NUM nodes have three or more children.

The highest child degree of a NUM node is 7.

Children of NUM nodes are attached using 16 different relations: flat (76; 33% instances), advmod (31; 14% instances), punct (30; 13% instances), case (25; 11% instances), nmod (21; 9% instances), cc (11; 5% instances), conj (8; 3% instances), det (8; 3% instances), nsubj (5; 2% instances), cop (4; 2% instances), acl:relcl (2; 1% instances), advmod:emph (2; 1% instances), nmod:lmod (2; 1% instances), nmod:tmod (2; 1% instances), advcl (1; 0% instances), csubj (1; 0% instances)

Children of NUM nodes belong to 12 different parts of speech: NUM (62; 27% instances), PUNCT (30; 13% instances), PROPN (26; 11% instances), ADP (25; 11% instances), NOUN (25; 11% instances), ADJ (16; 7% instances), ADV (15; 7% instances), CCONJ (11; 5% instances), DET (8; 3% instances), AUX (4; 2% instances), VERB (4; 2% instances), PART (3; 1% instances)