home edit page issue tracker

This page pertains to UD version 2.

Treebank Statistics: UD_Tamil-MWTT: POS Tags: NUM

There are 13 NUM lemmas (3%), 21 NUM types (2%) and 105 NUM tokens (4%). Out of 13 observed tags, the rank of NUM is: 8 in number of lemmas, 8 in number of types and 7 in number of tokens.

The 10 most frequent NUM lemmas: ஒன்று, ஐந்து, இரண்டு, மூன்று, ஆயிரம், நூறு, பத்து, ஆறு, இன்னொன்று, ஒவ்வொரு

The 10 most frequent NUM types: ஒரு, ஐந்து, மூன்று, இரண்டு, ஒன்று, ஆயிரம், இரண்டாவது, பத்து, ஆறு, இன்னொன்று

The 10 most frequent ambiguous lemmas: ஒன்று (NUM 66, ADV 1), இரண்டு (NUM 8, ADV 1, NOUN 1), ஆறு (NOUN 1, NUM 1, VERB 1)

The 10 most frequent ambiguous types: இரண்டு (NUM 4, NOUN 1)

Morphology

The form / lemma ratio of NUM is 1.615385 (the average of all parts of speech is 1.743028).

The 1st highest number of forms (4) was observed with the lemma “இரண்டு”: இரண்டாம், இரண்டாவது, இரண்டு, இரண்டே.

The 2nd highest number of forms (4) was observed with the lemma “ஒன்று”: ஒன்று, ஒன்றை, ஒரு, ஒரே.

The 3rd highest number of forms (2) was observed with the lemma “ஐந்து”: ஐந்து, ஐந்தை.

NUM occurs with 4 features: Case (31; 30% instances), NumType (9; 9% instances), Number (2; 2% instances), Person (2; 2% instances)

NUM occurs with 7 feature-value pairs: Case=Acc, Case=Dat, Case=Nom, NumType=Card, NumType=Ord, Number=Sing, Person=3

NUM occurs with 6 feature combinations. The most frequent feature combination is _ (65 tokens). Examples: ஒரு, இன்னொன்று, ஒன்றை, ஒரே, ஒவ்வொரு

Relations

NUM nodes are attached to their parents using 5 different relations: nummod (96; 91% instances), obj (3; 3% instances), obl (3; 3% instances), root (2; 2% instances), amod (1; 1% instances)

Parents of NUM nodes belong to 4 different parts of speech: NOUN (90; 86% instances), VERB (10; 10% instances), ADV (3; 3% instances), (2; 2% instances)

97 (92%) NUM nodes are leaves.

6 (6%) NUM nodes have one child.

2 (2%) NUM nodes have two children.

The highest child degree of a NUM node is 2.

Children of NUM nodes are attached using 4 different relations: case (4; 40% instances), det (2; 20% instances), nsubj (2; 20% instances), punct (2; 20% instances)

Children of NUM nodes belong to 4 different parts of speech: ADP (4; 40% instances), DET (2; 20% instances), NOUN (2; 20% instances), PUNCT (2; 20% instances)