home edit page issue tracker

This page pertains to UD version 2.

Treebank Statistics: UD_Arabic-PADT: POS Tags: X

There are 6049 X lemmas (39%), 6112 X types (22%) and 17168 X tokens (6%). Out of 17 observed tags, the rank of X is: 1 in number of lemmas, 2 in number of types and 7 in number of tokens.

The 10 most frequent X lemmas: ب، محمد، اف، وَاشِنطُن، عبد، أَفرِيقِيَا، مبارك، سُورِيَا، شَارُون، اَلسَّارس

The 10 most frequent X types: ب، اف، محمد، واشنطن، عبد، مبارك، سوريا، شارون، السارس، أفريقيا

The 10 most frequent ambiguous lemmas: أَ (X 48, PART 6, AUX 4), لكن (X 23, ADV 2), ر (PUNCT 22, X 20), آل (X 18, VERB 2, NOUN 1), لقد (X 18, ADV 9), أَي (CCONJ 40, X 13), إِن (SCONJ 20, X 13), تَلّ (X 13, NOUN 1), كِيلُو (X 8, NOUN 1), إلا (ADV 11, X 6)

The 10 most frequent ambiguous types: ب (ADP 6079, X 208), محمد (X 136, NOUN 49), عبد (X 104, NOUN 37), مبارك (X 99, NOUN 12), أفريقيا (X 89, ADJ 1, NOUN 1), الله (X 73, NOUN 53), ذلك (DET 273, X 69), علي (ADP 313, X 69, NOUN 7), عرفات (X 67, NOUN 8), الذي (DET 712, X 65)

Morphology

The form / lemma ratio of X is 1.010415 (the average of all parts of speech is 1.761981).

The 1st highest number of forms (4) was observed with the lemma “أَي”: أى, أي, اى, اي.

The 2nd highest number of forms (3) was observed with the lemma “آبَاد”: آباد, أباد, اباد.

The 3rd highest number of forms (3) was observed with the lemma “آسِيَا”: آسيا, أسيا, اسيا.

X occurs with 3 features: Foreign (5097; 30% instances), Abbr (504; 3% instances), ExtPos (16; 0% instances)

X occurs with 4 feature-value pairs: Abbr=Yes, ExtPos=ADP, ExtPos=SCONJ, Foreign=Yes

X occurs with 5 feature combinations. The most frequent feature combination is _ (11551 tokens). Examples: محمد، اف، عبد، مبارك، الله، ذلك، علي، عرفات، الذي، حسين

Relations

X nodes are attached to their parents using 26 different relations: nmod (10397; 61% instances), nsubj (2068; 12% instances), conj (787; 5% instances), flat (745; 4% instances), obl (680; 4% instances), obj (525; 3% instances), dep (524; 3% instances), root (470; 3% instances), obl:arg (260; 2% instances), cc (243; 1% instances), appos (113; 1% instances), advcl:pred (80; 0% instances), parataxis (60; 0% instances), orphan (39; 0% instances), nsubj:pass (35; 0% instances), fixed (31; 0% instances), mark (27; 0% instances), case (26; 0% instances), iobj (21; 0% instances), dislocated (18; 0% instances), acl (5; 0% instances), advcl (5; 0% instances), ccomp (5; 0% instances), xcomp (2; 0% instances), acl:relcl (1; 0% instances), csubj (1; 0% instances)

Parents of X nodes belong to 15 different parts of speech: NOUN (6799; 40% instances), X (5328; 31% instances), VERB (3361; 20% instances), (470; 3% instances), ADJ (451; 3% instances), NUM (364; 2% instances), PROPN (138; 1% instances), PRON (65; 0% instances), CCONJ (48; 0% instances), PART (46; 0% instances), ADP (41; 0% instances), DET (38; 0% instances), ADV (16; 0% instances), INTJ (2; 0% instances), SCONJ (1; 0% instances)

8114 (47%) X nodes are leaves.

4650 (27%) X nodes have one child.

2350 (14%) X nodes have two children.

2054 (12%) X nodes have three or more children.

The highest child degree of a X node is 26.

Children of X nodes are attached using 30 different relations: nmod (5835; 34% instances), punct (3207; 18% instances), case (2014; 12% instances), conj (1048; 6% instances), cc (878; 5% instances), flat (745; 4% instances), dep (644; 4% instances), obl (513; 3% instances), nsubj (353; 2% instances), obl:arg (316; 2% instances), obj (270; 2% instances), mark (250; 1% instances), amod (244; 1% instances), parataxis (202; 1% instances), acl (190; 1% instances), nummod (105; 1% instances), advmod (100; 1% instances), appos (69; 0% instances), ccomp (58; 0% instances), acl:relcl (56; 0% instances), advmod:emph (48; 0% instances), advcl (47; 0% instances), orphan (39; 0% instances), advcl:pred (37; 0% instances), det (29; 0% instances), cop (26; 0% instances), aux (17; 0% instances), fixed (16; 0% instances), dislocated (5; 0% instances), csubj (3; 0% instances)

Children of X nodes belong to 17 different parts of speech: X (5328; 31% instances), NOUN (3226; 19% instances), PUNCT (3207; 18% instances), ADP (2029; 12% instances), ADJ (810; 5% instances), CCONJ (765; 4% instances), VERB (626; 4% instances), NUM (508; 3% instances), PRON (204; 1% instances), SCONJ (195; 1% instances), DET (132; 1% instances), PART (112; 1% instances), PROPN (92; 1% instances), ADV (60; 0% instances), AUX (43; 0% instances), SYM (25; 0% instances), INTJ (2; 0% instances)