home edit page issue tracker

This page pertains to UD version 2.

Treebank Statistics: UD_Arabic-PADT: POS Tags: X

There are 6055 X lemmas (39%), 6118 X types (22%) and 17211 X tokens (6%). Out of 16 observed tags, the rank of X is: 1 in number of lemmas, 2 in number of types and 7 in number of tokens.

The 10 most frequent X lemmas: ب، محمد، اف، وَاشِنطُن، عبد، أَفرِيقِيَا، مبارك، سُورِيَا، شَارُون، اَلسَّارس

The 10 most frequent X types: ب، اف، محمد، واشنطن، عبد، مبارك، سوريا، شارون، السارس، أفريقيا

The 10 most frequent ambiguous lemmas: أَ (X 48, PART 10), ر (PUNCT 22, X 20), آل (X 18, VERB 2, NOUN 1), أَي (CCONJ 40, X 13), إِن (CCONJ 20, X 13), تَلّ (X 13, NOUN 1), كِيلُو (X 8, NOUN 1), رَام (X 6, VERB 1), فِي (ADP 8766, X 4), ما (DET 4, X 4)

The 10 most frequent ambiguous types: ب (ADP 6079, X 208), محمد (X 136, NOUN 49), عبد (X 104, NOUN 37), مبارك (X 99, NOUN 12), أفريقيا (X 89, ADJ 1, NOUN 1), الله (X 73, NOUN 53), ذلك (DET 273, X 69), علي (ADP 313, X 69, NOUN 7), عرفات (X 67, NOUN 8), الذي (DET 712, X 65)

Morphology

The form / lemma ratio of X is 1.010405 (the average of all parts of speech is 1.761701).

The 1st highest number of forms (4) was observed with the lemma “أَي”: أى, أي, اى, اي.

The 2nd highest number of forms (3) was observed with the lemma “آبَاد”: آباد, أباد, اباد.

The 3rd highest number of forms (3) was observed with the lemma “آسِيَا”: آسيا, أسيا, اسيا.

X occurs with 2 features: Foreign (5097; 30% instances), Abbr (504; 3% instances)

X occurs with 2 feature-value pairs: Abbr=Yes, Foreign=Yes

X occurs with 3 feature combinations. The most frequent feature combination is _ (11610 tokens). Examples: محمد، اف، عبد، مبارك، الله، ذلك، علي، عرفات، الذي، حسين

Relations

X nodes are attached to their parents using 27 different relations: nmod (10136; 59% instances), nsubj (2179; 13% instances), advmod (811; 5% instances), conj (782; 5% instances), flat:foreign (745; 4% instances), obj (531; 3% instances), root (470; 3% instances), dep (412; 2% instances), obl:arg (320; 2% instances), cc (260; 2% instances), appos (132; 1% instances), xcomp (80; 0% instances), parataxis (63; 0% instances), iobj (50; 0% instances), orphan (44; 0% instances), nsubj:pass (36; 0% instances), fixed (31; 0% instances), mark (27; 0% instances), case (26; 0% instances), aux (23; 0% instances), advmod:emph (19; 0% instances), cop (17; 0% instances), acl (7; 0% instances), ccomp (5; 0% instances), advcl (3; 0% instances), csubj (1; 0% instances), obl (1; 0% instances)

Parents of X nodes belong to 16 different parts of speech: NOUN (6817; 40% instances), X (5334; 31% instances), VERB (3316; 19% instances), ADJ (476; 3% instances), (470; 3% instances), NUM (364; 2% instances), PROPN (138; 1% instances), CCONJ (88; 1% instances), PRON (67; 0% instances), PART (46; 0% instances), ADP (42; 0% instances), DET (35; 0% instances), ADV (14; 0% instances), INTJ (2; 0% instances), AUX (1; 0% instances), PUNCT (1; 0% instances)

8152 (47%) X nodes are leaves.

4660 (27%) X nodes have one child.

2348 (14%) X nodes have two children.

2051 (12%) X nodes have three or more children.

The highest child degree of a X node is 26.

Children of X nodes are attached using 28 different relations: nmod (5419; 31% instances), punct (3190; 18% instances), case (2036; 12% instances), conj (1044; 6% instances), cc (899; 5% instances), flat:foreign (745; 4% instances), amod (641; 4% instances), dep (627; 4% instances), obl (388; 2% instances), nsubj (359; 2% instances), obl:arg (303; 2% instances), obj (284; 2% instances), acl (245; 1% instances), mark (242; 1% instances), parataxis (200; 1% instances), advmod (148; 1% instances), nummod (114; 1% instances), appos (74; 0% instances), advmod:emph (65; 0% instances), ccomp (58; 0% instances), advcl (51; 0% instances), orphan (43; 0% instances), det (42; 0% instances), xcomp (38; 0% instances), cop (34; 0% instances), aux (26; 0% instances), fixed (16; 0% instances), csubj (3; 0% instances)

Children of X nodes belong to 16 different parts of speech: X (5334; 31% instances), NOUN (3235; 19% instances), PUNCT (3190; 18% instances), ADP (2029; 12% instances), CCONJ (958; 6% instances), ADJ (815; 5% instances), VERB (620; 4% instances), NUM (508; 3% instances), PRON (191; 1% instances), DET (125; 1% instances), PART (124; 1% instances), PROPN (92; 1% instances), ADV (56; 0% instances), AUX (30; 0% instances), SYM (25; 0% instances), INTJ (2; 0% instances)