home edit page issue tracker

This page pertains to UD version 2.

Treebank Statistics: UD_Arabic-PADT: POS Tags: X

There are 6050 X lemmas (39%), 6113 X types (22%) and 17189 X tokens (6%). Out of 16 observed tags, the rank of X is: 1 in number of lemmas, 2 in number of types and 7 in number of tokens.

The 10 most frequent X lemmas: ب، محمد، اف، وَاشِنطُن، عبد، أَفرِيقِيَا، مبارك، سُورِيَا، شَارُون، اَلسَّارس

The 10 most frequent X types: ب، اف، محمد، واشنطن، عبد، مبارك، سوريا، شارون، السارس، أفريقيا

The 10 most frequent ambiguous lemmas: أَ (X 48, PART 6, AUX 4), لكن (X 23, ADV 2), ر (PUNCT 22, X 20), آل (X 18, VERB 2, NOUN 1), لقد (X 18, ADV 9), أَي (CCONJ 40, X 13), إِن (CCONJ 20, X 13), تَلّ (X 13, NOUN 1), كِيلُو (X 8, NOUN 1), رَام (X 6, VERB 1)

The 10 most frequent ambiguous types: ب (ADP 6079, X 208), محمد (X 136, NOUN 49), عبد (X 104, NOUN 37), مبارك (X 99, NOUN 12), أفريقيا (X 89, ADJ 1, NOUN 1), الله (X 73, NOUN 53), ذلك (DET 273, X 69), علي (ADP 313, X 69, NOUN 7), عرفات (X 67, NOUN 8), الذي (DET 712, X 65)

Morphology

The form / lemma ratio of X is 1.010413 (the average of all parts of speech is 1.762014).

The 1st highest number of forms (4) was observed with the lemma “أَي”: أى, أي, اى, اي.

The 2nd highest number of forms (3) was observed with the lemma “آبَاد”: آباد, أباد, اباد.

The 3rd highest number of forms (3) was observed with the lemma “آسِيَا”: آسيا, أسيا, اسيا.

X occurs with 2 features: Foreign (5097; 30% instances), Abbr (504; 3% instances)

X occurs with 2 feature-value pairs: Abbr=Yes, Foreign=Yes

X occurs with 3 feature combinations. The most frequent feature combination is _ (11588 tokens). Examples: محمد، اف، عبد، مبارك، الله، ذلك، علي، عرفات، الذي، حسين

Relations

X nodes are attached to their parents using 24 different relations: nmod (10182; 59% instances), nsubj (2171; 13% instances), obl (838; 5% instances), conj (782; 5% instances), flat:foreign (745; 4% instances), obj (496; 3% instances), root (470; 3% instances), dep (420; 2% instances), obl:arg (320; 2% instances), cc (260; 2% instances), appos (114; 1% instances), xcomp (82; 0% instances), parataxis (63; 0% instances), iobj (50; 0% instances), orphan (44; 0% instances), nsubj:pass (36; 0% instances), fixed (31; 0% instances), mark (27; 0% instances), case (26; 0% instances), dislocated (18; 0% instances), acl (6; 0% instances), ccomp (5; 0% instances), advcl (2; 0% instances), csubj (1; 0% instances)

Parents of X nodes belong to 15 different parts of speech: NOUN (6805; 40% instances), X (5331; 31% instances), VERB (3350; 19% instances), (470; 3% instances), ADJ (448; 3% instances), NUM (364; 2% instances), PROPN (138; 1% instances), CCONJ (88; 1% instances), PRON (64; 0% instances), PART (46; 0% instances), ADP (41; 0% instances), DET (27; 0% instances), ADV (14; 0% instances), INTJ (2; 0% instances), AUX (1; 0% instances)

8131 (47%) X nodes are leaves.

4644 (27%) X nodes have one child.

2352 (14%) X nodes have two children.

2062 (12%) X nodes have three or more children.

The highest child degree of a X node is 26.

Children of X nodes are attached using 29 different relations: nmod (5423; 31% instances), punct (3204; 18% instances), case (2018; 12% instances), conj (1044; 6% instances), cc (876; 5% instances), flat:foreign (745; 4% instances), amod (641; 4% instances), dep (628; 4% instances), obl (512; 3% instances), nsubj (360; 2% instances), obl:arg (304; 2% instances), obj (284; 2% instances), acl (252; 1% instances), mark (238; 1% instances), parataxis (202; 1% instances), nummod (114; 1% instances), advmod (99; 1% instances), appos (69; 0% instances), ccomp (58; 0% instances), advmod:emph (57; 0% instances), advcl (51; 0% instances), orphan (43; 0% instances), det (42; 0% instances), xcomp (38; 0% instances), cop (26; 0% instances), aux (17; 0% instances), fixed (16; 0% instances), dislocated (5; 0% instances), csubj (3; 0% instances)

Children of X nodes belong to 16 different parts of speech: X (5331; 31% instances), NOUN (3236; 19% instances), PUNCT (3204; 18% instances), ADP (2033; 12% instances), CCONJ (959; 6% instances), ADJ (812; 5% instances), VERB (634; 4% instances), NUM (508; 3% instances), PRON (194; 1% instances), DET (126; 1% instances), PART (109; 1% instances), PROPN (92; 1% instances), ADV (61; 0% instances), AUX (43; 0% instances), SYM (25; 0% instances), INTJ (2; 0% instances)