home edit page issue tracker

This page pertains to UD version 2.

Treebank Statistics: UD_Arabic-PADT: POS Tags: X

There are 6049 X lemmas (39%), 6112 X types (22%) and 17168 X tokens (6%). Out of 17 observed tags, the rank of X is: 1 in number of lemmas, 2 in number of types and 7 in number of tokens.

The 10 most frequent X lemmas: ب، محمد، اف، وَاشِنطُن، عبد، أَفرِيقِيَا، مبارك، سُورِيَا، شَارُون، اَلسَّارس

The 10 most frequent X types: ب، اف، محمد، واشنطن، عبد، مبارك، سوريا، شارون، السارس، أفريقيا

The 10 most frequent ambiguous lemmas: أَ (X 48, PART 6, AUX 4), لكن (X 23, ADV 2), ر (PUNCT 22, X 20), آل (X 18, VERB 2, NOUN 1), لقد (X 18, ADV 9), أَي (CCONJ 40, X 13), إِن (SCONJ 20, X 13), تَلّ (X 13, NOUN 1), كِيلُو (X 8, NOUN 1), إلا (ADV 11, X 6)

The 10 most frequent ambiguous types: ب (ADP 6079, X 208), محمد (X 136, NOUN 49), عبد (X 104, NOUN 37), مبارك (X 99, NOUN 12), أفريقيا (X 89, ADJ 1, NOUN 1), الله (X 73, NOUN 53), ذلك (DET 273, X 69), علي (ADP 313, X 69, NOUN 7), عرفات (X 67, NOUN 8), الذي (DET 712, X 65)

Morphology

The form / lemma ratio of X is 1.010415 (the average of all parts of speech is 1.761966).

The 1st highest number of forms (4) was observed with the lemma “أَي”: أى, أي, اى, اي.

The 2nd highest number of forms (3) was observed with the lemma “آبَاد”: آباد, أباد, اباد.

The 3rd highest number of forms (3) was observed with the lemma “آسِيَا”: آسيا, أسيا, اسيا.

X occurs with 2 features: Foreign (5097; 30% instances), Abbr (504; 3% instances)

X occurs with 2 feature-value pairs: Abbr=Yes, Foreign=Yes

X occurs with 3 feature combinations. The most frequent feature combination is _ (11567 tokens). Examples: محمد، اف، عبد، مبارك، الله، ذلك، علي، عرفات، الذي، حسين

Relations

X nodes are attached to their parents using 25 different relations: nmod (10181; 59% instances), nsubj (2171; 13% instances), obl (836; 5% instances), conj (782; 5% instances), flat:foreign (745; 4% instances), obj (496; 3% instances), root (470; 3% instances), dep (420; 2% instances), obl:arg (320; 2% instances), cc (243; 1% instances), appos (113; 1% instances), xcomp (82; 0% instances), parataxis (60; 0% instances), iobj (50; 0% instances), orphan (44; 0% instances), nsubj:pass (36; 0% instances), fixed (31; 0% instances), mark (27; 0% instances), case (26; 0% instances), dislocated (18; 0% instances), acl (5; 0% instances), advcl (5; 0% instances), ccomp (5; 0% instances), acl:relcl (1; 0% instances), csubj (1; 0% instances)

Parents of X nodes belong to 15 different parts of speech: NOUN (6805; 40% instances), X (5330; 31% instances), VERB (3362; 20% instances), (470; 3% instances), ADJ (451; 3% instances), NUM (364; 2% instances), PROPN (138; 1% instances), PRON (64; 0% instances), CCONJ (48; 0% instances), PART (46; 0% instances), ADP (41; 0% instances), DET (30; 0% instances), ADV (16; 0% instances), INTJ (2; 0% instances), SCONJ (1; 0% instances)

8114 (47%) X nodes are leaves.

4640 (27%) X nodes have one child.

2352 (14%) X nodes have two children.

2062 (12%) X nodes have three or more children.

The highest child degree of a X node is 26.

Children of X nodes are attached using 30 different relations: nmod (5421; 31% instances), punct (3210; 18% instances), case (2017; 12% instances), conj (1043; 6% instances), cc (878; 5% instances), flat:foreign (745; 4% instances), amod (641; 4% instances), dep (628; 4% instances), obl (517; 3% instances), nsubj (370; 2% instances), obl:arg (317; 2% instances), obj (270; 2% instances), mark (249; 1% instances), parataxis (202; 1% instances), acl (196; 1% instances), nummod (114; 1% instances), advmod (100; 1% instances), appos (69; 0% instances), ccomp (58; 0% instances), acl:relcl (56; 0% instances), advmod:emph (48; 0% instances), advcl (47; 0% instances), orphan (43; 0% instances), det (42; 0% instances), xcomp (37; 0% instances), cop (26; 0% instances), aux (17; 0% instances), fixed (16; 0% instances), dislocated (5; 0% instances), csubj (3; 0% instances)

Children of X nodes belong to 17 different parts of speech: X (5330; 31% instances), NOUN (3231; 19% instances), PUNCT (3210; 18% instances), ADP (2032; 12% instances), ADJ (810; 5% instances), CCONJ (765; 4% instances), VERB (632; 4% instances), NUM (508; 3% instances), PRON (206; 1% instances), SCONJ (195; 1% instances), DET (132; 1% instances), PART (112; 1% instances), PROPN (92; 1% instances), ADV (60; 0% instances), AUX (43; 0% instances), SYM (25; 0% instances), INTJ (2; 0% instances)