home edit page issue tracker

This page pertains to UD version 2.

Treebank Statistics: UD_Arabic-NYUAD: POS Tags: NOUN

There are 92 NOUN lemmas (2%), 1 NOUN types (6%) and 218254 NOUN tokens (30%). Out of 16 observed tags, the rank of NOUN is: 4 in number of lemmas, 8 in number of types and 1 in number of tokens.

The 10 most frequent NOUN lemmas: _، None، TBupdate، w، “، .، ,، l، b، h

The 10 most frequent NOUN types: _

The 10 most frequent ambiguous lemmas: _ (NOUN 216429, PUNCT 72574, ADJ 66760, ADP 62646, VERB 54473, PROPN 48965, ADV 26129, SCONJ 23987, NUM 15122, AUX 6581, DET 6330, PART 5856, CCONJ 5168, PRON 2460, INTJ 54, X 32), None (NOUN 457, X 344, VERB 264, ADJ 125, PROPN 124, ADV 34, CCONJ 20, PRON 16, SCONJ 16, PART 14, ADP 8, DET 6, AUX 2), TBupdate (NOUN 401, ADJ 280, VERB 263, X 174, ADV 74, PROPN 69, ADP 4, SCONJ 2, CCONJ 1, DET 1, PART 1, PRON 1), w (CCONJ 43321, NOUN 190, PUNCT 136, ADP 120, ADV 117, PROPN 78, VERB 71, SCONJ 69, ADJ 55, PRON 33, PART 10, DET 9, NUM 8, AUX 5, X 3), “ (NOUN 112, ADP 34, CCONJ 20, PROPN 20, ADJ 12, VERB 8, PART 6, PRON 6, SCONJ 6, ADV 5, AUX 2, DET 2, X 2), . (NOUN 107, ADJ 95, PROPN 67, PRON 20, VERB 12, PART 6, ADP 5, X 5, CCONJ 3, ADV 2, AUX 2, DET 2, SCONJ 1), , (NOUN 100, CCONJ 96, VERB 34, PROPN 33, ADJ 30, ADP 30, PRON 11, SCONJ 11, PART 10, AUX 5, DET 5, ADV 4), l (ADP 15449, PART 123, NOUN 98, AUX 67, CCONJ 33, ADJ 30, PUNCT 19, VERB 9, SCONJ 8, PROPN 7, ADV 6, PRON 5, DET 2, INTJ 2, NUM 1, X 1), b (ADP 12204, NOUN 65, VERB 17, ADJ 16, PUNCT 15, PRON 12, CCONJ 10, SCONJ 7, PROPN 6, ADV 5, AUX 2, PART 2, X 2, DET 1, NUM 1), h (PRON 12201, SCONJ 390, AUX 107, NOUN 62, ADP 36, PUNCT 14, CCONJ 12, ADJ 7, NUM 7, VERB 7, PROPN 2, PART 1, X 1)

The 10 most frequent ambiguous types: _ (NOUN 218254, ADP 91694, PUNCT 75148, ADJ 67604, PROPN 58325, VERB 55215, CCONJ 50032, PRON 31239, ADV 26527, SCONJ 26034, NUM 15147, PART 8612, AUX 7723, DET 6362, X 917, INTJ 56)

Morphology

The form / lemma ratio of NOUN is 0.010870 (the average of all parts of speech is 0.002933).

The 1st highest number of forms (1) was observed with the lemma “!”: _.

The 2nd highest number of forms (1) was observed with the lemma “””: _.

The 3rd highest number of forms (1) was observed with the lemma “(”: _.

NOUN occurs with 8 features: Gender (217040; 99% instances), Number (217040; 99% instances), Definite (216797; 99% instances), Case (209062; 96% instances), Person (400; 0% instances), Voice (243; 0% instances), Mood (238; 0% instances), Polarity (14; 0% instances)

NOUN occurs with 20 feature-value pairs: Case=Acc, Case=Gen, Case=Nom, Definite=Com, Definite=Def, Definite=Ind, Gender=Fem, Gender=Masc, Mood=Ind, Mood=Jus, Mood=Sub, Number=Dual, Number=Plur, Number=Sing, Person=1, Person=2, Person=3, Polarity=Neg, Voice=Act, Voice=Pass

NOUN occurs with 107 feature combinations. The most frequent feature combination is Case=Gen|Definite=Def|Gender=Masc|Number=Sing (40528 tokens). Examples: _

Relations

NOUN nodes are attached to their parents using 18 different relations: nmod (87788; 40% instances), nmod:poss (66743; 31% instances), nsubj (20288; 9% instances), obj (20226; 9% instances), conj (17054; 8% instances), root (2255; 1% instances), parataxis (1818; 1% instances), nsubj:pass (1360; 1% instances), iobj (268; 0% instances), aux (266; 0% instances), flat (77; 0% instances), xcomp (34; 0% instances), acl (24; 0% instances), dep (24; 0% instances), nummod (12; 0% instances), amod (9; 0% instances), ccomp (7; 0% instances), mark (1; 0% instances)

Parents of NOUN nodes belong to 16 different parts of speech: NOUN (112056; 51% instances), VERB (74296; 34% instances), ADV (16160; 7% instances), ADJ (7184; 3% instances), PROPN (2938; 1% instances), (2255; 1% instances), PRON (1114; 1% instances), NUM (635; 0% instances), CCONJ (588; 0% instances), PUNCT (310; 0% instances), PART (224; 0% instances), X (223; 0% instances), DET (115; 0% instances), SCONJ (95; 0% instances), AUX (57; 0% instances), INTJ (4; 0% instances)

39473 (18%) NOUN nodes are leaves.

72464 (33%) NOUN nodes have one child.

62527 (29%) NOUN nodes have two children.

43790 (20%) NOUN nodes have three or more children.

The highest child degree of a NOUN node is 106.

Children of NOUN nodes are attached using 26 different relations: nmod:poss (80173; 22% instances), case (69452; 19% instances), amod (55214; 15% instances), nmod (48054; 13% instances), punct (19460; 5% instances), cc (18775; 5% instances), conj (16210; 4% instances), ccomp (10271; 3% instances), appos (8623; 2% instances), advmod (8141; 2% instances), mark (6690; 2% instances), xcomp (6125; 2% instances), nummod (5610; 2% instances), det (4215; 1% instances), obj (2276; 1% instances), cop (1531; 0% instances), nsubj (1485; 0% instances), dep (1477; 0% instances), parataxis (1159; 0% instances), flat (388; 0% instances), csubj (277; 0% instances), flat:name (54; 0% instances), acl (40; 0% instances), iobj (15; 0% instances), aux (10; 0% instances), compound (1; 0% instances)

Children of NOUN nodes belong to 16 different parts of speech: NOUN (112056; 31% instances), ADP (69541; 19% instances), ADJ (56584; 15% instances), PROPN (22973; 6% instances), PUNCT (19460; 5% instances), CCONJ (18825; 5% instances), VERB (17911; 5% instances), PRON (17803; 5% instances), ADV (8786; 2% instances), NUM (7319; 2% instances), SCONJ (6732; 2% instances), DET (4540; 1% instances), AUX (1857; 1% instances), PART (1098; 0% instances), X (236; 0% instances), INTJ (5; 0% instances)