home edit page issue tracker

This page pertains to UD version 2.

Treebank Statistics: UD_Catalan: POS Tags: SYM

There are 276 SYM lemmas (1%), 273 SYM types (1%) and 4633 SYM tokens (1%). Out of 17 observed tags, the rank of SYM is: 8 in number of lemmas, 8 in number of types and 14 in number of tokens.

The 10 most frequent SYM lemmas: ’, %, 50/100, 10/100, 30/100, 5/100, 1/100, 2/100, 25/100, 20/100

The 10 most frequent SYM types: ’, %, 50%, 10%, 30%, 5%, 40%, 1%, 2%, 25%

The 10 most frequent ambiguous lemmas: (SYM 3820, PUNCT 31), 10/100 (SYM 16, NUM 1), 15/100 (SYM 6, NUM 1), 75/100 (SYM 4, NUM 1), 40 (NUM 47, SYM 2, NOUN 1), - (PUNCT 950, SYM 1), 10 (NUM 160, NOUN 9, SYM 1), 34 (NUM 12, SYM 1), 50 (NUM 75, SYM 1)

The 10 most frequent ambiguous types: (SYM 3820, PUNCT 32), - (PUNCT 950, SYM 1)

Morphology

The form / lemma ratio of SYM is 0.989130 (the average of all parts of speech is 1.413188).

The 1st highest number of forms (2) was observed with the lemma “1.82/100”: 1’82%, 1,82%.

The 2nd highest number of forms (2) was observed with the lemma “3.5/100”: 3’5%, 3,5%.

The 3rd highest number of forms (2) was observed with the lemma “6.94/100”: 6’94%, 6,94%.

SYM occurs with 4 features: NumForm (806; 17% instances), NumType (793; 17% instances), AdvType (1; 0% instances), Gender (1; 0% instances)

SYM occurs with 4 feature-value pairs: AdvType=Tim, Gender=Masc, NumForm=Digit, NumType=Frac

SYM occurs with 5 feature combinations. The most frequent feature combination is _ (3825 tokens). Examples: ’, 2%, -, 4%, 5%

Relations

SYM nodes are attached to their parents using 13 different relations: nmod (4166; 90% instances), obj (185; 4% instances), advmod (77; 2% instances), appos (77; 2% instances), nsubj (62; 1% instances), conj (43; 1% instances), root (10; 0% instances), acl (4; 0% instances), dep (3; 0% instances), advcl (2; 0% instances), parataxis (2; 0% instances), ccomp (1; 0% instances), iobj (1; 0% instances)

Parents of SYM nodes belong to 17 different parts of speech: VERB (1563; 34% instances), NOUN (1035; 22% instances), PROPN (879; 19% instances), ADJ (389; 8% instances), DET (232; 5% instances), NUM (230; 5% instances), SYM (114; 2% instances), AUX (47; 1% instances), ADV (42; 1% instances), CCONJ (29; 1% instances), PRON (29; 1% instances), ADP (11; 0% instances), (10; 0% instances), X (9; 0% instances), PART (7; 0% instances), PUNCT (4; 0% instances), SCONJ (3; 0% instances)

3931 (85%) SYM nodes are leaves.

226 (5%) SYM nodes have one child.

221 (5%) SYM nodes have two children.

255 (6%) SYM nodes have three or more children.

The highest child degree of a SYM node is 8.

Children of SYM nodes are attached using 21 different relations: nmod (479; 29% instances), case (338; 21% instances), punct (219; 13% instances), det (202; 12% instances), advmod (69; 4% instances), obl (64; 4% instances), conj (40; 2% instances), cc (38; 2% instances), appos (25; 2% instances), mark (23; 1% instances), cop (22; 1% instances), nsubj (22; 1% instances), amod (18; 1% instances), advcl (15; 1% instances), obj (15; 1% instances), flat (14; 1% instances), acl (10; 1% instances), aux (10; 1% instances), xcomp (5; 0% instances), compound (2; 0% instances), ccomp (1; 0% instances)

Children of SYM nodes belong to 15 different parts of speech: ADP (345; 21% instances), NOUN (277; 17% instances), PUNCT (220; 13% instances), PRON (214; 13% instances), DET (201; 12% instances), SYM (114; 7% instances), ADV (64; 4% instances), PROPN (50; 3% instances), CCONJ (38; 2% instances), VERB (36; 2% instances), AUX (32; 2% instances), ADJ (23; 1% instances), SCONJ (12; 1% instances), NUM (3; 0% instances), PART (2; 0% instances)