home edit page issue tracker

This page pertains to UD version 2.

Treebank Statistics: UD_Portuguese-PUD: POS Tags: NUM

There are 3 NUM lemmas (0%), 235 NUM types (4%) and 469 NUM tokens (2%). Out of 16 observed tags, the rank of NUM is: 12 in number of lemmas, 5 in number of types and 12 in number of tokens.

The 10 most frequent NUM lemmas: _, bilhão, três

The 10 most frequent NUM types: dois, um, três, duas, milhões, quatro, uma, 10, 3, seis

The 10 most frequent ambiguous lemmas: _ (VERB 1051, PRON 891, ADP 675, NUM 467, AUX 416, NOUN 311, DET 290, CCONJ 131, SCONJ 98, ADJ 94, SYM 40, ADV 14, X 5, INTJ 1)

The 10 most frequent ambiguous types: um (DET 213, NUM 20, NOUN 3), uma (DET 186, NUM 8, NOUN 1), bilhões (NUM 3, NOUN 1)

Morphology

The form / lemma ratio of NUM is 78.333333 (the average of all parts of speech is 1.570742).

The 1st highest number of forms (235) was observed with the lemma “_”: 0, 1, 1,4, 1,5, 1.165, 1.335, 1.365, 1.4, 1.5, 10, 10,000, 10.000, 100, 100.000, 1000, 103.7, 1072, 1075, 10:00, 11, 12, 12,000, 120, 1200, 125, 1340, 1350, 137, 1399, 14, 1415, 1492, 15, 15,001, 15.000, 15.5, 1519, 1530, 1538, 1563, 1566, 16, 16.500, 1600, 1610, 1632, 168.000, 17, 1770, 1777, 1794, 18, 1820, 1832, 1839, 1842, 1856, 1858, 1860, 1879, 1882, 1886, 1887, 1896, 19, 19,999, 1900, 1903, 1904, 1911, 1912, 1913, 1914, 1916, 1917, 1918, 1925, 1926, 1927, 1928, 1933, 1945, 1947, 1948, 1950, 1952, 1954, 1955, 1960, 1961, 1962, 1969, 1970, 1973, 1975, 1976, 1977, 1979, 1980, 1981, 1984, 1987, 1988, 1990, 1991, 1992, 1993, 1994, 1996, 1997, 1998, 2, 2.900, 20, 200, 2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011, 2012, 2013, 2013-2014, 2014, 2015, 2015-2016, 2016, 2017, 2019, 2020, 2035, 2050, 21, 221, 23.45, 24, 25, 25,000, 27, 28, 3, 3,000, 3.000, 30, 31, 328, 33, 330, 330.000, 3300, 34, 35.000, 352, 36, 360, 363, 367, 393, 4, 40, 400, 42, 45, 49, 5, 5,000, 5,7, 5.000, 50, 500, 512-511, 53, 550, 56, 6, 6.000, 60, 600.000, 62, 66, 6:30, 7, 7,5, 70, 700, 71, 760, 80, 830, 833, 84, 846, 9, 90, X, XIII, XIV, XV, XVI, XX, bilhão, bilhões, bn, cinco, cinquenta, dez, dezessete, dezoito, dois, duas, mil, milhão, milhões, nove, oito, quarenta, quatro, quinze, seis, sessenta, sete, setenta, treze, trinta, três, um, uma, vinte.

The 2nd highest number of forms (1) was observed with the lemma “bilhão”: bilhões.

The 3rd highest number of forms (1) was observed with the lemma “três”: três.

NUM occurs with 2 features: Gender (274; 58% instances), Number (24; 5% instances)

NUM occurs with 4 feature-value pairs: Gender=Fem, Gender=Masc, Number=Plur, Number=Sing

NUM occurs with 6 feature combinations. The most frequent feature combination is Gender=Masc (240 tokens). Examples: dois, um, 1, 1492, 2010, 2012, 2014, 2015, 2017, 1980

Relations

NUM nodes are attached to their parents using 11 different relations: nummod (200; 43% instances), obl (100; 21% instances), nmod (85; 18% instances), appos (30; 6% instances), conj (14; 3% instances), obl:tmod (14; 3% instances), nsubj (11; 2% instances), obj (8; 2% instances), nsubj:pass (3; 1% instances), root (3; 1% instances), compound (1; 0% instances)

Parents of NUM nodes belong to 10 different parts of speech: NOUN (254; 54% instances), VERB (117; 25% instances), SYM (41; 9% instances), NUM (34; 7% instances), PROPN (10; 2% instances), ADP (3; 1% instances), ADV (3; 1% instances), (3; 1% instances), ADJ (2; 0% instances), PRON (2; 0% instances)

194 (41%) NUM nodes are leaves.

161 (34%) NUM nodes have one child.

76 (16%) NUM nodes have two children.

38 (8%) NUM nodes have three or more children.

The highest child degree of a NUM node is 5.

Children of NUM nodes are attached using 14 different relations: case (186; 43% instances), punct (74; 17% instances), nmod (68; 16% instances), advmod (33; 8% instances), nummod (19; 4% instances), cc (13; 3% instances), det (13; 3% instances), conj (12; 3% instances), acl:relcl (5; 1% instances), cop (5; 1% instances), nsubj (5; 1% instances), amod (2; 0% instances), obl:tmod (1; 0% instances), parataxis (1; 0% instances)

Children of NUM nodes belong to 12 different parts of speech: ADP (188; 43% instances), PUNCT (74; 17% instances), NOUN (66; 15% instances), NUM (34; 8% instances), ADV (32; 7% instances), CCONJ (13; 3% instances), DET (13; 3% instances), AUX (5; 1% instances), ADJ (4; 1% instances), VERB (4; 1% instances), PRON (2; 0% instances), PROPN (2; 0% instances)