home edit page issue tracker

This page pertains to UD version 2.

Treebank Statistics: UD_Portuguese-PUD: POS Tags: NUM

There are 1 NUM lemmas (4%), 238 NUM types (4%) and 471 NUM tokens (2%). Out of 14 observed tags, the rank of NUM is: 9 in number of lemmas, 5 in number of types and 11 in number of tokens.

The 10 most frequent NUM lemmas: _

The 10 most frequent NUM types: dois, um, três, duas, milhões, quatro, uma, 10, 3, seis

The 10 most frequent ambiguous lemmas: _ (NOUN 4636, ADP 2571, PUNCT 2547, VERB 2512, DET 2070, ADJ 1554, PROPN 1352, PRON 910, ADV 841, CCONJ 578, NUM 471, AUX 328, SYM 34, X 9)

The 10 most frequent ambiguous types: um (DET 213, NUM 20, NOUN 3), uma (DET 186, NUM 8, NOUN 1), bilhões (NUM 3, NOUN 1)

Morphology

The form / lemma ratio of NUM is 238.000000 (the average of all parts of speech is 228.814815).

The 1st highest number of forms (238) was observed with the lemma “_”: $1,5, $1.4, $103.7, $15.000, $25,000, $5,000, 0, 1, 1,4, 1,5, 1.165, 1.335, 1.365, 1.5, 10, 10,000, 10.000, 100, 100.000, 1000, 1072, 1075, 10:00, 11, 12, 12,000, 120, 1200, 125, 1340, 1350, 137, 1399, 14, 1415, 1492, 15, 15,001, 15.5, 1519, 1530, 1538, 1563, 1566, 16, 16.500, 1600, 1610, 1632, 168.000, 17, 1770, 1777, 1794, 18, 1820, 1832, 1839, 1842, 1856, 1858, 1860, 1879, 1882, 1886, 1887, 1896, 19, 19,999, 1900, 1903, 1904, 1911, 1912, 1913, 1914, 1916, 1917, 1918, 1925, 1926, 1927, 1928, 1933, 1945, 1947, 1948, 1950, 1952, 1954, 1955, 1960, 1961, 1962, 1969, 1970, 1973, 1975, 1976, 1977, 1979, 1980, 1981, 1984, 1987, 1988, 1990, 1991, 1992, 1993, 1994, 1996, 1997, 1998, 2, 2.900, 20, 200, 2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011, 2012, 2013, 2013-2014, 2014, 2015, 2015-2016, 2016, 2017, 2019, 2020, 2035, 2050, 21, 221, 23.45, 24, 25, 27, 28, 3, 3,000, 3.000, 30, 31, 328, 33, 330, 330.000, 3300, 34, 35.000, 352, 36, 360, 363, 367, 393, 4, 40, 400, 42, 45, 49, 5, 5,000, 5,7, 5.000, 50, 500, 512-511, 53, 550, 56, 6, 6.000, 60, 600.000, 62, 66, 6:30, 7, 7,5, 70, 700, 71, 760, 8, 80, 830, 833, 84, 846, 9, 90, X, XIII, XIV, XV, XVI, XX, bilhão, bilhões, bn, cinco, cinquenta, dez, dezessete, dezoito, dois, duas, mil, milhão, milhões, nove, oito, quarenta, quatro, quinze, seis, sessenta, sete, setenta, treze, trinta, três, um, uma, vinte.

NUM occurs with 2 features: Gender (274; 58% instances), Number (24; 5% instances)

NUM occurs with 4 feature-value pairs: Gender=Fem, Gender=Masc, Number=Plur, Number=Sing

NUM occurs with 6 feature combinations. The most frequent feature combination is Gender=Masc (240 tokens). Examples: dois, um, 1, 1492, 2010, 2012, 2014, 2015, 2017, 1980

Relations

NUM nodes are attached to their parents using 12 different relations: nummod (195; 41% instances), obl (101; 21% instances), nmod (85; 18% instances), appos (33; 7% instances), obl:tmod (14; 3% instances), conj (13; 3% instances), nsubj (11; 2% instances), obj (9; 2% instances), root (5; 1% instances), nsubj:pass (3; 1% instances), case (1; 0% instances), compound (1; 0% instances)

Parents of NUM nodes belong to 11 different parts of speech: NOUN (257; 55% instances), VERB (118; 25% instances), SYM (34; 7% instances), NUM (33; 7% instances), PROPN (9; 2% instances), (5; 1% instances), ADJ (4; 1% instances), ADP (3; 1% instances), ADV (3; 1% instances), PRON (3; 1% instances), CCONJ (2; 0% instances)

196 (42%) NUM nodes are leaves.

157 (33%) NUM nodes have one child.

77 (16%) NUM nodes have two children.

41 (9%) NUM nodes have three or more children.

The highest child degree of a NUM node is 5.

Children of NUM nodes are attached using 15 different relations: case (187; 42% instances), punct (77; 17% instances), nmod (61; 14% instances), advmod (44; 10% instances), nummod (21; 5% instances), cc (12; 3% instances), det (12; 3% instances), conj (11; 2% instances), cop (6; 1% instances), nsubj (6; 1% instances), acl:relcl (5; 1% instances), amod (2; 0% instances), appos (1; 0% instances), obl:tmod (1; 0% instances), parataxis (1; 0% instances)

Children of NUM nodes belong to 12 different parts of speech: ADP (190; 43% instances), PUNCT (77; 17% instances), NOUN (70; 16% instances), ADV (33; 7% instances), NUM (33; 7% instances), CCONJ (14; 3% instances), DET (12; 3% instances), AUX (6; 1% instances), ADJ (4; 1% instances), VERB (4; 1% instances), PRON (2; 0% instances), PROPN (2; 0% instances)