home edit page issue tracker

This page pertains to UD version 2.

Treebank Statistics: UD_Tupinamba-TuDeT: POS Tags: NOUN

There are 554 NOUN lemmas (45%), 954 NOUN types (49%) and 1436 NOUN tokens (32%). Out of 14 observed tags, the rank of NOUN is: 1 in number of lemmas, 1 in number of types and 1 in number of tokens.

The 10 most frequent NOUN lemmas: _, iko, eko, aβa, jar, maʔe, apɨaβ, tupã, so, sɨ

The 10 most frequent NOUN types: aβa, janejara, maʔe, taβa, paʔi, Tupã, apɨaβa, cruz, seko, teko

The 10 most frequent ambiguous lemmas: _ (NOUN 87, VERB 34, PUNCT 12, ADP 9, PRON 9, PROPN 9, PART 6, ADV 5, NUM 2, DET 1, X 1), iko (NOUN 43, VERB 27, DET 5, ADV 1), aβa (NOUN 37, PRON 7), jar (NOUN 28, VERB 5, ADV 1), maʔe (NOUN 22, PRON 7, INTJ 1), so (VERB 32, NOUN 16), mojaŋ (NOUN 13, VERB 4), awsuβ (VERB 13, NOUN 12), ereko (NOUN 12, VERB 4), poʃɨ (NOUN 12, VERB 1)

The 10 most frequent ambiguous types: aβa (NOUN 20, PRON 2), maʔe (NOUN 13, PROPN 1), Tupã (PROPN 35, NOUN 12), kujã (NOUN 9, PART 1), São (NOUN 6, PROPN 1, PUNCT 1), marã (ADV 7, NOUN 5, PRON 2), Jesu (NOUN 2, PROPN 1), kotɨ (ADP 3, NOUN 2), pe (NOUN 2, PART 1), βeβe (NOUN 2, VERB 1)

Morphology

The form / lemma ratio of NOUN is 1.722022 (the average of all parts of speech is 1.577170).

The 1st highest number of forms (82) was observed with the lemma “_”: Nasawsuβarɨpɨramo, Naʃeremimotara, Perenosema, Takwara, apɨaβ, apɨaβaiβa, apɨaβaíβa, atɨraβeβo, aíβa, culpa, iatõjmɨreʔɨma, ijaʔo, ijukasarama, ikajemi, ikawĩwasuβaʔe, imoerapwanɨmɨra, imojarɨpɨ́ramo, imoperepereβawera, imoreʔɨmara, inupãsawera, ipira, ipotasape, ipoʃɨpwera, jemeʔeŋa, katupaβẽ, kɨreʔɨmβawera, manemwera, maramojaŋape, maraneʔɨma, maʔe, maʔeaiβa, mojaŋawama, monɨaβo, neratãŋatu, nererupa, nesawsuβa, nijɨpɨj, oarõanamo, oaʔo, oguβa, omara, omaramojãβaʔepwera, omoʔaŋekoaime, oreamotareʔɨmara, orewasemaβa, oreɨβɨjme, peposaŋa, pepɨsɨrõ, pepɨsɨrõawama, perekomojaŋaβa, perekorama, perekoreme, peremimojaŋa, peremimojaŋwama, pererekoaíme, pesemawama, porarasara, poreteramo, pɨtɨβõsara, rekoreme, rɨrɨja, sejtɨkite, sekow, soβajɨwaramo, sɨrɨki, tamanwa, tuisamojaŋape, tɨmawereʔɨma, upiarwera, uʔuβorwera, ɨsɨkatãsɨapwana, ɨβɨtu, ʃejeʔeŋa, ʃemomwerapane, ʃepapera, ʃepo, ʃerejtɨk, ʃereminuʔune, ʃererekow, ʃerorɨkatu, ʃerorɨβetene, ʃeruβisaβa.

The 2nd highest number of forms (19) was observed with the lemma “iko”: Ejmoiŋokatu, Sekote, janereko, moiŋow, nereko, oeko, ojkoβaʔe, pereko, perekopwera, perekoreme, reko, rekow, seko, sekoreme, sekow, serekow, teko, tekwara, ʃerekoape.

The 3rd highest number of forms (16) was observed with the lemma “eko”: Oeko, nereko, owekorama, pereko, reko, rekow, seko, sekopwera, sekoreme, sekow, serekow, teko, tekoara, tekopwera, ʃereko, ʃerekoape.

NOUN occurs with 33 features: Case (526; 37% instances), Rel (497; 35% instances), Number (209; 15% instances), Person (170; 12% instances), Nomzr (133; 9% instances), Person[psor] (120; 8% instances), NonFoc (92; 6% instances), Tense (80; 6% instances), Number[psor] (77; 5% instances), Reflex (61; 4% instances), Intens (49; 3% instances), Clusivity (43; 3% instances), Voice (41; 3% instances), Polarity (33; 2% instances), VerbForm (29; 2% instances), Mood (23; 2% instances), Int (19; 1% instances), Corf (18; 1% instances), Aspect (11; 1% instances), Degree (11; 1% instances), Priv (9; 1% instances), Hum (8; 1% instances), Red (6; 0% instances), Person[subj] (5; 0% instances), Dev (3; 0% instances), Emph (2; 0% instances), Number[subj] (2; 0% instances), Person[obj] (2; 0% instances), Animacy (1; 0% instances), Delib (1; 0% instances), Foreign (1; 0% instances), Poss (1; 0% instances), PronType (1; 0% instances)

NOUN occurs with 63 feature-value pairs: Animacy=Hum, Aspect=Iter, Aspect=Lus, Case=All, Case=Dat, Case=Loc, Case=Per, Case=Ref, Case=Tra, Case=Voc, Clusivity=Ex, Clusivity=In, Corf=Yes, Degree=Aug, Delib=Yes, Dev=Pass, Emph=Yes, Foreign=Yes, Hum=Yes, Int=Yes, Intens=Yes, Mood=Cnd, Mood=Irr, Mood=Per, Mood=Sub, Nomzr=Ag, Nomzr=CCirc, Nomzr=Circ, Nomzr=DevPass, Nomzr=Hab, Nomzr=Pas, Nomzr=Rel, NonFoc=Yes, Number=Plur, Number=Sing, Number[psor]=Plur, Number[psor]=Sing, Number[subj]=Sing, Person=1, Person=2, Person=3, Person[obj]=3, Person[psor]=1, Person[psor]=2, Person[psor]=3, Person[subj]=1, Polarity=Neg, Poss=Hum, Priv=Yes, PronType=Rcp, Red=Di, Reflex=Yes, Rel=Abs, Rel=Cont, Rel=Corf, Rel=Hum, Rel=NCont, Tense=Fut, Tense=Past, VerbForm=Ger, Voice=Cau, Voice=Mid, Voice=SCau

NOUN occurs with 324 feature combinations. The most frequent feature combination is _ (372 tokens). Examples: aβa, maʔe, paʔi, Tupã, cruz, judeus, kujã, muru, São, kawĩ

Relations

NOUN nodes are attached to their parents using 21 different relations: obl (279; 19% instances), nmod (253; 18% instances), root (223; 16% instances), obj (210; 15% instances), nsubj (111; 8% instances), conj (78; 5% instances), parataxis (73; 5% instances), appos (63; 4% instances), advcl (53; 4% instances), dep (25; 2% instances), xcomp (15; 1% instances), ccomp (12; 1% instances), acl (10; 1% instances), compound (10; 1% instances), vocative (7; 0% instances), discourse (5; 0% instances), nummod (3; 0% instances), amod (2; 0% instances), case (2; 0% instances), dislocated (1; 0% instances), iobj (1; 0% instances)

Parents of NOUN nodes belong to 11 different parts of speech: NOUN (625; 44% instances), VERB (508; 35% instances), (223; 16% instances), PROPN (55; 4% instances), PRON (12; 1% instances), ADP (4; 0% instances), ADV (4; 0% instances), NUM (2; 0% instances), DET (1; 0% instances), INTJ (1; 0% instances), PART (1; 0% instances)

575 (40%) NOUN nodes are leaves.

397 (28%) NOUN nodes have one child.

199 (14%) NOUN nodes have two children.

265 (18%) NOUN nodes have three or more children.

The highest child degree of a NOUN node is 12.

Children of NOUN nodes are attached using 24 different relations: punct (432; 23% instances), nmod (328; 18% instances), case (188; 10% instances), obl (157; 8% instances), nsubj (122; 7% instances), advmod (119; 6% instances), advcl (88; 5% instances), discourse (78; 4% instances), conj (67; 4% instances), parataxis (67; 4% instances), appos (52; 3% instances), obj (42; 2% instances), dep (33; 2% instances), det (22; 1% instances), compound (17; 1% instances), nummod (13; 1% instances), xcomp (12; 1% instances), cc (8; 0% instances), acl (6; 0% instances), mark (4; 0% instances), vocative (4; 0% instances), dislocated (3; 0% instances), amod (2; 0% instances), ccomp (1; 0% instances)

Children of NOUN nodes belong to 14 different parts of speech: NOUN (625; 34% instances), PUNCT (432; 23% instances), ADP (207; 11% instances), ADV (138; 7% instances), PRON (115; 6% instances), VERB (101; 5% instances), PART (85; 5% instances), PROPN (76; 4% instances), DET (59; 3% instances), NUM (10; 1% instances), INTJ (9; 0% instances), CCONJ (4; 0% instances), SCONJ (3; 0% instances), X (1; 0% instances)