home edit page issue tracker

This page pertains to UD version 2.

Treebank Statistics: UD_Bororo-BDT: POS Tags: PUNCT

There are 33 PUNCT lemmas (0%), 98 PUNCT types (1%) and 32183 PUNCT tokens (20%). Out of 17 observed tags, the rank of PUNCT is: 14 in number of lemmas, 10 in number of types and 2 in number of tokens.

The 10 most frequent PUNCT lemmas: ., ,, !, …, ?, ;, _, (, :, )

The 10 most frequent PUNCT types: ., ,, !, …, ?, :, (, ), ;, -

The 10 most frequent ambiguous lemmas: _ (NOUN 5910, VERB 3398, ADV 1856, PRON 1359, ADP 1308, PROPN 1165, X 926, PUNCT 459, DET 149, INTJ 122, SCONJ 55, CCONJ 30, PART 29), ) (PUNCT 230, NOUN 4)

The 10 most frequent ambiguous types:

Morphology

The form / lemma ratio of PUNCT is 2.969697 (the average of all parts of speech is 1.360106).

The 1st highest number of forms (70) was observed with the lemma “_”: …, …‘kororo’., …‘pa’., …‘tuku’., :, ;, ?, Bororo’., Erore’puku’., Taci’., aoraji’pa’‘pa’., bakowuto’taci’., barigu’cu’., barigu’kuci’., baruto’taci’., bea’., ceebu’ta’., cere’rezado’‘terçoji’., d’d’d’d’., ei’taci’., etagedudo’taci’., etugu’tuku’., hu’uo’uo’uo’., ikajeje’tuku’., inagodo’jiwudo’., ipiji’taci’., ire’comunhaodo’., ire’confessado’., iwo’confessado’., jamedu’pu’., ji’pa’., ji’ta’., jo’., kae’., kae’tu’., kajeje’taci’., kajeje’tuku’., keje’bea’., keje’ta’., keje’tuku’., kowuje’tuku’., ku’ku’ku’., kuri’cuku’., kuri’pao’., kuri’po’., kuri’tuku’., oiagi’pa’., oiko’taci’., p’p’p’p’., pa’., pado’ta’., padure’kuri’., po’., poboto’c’., poboto’pu’., pu’., pugeje’co’., pugeje’ta’., pugeje’taci’., pui’pao’., rugadu’taci’., rurudo’tuku’‘tuku’‘tuku’., t’t’t’t’t’t’t’., ta’., taiado’paci’., to’tuku’., tu’., tumugudo’taci’., turagojedo’ta’., turagojedo’taci’..

The 2nd highest number of forms (2) was observed with the lemma “.”: ., ….

The 3rd highest number of forms (2) was observed with the lemma “;”: ,, ;.

PUNCT does not occur with any features.

Relations

PUNCT nodes are attached to their parents using 2 different relations: punct (32114; 100% instances), root (69; 0% instances)

Parents of PUNCT nodes belong to 17 different parts of speech: VERB (22473; 70% instances), NOUN (5062; 16% instances), ADV (1731; 5% instances), PROPN (688; 2% instances), X (511; 2% instances), INTJ (377; 1% instances), PRON (371; 1% instances), AUX (234; 1% instances), ADP (164; 1% instances), NUM (142; 0% instances), PART (102; 0% instances), DET (74; 0% instances), (69; 0% instances), ADJ (66; 0% instances), CCONJ (51; 0% instances), PUNCT (48; 0% instances), SCONJ (20; 0% instances)

32109 (100%) PUNCT nodes are leaves.

39 (0%) PUNCT nodes have one child.

21 (0%) PUNCT nodes have two children.

14 (0%) PUNCT nodes have three or more children.

The highest child degree of a PUNCT node is 4.

Children of PUNCT nodes are attached using 12 different relations: punct (48; 38% instances), nsubj (26; 20% instances), advmod (17; 13% instances), parataxis (12; 9% instances), obl (11; 9% instances), conj (5; 4% instances), nmod (3; 2% instances), dep (2; 2% instances), advcl (1; 1% instances), cc (1; 1% instances), discourse (1; 1% instances), flat (1; 1% instances)

Children of PUNCT nodes belong to 11 different parts of speech: PUNCT (48; 38% instances), ADV (19; 15% instances), NOUN (19; 15% instances), PROPN (13; 10% instances), PRON (11; 9% instances), VERB (9; 7% instances), X (4; 3% instances), AUX (2; 2% instances), ADP (1; 1% instances), CCONJ (1; 1% instances), NUM (1; 1% instances)