PART

This is part of archived UD v1 documentation. See http://universaldependencies.org/ for the current version.

home pt/pos issue tracker

`PART`: particle

In Portuguese, PART is used to tag prefixes that form complex words, but not compounds. In ex-presidente, anti-capitalista, vice-diretor, pós-graduação, the morphemes ex-, anti-, vice-, pós- should be tagged as PART. Note that when one uses one of those prefixes alone (in a sentence as Minha pós não acaba nunca. (My post-grad never ends.)) “pós” still stands for “pós-graduação”. This is different from compound words, such as norte-americano, meio-campo, porta-voz, in which there is no particle and one cannot use only the prefix to recall the entire sense of the compound. Weekday names, such as segunda-feira, are analysed as compound words, even if the first part is used for the whole e.g. Essa quarta, sem falta (This Wednesday, without failing.). Words such as fim-de-semana, a partir de, de novo are MWEs and their elements should not be tagged as PART.

This means that prefixed words should be split in the tokenization step. Note that hyphenation is still a big issue here, since many of those complex words formed by particles would not necessarily be split by a hyphen. Hyphenation is discussed in the new Regulation of Portuguese Orthography (2009) and some specific cases are explictly ruled: vice- and ex- always come with hyphen. But not all cases are specified and many dictionaries (and old corpora) carry both forms anti-capitalista and anticapitalista.

Part is also used for negative particles, as não, nem in predicative contexts. Note that negative adverbs, as nunca, jamais are still tagged as ADV.

Examples:

Negative particles: não, nem

Prefixes: anti-, ex-, pós-, vice-, primeiro-, pró-, infra-

Treebank Statistics (UD_Portuguese)

There are 6 PART lemmas (0%), 6 PART types (0%) and 44 PART tokens (0%). Out of 17 observed tags, the rank of PART is: 15 in number of lemmas, 16 in number of types and 16 in number of tokens.

The 10 most frequent PART lemmas: não, anti-, ex, ex-, pré, pós

The 10 most frequent PART types: não, anti-, ex, ex-, pré-, pós

The 10 most frequent ambiguous lemmas: não (ADV 1343, PART 38, INTJ 9, NOUN 2)

The 10 most frequent ambiguous types: não (ADV 1205, PART 35, INTJ 3, NOUN 2)

não
- ADV 1205: Eu não me associo com moda .
- PART 35: Já não há o império de o mal para combater .
- INTJ 3: Não , não e não
- NOUN 2: Apesar de evitar dar um não definitivo , Marise deixou claro que deve recusar o convite de Brizola .

Morphology

The form / lemma ratio of PART is 1.000000 (the average of all parts of speech is 1.432674).

The 1st highest number of forms (1) was observed with the lemma “NÃO”: NÃO.

The 2nd highest number of forms (1) was observed with the lemma “anti-”: anti-.

The 3rd highest number of forms (1) was observed with the lemma “ex”: ex.

PART occurs with 2 features: Negative (39; 89% instances), pt-feat/Hyph (5; 11% instances)

PART occurs with 2 feature-value pairs: Hyph=Yes, Negative=Neg

PART occurs with 2 feature combinations. The most frequent feature combination is Negative=Neg (39 tokens). Examples: não

Relations

PART nodes are attached to their parents using 5 different relations: mwe (29; 66% instances), cc (7; 16% instances), nmod (5; 11% instances), advmod (2; 5% instances), neg (1; 2% instances)

Parents of PART nodes belong to 8 different parts of speech: ADV (24; 55% instances), NOUN (6; 14% instances), DET (5; 11% instances), VERB (4; 9% instances), PROPN (2; 5% instances), ADJ (1; 2% instances), NUM (1; 2% instances), SCONJ (1; 2% instances)

34 (77%) PART nodes are leaves.

8 (18%) PART nodes have one child.

1 (2%) PART nodes have two children.

1 (2%) PART nodes have three or more children.

The highest child degree of a PART node is 4.

Children of PART nodes are attached using 6 different relations: mwe (9; 64% instances), case (1; 7% instances), cc (1; 7% instances), conj (1; 7% instances), det (1; 7% instances), nmod (1; 7% instances)

Children of PART nodes belong to 4 different parts of speech: NOUN (11; 79% instances), ADP (1; 7% instances), CONJ (1; 7% instances), DET (1; 7% instances)

Treebank Statistics (UD_Portuguese-Bosque)

There are 4 PART lemmas (0%), 4 PART types (0%) and 4 PART tokens (0%). Out of 17 observed tags, the rank of PART is: 15 in number of lemmas, 15 in number of types and 16 in number of tokens.

The 10 most frequent PART lemmas: anti-, ex, pré, pós

The 10 most frequent PART types: anti-, ex, pré-, pós

The 10 most frequent ambiguous lemmas:

The 10 most frequent ambiguous types:

Morphology

The form / lemma ratio of PART is 1.000000 (the average of all parts of speech is 1.449059).

The 1st highest number of forms (1) was observed with the lemma “anti-”: anti-.

The 2nd highest number of forms (1) was observed with the lemma “ex”: ex.

The 3rd highest number of forms (1) was observed with the lemma “pré”: pré-.

PART occurs with 2 features: Gender (1; 25% instances), Number (1; 25% instances)

PART occurs with 2 feature-value pairs: Gender=Masc, Number=Sing

PART occurs with 2 feature combinations. The most frequent feature combination is _ (3 tokens). Examples: anti-, ex, pré-

Relations

PART nodes are attached to their parents using 1 different relations: dep (4; 100% instances)

Parents of PART nodes belong to 2 different parts of speech: NOUN (3; 75% instances), NUM (1; 25% instances)

2 (50%) PART nodes are leaves.

0 (0%) PART nodes have one child.

0 (0%) PART nodes have two children.

2 (50%) PART nodes have three or more children.

The highest child degree of a PART node is 4.

Children of PART nodes are attached using 5 different relations: punct (3; 43% instances), case (1; 14% instances), cc (1; 14% instances), conj (1; 14% instances), det (1; 14% instances)

Children of PART nodes belong to 5 different parts of speech: PUNCT (3; 43% instances), ADP (1; 14% instances), CONJ (1; 14% instances), DET (1; 14% instances), NOUN (1; 14% instances)

Treebank Statistics (UD_Portuguese-BR)

There are 1 PART lemmas (7%), 74 PART types (0%) and 748 PART tokens (0%). Out of 14 observed tags, the rank of PART is: 9 in number of lemmas, 12 in number of types and 13 in number of tokens.

The 10 most frequent PART lemmas: _

The 10 most frequent PART types: se, ex, vice, pré, auto, claro, latino, pós, recém, ai

The 10 most frequent ambiguous lemmas: _ (NOUN 57316, ADP 51928, PUNCT 42033, PROPN 32948, VERB 29700, DET 26122, ADJ 15107, CONJ 10984, ADV 9773, NUM 8491, PRON 7392, AUX 5242, PART 748, X 539)

The 10 most frequent ambiguous types: se (PRON 755, PART 392, CONJ 186, ADP 3, PROPN 1), ex (PART 145, X 1, NOUN 1), vice (PART 45, NOUN 11, ADJ 3), pré (PART 34, ADJ 1), claro (ADJ 28, PART 5, NOUN 2), latino (PART 7, ADJ 3), recém (PART 5, ADV 1), ai (PART 3, ADV 2), aí (ADV 13, PART 1), bem (ADV 140, NOUN 6, PART 2)

se
- PRON 755: Muitos clientes se anteciparam e garantiram as reservas .
- PART 392: Especula - se sobre a possibilidade de estar extinta .
- CONJ 186: ” Mas se precisasse , usaria sim “ , diisse .
- ADP 3: Se tiver , vamos atender , se não , vamos usar outros .
- PROPN 1: A experiência adquirida ao longo de 20 anos acaba de virar o livro “ Vá se drenar !
ex
- PART 145: No Twitter , a ex - BBB voltou a comentar .
- X 1: Os ministros Paulo Bernardo ( Comunicações ) e Gleisi Hoffmann ( Casa Civil ) discutirão nesta terça - feira ( 9 ) estratégia para tentar convencer o ex - presidente Lula a subir no palanque de Gustavo Fruet ( PDT ) no segundo turno da disputa pela Prefeitura de Curitiba .
- NOUN 1: A entojada se aproxima de Conrado bem na hora que ele está admirando a ex .
vice
- PART 45: A série de participações receberá os dez candidatos a vice - prefeito .
- NOUN 11: Agra seria o vice que Cássio tanto quis e nunca teve .
- ADJ 3: Entre os nomes cotados para receber o apoio do prefeito está o candidato a vice na chapa de Magalhães , Orly Gomes ( DEM ) , que é o mais forte deles .
pré
- PART 34: ” Estou muito feliz porque Tevez trabalhou muito bem na pré - temporada .
- ADJ 1: Tozo não chegou nem a viajar para Sairé , onde a equipe realiza pré - temporada .
claro
- ADJ 28: Quero deixar claro que estou longe de ser um atleta exemplar .
- PART 5: Como todos , claro que ele sabe quanto vai ser difícil seu trabalho .
- NOUN 2: Mas é claro que vejo com bons olhos o fato de que a minha música pode agradar pessoas que são fãs de outros estilos musicais “ , afirmou .
latino
- PART 7: A situação envolvia três terroristas latino - americanos que levavam explosivos para o Estádio do Maracanã .
- ADJ 3: Elzevirium – nome latino da prestigiada editora , ainda existente , Elsevier .
recém
- PART 5: O impacto das batidas lançou o recém - nascido , mesmo com o assento infantil , para dentro do porta - malas .
- ADV 1: No minuto seguinte , Léo Gago , que recém entrara na partida , cobrou falta de intermediária com força , no cantinho : golaço do Grêmio .
ai
- PART 3: Ai , ai , ai .
- ADV 2: Quando ela for adolescente e quiser , ai tudo bem , a decisão é dela “ .
aí
- ADV 13: Mais uma , estamos juntos aí na tão sonhada final !
- PART 1: E aí , o que acharam ?
bem
- ADV 140: Os gandharas foram um povo furioso , bem treinados na arte da guerra .
- NOUN 6: Aliás , isso é o que me realiza e me faz querer ser prefeito : o desejo de fazer o bem às pessoas .
- PART 2: E , pois bem , quem era Jack Riley ?

Morphology

The form / lemma ratio of PART is 74.000000 (the average of all parts of speech is 2514.000000).

The 1st highest number of forms (74) was observed with the lemma “_”: ’s, Agora, Avante, Cara, Desculpe, Nè, Ok, Olá, Oxalá, Sucesso, afro, ai, alvi, ante, anti, ar, arqui, atenção, auto, aí, bem, claro, co, contra, cyber, eba, então, ex, extra, foi, franco, germano, greco, grão, hein, hélio, in, infanto, infra, inter, intra, ir, latino, lá, mamilo, micro, multi, on, pan, para, pois, prático, pré, pró, pós, pô, público, recém, rs, s, se, su, sub, supra, tele, to, tá, ultra, utz, vice, viu, á, ão, é.

PART does not occur with any features.

Relations

PART nodes are attached to their parents using 18 different relations: expl (398; 53% instances), nmod (83; 11% instances), nsubj (51; 7% instances), conj (47; 6% instances), dep (38; 5% instances), amod (36; 5% instances), appos (34; 5% instances), dobj (27; 4% instances), root (12; 2% instances), advmod (7; 1% instances), nsubjpass (6; 1% instances), mark (2; 0% instances), parataxis (2; 0% instances), acl:relcl (1; 0% instances), advcl (1; 0% instances), cop (1; 0% instances), iobj (1; 0% instances), name (1; 0% instances)

Parents of PART nodes belong to 10 different parts of speech: VERB (488; 65% instances), NOUN (139; 19% instances), PROPN (40; 5% instances), ADJ (39; 5% instances), PART (18; 2% instances), ROOT (12; 2% instances), PRON (5; 1% instances), ADV (4; 1% instances), AUX (2; 0% instances), NUM (1; 0% instances)

461 (62%) PART nodes are leaves.

17 (2%) PART nodes have one child.

31 (4%) PART nodes have two children.

239 (32%) PART nodes have three or more children.

The highest child degree of a PART node is 13.

Children of PART nodes are attached using 22 different relations: punct (376; 32% instances), name (256; 22% instances), det (112; 9% instances), nmod (106; 9% instances), appos (98; 8% instances), case (90; 8% instances), conj (35; 3% instances), amod (28; 2% instances), cc (22; 2% instances), acl:relcl (12; 1% instances), acl:part (11; 1% instances), cop (11; 1% instances), det:poss (8; 1% instances), nsubj (8; 1% instances), advmod (5; 0% instances), nummod (3; 0% instances), advcl (1; 0% instances), expl (1; 0% instances), mark (1; 0% instances), mwe (1; 0% instances), parataxis (1; 0% instances), xcomp (1; 0% instances)

Children of PART nodes belong to 12 different parts of speech: PUNCT (376; 32% instances), NOUN (296; 25% instances), PROPN (178; 15% instances), DET (120; 10% instances), ADP (90; 8% instances), VERB (37; 3% instances), ADJ (33; 3% instances), CONJ (23; 2% instances), PART (18; 2% instances), ADV (6; 1% instances), NUM (5; 0% instances), PRON (5; 0% instances)

PART in other languages: [bg] [cs] [de] [el] [en] [es] [eu] [fa] [fi] [fr] [ga] [he] [hu] [it] [ja] [ko] [sv] [u]

PART: particle

Treebank Statistics (UD_Portuguese)

Morphology

Relations

Treebank Statistics (UD_Portuguese-Bosque)

Morphology

Relations

Treebank Statistics (UD_Portuguese-BR)

Morphology

Relations

`PART`: particle