home issue tracker

This page still pertains to UD version 1.

POS tags

Open class words Closed class words Other

ADJ: adjective


Adjectives are words that typically modify nouns and specify their properties or attributes. They may also function as predicates, as in

O carro é verde. “The car is green.”

Adjectives in Portuguese agree in number and gender with the noun they modify, e.g. a casa amarela (femenine singular), as casas amarelas (femenine plural).

To conform to the UD guidelines, possessive adjectives are handled as determiners DET.


edit ADJ

ADP: adposition


Adposition is a cover term for prepositions and postpositions. Adpositions belong to a closed set of items that occur before (preposition) or after (postposition) a complement composed of a noun phrase, noun, pronoun, or clause that functions as a noun phrase, and that form a single structure with the complement to express its grammatical and semantic relation to another unit within a clause.

In many languages, as Portuguese, adpositions can take the form of fixed multiword expressions, such as graças a, por causa de (“thanks to”, “because of”).


edit ADP

ADV: adverb


Adverbs are words that typically modify verbs for such categories as time, place, direction or manner. They may also modify adjectives (as in claramente falso “clearly false”), other adverbs (as in muito brevemente “very briefly”) or even nouns / pronouns (as in apenas você “only you”).


edit ADV

AUX: auxiliary verb


An auxiliary verb is a verb that accompanies the lexical verb of a verb phrase and expresses grammatical distinctions not carried by the lexical verb, such as person, number, tense, mood, aspect, and voice.


edit AUX

CCONJ: coordinating conjunction


A coordinating conjunction is a word that links words or larger constituents without syntactically subordinating one to the other and expresses a semantic relationship between them.

For subordinating conjunctions, see SCONJ.


edit CCONJ

DET: determiner


Determiners are words that modify nouns or noun phrases and express the reference of the noun phrase in context. That is, a determiner may indicate whether the noun is referring to a definite or indefinite element of a class, to a closer or more distant element, to an element belonging to a specified person or thing, to a particular number or quantity, etc.

In Portuguese corpora, numerals are not tagged as DET. In a noun phrase such as “os cinco mortos” (the five dead [people]), only “os” is tagged as DET.


edit DET

INTJ: interjection


An interjection is a word that is used most often as an exclamation or part of an exclamation. It typically expresses an emotional reaction, is not syntactically related to other accompanying expressions, and may include a combination of sounds not otherwise found in the language. In Portuguese, we have interjections that are multi word expressions, as Deus me livre and pois é.

Note that words primarily belonging to another part of speech retains their original category when used in exclamations. For example, God is a NOUN even in exclamatory uses. (This is an issue in our current version.)

As a special case of interjections, we recognize feedback particles such as taí, não, etc.


edit INTJ

NOUN: noun


Nouns are a part of speech typically denoting a person, place, thing, animal or idea.

The NOUN tag is intended for common nouns only. See PROPN for proper nouns and PRON for pronouns.

Portuguese nouns have the features Gender and Number.


edit NOUN

NUM: numeral


A numeral is a word, functioning most typically as a determiner, adjective or pronoun, that expresses a number and a relation to the number, such as quantity, sequence, frequency or fraction.

Note that cardinal numerals are covered by NUM whether they are used as determiners or not (as in Windows Sete) and whether they are expressed as words (quatro), digits (4) or Roman numerals (IV). Other words functioning as determiners (including quantifiers such as muito and pouco) are tagged DET.

Note that there are words that may be traditionally called numerals in some languages (e.g. Czech and Portuguese) but which are not tagged NUM. Such non-cardinal numerals belong to other parts of speech in the universal tagging scheme, based mainly on syntactic criteria. These are adjectives (primeiro, segundo, terceiro) (ordinal numerals) or adverbs (mais_uma_vez “once more”; pela_primeira_vez “for the first time”).


edit NUM

PART: particle

In Portuguese, PART is used to tag prefixes that form complex words, but not compounds. In ex-presidente, anti-capitalista, vice-diretor, pós-graduação, the morphemes ex-, anti-, vice-, pós- should be tagged as PART. Note that when one uses one of those prefixes alone (in a sentence as Minha pós não acaba nunca. (My post-grad never ends.)) “pós” still stands for “pós-graduação”. This is different from compound words, such as norte-americano, meio-campo, porta-voz, in which there is no particle and one cannot use only the prefix to recall the entire sense of the compound. Weekday names, such as segunda-feira, are analysed as compound words, even if the first part is used for the whole e.g. Essa quarta, sem falta (This Wednesday, without failing.). Words such as fim-de-semana, a partir de, de novo are MWEs and their elements should not be tagged as PART.

This means that prefixed words should be split in the tokenization step. Note that hyphenation is still a big issue here, since many of those complex words formed by particles would not necessarily be split by a hyphen. Hyphenation is discussed in the new Regulation of Portuguese Orthography (2009) and some specific cases are explictly ruled: vice- and ex- always come with hyphen. But not all cases are specified and many dictionaries (and old corpora) carry both forms anti-capitalista and anticapitalista.

Part is also used for negative particles, as não, nem in predicative contexts. Note that negative adverbs, as nunca, jamais are still tagged as ADV.


Negative particles: não, nem

Prefixes: anti-, ex-, pós-, vice-, primeiro-, pró-, infra-

edit PART

PRON: pronoun

Pronouns are words that substitute for nouns or noun phrases, whose meaning is recoverable from the linguistic or extralinguistic context.

Lemmatization rules = ?


clitic pronouns: se, me, te, lhe (including reflexive pronouns)

demonstrative pronouns: isto, esse, aquilo

personal pronouns: eu, tu, ele, vocês

indefinite pronouns: um, outro, qualquer

possessive pronouns: meu, seu, dele

interrogative pronouns: que, quanto, qual

relative pronouns: que, cujo, qual

totality pronouns: todo, todas

negative pronouns: nenhum, ninguém

edit PRON

PROPN: proper noun


A proper noun is a noun (or nominal content word) that is the name (or part of the name) of a specific individual, place, or object.

Note that PROPN is only used for the subclass of nouns that are used as names and that often exhibit special syntactic properties. When other phrases or sentences are used as names, the component words retain their original tags. For example, in Cat on a Hot Tin Roof, Cat is NOUN, on is ADP, a is DET, etc. However, for now, the Portuguese corpora are not very consistently annotated, as many proper nouns (that are mwes) are not split.

Acronyms of proper nouns, such as EUA and NATO, should be tagged PROPN. Even if they contain numbers (as in various product names), they are tagged PROPN and not SYM: 130XE, DC10, DC-10. However, if the token consists entirely of digits (like 7 in Windows 7), it is tagged NUM.


edit PROPN

PUNCT: punctuation


Punctuation marks are non-alphabetical characters and character groups used in many languages to delimit linguistic units in printed text.

Punctuation is not taken to include logograms such as $, %, and §, which are instead tagged as SYM.



edit PUNCT

SCONJ: subordinating conjunction


A subordinating conjunction is a conjunction that links constructions by making one of them a constituent of the other. The subordinating conjunction typically marks the incorporated constituent which has the status of a (subordinate) clause.

For coordinating conjunctions, see CCONJ.


edit SCONJ

SYM: symbol


A symbol is a word-like entity that differs from ordinary words by form, function, or both.

Many symbols are or contain special non-alphanumeric characters, similarly to punctuation. What makes them different from punctuation is that they can be substituted by normal words. This involves all currency symbols, e.g. $ 75 is identical to seventy-five dollars.

Mathematical operators form another group of symbols.

Another group of symbols is emoticons and emoji.

Strings that consists entirely of alphanumeric characters are not symbols but they may be proper nouns: 130XE, DC10; others may be tagged PROPN (rather than SYM) even if they contain special characters: DC-10. Similarly, abbreviations for single words are not symbols but are assigned the part of speech of the full form. For example, Sr. (senhor), kg (kilograma), km (quilômetro), Dr (doutor) should be tagged nouns. Acronyms for proper names such as PT and IBM should be tagged as proper nouns.

Characters used as bullets in itemized lists (•, ‣) and parentheses are not symbols, they are punctuation.


edit SYM

VERB: verb


A verb is a member of the syntactic class of words that typically signal events and actions, can constitute a minimal predicate in a clause, and govern the number and types of other constituents which may occur in the clause. Verbs are often associated with grammatical categories like tense, mood, aspect and voice, which can either be expressed inflectionally or using auxilliary verbs or particles.

Note that the VERB tag covers main verbs (content verbs) and copulas but it does not cover auxiliary verbs, for which there is the AUX tag. Auxiliares are verbs that are used in verb phrases expressing only grammatical function (as tense or mood) and do not have semantic content. For example, estava comendo (“was eating”), where “estar/be” is the auxiliary verb.

Portuguese traditional grammar also considers as auxiliary some verbs that in English would not be considered so, such as começar, acabar, permanecer (“begin, finish, stay”) as in começou a fazer (“began to do”), where “começar/begin” is auxiliar.

Note that we can have more than one auxiliary in a verbal phrase. A simple example would be tendo sido nomeado (“had been appointed”) where both “tendo” and “sido” are auxiliaries. Again a Portuguese only example would be parece estar a influenciar, (“seems to be influencing”), where “parece” and “estar” would be tagged as auxiliaries.

Note that participles are word forms that may share properties and usage of adjectives and verbs. Depending on language and context, they may be classified as either VERB or ADJ, e.g. “nomeado/appointed”.

Gerunds (“comendo/eating”) and infinitives (“nomear/appoint”) are classified as VERB.


!– ### References - Câmara Jr (1979, p. 163-170) for traditional Portuguese grammar - Wachowitz, T. C. AUXILIARY AND ASPECTUALIZER VERBS: SOME SYNTACTIC AND SEMANTIC DISTINCTIONS, 2007, for a perspective more aligned with English grammar

edit VERB

X: other

This document is a placeholder for the language-specific documentation for X.

edit X