|Open class words||Closed class words||Other|
Adjectives are words that typically modify nouns and specify their properties or attributes. They may also function as predicates, as in
O carro é verde. “The car is green.”
Adjectives in Portuguese agree in number and gender with the noun they modify, e.g. a casa amarela (femenine singular), as casas amarelas (femenine plural).
To conform to the UD guidelines, possessive adjectives are handled as determiners DET.
- grande “big”
- velho “old”
- verde “green”
- incompreensível “incomprehensible”
- primeiro “first”, segundo “second”, terceiro “third”
Adposition is a cover term for prepositions and postpositions. Adpositions belong to a closed set of items that occur before (preposition) or after (postposition) a complement composed of a noun phrase, noun, pronoun, or clause that functions as a noun phrase, and that form a single structure with the complement to express its grammatical and semantic relation to another unit within a clause.
In many languages, as Portuguese, adpositions can take the form of fixed multiword expressions, such as graças a, por causa de (“thanks to”, “because of”).
- em, de
- para, a (preposition)
Adverbs are words that typically modify verbs for such categories as time, place, direction or manner. They may also modify adjectives (as in claramente falso “clearly false”), other adverbs (as in muito brevemente “very briefly”) or even nouns / pronouns (as in apenas você “only you”).
- muito “very”
- bem “well”
- exatamente “exactly”
- amanhã “tomorrow”
- acima, abaixo “above, below”
- interrogative or exclamative adverbs: onde, quando, como, por que “where, when, how, why”
- demonstrative adverbs: aqui, ali, agora, depois “here, there, now, after”
- totality adverbs: sempre “always”
- negative adverbs: nunca, sem “never, without”
AUX: auxiliary verb
An auxiliary verb is a verb that accompanies the lexical verb of a verb phrase and expresses grammatical distinctions not carried by the lexical verb, such as person, number, tense, mood, aspect, and voice.
- Tense auxiliary: ir (futuro perifrástico),
- Modal auxiliary (+ infinitive): poder, dever, continuar,
- Passive auxiliary: ser, ter, ir.
CONJ: coordinating conjunction
A coordinating conjunction is a word that links words or larger constituents without syntactically subordinating one to the other and expresses a semantic relationship between them.
For subordinating conjunctions, see SCONJ.
Determiners are words that modify nouns or noun phrases and express the reference of the noun phrase in context. That is, a determiner may indicate whether the noun is referring to a definite or indefinite element of a class, to a closer or more distant element, to an element belonging to a specified person or thing, to a particular number or quantity, etc.
In Portuguese corpora, numerals are not tagged as
DET. In a noun phrase such as “os cinco mortos” (the five dead [people]), only “os” is tagged as
- articles (a closed class indicating definiteness, specificity or givenness): o, a, os, as
- possessive determiners: meu, teu, seu, minha, meus, dele, nosso
- demonstrative determiners: este, isto, esta, aquele,
- interrogative determiners: qual
- relative determiners: que
- quantity/quantifier determiners: nenhum, todos.
An interjection is a word that is used most often as an exclamation or part of an exclamation. It typically expresses an emotional reaction, is not syntactically related to other accompanying expressions, and may include a combination of sounds not otherwise found in the language. In Portuguese, we have interjections that are multi word expressions, as Deus me livre and pois é.
Note that words primarily belonging to another part of speech retains their original category when used in exclamations. For example, God is a NOUN even in exclamatory uses. (This is an issue in our current version.)
As a special case of interjections, we recognize feedback particles such as taí, não, etc.
Nouns are a part of speech typically denoting a person, place, thing, animal or idea.
- menina “girl”
- gato “cat”
- árvore “tree”
- ar “air”
- beleza “beauty”
A numeral is a word, functioning most typically as a determiner, adjective or pronoun, that expresses a number and a relation to the number, such as quantity, sequence, frequency or fraction.
Note that cardinal numerals are covered by
NUM whether they are used
as determiners or not (as in Windows Sete) and whether they
are expressed as words (quatro), digits (4) or Roman numerals
(IV). Other words functioning as determiners (including quantifiers
such as muito and pouco) are tagged DET.
Note that there are words that may be traditionally called numerals in
some languages (e.g. Czech and Portuguese) but which are not tagged
Such non-cardinal numerals belong to other parts of speech in the universal
tagging scheme, based mainly on syntactic criteria.
These are adjectives (primeiro, segundo, terceiro) (ordinal numerals)
or adverbs (mais_uma_vez “once more”; pela_primeira_vez “for the first time”).
- 0, 1, 2, 3, 4, 5, 2014, 1000000, 3.14159265359
- um, dois, três, trinta e sete
- I, II, III, IV, V, MMXIV
PART is used to tag prefixes that form complex words, but not compounds. In
ex-presidente, anti-capitalista, vice-diretor, pós-graduação, the morphemes
ex-, anti-, vice-, pós- should be tagged as
PART. Note that when one uses one of those prefixes alone (in a sentence as
Minha pós não acaba nunca. (My post-grad never ends.)) “pós” still stands for “pós-graduação”. This is different from compound words, such as
norte-americano, meio-campo, porta-voz, in which there is no particle and one cannot use only the prefix to recall the entire sense of the compound. Weekday names, such as
segunda-feira, are analysed as compound words, even if the first part is used for the whole e.g.
Essa quarta, sem falta (This Wednesday, without failing.). Words such as
fim-de-semana, a partir de, de novo are
MWEs and their elements should not be tagged as
This means that prefixed words should be split in the tokenization step. Note that hyphenation is still a big issue here, since many of those complex words formed by particles would not necessarily be split by a hyphen. Hyphenation is discussed in the new Regulation of Portuguese Orthography (2009) and some specific cases are explictly ruled: vice- and ex- always come with hyphen. But not all cases are specified and many dictionaries (and old corpora) carry both forms
Part is also used for negative particles, as
não, nem in predicative contexts. Note that negative adverbs, as
nunca, jamais are still tagged as
Negative particles: não, nem
Prefixes: anti-, ex-, pós-, vice-, primeiro-, pró-, infra-
Pronouns are words that substitute for nouns or noun phrases, whose meaning is recoverable from the linguistic or extralinguistic context.
Lemmatization rules = ?
clitic pronouns: se, me, te, lhe (including reflexive pronouns)
demonstrative pronouns: isto, esse, aquilo
personal pronouns: eu, tu, ele, vocês
indefinite pronouns: um, outro, qualquer
possessive pronouns: meu, seu, dele
interrogative pronouns: que, quanto, qual
relative pronouns: que, cujo, qual
totality pronouns: todo, todas
negative pronouns: nenhum, ninguém
PROPN: proper noun
A proper noun is a noun (or nominal content word) that is the name (or part of the name) of a specific individual, place, or object.
PROPN is only used for the subclass of nouns that are used
as names and that often exhibit special syntactic properties. When other
phrases or sentences are used as names, the component words retain
their original tags. For example, in Cat on a Hot Tin Roof, Cat is
NOUN, on is ADP, a is DET, etc.
However, for now, the Portuguese corpora are not very consistently annotated, as many proper nouns (that are mwes)
are not split.
Acronyms of proper nouns, such as EUA and NATO, should be tagged
Even if they contain numbers (as in various product names), they are tagged
PROPN and not SYM:
130XE, DC10, DC-10.
However, if the token consists entirely of digits (like 7 in Windows 7), it is tagged NUM.
- Maria, João
- Londres, Goiânia
- ONG, EUA
Punctuation marks are non-alphabetical characters and character groups used in many languages to delimit linguistic units in printed text.
Punctuation is not taken to include logograms such as $, %, and §, which are instead tagged as SYM.
- Period: .
- Comma: ,
- Parentheses: ()
- Quotes: «, », “
SCONJ: subordinating conjunction
A subordinating conjunction is a conjunction that links constructions by making one of them a constituent of the other. The subordinating conjunction typically marks the incorporated constituent which has the status of a (subordinate) clause.
For coordinating conjunctions, see CONJ.
- que as in Ele disse que ele viria.
A symbol is a word-like entity that differs from ordinary words by form, function, or both.
Many symbols are or contain special non-alphanumeric characters, similarly to punctuation. What makes them different from punctuation is that they can be substituted by normal words. This involves all currency symbols, e.g. $ 75 is identical to seventy-five dollars.
Mathematical operators form another group of symbols.
Another group of symbols is emoticons and emoji.
Strings that consists entirely of alphanumeric characters are not
symbols but they may be proper nouns: 130XE, DC10; others
may be tagged
PROPN (rather than
SYM) even if they contain special
characters: DC-10. Similarly, abbreviations for single words are not
symbols but are assigned the part of speech of the full form. For
example, Sr. (senhor), kg (kilograma), km (quilômetro), Dr
(doutor) should be tagged nouns. Acronyms for proper names
such as PT and IBM should be tagged as proper nouns.
Characters used as bullets in itemized lists (•, ‣) and parentheses are not symbols, they are punctuation.
- $, %, §, ©
- +, −, ×, ÷, =, <, >
- :), ♥‿♥, 😝
- email@example.com, http://universaldependencies.org/, 1-800-COMPANY
A verb is a member of the syntactic class of words that typically signal events and actions, can constitute a minimal predicate in a clause, and govern the number and types of other constituents which may occur in the clause. Verbs are often associated with grammatical categories like tense, mood, aspect and voice, which can either be expressed inflectionally or using auxilliary verbs or particles.
Note that the
VERB tag covers main verbs (content verbs) and
copulas but it does not cover auxiliary verbs, for which there is
the AUX tag. Auxiliares are verbs that are used in verb phrases expressing
only grammatical function (as tense or mood) and do not have semantic content.
For example, estava comendo (“was eating”), where “estar/be” is the auxiliary verb.
Portuguese traditional grammar also considers as auxiliary some verbs that in English would not be considered so, such as começar, acabar, permanecer (“begin, finish, stay”) as in começou a fazer (“began to do”), where “começar/begin” is auxiliar.
Note that we can have more than one auxiliary in a verbal phrase. A simple example would be tendo sido nomeado (“had been appointed”) where both “tendo” and “sido” are auxiliaries. Again a Portuguese only example would be parece estar a influenciar, (“seems to be influencing”), where “parece” and “estar” would be tagged as auxiliaries.
Note that participles are word forms that may share properties and
usage of adjectives and verbs. Depending on language and context, they
may be classified as either
VERB or ADJ, e.g. “nomeado/appointed”.
Gerunds (“comendo/eating”) and infinitives (“nomear/appoint”) are classified as
- correr, comer
- correu, comia
- correndo, comendo
!– ### References - Câmara Jr (1979, p. 163-170) for traditional Portuguese grammar - Wachowitz, T. C. AUXILIARY AND ASPECTUALIZER VERBS: SOME SYNTACTIC AND SEMANTIC DISTINCTIONS, 2007, for a perspective more aligned with English grammar
This document is a placeholder for the language-specific documentation