home issue tracker

POS tags

Open class words Closed class words Other

ADJ: adjective

The English ADJ is currently precisely the union of PTB JJ, JJR, and JJS.

edit ADJ

ADP: adposition

The English ADP covers the Penn Treebank RP, and a subset of uses of IN (when not a complementizer or subordinating conjunction) and TO (in old treebanks which used this for to even when used as a preposition).

edit ADP

ADV: adverb

The English ADV covers all uses of PTB tags RB, RBR, RBS, and WRB except the clausal negation not and reduced forms of it, which become PART.

edit ADV

AUX: auxiliary verb

The English AUX covers PTB MD and uses of the various verbal tags (VB, VBP, VBG, VBN, VBD, VBZ) when they are forms of be, have, do, and get when used as an auxiliary (we count passive get as an auxiliary).

edit AUX

CONJ: coordinating conjunction

The English CONJ corresponds to PTB CC.

edit CONJ

DET: determiner

The English DET covers most cases of Penn Treebank DT, PDT, WDT. However, when a Penn Treebank word with one of these tags stands alone as a noun phrase rather than modifying another word, then it becomes PRON.

edit DET

INTJ: interjection

The English INTJ corresponds to the PTB UH.

edit INTJ

NOUN: noun

The English NOUN corresponds to all cases of PTB NN and NNS, except for %, which we retag as SYM.

edit NOUN

NUM: numeral

The English NUM corresponds exactly to the PTB CD.

edit NUM

PART: particle

The following English words (only) are currently being treated as PART in English:

(This is a slightly motley list and we may still want to rethink this category for English….)

This covers PTB tags POS and some (old PTB style) or all uses of TO, and the subset of RB that is negation.

edit PART

PRON: pronoun

PRON is used for English pronouns, such as we, her, it, who, and that when used as a relative pronoun.

The English PRON corresponds to the PTB PRP, PRP$, WP, WP$, EX, and certain things that are tagged DT (question and Wh pronouns, such as who, this, and that), when they comprise a nominal by themselves rather than functioning as the determiner of a nominal head (usually a noun). (The assignment of PRP$ and WP$ to PRON might be subject to revision - they could also become DET.)

edit PRON

PROPN: proper noun

The English PROPN corresponds to everything tagged NNP or NNPS in the PTB tag set. (Note that at present we make no attempt to exclude words arguably of other parts of speech which appear in proper noun phrases that the PTB tag set would tag with NNP(S). So, United States is United/PROPN States/PROPN.)

edit PROPN

PUNCT: punctuation

The English PUNCT covers PTB tags:

edit PUNCT

SCONJ: subordinating conjunction

SCONJ is used for these two subclasses of subordinating conjunctions:

These are a subset of the things that the IN tag is used for in the PTB.

We treat the putative relativizer use of that (e.g., Jespersen 1924) as a relative pronoun in modern English, so that it gets the POS tag PRON.

edit SCONJ

SYM: symbol

The English SYM covers PTB tags NFP (except for lines of separators, which become PUNCT), #, $, SYM, and for the percent sign (%).

edit SYM

VERB: verb

The tag VERB covers PTB tags VB, VBP, VBZ, VBD, VBG, VBN, except for auxiliary verb uses of be, have, do, and get.

(Auxiliary verbs and modals are AUX and the infinitive to is PART.)

edit VERB

X: other

The English tag X is used for the PTB tags FW, LS, XX, ADD, AFX, and GW. Some things tagged AFX would be candidates for retagging with other tags, but that has not been attempted.

edit X