POS tags
Open class words | Closed class words | Other |
---|---|---|
ADJ | ADP | PUNCT |
ADV | AUX | SYM |
INTJ | CONJ | X |
NOUN | DET | |
PROPN | NUM | |
VERB | PART | |
PRON | ||
SCONJ |
ADJ
: adjective
The English ADJ
is currently precisely the union of PTB JJ, JJR, and JJS.
ADP
: adposition
The English ADP
covers the Penn Treebank RP, and a subset of uses of IN (when not a complementizer or subordinating conjunction) and TO (in old treebanks which used this for to even when used as a preposition).
ADV
: adverb
The English ADV
covers all uses of PTB tags RB, RBR, RBS, and WRB except the clausal negation not and reduced forms of it, which become PART.
AUX
: auxiliary verb
The English AUX
covers PTB MD and uses of the various verbal tags (VB, VBP, VBG, VBN, VBD, VBZ) when they are forms of be, have, do, and get when used as an auxiliary (we count passive get as an auxiliary).
CONJ
: coordinating conjunction
The English CONJ
corresponds to PTB CC.
DET
: determiner
The English DET
covers most cases of Penn Treebank DT, PDT, WDT. However, when a Penn Treebank word with one of these tags stands alone as a noun phrase rather than modifying another word, then it becomes PRON
.
INTJ
: interjection
The English INTJ
corresponds to the PTB UH.
NOUN
: noun
The English NOUN
corresponds to all cases of PTB NN and NNS, except for %, which we retag as SYM.
NUM
: numeral
The English NUM
corresponds exactly to the PTB CD.
PART
: particle
The following English words (only) are currently being treated as PART
in English:
- Possessive marker: ’s or ’ (and non-standard forms s, -s)
- Predicate negation: not, n’t, nt
- Infinitive marker: to (and non-standard forms ta, na, too, ot, 2, a)
(This is a slightly motley list and we may still want to rethink this category for English….)
This covers PTB tags POS and some (old PTB style) or all uses of TO, and the subset of RB that is negation.
PRON
: pronoun
PRON
is used for English pronouns, such as we, her, it, who, and that when used as a relative pronoun.
The English PRON
corresponds to the PTB PRP, PRP$, WP, WP$, EX, and certain things that are tagged DT (question and Wh pronouns, such as who, this, and that), when they comprise a nominal by themselves rather than functioning as the determiner of a nominal head (usually a noun). (The assignment of PRP$ and WP$ to PRON might be subject to revision - they could also become DET.)
PROPN
: proper noun
The English PROPN
corresponds to everything tagged NNP or NNPS in the PTB tag set. (Note that at present we make no attempt to exclude words arguably of other parts of speech which appear in proper noun phrases that the PTB tag set would tag with NNP(S). So, United States is United/PROPN States/PROPN.)
PUNCT
: punctuation
The English PUNCT
covers PTB tags:
- ``
- ’’
- -LRB-
- -RRB-
- ,
- .
- :
- HYPH
- Some uses of NFP (for lines of hyphens, asterisks or tildes)
SCONJ
: subordinating conjunction
SCONJ
is used for these two subclasses of subordinating conjunctions:
- Complementizers: that, whether, if, etc.
- Adverbial clause introducers: when, since, before, etc. (when introducing a clause not a nominal)
These are a subset of the things that the IN tag is used for in the PTB.
We treat the putative relativizer use of that (e.g., Jespersen 1924) as a relative pronoun in modern English, so that it gets the POS tag PRON.
SYM
: symbol
The English SYM
covers PTB tags NFP (except for lines of separators, which become PUNCT), #, $, SYM, and for the percent sign (%).
VERB
: verb
The tag VERB
covers PTB tags VB, VBP, VBZ, VBD, VBG, VBN, except for auxiliary verb uses of be, have, do, and get.
(Auxiliary verbs and modals are AUX
and the infinitive to is PART
.)
X
: other
The English tag X
is used for the PTB tags FW, LS, XX, ADD, AFX, and GW. Some things tagged AFX would be candidates for retagging with other tags, but that has not been attempted.