UD English EWT
Language: English (code: en
)
Family: IE
This treebank has been part of Universal Dependencies since the UD v1.0 release.
The following people have contributed to making this treebank part of UD: Natalia Silveira, Timothy Dozat, Christopher Manning, Sebastian Schuster, Ethan Chi, John Bauer, Miriam Connor, Marie-Catherine de Marneffe, Nathan Schneider, Sam Bowman, Hanzhi Zhu, Daniel Galbraith, John Bauer.
Repository: UD_English-EWT
Search this treebank on-line: PML-TQ
Download all treebanks: UD 2.15
License: CC BY-SA 4.0
Genre: blog, social, reviews, email, web
Questions, comments?
General annotation questions (either English-specific or cross-linguistic) can be raised in the main UD issue tracker.
You can report bugs in this treebank in the treebank-specific issue tracker on Github.
If you want to collaborate, please contact [syntacticdependencies (æt) lists • stanford • edu].
Development of the treebank happens in the UD repository but not directly in the final CoNLL-U files.
You may submit bug fixes as pull requests against the dev branch but you have to go to the folder called not-to-release
and locate the source files there.
Contact the treebank maintainers if in doubt.
Annotation | Source |
---|---|
Lemmas | assigned by a program, with some manual corrections, but not a full manual verification |
UPOS | annotated manually in non-UD style, automatically converted to UD, with some manual corrections of the conversion |
XPOS | annotated manually |
Features | annotated manually in non-UD style, automatically converted to UD, with some manual corrections of the conversion |
Relations | annotated manually, natively in UD style |
Description
A Gold Standard Universal Dependencies Corpus for English, built over the source material of the English Web Treebank LDC2012T13 (https://catalog.ldc.upenn.edu/LDC2012T13).
The corpus comprises 254,820 words and 16,622 sentences, taken from five genres of web media: weblogs, newsgroups, emails, reviews, and Yahoo! answers. See the LDC2012T13 documentation for more details on the sources of the sentences. The trees were automatically converted into Stanford Dependencies and then hand-corrected to Universal Dependencies. All the basic dependency annotations have been single-annotated, a limited portion of them have been double-annotated, and subsequent correction has been done to improve consistency. Other aspects of the treebank, such as Universal POS, features and enhanced dependencies, has mainly been done automatically, with very limited hand-correction.
Acknowledgments
Annotation of the Universal Dependencies English Web Treebank was carried out by (in order of size of contribution):
- Natalia Silveira
- Timothy Dozat
- Sebastian Schuster
- Miriam Connor
- Marie-Catherine de Marneffe
- Nathan Schneider
- Ethan Chi
- Samuel Bowman
- Christopher Manning
- Hanzhi Zhu
- Daniel Galbraith
- John Bauer
Creation of the CoNLL-U files, including calculating UPOS, feature, and lemma information was primarily done by
- Sebastian Schuster
- Natalia Silveira
The construction of the Universal Dependencies English Web Treebank was partially funded by a gift from Google, Inc., which we gratefully acknowledge.
Statistics of UD English EWT
POS Tags
ADJ – ADP – ADV – AUX – CCONJ – DET – INTJ – NOUN – NUM – PART – PRON – PROPN – PUNCT – SCONJ – SYM – VERB – X
Features
Abbr – Case – Definite – Degree – ExtPos – Foreign – Gender – Mood – Number – NumForm – NumType – Person – Polarity – Poss – PronType – Reflex – Style – Tense – Typo – VerbForm – Voice
Relations
acl – acl:relcl – advcl – advcl:relcl – advmod – amod – appos – aux – aux:pass – case – cc – cc:preconj – ccomp – compound – compound:prt – conj – cop – csubj – csubj:outer – csubj:pass – dep – det – det:predet – discourse – dislocated – expl – fixed – flat – goeswith – iobj – list – mark – nmod – nmod:desc – nmod:poss – nmod:unmarked – nsubj – nsubj:outer – nsubj:pass – nummod – obj – obl – obl:agent – obl:unmarked – orphan – parataxis – punct – reparandum – root – vocative – xcomp
Tokenization and Word Segmentation
- This corpus contains 16622 sentences, 251493 tokens and 254822 syntactic words.
- This corpus contains 31028 tokens (12%) that are not followed by a space.
- This corpus does not contain words with spaces.
- This corpus contains 925 types of words that contain both letters and punctuation. Examples: 's, n't, 'm, 'll, 've, 're, 'd, Dr., e-mail, Mr., ’s, U.S., st., Inc., etc., Sept., vs., W., it's, .doc, carol.st.clair@enron.com, 01-Feb-02, n’t, Dec., Ft., Oct., alt.animals.cat, p&l, :D, Corp., Ms., No., Non-Bondad, PG&E, S., Yahoo!, i.e., A., Analysis_0712, D.C., E., ENRON.XLS, MEH-risk, Sha'lan, b/c, co., ekrapels@esaibos.com, enrongss.xls, p.m., 80's
- This corpus contains 3327 multi-word tokens. On average, one multi-word token consists of 2.00 syntactic words.
- There are 606 types of multi-word tokens. Examples: don't, i'm, it's, i've, didn't, can't, its, i'll, you'll, you're, cannot, doesn't, he's, that's, dont, won't, they're, wouldn't, there's, haven't, isn't, bush's, i'd, wasn't, couldn't, we've, China's, im, we're, here's, what's, aren't, you've, we'll, ive, wont, let's, she's, weren't, your, cant, they'll, world's, you'd, Enron's, Iran's, thats, India's, Qaeda's, he'd.
Morphology
Tags
- This corpus uses 17 UPOS tags out of 17 possible: ADJ, ADP, ADV, AUX, CCONJ, DET, INTJ, NOUN, NUM, PART, PRON, PROPN, PUNCT, SCONJ, SYM, VERB, X
- This corpus contains 20 word types tagged as particles (PART): ', 's, -s, 2, `s, a, n, n't, na, not, nt, n’t, ot, s, ta, the, to, too, ’, ’s
- This corpus contains 52 lemmas tagged as pronouns (PRON): I, anybody, anyone, anything, everybody, everyone, everything, he, her, herself, himself, his, it, its, itself, my, myself, no-one, nobody, none, nothing, one, our, ourselves, she, somebody, someone, something, that, their, themselves, there, they, this, thou, thy, we, what, whatever, which, who, whoever, whom, whomever, whose, wtf, y'all, ye, you, your, yourself, yourselves
- This corpus contains 21 lemmas tagged as determiners (DET): a, all, another, any, both, each, either, every, half, many, neither, no, quite, some, such, that, the, this, what, whatever, which
- Out of the above, 5 lemmas occurred sometimes as PRON and sometimes as DET: that, this, what, whatever, which
- This corpus contains 14 lemmas tagged as auxiliaries (AUX): be, can, could, do, get, have, may, might, must, ought, shall, should, will, would
- Out of the above, 5 lemmas occurred sometimes as AUX and sometimes as VERB: be, can, do, get, have
- There are 4 (de)verbal forms:
- Fin
- AUX: is, will, can, would, was, are, do, could, should, have
- VERB: have, had, said, has, want, need, let, is, think, know
- Ger
- VERB: following, going, working, getting, having, making, using, doing, taking, living
- Inf
- AUX: be, have, get, do, of, 've, b, by
- VERB: have, get, know, do, go, make, see, take, like, find
- Part
- AUX: been, being, getting, having
- VERB: going, had, attached, looking, done, made, doing, used, based, called
Nominal Features
- Fem
- PRON: she, her, herself, hers
- Masc
- PRON: he, his, him, himself
- Neut
- PRON: it, its, itself, it's, THERE, is, ti
- Plur
- AUX-Fin: are, were, have, do, 're, did, had, 've, where, 'd
- DET: these, those
- NOUN: people, years, days, things, questions, times, months, guys, friends, places
- PRON: they, we, their, our, them, us, those, these, themselves, there
- PROPN: states, americans, Beatles, Iraqis, Palestinians, Islands, Tigers, Shiites, Nations, Seas
- SYM: $
- VERB-Fin: have, are, had, need, want, know, do, were, took, got
- Ptan
- NOUN: regards, troops, supplies, means, politics, clothes, thanks, grounds, contents, goods
- PROPN: Philippines, Netherlands
- Sing
- AUX-Fin: is, was, has, 's, do, have, am, 'm, did, are
- DET: this, that, Thi$, dat, dthat, his
- NOUN: time, service, place, thanks, food, way, year, day, number, pm
- PRON: i, it, my, he, me, this, his, that, him, she
- PROPN: bush, US, al, Iraq, enron, Iran, China, Qaeda, John, india
- SYM: #, %, 1%P701!.doc
- VERB-Fin: have, said, has, had, is, want, think, need, know, got
- Acc
- PRON: me, it, you, them, him, us, her, yourself, myself, itself
- Gen
- PRON: my, your, their, his, our, its, her, you, it's, there
- Nom
- PRON: i, you, it, they, we, he, she, u, the, There
- Def
- DET: the, teh, da, he, te, then, ther, thes, to, tttthhhhh
- Ind
- DET: a, an, and, aa
Degree and Polarity
- Cmp
- ADJ: more, better, less, larger, bigger, earlier, smaller, higher, older, greater
- ADV: more, later, better, earlier, longer, less, further, sooner, closer, higher
- Pos
- ADJ: good, great, new, other, many, last, same, few, little, sure
- ADV: well, far, soon, long, hard, early, late, little, close, high
- Sup
- ADJ: best, most, least, worst, cheapest, largest, latest, easiest, highest, oldest
- ADV: most, best, least, worst, highest, longest
- Neg
- CCONJ: nor, neither
- INTJ: no
- PART: not, n't, nt, n’t, n
- Pos
- INTJ: yes, Ye$, Υes
Verbal Features
- Imp
- AUX-Fin: do, be, get
- VERB-Fin: let, go, see, take, try, get, make, give, call, put
- Ind
- AUX-Fin: is, was, are, do, have, has, were, 's, am, did
- VERB-Fin: have, had, said, has, want, need, is, are, know, think
- Sub
- AUX-Fin: be, were, do
- VERB-Fin: go, come, get, have, take, build, buy, call, compare, comply
- Past
- AUX-Fin: was, were, did, had, 'd, got, where, wase, we
- AUX-Part: been
- VERB-Fin: had, said, got, took, came, went, told, called, made, did
- VERB-Part: had, attached, done, made, used, based, called, given, seen, sent
- Pres
- AUX-Fin: is, are, do, have, has, 's, am, 'm, does, 've
- AUX-Part: being, getting, having, been
- VERB-Fin: have, has, want, need, is, are, know, think, thank, get
- VERB-Part: going, looking, doing, trying, getting, including, having, regarding, using, according
- Pass
- VERB-Part: attached, made, based, done, used, called, sent, given, told, known
Pronouns, Determiners, Quantifiers
- Art
- DET: the, a, an, and, teh, aa, da, he, te, then
- Dem
- ADV: now, then, there, here, than, their, them, hear, that, thr
- DET: this, that, these, those, Thi$, dat, dthat, his
- PRON: this, that, those, these
- Emp
- PRON: itself, themselves, myself, himself, herself, yourself, my, ourselves
- Ind
- ADV: ever, sometimes, anywhere, somewhere, either, anytime, sometime, someplace, any, anyplace
- DET: some, any, another, such, quite, either, half, many, $ome, and
- PRON: anyone, something, anything, someone, anybody, somebody, any, any1, some, someon
- Int
- ADV: when, how, why, where, however, Wherever, were, who, y
- DET: what, which, whatever
- PRON: what, who, which, whatever, whom, Wtf, waht, whoooooo, wht
- Neg
- ADV: never, nowhere, Neither, NEEEEEEEEEVERRRR, no
- DET: no, neither
- PRON: nothing, none, one, nobody, noone
- Prs
- PRON: i, you, it, they, my, we, he, your, me, their
- Rcp
- DET: each
- PRON: one
- Rel
- ADV: where, when, why, whenever, how, were, however, wherein, wherever, where-ever
- DET: which, whatever, what
- PRON: that, which, who, what, whom, whatever, whose, who's, whoever, whomever
- Tot
- ADV: always, everywhere
- DET: all, every, each, both
- PRON: everything, everyone, everybody, everbody, every
- Card
- NOUN: 1970s, 80's, 1980s, 1990s, 1590's, 1920s, 1960s, 20s, 60's, 70's
- NUM: one, two, 2, 1, 3, 5, 4, 10, three, 20
- Frac
- ADJ: half
- ADV: half
- DET: half
- NOUN: half, third, fifth, fourth, tenth, thirds
- NUM: 1.5, 20.000, 3.5, 1.00, 4.6, 6.00, 6.1, 1.1, 10.0, 10.2
- Mult
- ADV: once, twice
- Ord
- ADJ: first, second, third, 17th, fourth, 5th, 19th, 21st, 2nd, 7th
- ADV: first, second, Third, fifth
- NOUN: 23rd, 26th, 30th, 15th, 20th, 22nd, 13th, 1st, 29th, 4th
- Yes
- PRON: my, your, their, his, our, its, her, mine, you, it's
- Yes
- PRON: yourself, myself, itself, themselves, himself, ourselves, herself, my, yourselves, your
- 1
- AUX-Fin: have, am, 'm, do, 've, are, was, did, were, had
- PRON: i, my, we, me, our, us, myself, mine, 's, ourselves
- VERB-Fin: have, had, think, thank, hope, know, need, got, want, love
- 2
- AUX-Fin: are, do, 're, have, did, were, 've, r, re, be
- PRON: you, your, yourself, u, Yo, ur, yours, thy, ya, ye
- VERB-Fin: have, want, need, get, know, think, go, see, take, use
- 3
- AUX-Fin: is, was, are, has, 's, were, have, does, did, do
- PRON: it, they, he, their, his, them, him, she, her, its
- VERB-Fin: said, has, is, had, have, are, came, took, told, says
Other Features
- Abbr
- Yes
- ADJ: d, gud, lil
- ADP: o, thru, vs, w, ta, f, a, w/, 2, 4
- ADV: asap, 4-ever, aka, ie, ovr, Def, deffly, eg, eg., prolly
- AUX-Fin: ar, r, re, shal, v, wud
- AUX-Inf: b
- CCONJ: n, 'n
- DET: da, dat, sm
- INTJ: pls, wel, plllz
- NOUN: etc, etc., mins, No., b, luv, ppl, thanx, yrs, UV
- NUM: m, k, b, bn, t
- PART: na, ta, nt, 2, a
- PRON: u, ur, any1, somethin, wht
- PROPN: Sept., Dec., Oct, Oct., feb, Jan, Nov, Nov., Sat., Fri
- SCONJ: b/c, 4, bc, cos, cus, tho, w/out, coz
- VERB-Fin: wan, SMS, hav
- VERB-Ger: xferring, findin
- VERB-Inf: hav, Arrv., wan
- VERB-Part: gon, OK'd, b., est
- Yes
- ExtPos
- ADP
- ADJ: due, such, prior, d, do
- ADP: because, as, in, b/c, on, becuse
- ADV: instead, next
- SCONJ: as
- VERB-Part: according
- ADV
- ADJ: more, less
- ADP: of, at, up, in, A
- ADV: as, How
- DET: all
- NOUN: kind, sort
- PRON: that
- CCONJ
- ADP: rather
- ADV: as, rather
- PART: not
- VERB-Inf: let
- PRON
- DET: each
- PRON: one
- SCONJ
- ADJ: due, such, prior
- ADP: in, as, rather
- ADV: instead
- SCONJ: so, whether, in, as
- ADP
- Foreign
- Yes
- INTJ: Bon, appetit
- NOUN: empanadas, arabes, cordobes, empanada
- X: de, la, Baba, Kevalam, Nam, a, del, guerre, hoc, non
- Yes
- NumForm
- Combi
- ADJ: 17th, 5th, 19th, 21st, 2nd, 7th, 10th, 14th, 1st, 20th
- NOUN: 1970s, 23rd, 26th, 30th, 80's, 15th, 1980s, 20th, 22nd, 13th
- Digit
- NOUN: 22s
- NUM: 2, 1, 3, 5, 4, 10, 20, 6, 2005, 2003
- Roman
- NUM: ii, VI, iii, i, v, XIII, iv, VII, VIII
- Word
- ADJ: first, second, third, fourth, half, sixth, fifth
- ADV: first, once, twice, second, Third, fifth, half
- DET: half
- NOUN: half, first, third, Sixties, eighties, fifteenth, fifth, fourth, mid-nineties, sixth
- NUM: one, two, three, four, m, million, five, six, k, billion
- Combi
- Style
- Arch
- AUX-Fin: wilt, art
- PRON: thy, ye, Thou
- Coll
- PRON: ya, 'em, em
- Expr
- ADJ: Brilll, F%#king, FANFUCKINGTASTIC, Pho-nomenal, bl**dy, comfyy, grrrrrrrreeeaaat
- ADV: sooooo, sooo, soooo, REAAAALLY, VERYYY, VERYYYY, preety, soo, waaaaaaaaaaaaay, NEEEEEEEEEVERRRR
- INTJ: hmmm, Hmmmmmm, Ummmm, AAAAAGGGHHHHHH, GRRRRRRR, ewww, hmmmm, pleasseee, riiight, uhh
- NOUN: *ss, Assh@%$e, F'ers, b****, f*ck, poneh, sh*t
- PRON: Wtf, whoooooo
- PROPN: EARTHHHHHHH, saaaaaam
- VERB-Part: f*ed
- Slng
- ADV: Def, deffly, prolly
- PRON: Yo
- Vrnc
- AUX-Fin: ai
- AUX-Inf: of
- NOUN: lovin'
- PRON: Ya'll
- SCONJ: coz
- VERB-Fin: c'm
- VERB-Ger: goin, playin
- VERB-Part: cookin', wagin, walkin
- Arch
- Typo
- Yes
- ADJ: over, rediculous, accomodating, Arial, afore, knowledgable, unic, 0nside, Aweesome, Awsome
- ADP: a, then, of, and, in, int, the, aboout, abou, admidst
- ADV: to, definately, aboard, all, completly, half, on, realy, truely, were
- AUX-Fin: s, m, r, ve, `s, re, where, d, ll, have
- AUX-Inf: by
- AUX-Part: been
- CCONJ: an, adn, a, ad=nd, afnd, amd, ans, at, of
- DET: and, teh, $ome, Thi$, aa, anothers, dthat, he, his, te
- INTJ: high, Ye$, Υes
- NOUN: mid, Compaq.com, area's, catagory, chnages, collages, e, ect, hamburguers, resturant
- NUM: 3,, on
- PART: s, nt, ', too, -s, `s, ot, the
- PRON: you, there, it's, their, the, s, out, they, any, who's
- PROPN: John, Ken, David, Lorie, Sara, Nasim, Robert, Sear's, penines, Adnan
- PUNCT: 1?!?!?, =
- SCONJ: becuse, then, wether, I'd, Seince, Whie, altough, ask, beacuse, becouse
- VERB-Fin: s, taste, new, recieved, know, over, reccomend, see, want, For
- VERB-Ger: EATTING, developiong, goin, playin, usint
- VERB-Inf: loose, reccommend, recieve, recomend, accomodate, answers, bare, bouild, charger, chose
- VERB-Part: excepted, name, suppose, ASWERING, Compare, Over, Rcommended, U, amplifiaed, botn
- X: et.
- Yes
Syntax
Auxiliary Verbs and Copula
- This corpus uses 1 lemmas as copulas (cop). Examples: be.
- This corpus uses 12 lemmas as auxiliaries (aux). Examples: have, be, will, do, can, would, could, should, may, might, must, shall.
- This corpus uses 2 lemmas as passive auxiliaries (aux:pass). Examples: be, get.
Core Arguments, Oblique Arguments and Adjuncts
Here we consider only relations between verbs (parent) and nouns or pronouns (child).
- nsubj
- VERB-Fin--NOUN (1803)
- VERB-Fin--NOUN-ADP(in) (1)
- VERB-Fin--PRON (759)
- VERB-Fin--PRON-Acc (2)
- VERB-Fin--PRON-Nom (4682)
- VERB-Ger--NOUN (18)
- VERB-Ger--PRON (2)
- VERB-Ger--PRON-Acc (4)
- VERB-Ger--PRON-Gen (4)
- VERB-Ger--PRON-Nom (3)
- VERB-Inf--NOUN (622)
- VERB-Inf--PRON (283)
- VERB-Inf--PRON-Acc (29)
- VERB-Inf--PRON-Nom (2579)
- VERB-Part--NOUN (484)
- VERB-Part--PRON (150)
- VERB-Part--PRON-Acc (6)
- VERB-Part--PRON-Gen (7)
- VERB-Part--PRON-Nom (1444)
- obj
- VERB-Fin--NOUN (3526)
- VERB-Fin--PRON (325)
- VERB-Fin--PRON-Acc (793)
- VERB-Fin--PRON-Nom (66)
- VERB-Ger--NOUN (479)
- VERB-Ger--PRON (13)
- VERB-Ger--PRON-Acc (43)
- VERB-Ger--PRON-Nom (5)
- VERB-Inf--NOUN (3151)
- VERB-Inf--NOUN-ADP('s) (1)
- VERB-Inf--PRON (333)
- VERB-Inf--PRON-Acc (720)
- VERB-Inf--PRON-Nom (88)
- VERB-Part--NOUN (1325)
- VERB-Part--PRON (141)
- VERB-Part--PRON-Acc (150)
- VERB-Part--PRON-Nom (11)
- iobj
- VERB-Fin--NOUN (59)
- VERB-Fin--PRON (2)
- VERB-Fin--PRON-Acc (288)
- VERB-Fin--PRON-Nom (5)
- VERB-Ger--NOUN (10)
- VERB-Ger--PRON-Acc (14)
- VERB-Inf--NOUN (36)
- VERB-Inf--PRON (6)
- VERB-Inf--PRON-Acc (221)
- VERB-Inf--PRON-Nom (9)
- VERB-Part--NOUN (16)
- VERB-Part--PRON (1)
- VERB-Part--PRON-Acc (47)
- VERB-Part--PRON-Nom (1)
Verbs with Reflexive Core Objects
- This corpus contains 59 lemmas that occur at least once with a reflexive core object (obj or iobj). Examples: find yourself, save yourself, ask yourself, blow himself, burn itself, consider themselves, describe themselves, do yourself, feel yourself, give yourself, protect ourselves, work themselves, absent himself, absent yourself, adapt itself, ally itself, avail myself, blow herself, bunker themselves, call himself, cloak himself, commit ourselves, compose himself, contradict themselves, do your, embarrass himself, enjoy myself, enjoy yourself, explode himself, explode yourself, find himself, find themselves, get myself, hurt themselves, imagine yourself, introduce herself, introduce myself, keep himself, keep myself, kill themselves, land herself, leave yourself, make yourself, manifest itself, misrepresent themselves, organize themselves, picture yourself, present yourself, pride themselves, prove himself
- Out of those, 1 lemmas occurred more than once, but never without a reflexive dependent. Examples: absent
Relations Overview
- This corpus uses 15 relation subtypes: acl:relcl, advcl:relcl, aux:pass, cc:preconj, compound:prt, csubj:outer, csubj:pass, det:predet, nmod:desc, nmod:poss, nmod:unmarked, nsubj:outer, nsubj:pass, obl:agent, obl:unmarked
- The following 1 relation types are not used in this corpus at all: clf