UD English LinES
Language: English (code: en)
Family: IE
This treebank has been part of Universal Dependencies since the UD v1.3 release.
The following people have contributed to making this treebank part of UD: Lars Ahrenberg.
Repository: UD_English-LinES
Search this treebank on-line: PML-TQ
Download all treebanks: UD 2.17
License: CC BY-NC-SA 4.0
Genre: fiction, nonfiction, spoken
Questions, comments? General annotation questions (either English-specific or cross-linguistic) can be raised in the main UD issue tracker. You can report bugs in this treebank in the treebank-specific issue tracker on Github. If you want to collaborate, please contact [lars • ahrenberg (æt) liu • se]. Development of the treebank happens outside the UD repository. If there are bugs, either the original data source or the conversion procedure must be fixed. Do not submit pull requests against the UD repository.
| Annotation | Source |
|---|---|
| Lemmas | annotated manually in non-UD style, automatically converted to UD |
| UPOS | annotated manually in non-UD style, automatically converted to UD |
| XPOS | annotated manually |
| Features | not available |
| Relations | annotated manually in non-UD style, automatically converted to UD, with some manual corrections of the conversion |
Description
UD English_LinES is the English half of the LinES Parallel Treebank with the original dependency annotation first automatically converted into Universal Dependencies and then partially reviewed. Its contents cover literature, an online manual and Europarl data.
UD English_LinES is the English half of the LinES Parallel Treebank with UD annotations. The majority of segments are from literature but there is also a section with online manual data and one section with Europarl data. All segments have an associated translation in the UD Swedish_LinES treebank (with the same segment index). The original dependency annotation was first automatically converted to Universal Dependencies and then partially reviewed (Ahrenberg, 2015). In January-February 2017 it was converted to UD version 2 and again reviewed for errors. With version 2.1 lemma information has been added.
The treebank is being developed continuously.
Acknowledgments
Three of the source texts were collected as part of the Linköping Translation Corpus Corpus (Merkel, 1999). The treebank was first developed in the project ‘Micro- and macro-level analysis of translations’ funded by the Swedish Research Council (Ahrenberg, 2007).
Statistics of UD English LinES
POS Tags
ADJ – ADP – ADV – AUX – CCONJ – DET – INTJ – NOUN – NUM – PART – PRON – PROPN – PUNCT – SCONJ – SYM – VERB – X
Features
Case – Definite – Degree – ExtPos – Foreign – Gender – Mood – Number – NumType – Person – Polarity – Poss – PronType – Reflex – Tense – VerbForm – Voice
Relations
acl – acl:relcl – advcl – advmod – amod – appos – aux – aux:pass – case – cc – ccomp – compound – compound:prt – conj – cop – csubj – csubj:outer – csubj:pass – dep – det – discourse – dislocated – expl – fixed – flat – iobj – mark – nmod – nmod:desc – nmod:poss – nmod:unmarked – nsubj – nsubj:outer – nsubj:pass – nummod – obj – obl – obl:agent – obl:unmarked – orphan – parataxis – punct – reparandum – root – vocative – xcomp
Tokenization and Word Segmentation
- This corpus contains 5696 sentences, 105137 tokens and 106305 syntactic words.
- This corpus contains 12530 tokens (12%) that are not followed by a space.
- This corpus does not contain words with spaces.
- This corpus contains 553 types of words that contain both letters and punctuation. Examples: 's, n't, 'd, 've, 'll, 'm, 're, ANSI-92, Mr., ANSI-89, Mrs., o'clock, 31-Dec-1999, 01-Jul-1999, Gai-Hinnom, drop-down, middle-aged, well-formed, &, Ben-Gurion, Hong-Kong, XML-based, a.m., by-and-by, cat-flap, custom-house, d', forty-eight, good-by, p.m, second-class, second-hand, .xsl, Dar-es-Salaam, Jo-Ann, No-6, Sha'ananim, St., Vice-President, anti-Semitic, case-sensitive, coup-d'etat, crew-cut, dark-green, eat-e, eight-inch, first-class, forty-one, great-grandfather, higher-level
- This corpus contains 1168 multi-word tokens. On average, one multi-word token consists of 2.00 syntactic words.
- There are 263 types of multi-word tokens. Examples: don't, it's, didn't, I'm, that's, wasn't, there's, I've, Harry's, he'd, can't, he's, couldn't, wouldn't, you're, hadn't, I'd, doesn't, I'll, isn't, you've, won't, Stillman's, they're, Mweta's, Ron's, they'll, what's, Clelia's, haven't, Commission's, weren't, Auster's, aren't, we'll, father's, hasn't, mother's, she's, you'll, Europe's, we'd, we've, Dando's, Quinn's, Vernon's, company's, shouldn't, we're, Weasley's.
Morphology
Tags
- This corpus uses 17 UPOS tags out of 17 possible: ADJ, ADP, ADV, AUX, CCONJ, DET, INTJ, NOUN, NUM, PART, PRON, PROPN, PUNCT, SCONJ, SYM, VERB, X
- This corpus contains 6 word types tagged as particles (PART): ', 's, n't, not, t', to
- This corpus contains 60 lemmas tagged as pronouns (PRON): I, Much, _, a, all, another, any, anybody, anyone, anything, both, each, either, everybody, everyone, everything, half, he, her, herself, himself, his, it, its, money, my, myself, neither, no, nobody, none, nothing, one, other, our, ourselves, own, she, some, somebody, someone, something, such, te, that, their, themselves, there, they, this, we, what, whatever, whatnot, which, who, whoever, you, your, yourself
- This corpus contains 23 lemmas tagged as determiners (DET): a, all, an, another, any, both, du, each, either, every, la, le, no, none, one, some, that, the, this, what, whatever, which, who
- Out of the above, 17 lemmas occurred sometimes as PRON and sometimes as DET: a, all, another, any, both, each, either, no, none, one, some, that, this, what, whatever, which, who
- This corpus contains 14 lemmas tagged as auxiliaries (AUX): be, can, could, do, get, have, may, might, must, ought, shall, should, will, would
- Out of the above, 7 lemmas occurred sometimes as AUX and sometimes as VERB: be, could, do, get, have, ought, will
- There are 3 (de)verbal forms:
- Fin
- AUX: was, had, is, were, would, are, can, could, have, 's
- VERB: said, was, had, is, came, seemed, looked, went, made, felt
- Inf
- AUX: be, have, do, get
- VERB: see, know, do, make, go, get, have, say, take, be
- Part
- AUX: been, being, having, had
- VERB: going, done, using, come, made, looking, taken, trying, moving, taking
Nominal Features
- Fem
- PRON: her, she, herself, hers, itself
- Masc
- PRON: he, his, him, himself
- Neut
- PRON: its, itself
- Plur
- AUX-Fin: were
- DET: these, those
- NOUN: people, eyes, things, men, fields, years, items, women, children, hands
- NUM: fifteen, forty-one, two
- PRON: they, we, their, them, us, our, themselves, these, those, others
- PROPN: Dursleys, Hogwarts, Weasleys, Mets, States, Beatles, Bayleys, Cloughs, Pettigrews, Masons
- VERB-Fin: mix
- Sing
- AUX-Fin: was, is, 's, has, does, am, 'm
- DET: this, that, each
- NOUN: data, man, time, field, way, father, page, room, file, place
- NUM: one
- PRON: he, I, his, my, him, her, she, me, that, this
- PROPN: Harry, Quinn, Stillman, XML, Access, Auster, Bray, SQL, Ron, Mweta
- SYM: %
- VERB-Fin: was, is, 's, has, says, goes, makes, knows, means, comes
- Acc
- PRON: him, me, them, himself, you, us, her, myself, itself, themselves
- Gen
- PRON: his, my, her, their, your, its, our
- Nom
- PRON: he, I, you, they, we, she, all, others, some, another
- Def
- DET: the, Le
- Ind
- DET: a, an, Tha
Degree and Polarity
- Cmp
- ADJ: more, better, older, most, worse, lower, easier, greater, higher, younger
- ADV: longer, farther, more, nearer, sooner, closer, faster, harder, higher
- Pos
- ADJ: other, white, old, own, new, good, long, same, little, black
- ADV: well, far, long, soon, close, hard, early, little, badly, fast
- Sup
- ADJ: best, nearest, greatest, biggest, worst, largest, least, closest, commonest, deepest
- ADV: least, Whilst, best
- Neg
- CCONJ: neither, nor
- INTJ: no
- PART: not, n't
- Pos
- INTJ: yes
Verbal Features
- Imp
- VERB: let, see, come, look, Note, click, Go, Imagine, have, make
- VERB-Fin: let, come, look, see, Note, click, Imagine, have, make, remember
- VERB-Inf: Go
- Ind
- AUX-Fin: was, had, is, were, would, are, can, could, have, 's
- VERB: said, was, had, is, came, seemed, looked, went, made, felt
- VERB-Fin: said, was, had, is, came, seemed, looked, went, made, felt
- VERB-Inf: Land, filter, hurt, march, trouble
- VERB-Part: want, appeared, had, made, paid, promising, shut, startled, storm, welcome
- Sub
- VERB-Fin: were, get, lost, post
- Past
- AUX-Fin: was, had, were, did, 'd, got, might
- AUX-Part: been, had
- VERB: said, was, had, made, came, seemed, looked, went, felt, got
- VERB-Fin: said, was, had, came, seemed, looked, went, made, felt, saw
- VERB-Part: done, made, come, taken, seen, used, given, gone, written, displayed
- Pres
- AUX: is, are, have, 's, do, has, being, 've, does, 're
- AUX-Fin: is, are, have, 's, do, has, 've, does, 'm, am
- AUX-Part: being, having
- VERB-Fin: is, know, are, have, 's, has, want, says, see, think
- VERB-Inf: filter
- VERB-Part: going, using, looking, trying, moving, taking, talking, making, coming, waiting
- Pass
- VERB-Fin: pushed, thought
- VERB-Part: made, used, displayed, done, based, taken, given, created, hidden, put
Pronouns, Determiners, Quantifiers
- Art
- DET: the, a, an, Le, what, Tha
- Dem
- ADV: then, now, there, here
- DET: this, that, these, those
- PRON: that, this, these, those
- Emp
- PRON: himself
- Ind
- ADV: ever, sometimes, somewhere, anywhere
- DET: some, any, another, either
- PRON: something, someone, anything, anyone, one, either, some, ones
- Int
- ADV: how, why, where, when, wherever, whatever
- DET: what, which, whatever
- PRON: what, who, which, whatever, whom, whose, Those
- Neg
- ADV: never, nowhere
- DET: no, none
- PRON: nothing, one, none, neither
- Prs
- PRON: he, I, his, you, they, my, him, her, we, she
- Rcp
- DET: each
- PRON: one
- Rel
- ADV: where, why
- DET: what, whose
- PRON: that, who, which, what, whom, whose
- Tot
- ADV: always, everywhere
- DET: all, each, every, both
- PRON: each, both, all
- Card
- NUM: one, two, three, 2002, five, six, ten, four, 2000, 2
- Mult
- ADV: once, twice
- Ord
- ADJ: first, second, third, fourth, seventh, sixth, eleventh
- Yes
- DET: whose
- PRON: his, my, her, their, its, your, our, whose, theirs, hers
- Yes
- PRON: himself, myself, itself, themselves, herself, yourself, ourselves, oneself
- 1
- AUX-Fin: was, am, 'm
- PRON: I, my, we, me, us, our, myself, ourselves, mine, ours
- VERB-Fin: was, 'm, am
- 2
- PRON: you, your, yourself, yours, itself
- 3
- AUX-Fin: is, 's, has, does, was
- PRON: he, his, they, him, her, she, their, them, himself, its
- VERB-Fin: is, 's, has, says, goes, makes, knows, means, comes, contains
Other Features
- ExtPos
- ADP
- ADJ: such, due, more, prior
- ADP: because, in, As, instead, on
- ADV: because, Instead, as, regardless
- SCONJ: because
- VERB-Part: according
- ADV
- ADP: of, at, on, in, after, before
- ADV: as, By
- NOUN: kind, Sort, face
- PRON: all
- CCONJ
- ADV: as
- VERB-Inf: let
- PRON
- DET: each
- PRON: one
- SCONJ
- ADP: in, as
- ADV: so, instead
- SCONJ: so, instead, as, whether
- ADP
- Foreign
- Yes
- PRON: te
- Yes
Syntax
Auxiliary Verbs and Copula
- This corpus uses 1 lemmas as copulas (cop). Examples: be.
- This corpus uses 14 lemmas as auxiliaries (aux). Examples: have, be, do, would, can, will, could, must, should, might, may, shall, get, ought.
- This corpus uses 10 lemmas as passive auxiliaries (aux:pass). Examples: be, have, will, can, get, could, must, should, may, would.
Core Arguments, Oblique Arguments and Adjuncts
Here we consider only relations between verbs (parent) and nouns or pronouns (child).
- nsubj
- VERB--NOUN (2)
- VERB-Fin--NOUN (1285)
- VERB-Fin--PRON (691)
- VERB-Fin--PRON-Gen (1)
- VERB-Fin--PRON-Nom (2388)
- VERB-Fin--PRON-Nom-ADP(as) (1)
- VERB-Inf--NOUN (213)
- VERB-Inf--PRON (136)
- VERB-Inf--PRON-ADP(for) (1)
- VERB-Inf--PRON-Acc (7)
- VERB-Inf--PRON-Gen (2)
- VERB-Inf--PRON-Nom (871)
- VERB-Part--NOUN (338)
- VERB-Part--PRON (143)
- VERB-Part--PRON-Acc (3)
- VERB-Part--PRON-Nom (497)
- obj
- VERB--NOUN (1)
- VERB--PRON (1)
- VERB--PRON-Acc (1)
- VERB-Fin--NOUN (1459)
- VERB-Fin--NOUN-ADP(for) (1)
- VERB-Fin--NOUN-ADP(in) (1)
- VERB-Fin--NOUN-ADP(out) (1)
- VERB-Fin--NOUN-ADP(to) (3)
- VERB-Fin--NOUN-ADP(up) (1)
- VERB-Fin--PRON (286)
- VERB-Fin--PRON-Acc (295)
- VERB-Fin--PRON-Acc-ADP(with) (1)
- VERB-Fin--PRON-Gen (18)
- VERB-Fin--PRON-Nom (10)
- VERB-Inf--NOUN (1009)
- VERB-Inf--PRON (228)
- VERB-Inf--PRON-ADP(as) (1)
- VERB-Inf--PRON-Acc (195)
- VERB-Inf--PRON-Gen (16)
- VERB-Inf--PRON-Nom (9)
- VERB-Part--NOUN (799)
- VERB-Part--NOUN-ADP(to) (1)
- VERB-Part--PRON (117)
- VERB-Part--PRON-ADP(at) (1)
- VERB-Part--PRON-ADP(by) (1)
- VERB-Part--PRON-ADP(into) (1)
- VERB-Part--PRON-Acc (113)
- VERB-Part--PRON-Gen (8)
- VERB-Part--PRON-Nom (3)
- iobj
- VERB-Fin--NOUN (8)
- VERB-Fin--PRON (5)
- VERB-Fin--PRON-Acc (39)
- VERB-Fin--PRON-Gen (2)
- VERB-Inf--NOUN (7)
- VERB-Inf--PRON-Acc (24)
- VERB-Part--NOUN (3)
- VERB-Part--PRON-Acc (7)
Verbs with Reflexive Core Objects
- This corpus contains 91 lemmas that occur at least once with a reflexive core object (obj or iobj). Examples: feel himself, find himself, find myself, remind himself, ask myself, put himself, tell himself, enjoy himself, excuse himself, give himself, hang himself, imagine himself, lose myself, post himself, prove himself, punish himself, recover himself, wedge himself, absorb himself, advise myself, allow himself, avoid himself, beat himself, brace himself, break yourself, busy himself, buy himself, buy yourself, buzz themselves, carry myself, clean itself, collect themselves, commit himself, conduct themselves, control himself, convince myself, cover myself, cut yourself, detach itself, drink himself, drown himself, earn himself, efface himself, enjoy themselves, enjoy yourself, exhaust himself, find herself, find ourselves, fix herself, fix itself
Relations Overview
- This corpus uses 12 relation subtypes: acl:relcl, aux:pass, compound:prt, csubj:outer, csubj:pass, nmod:desc, nmod:poss, nmod:unmarked, nsubj:outer, nsubj:pass, obl:agent, obl:unmarked
- The following 3 relation types are not used in this corpus at all: clf, list, goeswith