home edit page issue tracker

This page pertains to UD version 2.

It appears that you have Javascript disabled. Please consider enabling Javascript for this page to see the visualizations.

UD English GUM

Language: English (code: en)
Family: Indo-European, Germanic

This treebank has been part of Universal Dependencies since the UD v2.2 release.

The following people have contributed to making this treebank part of UD: Siyao Peng, Amir Zeldes.

Repository: UD_English-GUM
Search this treebank on-line: PML-TQ
Download all treebanks: UD 2.13

License: CC BY-NC-SA 4.0

Genre: academic, blog, fiction, government, news, nonfiction, social, spoken, web, wiki

Questions, comments? General annotation questions (either English-specific or cross-linguistic) can be raised in the main UD issue tracker. You can report bugs in this treebank in the treebank-specific issue tracker on Github. If you want to collaborate, please contact [amir • zeldes (æt) georgetown • edu]. Development of the treebank happens outside the UD repository. If there are bugs, either the original data source or the conversion procedure must be fixed. Do not submit pull requests against the UD repository.

Annotation	Source
Lemmas	annotated manually
UPOS	annotated manually in non-UD style, automatically converted to UD
XPOS	annotated manually
Features	annotated manually in non-UD style, automatically converted to UD
Relations	annotated manually, natively in UD style

Description

Universal Dependencies syntax annotations from the GUM corpus (https://gucorpling.org/gum/)

GUM, the Georgetown University Multilayer corpus, is an open source collection of richly annotated texts from multiple text types. The corpus is collected and expanded by students as part of the curriculum in the course LING-4427 “Computational Corpus Linguistics” at Georgetown University. The selection of text types is meant to represent different communicative purposes, while coming from sources that are readily and openly available (usually Creative Commons licenses), so that new texts can be annotated and published with ease.

The dependencies in the corpus up to GUM version 5 were originally annotated using Stanford Typed Depenencies (de Marneffe & Manning 2013) and converted automatically to UD using DepEdit (https://gucorpling.org/depedit/). The rule-based conversion took into account gold entity annotations found in other annotation layers of the GUM corpus (e.g. entity annotations), and has since been corrected manually in native UD. The original conversion script used can found in the GUM build bot code from version 5, available from the (non-UD) GUM repository. Documents from version 6 of GUM onwards were annotated directly in UD, and subsequent manual error correction to all GUM data has also been done directly using the UD guidelines. Enhanced dependencies were added semi-automatically from version 7.1 of the corpus. For more details see the corpus website.

Acknowledgments

GUM annotation team (so far - thanks for participating!)

Adrienne Isaac, Akitaka Yamada, Alex Giorgioni, Alexandra Berends, Alexandra Slome, Amani Aloufi, Amber Hall, Amelia Becker, Andrea Price, Andrew O’Brien, Anna Runova, Anne Butler, Arianna Janoff, Aryaman Arora, Ayan Mandal, Aysenur Sagdic, Bertille Baron, Bradford Salen, Brandon Tullock, Brent Laing, Candice Penelton, Charlie Dees, Chenyue Guo, Colleen Diamond, Connor O’Dwyer, Cristina Lopez, Dan Simonson, Derek Reagan, Didem Ikizoglu, Edwin Ko, Emile Zahr, Emily Pace, Emma Manning, Ethan Beaman, Felipe De Jesus, Han Bu, Hana Altalhi, Hang Jiang, Hannah Wingett, Hanwool Choe, Hassan Munshi, Helen Dominic, Ho Fai Cheng, Hortensia Gutierrez, Jakob Prange, James Maguire, Janine Karo, Jehan al-Mahmoud, Jemm Excelle Dela Cruz, Jessica Cusi, Jessica Kotfila, Joaquin Gris Roca, John Chi, Jongbong Lee, Juliet May, Jungyoon Koh, Katarina Starcevic, Katelyn MacDougald, Katherine Vadella, Khalid Alharbi, Lara Bryfonski, Lauren Levine, Leah Northington, Lindley Winchester, Linxi Zhang, Siyao Peng, Lucia Donatelli, Luke Gessler, Mackenzie Gong, Margaret Anne Rowe, Margaret Borowczyk, Maria Stoianova, Mariko Uno, Mary Henderson, Maya Barzilai, Md. Jahurul Islam, Michael Kranzlein, Michaela Harrington, Minnie Annan, Mitchell Abrams, Mohammad Ali Yektaie, Naomee-Minh Nguyen, Negar Siyari, Nicholas Mararac, Nicholas Workman, Nicole Steinberg, Nitin Venkateswaran, Phoebe Fisher, Rachel Thorson, Rebecca Childress, Rebecca Farkas, Riley Breslin Amalfitano, Rima Elabdali, Robert Maloney, Ruizhong Li, Ryan Mannion, Ryan Murphy, Sakol Suethanapornkul, Sarah Bellavance, Sasha Slone, Sean Macavaney, Sean Simpson, Seyma Toker, Shane Quinn, Shannon Mooney, Shelby Lake, Shira Wein, Sichang Tu, Siddharth Singh, Siyu Liang, Stephanie Kramer, Sylvia Sierra, Talal Alharbi, Tatsuya Aoyama, Timothy Ingrassia, Trevor Adriaanse, Ulie Xu, Wai Ching Leung, Wenxi Yang, Xiaopei Wu, Yang Liu, Yi-Ju Lin, Yifu Mu, Yilun Zhu, Yingzhu Chen, Yiran Xu, Young-A Son, Yu-Tzu Chang, Yuhang Hu, Yunjung Ku, Yushi Zhao, Zhuosi Luo, Zhuxin Wang, Amir Zeldes

… and other annotators who wish to remain anonymous!

References

As a scholarly citation for the corpus in articles, please use this paper:

Zeldes, Amir (2017) “The GUM Corpus: Creating Multilayer Resources in the Classroom”. Language Resources and Evaluation 51(3), 581–612.

@Article{Zeldes2017,
author = {Amir Zeldes},
title = {The {GUM} Corpus: Creating Multilayer Resources in the Classroom},
journal = {Language Resources and Evaluation},
year = {2017},
volume = {51},
number = {3},
pages = {581--612},
doi = {http://dx.doi.org/10.1007/s10579-016-9343-x}
}

Statistics of UD English GUM

POS Tags

ADJ – ADP – ADV – AUX – CCONJ – DET – INTJ – NOUN – NUM – PART – PRON – PROPN – PUNCT – SCONJ – SYM – VERB – X

Features

Abbr – Case – Definite – Degree – Foreign – Gender – Mood – Number – NumForm – NumType – Person – Polarity – Poss – PronType – Reflex – Tense – Typo – VerbForm – Voice

Relations

acl – acl:relcl – advcl – advcl:relcl – advmod – amod – appos – aux – aux:pass – case – cc – cc:preconj – ccomp – compound – compound:prt – conj – cop – csubj – csubj:outer – csubj:pass – dep – det – det:predet – discourse – dislocated – expl – fixed – flat – goeswith – iobj – list – mark – nmod – nmod:npmod – nmod:poss – nmod:tmod – nsubj – nsubj:outer – nsubj:pass – nummod – obj – obl – obl:agent – obl:npmod – obl:tmod – orphan – parataxis – punct – reparandum – root – vocative – xcomp

Tokenization and Word Segmentation

This corpus contains 10761 sentences, 184373 tokens and 187417 syntactic words.

This corpus contains 26131 tokens (14%) that are not followed by a space.

This corpus does not contain words with spaces.

This corpus contains 378 types of words that contain both letters and punctuation. Examples: 's, n't, ’s, 're, 'm, n’t, 've, 'll, 'd, ’re, U.S., ’ve, ’m, e.g., Mr., ’d, ’ll, L'Enfant, al., th-, w-, St., c., d-, n-, non-avian, Mof-Ávvi, a.m., etc., i.e., pro-Beijing, f-, d., s-, D.C., Naqsh-e, b., cross-sectional, t-, L., Mrs., m., y-, A., Dr., J., Ph.D., Vava'u, W., a-

This corpus contains 3044 multi-word tokens. On average, one multi-word token consists of 2.00 syntactic words.
There are 515 types of multi-word tokens. Examples: it's, don't, I'm, that's, you're, gonna, they're, didn't, it’s, we're, I've, there's, can't, don’t, let's, I'll, he's, doesn't, I’m, cannot, what's, she's, that’s, won't, you'll, city's, haven't, wasn't, we’re, you'd, couldn't, she’s, we've, you’re, didn’t, isn't, wanna, who's, world's, I'd, can’t, wouldn't, you've, you’ve, let’s, Galois', aren't, Dalton’s, Warhol's, doesn’t.

Morphology

Nominal Features

Gender

Fem
- PRON: she, her, herself

Fem,Masc
- PRON: s/he

Masc
- PRON: he, his, him, himself

Neut
- PRON: it, its, itself, it's

Number

Plur
- AUX-Fin: are, were, have, 're, do, did, had, will, can, ’re
- DET: these, those
- NOUN: people, years, things, guys, data, days, studies, minutes, children, months
- PRON: we, they, their, our, them, us, you, those, these, 's
- PROPN: States, Americans, Nations, skittles, Chathams, Mets, Netherlands, Sox, Democrats, Olmec
- VERB-Fin: have, are, had, know, need, want, got, did, make, see

Sing
- ADJ: New, national, International, Democratic, American, Creative, Red, Civic, Main, Open
- ADV: Always, Little, Loud, Out, Too, Truly, northwest, south
- AUX: is, was, 's, has, do, 'm, did, had, ’s, 're
- AUX-Fin: is, was, 's, has, do, 'm, did, had, ’s, 're
- DET: this, that, half
- NOUN: time, day, way, city, world, year, today, life, work, example
- NUM: half, Seven, Three
- PRON: i, it, you, he, his, my, your, that, this, she
- PROPN: President, University, York, New, America, figure, north, Scientology, south, Warhol
- PUNCT: point
- SYM: %
- VERB: said, know, has, have, think, had, want, is, mean, 's
- VERB-Fin: said, know, has, have, think, had, want, is, mean, 's
- VERB-Ger: Concerning, Talking
- VERB-Inf: Avoid, Ditch, Hydrodynamica, Talk, Write
- VERB-Part: United, Combined, Protected, Rated

Case

Acc
- PRON: it, you, me, them, him, us, her, 's, himself, yourself

Gen
- PRON: his, my, your, their, our, its, her, it's, yours, he

Nom
- PRON: i, you, it, we, he, they, she, me, him, s/he

Definite

Def
- DET: the

Ind
- DET: a, an

Degree and Polarity

Degree

Cmp
- ADJ: more, better, greater, larger, further, higher, lower, smaller, easier, less
- ADV: more, later, less, longer, earlier, better, further, sooner, slower, Lesser

Pos
- ADJ: other, many, new, good, little, first, different, same, such, last
- ADV: really, well, back, still, too, again, away, much, all, probably
- DET: all
- PUNCT: —

Sup
- ADJ: most, best, least, largest, highest, greatest, worst, biggest, latest, smallest
- ADV: most, best, least, longest, fastest, foremost

Polarity

Neg
- ADJ: universal, non-avian, unknown, unlikely, unable, unprecedented, unfamiliar, unconscious, uncertain, unclear
- ADV: never, no, unfortunately, nowhere, Ne, pas, unambiguously, unanimously, unawares, uncertainly
- DET: no
- INTJ: no
- NOUN: discomfort, non-realism, none, non-art, non-fiction, non-locals, non-philosophers, non-proliferation, nowhere
- PART: not, n't, n’t, n`t
- PRON: nothing
- PROPN: Non-Proliferation, pas
- VERB-Fin: dismounted, Uncover, unclenched, uncovered
- VERB-Inf: undo, disband, disentangle
- VERB-Part: uncovered, disbanded
- X: no

Verbal Features

Mood

Imp
- AUX-Fin: be, Do
- VERB-Fin: let, see, look, make, get, use, add, try, place, take

Ind
- AUX-Fin: is, was, are, 's, do, were, has, have, 're, 'm
- VERB-Fin: have, know, said, has, had, think, are, want, is, mean

Sub
- AUX-Fin: be
- VERB-Fin: collaborate, do, look, rise, wear

Tense

Past
- AUX-Fin: was, were, did, had, 'd, ’d, got, where
- AUX-Part: been, done, had
- VERB-Fin: said, had, got, came, made, took, went, wanted, thought, did
- VERB-Part: united, called, known, used, based, made, given, done, seen, taken

Pres
- AUX-Fin: is, are, 's, do, has, have, 're, 'm, ’s, 've
- AUX-Part: doing
- VERB-Fin: have, know, has, think, are, want, is, mean, need, get
- VERB-Part: gon, going, doing, trying, getting, coming, looking, working, taking, talking

Voice

Pass
- VERB-Part: known, called, based, used, made, given, born, found, done, taken

Pronouns, Determiners, Quantifiers

PronType

Art
- DET: the, a, an

Dem
- ADV: then, here, there
- DET: this, these, that, those, such, yonder
- PRON: there, that, this, those, these

Emp
- PRON: itself, themselves, himself

Ind
- DET: some, all, any, every, another, each, both, half, Mat, and
- PRON: something, anything, someone, anyone, somebody, anybody

Int
- ADV: when, how, why, where, whither, whenever
- DET: which, what, whatever
- PRON: what, who, which, whatever, Whoever, whose

Neg
- DET: no, neither
- PRON: nothing, one, nobody

Prs
- PRON: i, it, you, we, he, they, his, my, your, she

Rcp
- DET: each
- PRON: one

Rel
- ADV: where, how, why, when, whenever, wherever, however
- DET: what, whatever
- PRON: that, which, who, what, whom, whose, whatever, Whosoever, whoever, wish

Tot
- DET: all, both, each
- PRON: everything, everyone, everybody

NumType

Card
- NUM: one, two, 1, 2, three, 3, four, 10, 4, 6
- PROPN: EIGHT, One

Frac
- ADV: half
- DET: half
- NOUN: half, quarter, third, thirds, quarters, fifths, halves, hundredths, millionth, tenth
- NUM: 7.2, 1.5, 6.8, 1.3, 1.4, 11.5, 2.3, 8.3, half, 1.6

Mult
- ADV: once, twice

Ord
- ADJ: first, second, third, 19th, fourth, 20th, fifth, 10th, 30th, seventh
- ADV: first, second, 135th, third, 15th, sixth

Poss

Yes
- PRON: his, my, your, their, our, its, her, whose, theirs, yours

Reflex

Yes
- PRON: himself, themselves, yourself, itself, myself, herself, ourselves

Person

1
- AUX-Fin: 'm, do, was, have, are, did, 've, am, 're, were
- PRON: i, we, my, our, me, us, 's, myself, ’s, mine
- VERB-Fin: have, think, mean, know, thank, had, got, want, said, wanted

2
- AUX-Fin: do, 're, are, did, have, be, were, ’re, 've, ’ve
- PRON: you, your, yourself, yours, ya, y', ye
- VERB-Fin: know, let, have, get, see, want, look, make, use, take
- VERB-Inf: see, let, Describe, get, go, use, Discuss, Do, Explain, continue

3
- AUX-Fin: is, was, are, 's, were, has, had, have, ’s, will
- PRON: it, he, they, his, she, their, her, them, its, him
- VERB-Fin: said, has, are, have, had, is, 's, says, came, makes

Other Features

Abbr
- Yes
  - ADJ: US, OK, Jr.
  - ADP: vs.
  - ADV: e.g., i.e., c., ca., approx.
  - INTJ: OK
  - NOUN: a.m., etc., GIS, DNA, p., p.m., No., Ph.D., DAB, Ed.
  - PROPN: US, U.S., NASA, NATO, Mr., USI, DH, St., DAB, UNESCO
  - VERB-Part: b., d., div., m.
  - X: al., Mlle.

Foreign
- Yes
  - ADJ: National
  - ADP: x
  - ADV: Ne, pas
  - DET: Une
  - NOUN: Comédie
  - PROPN: de, Cérebro, Escola, do, et, Catarin, Federal, Jim, Jules, La
  - PUNCT: !, ,, -, ?, “, ”
  - SYM: 33A, 56A
  - X: de, alcalde, 樋口, Ciao, Información, Montejo, Módulo, Palacio, Paseo, Turística

NumForm
- Combi
  - ADJ: 19th, 20th, 10th, 30th, 17th, 21st, 2nd, 33rd, 3rd, 50th
  - ADV: 135th, 15th
- Digit
  - NUM: 1, 2, 3, 10, 4, 6, 5, 15, 7, 20
- Roman
  - NUM: II, I, III, VI, XV, XVII
- Word
  - ADJ: first, second, third, fourth, fifth, seventh, ninth, sixth, tenth
  - ADV: first, once, second, twice, half, third, sixth
  - DET: half
  - NOUN: half, quarter, third, thirds, quarters, fifths, halves, hundredths, millionth, tenth
  - NUM: one, two, three, four, five, six, million, ten, eight, seven
  - PROPN: EIGHT, One

Typo
- Yes
  - ADJ: residential, I.=, Water, completed, digital, first, flashest, luxerious, non-Muslim, northeastern
  - ADP: on, to, of, With, a, as, fro, from, in, than
  - ADV: aka, all, before, Non, really, them, then, alr-, any, for
  - AUX-Fin: are, is, can, ll, was, get, has, s, were, where
  - AUX-Inf: be
  - AUX-Part: been
  - DET: a, an, on, some, the, to
  - INTJ: y-, Ca-, Ro-, T-, alreet, alroot, f-, n-, plo-, reve-
  - NOUN: lotos, etc, per, type, dodge, fisherman, kind, order, thing, while
  - NUM: 6:00, five, one
  - PART: s, 's, do, the, not
  - PRON: em, it, you, i, it's, we, ya, She, Who, he
  - PROPN: sea, skittles, Chatnam, Hutter, JOHN, Tale, Trump, bd, june, petri
  - PUNCT: ", -, ., [, |, ’
  - SCONJ: cuz, cause, despite, that
  - VERB: dwibbling, Pre, got, set, understand, United, Untied, address, begun, breath
  - VERB-Fin: set, address, begun, cause, counteracts, cross-breeded, get, gives, got, has
  - VERB-Ger: dwibbling, deeping, exper-, going, knowing, leading, recurring
  - VERB-Inf: understand, breath, contribute, experience, loose, to, understan-, very
  - VERB-Part: United, Untied, disappeared, dwibbling, food, got, know, motivated, raise, reach

Syntax

Auxiliary Verbs and Copula

This corpus uses 1 lemmas as copulas (cop). Examples: be.

This corpus uses 13 lemmas as auxiliaries (aux). Examples: have, be, do, can, will, would, should, may, could, must, might, shall, ought.
This corpus uses 2 lemmas as passive auxiliaries (aux:pass). Examples: be, get.

Core Arguments, Oblique Arguments and Adjuncts

Here we consider only relations between verbs (parent) and nouns or pronouns (child).

nsubj
- VERB-Fin--NOUN (1779)
- VERB-Fin--PRON (651)
- VERB-Fin--PRON-Nom (3217)
- VERB-Ger--NOUN (51)
- VERB-Ger--PRON-Gen (1)
- VERB-Ger--PRON-Nom (15)
- VERB-Inf--NOUN (393)
- VERB-Inf--PRON (139)
- VERB-Inf--PRON-Nom (1331)
- VERB-Part--NOUN (292)
- VERB-Part--PRON (93)
- VERB-Part--PRON-Nom (865)

obj
- VERB-Fin--NOUN (2719)
- VERB-Fin--PRON (219)
- VERB-Fin--PRON-Acc (526)
- VERB-Fin--PRON-Gen (2)
- VERB-Ger--NOUN (882)
- VERB-Ger--PRON (28)
- VERB-Ger--PRON-Acc (82)
- VERB-Inf--NOUN (1886)
- VERB-Inf--PRON (281)
- VERB-Inf--PRON-Acc (389)
- VERB-Inf--PRON-Gen (1)
- VERB-Part--NOUN (459)
- VERB-Part--PRON (77)
- VERB-Part--PRON-Acc (74)

iobj
- VERB-Fin--NOUN (33)
- VERB-Fin--PRON-Acc (124)
- VERB-Ger--NOUN (8)
- VERB-Ger--PRON (2)
- VERB-Ger--PRON-Acc (12)
- VERB-Inf--NOUN (41)
- VERB-Inf--PRON (1)
- VERB-Inf--PRON-Acc (62)
- VERB-Part--NOUN (6)
- VERB-Part--PRON (1)
- VERB-Part--PRON-Acc (10)

Verbs with Reflexive Core Objects

This corpus contains 66 lemmas that occur at least once with a reflexive core object (obj or iobj). Examples: find yourself, find himself, call themselves, force yourself, give yourself, proclaim himself, teach himself, ask yourself, assert himself, associate itself, attach itself, better myself, bind ourselves, bring myself, bring themselves, buy myself, call myself, coin myself, comfort yourself, declare himself, declare myself, devote himself, discover herself, distinguish himself, distinguish itself, establish herself, exalt itself, expose yourself, feel himself, find myself, find themselves, fling themselves, get themselves, give themselves, go yourself, good yourself, govern himself, haul themselves, infect themselves, introduce themselves, maintain himself, make herself, make themselves, make yourself, pick herself, pledge ourselves, prepare yourself, pride themselves, prove itself, prove themselves

Relations Overview

This corpus uses 16 relation subtypes: acl:relcl, advcl:relcl, aux:pass, cc:preconj, compound:prt, csubj:outer, csubj:pass, det:predet, nmod:npmod, nmod:poss, nmod:tmod, nsubj:outer, nsubj:pass, obl:agent, obl:npmod, obl:tmod
The following 1 relation types are not used in this corpus at all: clf