UD English LittlePrince
Language: English (code: en)
Family: IE
This treebank has been part of Universal Dependencies since the UD v2.17 release.
The following people have contributed to making this treebank part of UD: Lori Levin, Annie Zhang, Thomas Palakapilly, Jack Sun, Larry Zhang.
Repository: UD_English-LittlePrince
Search this treebank on-line: PML-TQ
Download all treebanks: UD 2.17
License: CC BY-SA 4.0
Genre: fiction
Questions, comments? General annotation questions (either English-specific or cross-linguistic) can be raised in the main UD issue tracker. You can report bugs in this treebank in the treebank-specific issue tracker on Github. If you want to collaborate, please contact [levin (æt) andrew • cmu • edu]. Development of the treebank happens directly in the UD repository, so you may submit bug fixes as pull requests against the dev branch.
| Annotation | Source |
|---|---|
| Lemmas | annotated manually |
| UPOS | annotated manually, natively in UD style |
| XPOS | not available |
| Features | annotated manually, natively in UD style |
| Relations | annotated manually, natively in UD style |
Description
This treebank contains manually corrected Universal Dependency annotations for 500 sentences from the English translation of The Little Prince.
This treebank is based on the English translation of The Little Prince and consists of a subset of 500 sentences (validated manually). The project was undertaken by students in the 11-422 Grammar Formalisms course at Carnegie Mellon University, taught by Prof. Lori Levin, with the aim of practicing dependency parsing and analyzing linguistic structures. We used silver parses from the English Little Prince SNACS corpus (v1.0) as a base (these were the output of the Stanza parser). We sampled 500 of these sentences and manually corrected them using ArboratorGrew for easy editing. Two annotators worked on each sentence, with approximately 25% of the sentences checked by the course coordinator. The annotations follow the Universal Dependencies guidelines and have been validated for correctness and consistency.
Acknowledgments
We would like to thank Georgetown for providing the initial silver parse of The Little Prince. We also thank Prof. Lori Levin for supervising this project and the students of 11-422 Grammar Formalisms for their dedicated work on this treebank. This project is also part of the course’s broader effort to practice syntactic theory and Universal Dependencies annotation.
References
- (citation)
Statistics of UD English LittlePrince
POS Tags
ADJ – ADP – ADV – AUX – CCONJ – DET – INTJ – NOUN – NUM – PART – PRON – PROPN – PUNCT – SCONJ – SYM – VERB
Features
Case – Definite – Degree – ExtPos – Gender – Mood – Number – NumType – Person – Polarity – Poss – PronType – Reflex – Tense – VerbForm – Voice
Relations
acl – acl:relcl – advcl – advcl:relcl – advmod – amod – appos – aux – aux:pass – case – cc – cc:preconj – ccomp – compound – compound:prt – conj – cop – csubj – det – det:predet – discourse – dislocated – expl – fixed – flat – iobj – mark – nmod – nmod:poss – nsubj – nsubj:outer – nsubj:pass – nummod – obj – obl – obl:agent – obl:npmod – obl:tmod – obl:unmarked – parataxis – punct – reparandum – root – vocative – xcomp
Tokenization and Word Segmentation
- This corpus contains 500 sentences and 6852 tokens.
- All tokens in this corpus are followed by a space.
- This corpus does not contain words with spaces.
- This corpus contains 3 types of words that contain both letters and punctuation. Examples: n't, 's, 've
Morphology
Tags
- This corpus uses 16 UPOS tags out of 17 possible: ADJ, ADP, ADV, AUX, CCONJ, DET, INTJ, NOUN, NUM, PART, PRON, PROPN, PUNCT, SCONJ, SYM, VERB
- This corpus does not use the following tags: X
- This corpus contains 4 word types tagged as particles (PART): 's, n't, not, to
- This corpus contains 38 lemmas tagged as pronouns (PRON): I, all, anyone, anything, everybody, everything, he, herself, himself, his, it, its, itself, mine, my, myself, nobody, nothing, one, oneself, our, she, somebody, something, that, their, themselves, there, they, this, we, what, which, who, whom, you, your, yourself
- This corpus contains 13 lemmas tagged as determiners (DET): a, all, another, any, each, no, some, such, that, the, this, what, which
- Out of the above, 5 lemmas occurred sometimes as PRON and sometimes as DET: all, that, this, what, which
- This corpus contains 14 lemmas tagged as auxiliaries (AUX): be, can, could, do, get, have, may, might, must, ought, shall, should, will, would
- Out of the above, 4 lemmas occurred sometimes as AUX and sometimes as VERB: be, do, get, have
- There are 4 (de)verbal forms:
- Fin
- AUX: was, is, will, would, are, can, am, do, have, could
- VERB: said, made, have, answered, had, replied, added, make, asked, is
- Ger
- AUX: being, having
- VERB: coming, having, asking, bending, blundering, crying, dreaming, falling, glittering, growing
- Inf
- AUX: be, have, get
- VERB: do, know, go, see, judge, make, understand, eat, look, have
- Part
- AUX: been
- VERB: let, grown, inhabited, come, asked, known, born, disturbed, doing, drinking
Nominal Features
- Fem
- PRON: she, her, herself
- Masc
- PRON: he, his, him, himself, herself
- Neut
- PRON: it, its, Mine, itself
- Plur
- AUX-Fin: are, were, have
- DET: these, those
- NOUN: baobabs, flowers, thorns, stars, millions, sheep, matters, years, bushes, seeds
- PRON: they, them, we, their, our, us, themselves, these
- PROPN: States
- VERB-Fin: eat, grow, resemble, sleep, split, start
- Sing
- AUX-Fin: was, is, am, have, has, did, do, had, does, were
- DET: this, that
- NOUN: prince, planet, king, flower, time, day, man, morning, sunset, consequence
- PRON: I, he, it, me, his, my, him, that, she, her
- PROPN: Chapter, France, Goodbye, Earth, Justice, Minster, afar, bush
- VERB: is, said, was, added, began, eats, knows, made, were, Approach
- VERB-Fin: is, said, was, added, began, eats, knows, made, were, cleaned
- Acc
- PRON: me, him, it, you, them, himself, her, myself, herself, us
- Gen
- PRON: my, his, your, its, our, their
- Nom
- PRON: I, he, you, it, she, they, we
- Def
- DET: the
- Ind
- DET: a, an
Degree and Polarity
- Cmp
- ADJ: more, tippler
- ADV: later, better, more, tippler
- Pos
- ADJ: little, good, conceited, other, bad, important, last, old, same, able
- ADV: soon, far, little, too, well, Abruptly, all, carefully, first, hard
- Sup
- ADJ: best, greatest, earliest, handsomest, richest, worst
- Neg
- INTJ: no
- PART: not
Verbal Features
- Imp
- AUX-Fin: Do
- VERB-Fin: come, go, let, Forget, Try, Admire, Clap, Do, Order, Wait
- VERB-Inf: like
- Ind
- AUX-Fin: was, is, are, am, have, do, had, did, were, has
- VERB-Fin: said, made, have, answered, had, replied, added, make, asked, is
- Past
- AUX-Fin: was, had, did, were
- AUX-Part: been
- VERB-Fin: said, made, answered, had, replied, added, asked, took, went, came
- VERB-Part: grown, inhabited, come, asked, known, born, disturbed, faced, gone, had
- Pres
- AUX-Fin: is, are, am, have, do, has, does, 's, 've
- VERB-Fin: have, make, is, order, think, believe, eats, know, knows, beg
- VERB-Part: doing, drinking, going, saying, speaking, becoming, beginning, breaking, eating, growing
- Pass
- VERB-Part: inhabited, born, disturbed, obeyed, acquainted, ashamed, carried, caught, choked, cleaned
Pronouns, Determiners, Quantifiers
- Art
- DET: the, a, an
- Dem
- ADV: then, there, here
- DET: this, that, these, those
- PRON: that, this, these
- Ind
- PRON: something, anyone, anything
- Int
- ADV: how, when, why, where, wherever
- DET: which, what
- PRON: what, who, whom
- Prs
- PRON: I, he, you, it, me, his, my, him, she, they
- Rel
- ADV: where
- PRON: that, which, who
- Card
- NUM: one, hundred, five, two, twenty, four, three, million, seven, six
- Mult
- ADV: once
- Ord
- ADJ: first, fourth, second, third, fifth
- ADV: first
- Yes
- PRON: his, my, your, her, their, our, its, Mine, myself
- Yes
- PRON: himself, myself, herself, yourself, themselves, itself
- 1
- AUX-Fin: am, have, did, do, was
- PRON: I, me, my, myself, we, our, us, Mine
- VERB-Fin: ordered, cried, felt, forbid, had, heard, knew, learned, made, pass
- 2
- AUX-Fin: are, 've, Were, do
- PRON: you, your, yourself, oneself
- VERB-Fin: think, Come, Forget, attend, confuse, know, laughed, live, mean, pull
- 3
- AUX-Fin: is, was, has, were, are, had, does, 's, did, have
- PRON: he, it, his, him, she, they, her, them, himself, their
- VERB-Fin: is, said, was, added, began, eats, knows, were, cleaned, eat
Other Features
- ExtPos
- ADP
- ADP: as, at, in
- SCONJ: so
- VERB-Ger: according
- ADV
- ADP: Of, at
- ADV: any, at
- DET: a
- NOUN: Bit, hand
- PRON
- DET: any
- SCONJ
- ADP: As, in
- SCONJ: as, so
- ADP
Syntax
Auxiliary Verbs and Copula
- This corpus uses 1 lemmas as copulas (cop). Examples: be.
- This corpus uses 13 lemmas as auxiliaries (aux). Examples: have, do, be, will, would, can, could, should, must, ought, shall, might, may.
- This corpus uses 2 lemmas as passive auxiliaries (aux:pass). Examples: be, get.
Core Arguments, Oblique Arguments and Adjuncts
Here we consider only relations between verbs (parent) and nouns or pronouns (child).
- nsubj
- VERB-Fin--NOUN (116)
- VERB-Fin--PRON (26)
- VERB-Fin--PRON-ADP(for) (1)
- VERB-Fin--PRON-Nom (187)
- VERB-Ger--NOUN (2)
- VERB-Ger--PRON (1)
- VERB-Ger--PRON-Nom (1)
- VERB-Inf--NOUN (19)
- VERB-Inf--PRON (8)
- VERB-Inf--PRON-Nom (82)
- VERB-Part--NOUN (13)
- VERB-Part--PRON (2)
- VERB-Part--PRON-Nom (51)
- obj
- VERB-Fin--NOUN (89)
- VERB-Fin--PRON (12)
- VERB-Fin--PRON-Acc (38)
- VERB-Ger--NOUN (6)
- VERB-Ger--PRON (2)
- VERB-Ger--PRON-Acc (3)
- VERB-Inf--NOUN (55)
- VERB-Inf--PRON (14)
- VERB-Inf--PRON-ADP(over) (1)
- VERB-Inf--PRON-Acc (41)
- VERB-Part--NOUN (16)
- VERB-Part--PRON (4)
- VERB-Part--PRON-Acc (7)
- iobj
- VERB-Fin--PRON-Acc (3)
- VERB-Ger--PRON-Acc (1)
- VERB-Inf--NOUN (1)
- VERB-Inf--PRON-Acc (7)
- VERB-Part--PRON-Acc (1)
Verbs with Reflexive Core Objects
- This corpus contains 14 lemmas that occur at least once with a reflexive core object (obj or iobj). Examples: change himself, judge yourself, amuse myself, ask himself, find himself, interrupt herself, judge myself, let herself, reassure themselves, show herself, shut himself, stop myself, stretch itself, throw themselves
- Out of those, 1 lemmas occurred more than once, but never without a reflexive dependent. Examples: change
Relations Overview
- This corpus uses 13 relation subtypes: acl:relcl, advcl:relcl, aux:pass, cc:preconj, compound:prt, det:predet, nmod:poss, nsubj:outer, nsubj:pass, obl:agent, obl:npmod, obl:tmod, obl:unmarked
- The following 5 relation types are not used in this corpus at all: clf, list, orphan, goeswith, dep