home edit page issue tracker

This page pertains to UD version 2.

UD English CHILDES

Language: English (code: en)
Family: IE

This treebank has been part of Universal Dependencies since the UD v2.16 release.

The following people have contributed to making this treebank part of UD: Xiulin Yang, Zhuoxuan Ju, Lanni Bu, Zoey Liu, Nathan Schneider.

Repository: UD_English-CHILDES
Search this treebank on-line: PML-TQ
Download all treebanks: UD 2.16

License: CC BY-SA 4.0

Genre: spoken

Questions, comments? General annotation questions (either English-specific or cross-linguistic) can be raised in the main UD issue tracker. You can report bugs in this treebank in the treebank-specific issue tracker on Github. If you want to collaborate, please contact [xy236 (æt) georgetown • edu]. Development of the treebank happens directly in the UD repository, so you may submit bug fixes as pull requests against the dev branch.

Annotation Source
Lemmas assigned by a program, with some manual corrections, but not a full manual verification
UPOS assigned by a program, with some manual corrections, but not a full manual verification
XPOS assigned by a program, not checked manually
Features not available
Relations annotated manually, natively in UD style

Description

This repository contains Universal Dependencies (UD) trees for utterances from child–adult spoken interactions in English, drawn from CHILDES transcripts.

This treebank is built based on three existing treebanks (details under References). We compile, harmonize, and manually correct major UD-style annotations of CHILDES data into a consistent, unified UD format, resulting in a gold-standard treebank of 48K sentences and 236K tokens.

Overall Statistics

Child Corpus Child Age Range Gold Sents Gold Toks
Laura Braunwald 1;3–7;0 (1;3–7;0) 4,622 21,079
Adam Brown 1;6–5;2 (1;6–5;2) 16,736 84,643
Eve Brown 1;6–5;1 (1;6–5;2) 2,207 8,497
Abe Kuczaj 2;4–5;0 (2;4–5;0) 4,167 22,437
Sarah Brown 1;6–5;2 (1;6–5;2) 5,347 23,233
Lily Providence 0;11–4;0 (0;11–4;0) 1,499 6,337
Naima Providence 1;3–3;11 (0;11–4;0) 2,534 14,360
Violet Providence 0;11–4;0 (0;11–4;0) 721 1,857
Thomas Thomas 2;0–4;11 (2;0–4;11) 4,240 20,333
Emma Weist 2;2–4;10 (2;1–5;0) 2,423 13,730
Roman Weist 2;2–4;9 (2;1–5;0) 3,653 20,557
Overall NA NA 48,183 236,941

Train, dev, test split statistics

split Children Corpus Gold Sents
Train Adam, Lily, Naima, Sarah, Roman, Laura, Abe Brown, Providence, Weist, Kuczaj, Braunwald 34,732
Dev Adam, Lily, Naima, Sarah, Roman, Laura, Abe Brown, Providence, Weist, Kuczaj, Braunwald 3,860
Test Eve, Violet, Emma, Thomas Brown, Providence, Weist, Thomas 9,591

Example

```

Acknowledgments

We acknowledge Ida Szubert, Omri Abend, Samuel Gibbon, Louis Mahon, Sharon Goldwater, Mark Steedman, and Emily Prud’hommeaux for their contributions to the original UD treebanking efforts. We also thank Brian MacWhinney for helpful discussions.

Statistics of UD English CHILDES

POS Tags

ADJADPADVAUXCCONJDETINTJNOUNNUMPARTPRONPROPNPUNCTSCONJSYMVERBX

Features

ExtPosTypo

Relations

aclacl:relcladvcladvmodamodapposauxaux:passcasecccc:preconjccompcompoundcompound:prtconjcopcsubjdepdetdet:predetdiscoursedislocatedexplfixedflatgoeswithiobjmarknmodnmod:possnmod:tmodnsubjnsubj:outernsubj:passnummodobjoblobl:npmodobl:tmodobl:unmarkedorphanparataxispunctreparandumrootvocativexcomp

Tokenization and Word Segmentation

Morphology

Tags

Nominal Features

Degree and Polarity

Verbal Features

Pronouns, Determiners, Quantifiers

Other Features

Syntax

Auxiliary Verbs and Copula

Core Arguments, Oblique Arguments and Adjuncts

Here we consider only relations between verbs (parent) and nouns or pronouns (child).

Relations Overview