home edit page issue tracker

This page pertains to UD version 2.

UD English LinES

Language: English (code: en)
Family: Indo-European, Germanic

This treebank has been part of Universal Dependencies since the UD v1.3 release.

The following people have contributed to making this treebank part of UD: Lars Ahrenberg.

Repository: UD_English-LinES

License: CC BY-NC-SA 4.0

Genre: fiction, nonfiction, spoken

Questions, comments? General annotation questions (either English-specific or cross-linguistic) can be raised in the main UD issue tracker. You can report bugs in this treebank in the treebank-specific issue tracker on Github. If you want to collaborate, please contact [lars • ahrenberg (æt) liu • se]. Development of the treebank happens outside the UD repository. If there are bugs, either the original data source or the conversion procedure must be fixed. Do not submit pull requests against the UD repository.

Annotation Source
Lemmas annotated manually in non-UD style, automatically converted to UD
UPOS annotated manually in non-UD style, automatically converted to UD
XPOS (unrecognized value: “manual”)
Features not available
Relations (unrecognized value: “converted from manual and corrected”)

Description

UD English_LinES is the English half of the LinES Parallel Treebank with the original dependency annotation first automatically converted into Universal Dependencies and then partially reviewed. Its contents cover literature, an online manual and Europarl data.

UD English_LinES is the English half of the LinES Parallel Treebank with UD annotations. The majority of segments are from literature but there is also a section with online manual data and one section with Europarl data. All segments have an associated translation in the UD Swedish_LinES treebank (with the same segment index). The original dependency annotation was first automatically converted to Universal Dependencies and then partially reviewed (Ahrenberg, 2015). In January-February 2017 it was converted to UD version 2 and again reviewed for errors. With version 2.1 lemma information has been added.

The treebank is being developed continuously.

Acknowledgments

Three of the source texts were collected as part of the Linköping Translation Corpus Corpus (Merkel, 1999). The treebank was first developed in the project ‘Micro- and macro-level analysis of translations’ funded by the Swedish Research Council (Ahrenberg, 2007).

Statistics of UD English LinES

POS Tags

ADJADPADVAUXCCONJDETINTJNOUNNUMPARTPRONPROPNPUNCTSCONJSYMVERBX

Features

Relations

aclacl:relcladvcladvmodamodapposauxaux:passcaseccccompcompoundcompound:prtconjcopcsubjcsubj:passdetdiscoursedislocatedexplfixedflatiobjmarknmodnmod:possnsubjnsubj:passnummodobjoblobl:agentorphanparataxispunctrootvocativexcomp

Tokenization and Word Segmentation

Morphology

Tags

Nominal Features

Degree and Polarity

Verbal Features

Pronouns, Determiners, Quantifiers

Other Features

Syntax

Auxiliary Verbs and Copula

Core Arguments, Oblique Arguments and Adjuncts

Here we consider only relations between verbs (parent) and nouns or pronouns (child).

Relations Overview