home edit page issue tracker

This page pertains to UD version 2.

UD Icelandic PUD

Language: Icelandic (code: is)
Family: Indo-European, Germanic

This treebank has been part of Universal Dependencies since the UD v2.6 release.

The following people have contributed to making this treebank part of UD: Hildur Jónsdóttir.

Repository: UD_Icelandic-PUD
Search this treebank on-line: PML-TQ
Download all treebanks: UD 2.13

License: CC BY-SA 4.0

Genre: news, wiki

Questions, comments? General annotation questions (either Icelandic-specific or cross-linguistic) can be raised in the main UD issue tracker. You can report bugs in this treebank in the treebank-specific issue tracker on Github. If you want to collaborate, please contact [hildur • jonsdottir (æt) gmail • com]. Development of the treebank happens directly in the UD repository, so you may submit bug fixes as pull requests against the dev branch.

Annotation Source
Lemmas assigned by a program, with some manual corrections, but not a full manual verification
UPOS assigned by a program, with some manual corrections, but not a full manual verification
XPOS assigned by a program, with some manual corrections, but not a full manual verification
Features assigned by a program, with some manual corrections, but not a full manual verification
Relations assigned by a program, with some manual corrections, but not a full manual verification

Description

Icelandic-PUD is the Icelandic part of the Parallel Universal Dependencies (PUD) treebanks.

The Icelandic-PUD consists of Icelandic translations of 1.000 sentences from the news domain and from Wikipedia. The morphological and syntactic annotation have been manually validated. Icelandic-PUD was not created and a part of the CoNLL 2017 shared task like the other PUD treebanks.

Acknowledgments

Translations were produced by Ölvir Gíslason, a professional translator. The automatic tagging was carried out using ABLTagger, which is based on BiLSTM models, a morphological lexicon and lexical category identification. It is developed by Steinþór Steingrímsson, Örvar Kárason and Hrafn Loftsson and available from https://github.com/steinst/ABLTagger. For lemmatizing the high accuracy lemmatizer Nefnir was run, it is developed by Jón Daði Ingólfsson, Svanhvít Lilja Ingólfsdóttir and Hrafn Loftsson and available at https://github.com/jonfd/nefnir. For preprocessing the syntactic annotation, a delexicalized parser was run using UDPipe, developed by Milan Straka, see https://ufal.mff.cuni.cz/udpipe.

The morphological and syntactic annotation were checked and corrected manually by Hildur Jónsdóttir.

Statistics of UD Icelandic PUD

POS Tags

ADJADPADVAUXCCONJDETINTJNOUNNUMPARTPRONPROPNPUNCTSCONJSYMVERBX

Features

CaseDefiniteDegreeForeignGenderMoodNumberPersonPossPronTypePunctSideTenseVerbFormVoice

Relations

aclacl:relcladvcladvmodamodapposauxcaseccccompcompoundcompound:prtconjcopcsubjdetdislocatedexplfixedflatflat:nameiobjmarknmodnmod:possnsubjnummodobjoblobl:argparataxispunctrootvocativexcomp

Tokenization and Word Segmentation

Morphology

Tags

Nominal Features

Degree and Polarity

Verbal Features

Pronouns, Determiners, Quantifiers

Other Features

Syntax

Auxiliary Verbs and Copula

Core Arguments, Oblique Arguments and Adjuncts

Here we consider only relations between verbs (parent) and nouns or pronouns (child).

Relations Overview