home edit page issue tracker

This page pertains to UD version 2.

It appears that you have Javascript disabled. Please consider enabling Javascript for this page to see the visualizations.

UD for Dutch

Tokenization and Word Segmentation

Words are delimited by whitespace or punctuation
Words do not contain spaces, although some lemma’s for multi-word expressions do (au serieux, dat wil zeggen, onder ander, onder veel, ter plaatse, tot en met)
Words (e.g. abbreviations, names, URLs etc.) may contain arbitrary punctuation signs (http://www.speelgoedmuseum.be, vroeg-renaissance, o.a., ex-VU&ID)
No multiword tokens occur (i.e. forms like ten are treated as a single token, not as te+een)

Morphology

Features

Abbr=Yes for abbreviations (POS=X, XPOS=SPEC afk)
Case=Acc,Nom for PRON (Nom for XPOS=VNW nomin, Acc for XPOS=VNW obl).
Definite=Def,Ind for DET (Def for XPOS=LID bep, Ind for XPOS=LID onbep)
Degree=Pos,Cmp,Sup for adjectives (POS=ADJ, Pos for XPOS=ADJ basis, Sup for ADJ sup, Cmp for ADJ comp)
Gender=Com,Neut for NOUN and PROPN, Com for N onz, Neut for N onz, Com,Neut for N genus
Person=1,2,3 for PRON (1 for XPOS=VNW 1, 2 for XPOS=VNW 2,2v,2b, 3 for XPOS=VNW 3,3p,3v,3p,3o

PronType=Int,Prs,Ind,Rel,Dem for PRON (Dem for demonstratives, VNW

aanw, Rel for relative pronouns, VNW

betr, Prs for personal and possessive pronouns, VNW

pers and VNW

bez, Ind for indefinite pronouns, VNW

onbep, Int for interoggative pronouns, VNW

vb).

Number=Sing,Plur for AUX, NOUN, VERB, PROPN, Sing for WW ev, WW met-t, N ev, Plur for WW mv, N mv
Poss=Yes for PRON with VNW bez
Reflex=Yes for PRON with VNW refl
VerbForm=Inf,Fin,Part for AUX and VERB with Inf for WW inf, Fin for WW pv and Part for WW od or WW vd

Detailed documentation of the decisions w.r.t. features in the original data can be found in the D-COI POS-tagging and lemmatization manual

Syntax

The Dutch treebanks are automatically converted from annotated and manually corrected treebanks. Detailed documentation of the the original syntactic annotation is in the syntactic annotation manual of the Lassy project. The data included in the UD treebanks can be explored using the PaQu interface, which supports querying both the original and UD annotation.

acl, acl:relcl acl is used for phrases headed by a verb modifying a noun. These can be prenominal (as in thans geldende rentestand) postnominal (as in de vraag of de rente zal stijgen). acl:relcl is used for relative clauses. In the original syntactic annotation these are nodes with an mod dependency relation that occur as sister to a nominal head, and which have a category ppres, ppart (prenominal), or cp, oti (postnominal) or rel (relative clauses). Verbs without dependents in prenominal position are considered to be amod.
advcl is used for phrases that occur as modifying phrases (adjuncts) and are dependents of a verbal head. In the original annotation they have relation mod and they can be of category cp, oti, ppart, among others.
advmod is used for adverbs and adverbial phrases modifying a verb. The POS of advmod elements is almost always ADV or ADJ.
amod is used for adjectives and other elements modifying a noun. The POS of amod elements is usually ADJ, but ADV and NOUN and others occur as well. ADV is used for elements such as slechts (5 euro), vele (kookboeken), zo’n (25 optredens) and occurs in nominalisations (het niet doen terugkeren, where niet is amod of the verb terugkeren, which itself is being used nominatively), and is used for adverbial pronouns (de verlenging ervan)
appos is used for appositions. In the original annotation, the relation app is used for a wide range of nominal phrases occurring in postnominal position (de fotograaf Philip Mechanicus, Nooteboom’s debuut ‘Philip en de anderen’, de jaren 1979-1981, de wethouder cultuur, presentatie Slibreeks, Hans Groenewegen, dichter en publicist, ZUiderzinnen, Festival van het woord, zondag 18 september 2005. All these are mapped to the appos dependency relation, even though this stretches the intended use of appos in UD.
aux, aux:pass aux is used for auxiliaries as defined above in the section on POS tags. Note that this implies that auxiliaries are dependents of the main verb with which they co-occur. In the original annotation, no distinction between verbs and auxiliaries is made, and auxiliaries always have a sister that is a clause headed by the main verb. Note that this also means that elements such as subjects, complementizers, and even the marker ‘te’ become dependents of the main verb, and not the auxiliary.
case is used for prepositions (ADP) that introduce a prepositional phrase. The preposition is a dependent of the head of the nominal phrase. Where there is both a preposition and a postposition (door de eeuwen heen, om hem heen) both elements are case dependents of the nominal head. In cases where the nominal element is replaced by an R-pronoun (er etc), the R-pronoun precedes the preposition, and may be nonadjacent to the preposition (U doet er verstandig aan). Note that this is a source of non-projective annotations.
cc is used for coordination words such as en, of, maar.
ccomp is used for complement clauses that are dependents of a verb. Complement clauses are phrases with relation vc in the original annotation and that are headed by a finite verb or a te-infinitive, so they can be of category cp, whsub, ti, oti. In ccomp clauses, there is no controlled subject.
compound:prt is used for seperable verbal prefixes (_ groeide uit, aan te wijzen_) and the non-verbal part of phrasal verbs (_ op prijs stellen, bekend staan, kenbaar maken_)
conj is used for conjuncts.
cop is used for the copula zijn only. Thus, the copula is a dependent of the predicate. If the copula is preceded by the inflection marker te, the marker also becomes a dependent of the predicate (In _ wordt aangeraden waakzaam te zijn_, we have (waakzaam,mark,te) )
csubj is used for clausal subjects. Clausal subjects are sometimes introduced by expletive het (marked as expl), as in het blijft onduidelijk wat Japix bedoelt. Clausal subjects can be of category cp, whsub, ti, or oti in the original annotation.
det is used for determiners, ie for elements with DET POS-tag, as explained above.
expl, expl:pv Expletives are het or er when used to introduce a clausal subject (het is verstanding u te laten adviseren, u dient er rekening mee te houden dat…) expl:pv is used for inherent reflexives (_ richt zich op, bevindt zich in, scheidt zich af, jaagt NP tegen zich in het harnas_)
fixed is used for the non-initial parts of multi-word expressions, such as ten aanzien van, voor zover, dan wel, fine fleur) Also, titles of books and other works of art and some institutions are annotated as fixed expressions (De ontdekking van de hemel, Faculteit Kunst en Cultuur) and some amounts (EUR 37,50, 15 uur) Note that the decision on what to label as fixed or not follows largely from the original annotation (ie phrases with category mwu where the parts are not labeled as proper names). Also note that fixed elements can in fact be coordinated (_ maandag 18 t/m zaterdag 23 april 2005, where _april 2005 is shared between to two conjuncts in the original annotatin) and that discontinous fixed expressions exist (exclusively in the so-called wat-voor construction as in wat is dit voor een kutfilm)

flat is used for the non-initial tokens of multi-word proper names (Kees van Kooten) and other multi-word expressions that contain at least one proper name. In particular, in dates like 20 augustus 2000 , 20 is the head with augustus and 2000 as flat dependents, as augustus is a name. Also, some titles of works of art are labeled flat, if at least one of the tokens was labeled as SPEC

deeleigen in the original annotation. ISSUE: there is some inconsistency between when a multi-word unit introduces flat or fixed dependents, but this is caused at least in part by the underlying annotation.

iobj is used for indirect objects that are NOT introduced by a preposition. The original annotation has both prepositional (geef het boek aan haar) and nominal (geef haar het boek) obj2 constituents. In UD, only the latter are iobj, while the former are obl dependents.
mark is used for subordinating conjunctions (dat, omdat, wanneer, hoewel, etc.). The word om is also a mark if it introduces a te-infinitive. The word te preceding a verb is also a mark dependent of the verb. As auxiliaries take no dependents, the te that may precede an auxiliary is attached, somewhat counterintuitively, to the main verb (na door het moeras gedwaald te hebben, here te is a dependent of gedwaald)
nmod, nmod:poss nmod is used for nominal and prepositional phrases modifying a noun (een neiging to dalen, de rente in de VS). In het Dow Jones gemiddelde, Dow is an nmod dependent of gemiddelde. Note also that some nouns can be used as adjective as in de afzijdige waarnemer, where afzijdige is a NOUN and thus an nmod dependent of waarnemer. In Enkele malen the pronoun Enkele is a modifier of the noun in the original annotation, and thus also labeled as nmod. Nmod:poss is used for possessive pronouns (hun oude boeken) en genitives (Nootebooms debuut).
nsubj, nsubj:pass Nsubj is used for the nominal subject of finite sentences. Nsubj:pass is used for the subject of passives. Clausal subjects are labeled csubj.
nummod Nummod is used for NUM elements occurring in pre-nominal position (tien arrestaties, 450.000 mark) In zeven miljard gulden we have zeven as nummod dependent of miljard, while miljard (a NOUN) is a nmod of gulden.
obj is used for the direct object of verbal heads (winst boeken, _een shock oplopen). Note that reflexives are labeled as obj if the verb is not inherently reflexive (in zich emanciperen, zich is an obj).
obl, obl:agent Obl is used for prepositional arguments and adjuncts of a verbal head (klopt met de werkelijkheid, ). In (temporal) nominal adjuncts can appear without preposition (enkele malen), these are also obl. Obl:agent is used for the door-phrase that can be present in passives (hij moet door zijn vrouw tot kalmte worden gebracht). As the underlying annotation does not mark such prepositional phrases, the labeling is based on heuristics and may contain errors.
orphan is used in elliptic constructions where the syntactic head has been elided and more than one dependent remains. The leftmost dependent is attached to the preceding constituent, while the remaining dependents are attached as orphan to the initial dependent (In 850 fondsen boekten winst tegenover 512 een verlies, een verlies is an orphan dependent of 512 which itself is a conj dependent of boekten).
parataxis is used to label utterances that do not form a syntactic unit, but consist of a number of phrases for which no obvious dependency label can be given( In dit in verband met de langere levensduur van de vrouw, dit is the root, with the rest of the phrase headed by levensduur being a parataxis dependent of dit). Note that in cases of ellipsis, there is a preceding conjunction which also contains a predicate that can be seen as identical to the elided element. In parataxis constructions, this is not the case. Parataxis is also used in attribution, as in Het deksel was er afgeslagen, zei Rijkers where the speech verb zei is a parataxis dependent of afgeslagen.
punct is used for punctuation signs.
root is the root of the utterance. This is usually the main verb, but in copula constructions it is the head of the predicate.
xcomp is used for the head of non-finite verbal complements of verbs (de burgemeester wil een traditie handhaven, de debiteuren staan te dringen, hij vraagt om een krediet beschikbaar te stellen), and for predicative complements of non-copula verbs (Fennema werd raadslid, de aandeelhouders vonden het bod onaanvaardbaar). In the enhanced dependencies, the subject of xcomp dependents that are non-finite clauses are added. For other predicative elements no controlled subject is identified.

Treebanks

There are 2 Dutch UD treebanks: