home edit page issue tracker

This page pertains to UD version 2.

It appears that you have Javascript disabled. Please consider enabling Javascript for this page to see the visualizations.

Specification of v1-to-v2 conversion and validation

Phenomenon	Conversion	Validation
Word segmentation	–	Make list of word forms and lemmas containing spaces
PoS tags	CONJ → CCONJ	CONJ → CCONJ
	–	Make list of lemmas of auxiliaries (`AUX`)
	–	Make list of lemmas of approved particles (`PART`)
Features	Rename a number of features/values	Revise inventory according to v2
	–	Even some v1 language-specific features are now disallowed because the new universal features should be used
Core arguments	dobj → obj	dobj → obj
	nsubjpass → nsubj:pass	nsubjpass → nsubj:pass
	csubjpass → csubj:pass	csubjpass → csubj:pass
Nominal modifiers	nmod with non-nominal head → obl	–
	flag other nmod for manual inspection	–
Copula	–	Make list of lemmas with cop relation ]
Functional relations	auxpass → aux:pass	auxpass → aux:pass

Validation: The set of lists is growing and we should not require that all lists are present for all languages even if empty (as was the practice so far). Furthermore, we should set up a validation procedure for these lists. Sometimes people bypass validation by adding something to the list of allowed labels which should never appear there.

Validation: Selected tests that have been part of the content validation should become part of the format validation. They would be optional when the validator is invoked from the command line, but the on-line overview would turn them on by default. Thus treebanks that contain (for example) right-to-left fixed relations would simply be invalid and highlighted in red.