home edit page issue tracker

This page pertains to UD version 2.

UD for Welsh

Tokenization and Word Segmentation

Morphology

Tags


Instruction: Specify any unused tags. Explain what words are tagged as PART. Describe how the AUX-VERB and DET-PRON distinctions are drawn, and specify whether there are (de)verbal forms tagged as ADJ, ADV or NOUN. Include links to language-specific tag definitions if any.


Features

Syntax

Inflected prepositions

Originally inflected prepositions (Gender, Person, Number) were annotated as single words. If followed by the corresponding pronoun, they were attached as case to the pronoun, else they were attached as nmod or obl to their nominal or verbal head, as if it were case marked pronouns.

gwrando arni (listening to her)

1  gwrando   gwrando   VERB   ...
2  arni      ar        ADP    ...  1  obl

gwrando arni hi (listening to her)

1  gwrando   gwrando   VERB   ...
2  arni      ar        ADP    ...  3  case
3  hi        hi        PRON   ...  1  obl

From version 2.7 inflected pronouns are annotated as multiword tokens, the preposition is attached to the following pronoun with case. An additional pronoun is attached with compound:redup

gwrando arni

1    gwrando  gwrando   VERB   ...
2-3  arni
2    ar       ar        ADP    ...  3  case
3    hi       hi        PRON   ...  1  obl
1    gwrando  gwrando   VERB   ...
2-3  arni
2    ar       ar        ADP    ...  3  case
3    hi       hi        PRON   ...  1  obl
4    hi       hi        PRON   ...  3  compound:redup

Treebanks

There is one Welsh UD treebank:


Instruction: Treebank-specific pages are generated automatically from the README file in the treebank repository and from the data in the latest release. Link to the respective *-index.html page in the treebanks folder, using the language code and the treebank code in the file name.