UD for Welsh
Tokenization and Word Segmentation
- In general Welsh is tokenized as English. Upper case is used for the first word of a sentence, proper names, months and weekdays.
- A notable difference is the apostrophe, which is part of the following word. Examples
- Mae o’n dod («He is coming»): Mae, o, ‘n, dod
- i’m weld («to see me»): i, ‘m, weld
- â’u ffrindiau («with their friends»): â, ‘u, ffrindiau
- other shortened forms: ‘w, ‘i, ‘th
- In same cases the apostrophe goes with the preceding word
- … yw f’enw i («… is my name»): yw, f’, enw, i
Morphology
Tags
- PROPN: the XPOS distinguished between person, palce, organisation (and propn for all other types)
- DET is used for the definite article (XPOS: art)
- PRON: five subclasses (XPOS):
- dem: demonstrative pronouns (hwn, hon, etc.)
- refl: reflexive pronouns (hun, hunan, etc.)
- rel: relative pronoun (a)
- indep: independent pronouns (used in subject position)
- dep: dependent pronouns (used in object position and as possessives, e.g. fy nhŷ «my house», fy ngweld «[to] see me»)
- pron: interrogatives and others (beth, neb, pwy, rhai, sawl)
- AUX is used in three cases:
- for the auxiliary verb bod, if inflected and in copula position)
- for TAM markers (yn (XPOS: impf), wedi (ante), newydd (ante), heb (ante), hen (ante), ar (post), am (post) (maybe changed to PART in a future version)
- for preverbals (y, a, mi, fe) (maybe changed to PART in a future version)
- VERB is used for all finite verbs, including bod if it is the main verb (followed by a verbnoun). Verbnouns however are marked as NOUN (with XPOS verbnoun) since they function syntactically as nouns (the direct object is in a genetive construction, the subject is marked with a preposition). See conjugation tables for forms and corresponding UD Tense/Mood values
- ADP: inflected prepostions are marked with the XPOS iprep, other preposition have the XPOS prep)
- PART is only used for the predicative marker yn (which triggers soft mutation on the following word, in difference to the TAM marker yn with does not trigger any mutation and the preposition yn which triggers nasal mutation). The predicative yn is used before nouns and adjectives in head position Mae Siôn yn athro «Siôn is a teacher», Roedd Nia yn gyflym «Nia was fast»
- The ADV class also contains dyma, dyna and dacw, even though they can function as a copula.
Instruction: Specify any unused tags. Explain what words are tagged as PART. Describe how the AUX-VERB and DET-PRON distinctions are drawn, and specify whether there are (de)verbal forms tagged as ADJ, ADV or NOUN. Include links to language-specific tag definitions if any.
Features
- Additional features exist to indicate the initial mutation
- Mutation with values
AM
,NM
orSM
for aspirated, nasal or soft mutation - impersonal form of verbs use
Person=0
: cyhoeddwyd y llyfr y llinedd «One has published the book last year» (cf. French «on a publié le livre l’an dernier» or German «man hat das Buch letztes Jahr veröffentlicht»). Usually the impersonal forms are translated by passive forms in English, French or German. - Tense=Fut,Imp,Pres,Past,Pqp
- Mutation with values
Syntax
- Verbnouns function as nouns in Welsh: The direct object is in a genetive case (like possessives for other nouns), subjects (unless linked indirectly via a xcomp relation, are attached using a prepositional phrase. However, currently, we still use nsubj, obj , obl, csubj, ccomp and xcomp for dependents of verbnouns, in opposition to nmod etc. for nouns.
- Welsh specific dependency relation
- case:pred only to attach the predicative yn (PART) to its head noun or ajdective
- Other relations with
:
- acl:relcl
- flat:name
- obl:agent (gan attached agents for impersonal verb forms)
- nmod:agent (gan attached agents for verbnouns in cael passives)
- The following multi-word expressions use the fixed dependency relation
- o hyd «always«
- ar hyd «along»
- ar draws «across»
- hyd at «as far as»
- hyd yn oed «even»
- dim ond «only»
- i fyny «up»
- i mewn «into»
- o fewn «within»
- o dan «under»
- o hyd «always»
- ynglŷn â «in connection with»
- yn hytrach «rather»
- yn erbyn «against»
- modd bynnag «however»
- oddi ar «since»
- oddi yma «from here»
- oddi wrth «from»
- oddi mewn «within»
- ar draws «across»
- er gwaethaf «in spite of»
- er mai «although»
- er drwg «despite»
- wrth gwrs «of course»
- wrth i «to»
- yn hytrach «rather than»
- ynglŷn â «regarding»
Inflected prepositions
Originally inflected prepositions (Gender, Person, Number) were annotated as single words. If followed by the corresponding pronoun, they were attached as case
to the pronoun, else they were attached as nmod
or obl
to their nominal or verbal head, as if it were case marked pronouns.
gwrando arni (listening to her)
1 gwrando gwrando VERB ...
2 arni ar ADP ... 1 obl
gwrando arni hi (listening to her)
1 gwrando gwrando VERB ...
2 arni ar ADP ... 3 case
3 hi hi PRON ... 1 obl
From version 2.7 inflected pronouns are annotated as multiword tokens, the preposition is attached to the following pronoun with case
. An additional pronoun is attached with compound:redup
gwrando arni
1 gwrando gwrando VERB ...
2-3 arni
2 ar ar ADP ... 3 case
3 hi hi PRON ... 1 obl
1 gwrando gwrando VERB ...
2-3 arni
2 ar ar ADP ... 3 case
3 hi hi PRON ... 1 obl
4 hi hi PRON ... 3 compound:redup
Treebanks
There is one Welsh UD treebank:
Instruction: Treebank-specific pages are generated automatically from the README file in the treebank repository and
from the data in the latest release. Link to the respective *-index.html
page in the treebanks
folder, using the language code
and the treebank code in the file name.