UD for Luxembourgish
Tokenization and Word Segmentation
- Words are generally delimited by whitespace characters.
- According to typographical rules, many punctuation marks are attached to a neighboring word.
We usually tokenize them as separate tokens (words)
with the exception of:
- The determiner d’ “the” is kept as one token with the apostrophe.
- Abbreviations which are kept as one token with the period.
- Luxembourgish compounds are written as one word and we do not split them.
- There are two classes of multi-word tokens:
- The contractions of prepositions and definite articles. (Example: vum = vun + dem “from the”.)
- Univerbation due to main verb with verb particle. (Example: opzehalen = ze + ophalen “to stop”.)
Morphology
Tags
- Luxembourgish uses all 17 universal POS categories, including particles (PART).
- Luxembourgish auxiliary verbs (AUX) are:
- sinn for perfect tenses of certain verbs (Si ass zu Paräis opgewuess. “She grew up in Paris”) and passive constructions (D’Haus ass gebaut. “The house is built”).
- hunn for perfect tenses of certain verbs (D’Meedchen huet e Bréif u säi Frënd geschriwwen “The girl wrote a letter to her friend.”)
- goen for subjunctive constructions (Ech géing heem goen. “I would go home.”).
- ginn for subjunctive (Et gouf gëschter zougestallt. “It was delivered yesterday”) and passive constructions (Ech géif gär mat him schwätzen. “I would like to talk to her”).
- kréien for passive constructions (Ech kréie gehollef. “I am being helped”).
- wäerten for subjunctive constructions (Dat wäert sech änneren. “That will change.”)
- modal verbs kënnen “can”, mussen “must”, sollen “shall”, däerfen “may”, wollen “want”.
- The verbs sinn, hunn, goen, ginn, and kréien can also occur as normal verbs (VERB), meaning “be, have, go, give, and get”.
Features
- For s clitics in subordinated clauses the feature clitic will be used (currently under misc.)
- For negation particles the feature negation will be used (currently under misc.)
Syntax
- The copula verb sinn (be) is used in equational, attributional, locative, possessive and benefactory nonverbal clauses.
- The following relation subtypes are used in Luxembourgish:
Treebanks
There is 1 Luxembourgish UD treebank: