UD for Afrikaans
Tokenization and Word Segmentation
- Words are generally delimited by whitespace or punctuation. Exceptions:
- Numerical expressions (including dates) are treated as single words and may contain punctuation or whitespace: 1-1-1970, 11:00, 2 000. This still needs be partially implemented.
- Abbreviations are treated as single words and may contain punctuation or whitespace: art.
- Contractions like dis are considered to be one token. However, they don’t occur in this treebank.
Morphology
Tags
- Afrikaans uses all 17 universal POS categories. Currently interjections are not covered by the treebank, as it contains legal texts from government websites.
- Particle (PART) is used for te introducing an infinitive, for the negation particle nie, and the genitival particle se.
- The tag DET is used for articles, demonstrative and indefinite pronouns. Other pronouns get the tag PRON.
- Auxiliaries (AUX) are all verbal in Afrikaans and can be grouped into four types:
- The copula wees.
- The temporal auxiliary het (have), which combines with the past participle of the main verb to form perfect tenses.
- The passive auxiliaries word (present) and wees (past), which combine with the past participle of the main verb passives.
- The modal verbs sal, wil, mag, durf, kan, moet, moenie, behoort, hoef.
Features
- Nouns (NOUN) and proper names (PROPN) inflect for Number and Degree (diminutives).
- Most adjectives (ADJ) have different forms in attributive and predicative positions, which is indicated by the AdjType feature. In addition, many adjectives inflect for Degree (positive, comparative, superlative), and rarely for Case, which could be genitive when used independently.
- Adverbs inflect only for Degree.
- Verbs can be finite or infinite (reflected in VerbForm). Finite forms inflect for Tense. For auxiliaries, the type (copula, modal, temporal/passive) is reflected in VerbType. Transitivity of a verb is indicate by Subcat.
- The type of determiner and pronoun is reflected in PronType. Articles are definite or indefinite (Definite). Personal and reflexive pronouns are inflected for Number, Case (nominative or accusative/oblique) and Person; possessive pronouns for Number and Person.
- The type of particle is reflected in PartType.
- The type of adposition is reflected in AdpType.
Syntax
- Subjects have the following characteristics:
- Word order: Subjects immediately follow the finite verb and precede negation in verb-initial main clauses.
- Case marking: Subjects occur in nominative case without adpositions.
- Passivization: Subjects are suppressed when verbs (both intransitive and transitive) are passivized.
- Control: Subjects control the subjects of absolute adverbials.
- Relativization: Relative pronouns with subject function cannot be omitted.
- Objects have the following characteristics:
- Word order: Objects immediately follow the main verb unless topicalized.
- Case marking: Objects occur in nominative case (if nouns) or accusative case (if pronouns) without adpositions.
- Passivization: Objects become (non-expletive) subjects when verbs are passivized.
- The copula verb wees (be) is used in equational, attributional, locative, possessive, existential and benefactory nonverbal clauses.
Relations Overview
- The following relation subtypes can be used in Afrikaans:
- nsubj:pass for nominal subjects of passive verbs
- csubj:pass for clausal subjects of passive verbs (currently not present)
- aux:pass for passive auxiliaries
- compound:prt for verb particles
Treebanks
There is one Afrikaans UD treebank: