UD Haitian Creole Autogramm
Language: Haitian Creole (code: ht
)
Family: Creole
This treebank has been part of Universal Dependencies since the UD v2.13 release.
The following people have contributed to making this treebank part of UD: Claudel Pierre-Louis, Sandra Jagodzińska, Sylvain Kahane, Agata Savary, Emmanuel Schang.
Repository: UD_Haitian_Creole-Autogramm
Search this treebank on-line: PML-TQ
Download all treebanks: UD 2.14
License: CC BY-SA 4.0
Genre: grammar-examples
Questions, comments? General annotation questions (either Haitian Creole-specific or cross-linguistic) can be raised in the main UD issue tracker. You can report bugs in this treebank in the treebank-specific issue tracker on Github. If you want to collaborate, please contact [sylvain (æt) kahane • fr]. Development of the treebank happens directly in the UD repository, so you may submit bug fixes as pull requests against the dev branch.
Annotation | Source |
---|---|
Lemmas | annotated manually |
UPOS | annotated manually, natively in UD style |
XPOS | not available |
Features | annotated manually, natively in UD style |
Relations | annotated manually in non-UD style, automatically converted to UD |
Description
This is a treebank of Haitian creole. It contains 144 sentences selected from 3 major genres: bible, literary texts, newspapers.
Kreyòl (Kreyòl Ayisyen, Haitian Creole, iso-639-1: ht) is the main language of Haïti. The dialect described here is the Cap Haïtien dialect which differs slightly in its lexicon with Center and South varieties.
This treebank contains a selection of sentences from the following sources:
- the bible in Haitian creole
- extracts of a novel: Roy (2021) “Lanmou titato”
- newspaper texts from “VOA kreyol” and “PAPDA”
The corpus contains 144 sentences and 3418 tokens. The annotation was done in ArboratorGrew in the SUD format and automatically converted to the UD format..
Acknowledgments
This treebank is the outcome of a Master internship project by Sandra Jagodzińska (LACITO, CNRS, France) and Claudel Pierre-Louis (LISN, Université Paris-Saclay, CNRS, France). It was funded by:
- an AIP project at the LISN laboratory et the Paris-Saclay University
- the French ANR AUTOGRAMM project (ANR-21-CE38-0017)
References
- Sandra Jagodzińska, Claudel Pierre-Louis, Sylvain Kahne, Agata Savary, Emmanuel Schang (submitted) Le premier corpus arboré en créole haïtien, in Journée d’études “Le créole haïtien : histoire, évolution, grammaire et lexique”, Université d’Etat d’Haïti
Statistics of UD Haitian Creole Autogramm
POS Tags
ADJ – ADP – ADV – AUX – CCONJ – DET – INTJ – NOUN – NUM – PRON – PROPN – PUNCT – SCONJ – SYM – VERB – X
Features
Aspect – Definite – ExtPos – Mood – Number – NumType – Person – Polarity – Poss – PronType – Tense – Typo
Relations
acl – acl:relcl – advcl – advcl:cleft – advmod – amod – appos – aux – case – cc – ccomp – compound:svc – conj – cop – dep – det – discourse – dislocated – fixed – flat:name – goeswith – iobj – mark – nmod – nsubj – nummod – obj – obl – obl:arg – obl:mod – parataxis – parataxis:insert – punct – reparandum – root – vocative
Tokenization and Word Segmentation
- This corpus contains 144 sentences and 3279 tokens.
- This corpus contains 279 tokens (9%) that are not followed by a space.
- This corpus does not contain words with spaces.
- This corpus contains 9 types of words that contain both letters and punctuation. Examples: Ing-wen, Jean-Dickens, Ayiti', Chia-lung, Jean-Pierre, kè-sote, sere-sere, tèt-chaje, wo-nivo
Morphology
Tags
- This corpus uses 16 UPOS tags out of 17 possible: ADJ, ADP, ADV, AUX, CCONJ, DET, INTJ, NOUN, NUM, PRON, PROPN, PUNCT, SCONJ, SYM, VERB, X
- This corpus does not use the following tags: PART
- This corpus contains 10 lemmas tagged as pronouns (PRON): anyen, ke, ki, kwa, li, mwen, nou, ou, sa, yo
- This corpus contains 14 lemmas tagged as determiners (DET): chak, ki, kèk, la, lòt, nempòt, non, oken, okenn, plizyè, sa, tout, yo, yon
- Out of the above, 3 lemmas occurred sometimes as PRON and sometimes as DET: ki, sa, yo
- This corpus contains 7 lemmas tagged as auxiliaries (AUX): ap, dwe, ka, pral, se, ta, te
- Out of the above, 1 lemmas occurred sometimes as AUX and sometimes as VERB: ka
- This corpus does not use the VerbForm feature.
Nominal Features
- Plur
- DET: yo, kèk, plizyè
- PRON: yo, nou, n, y, yon
- Sing
- DET: yon, a, la, an, sa, nan, chak, lòt, yo, yoon
- PRON: m, li, mwen, l, sa, w, ni, Ou, nou
- Def
- DET: a, yo, la, an, nan, sa
- Ind
- DET: yon, a, yo, yoon
Degree and Polarity
- Neg
- ADV: pa, p
- DET: okenn, oken
Verbal Features
- Prog
- AUX: ap
- Cnd
- AUX: ta
- Fut
- AUX: pral, pwal
- Past
- AUX: te, t
Pronouns, Determiners, Quantifiers
- Art
- DET: yon, yo, a, la, an, nan, yoon
- Dem
- DET: sa, a
- PRON: sa
- Neg
- ADV: anyen
- PRON: anyen
- Prs
- PRON: m, li, yo, l, mwen, nou, n, w, y, ni
- Rel
- ADV: kote
- DET: ki
- PRON: ki, ke, k
- SCONJ: ke
- Card
- NUM: 200,000
- Yes
- DET: yo
- PRON: li, l, m, nou
- 1
- PRON: m, mwen, nou, n
- 2
- PRON: w, nou, Ou, n
- 3
- PRON: li, yo, l, y, ni, sa, yon
Other Features
- ExtPos
- CCONJ
- SCONJ: kòm
- CCONJ
- Typo
- Yes
- ADJ: o
- NOUN: d, jennon
- PROPN: d
- Yes
Syntax
Auxiliary Verbs and Copula
- This corpus uses 1 lemmas as copulas (cop). Examples: se.
- This corpus uses 6 lemmas as auxiliaries (aux). Examples: te, ap, ka, ta, pral, dwe.
Core Arguments, Oblique Arguments and Adjuncts
Here we consider only relations between verbs (parent) and nouns or pronouns (child).
- nsubj
- VERB--NOUN (111)
- VERB--PRON (219)
- obj
- VERB--NOUN (158)
- VERB--NOUN-ADP(ak) (1)
- VERB--NOUN-ADP(de) (2)
- VERB--NOUN-ADP(kijanm) (1)
- VERB--NOUN-ADP(sou) (1)
- VERB--PRON (34)
- iobj
- VERB--NOUN (5)
- VERB--PRON (8)
Relations Overview
- This corpus uses 7 relation subtypes: acl:relcl, advcl:cleft, compound:svc, flat:name, obl:arg, obl:mod, parataxis:insert
- The following 2 main types are not used alone, they are always subtyped: compound, flat
- The following 6 relation types are not used in this corpus at all: csubj, xcomp, expl, clf, list, orphan