UD Haitian Creole Adolphe
Language: Haitian Creole (code: ht
)
Family: Creole
This treebank has been part of Universal Dependencies since the UD v2.16 release.
The following people have contributed to making this treebank part of UD: Jephtey Adolphe.
Repository: UD_Haitian_Creole-Adolphe
Search this treebank on-line: PML-TQ
Download all treebanks: UD 2.16
License: CC BY-SA 4.0
Genre: grammar-examples
Questions, comments? General annotation questions (either Haitian Creole-specific or cross-linguistic) can be raised in the main UD issue tracker. You can report bugs in this treebank in the treebank-specific issue tracker on Github. If you want to collaborate, please contact [ja983 (æt) scarletmail • rutgers • edu]. Development of the treebank happens directly in the UD repository, so you may submit bug fixes as pull requests against the dev branch.
Annotation | Source |
---|---|
Lemmas | annotated manually in non-UD style, automatically converted to UD, with some manual corrections of the conversion |
UPOS | annotated manually in non-UD style, automatically converted to UD, with some manual corrections of the conversion |
XPOS | not available |
Features | annotated manually in non-UD style, automatically converted to UD, with some manual corrections of the conversion |
Relations | annotated manually in non-UD style, automatically converted to UD, with some manual corrections of the conversion |
Description
This is a treebank for Haitian creole. It contains 3314 sentences and 300,000+ words selected from 1 bible-related source and was annotated programmatically.
Kreyòl (Kreyòl Ayisyen, Haitian Creole, iso-639-1: ht) is the main language of Haïti.
This treebank contains a selection of sentences from the following source:
- “Ann egzamine Ekriti yo chak jou - 2017”
Train: 55,527 tokens Dev: 6,186 tokens Test: 10,021 tokens
The sentences/tokens were randomly put into those 3 buckets.
Acknowledgments
This conversion has been performed by Jephtey Adolphe, a Rutgers University alum.
Statistics of UD Haitian Creole Adolphe
POS Tags
ADJ – ADP – ADV – AUX – CCONJ – DET – INTJ – NOUN – NUM – PART – PRON – PROPN – PUNCT – SCONJ – SYM – VERB – X
Features
Aspect – Definite – ExtPos – Mood – Number – NumType – Person – Polarity – Poss – PronType – Tense – Typo
Relations
acl – acl:relcl – advcl – advcl:cleft – advmod – amod – appos – aux – case – cc – ccomp – compound – compound:svc – conj – cop – dep – det – discourse – dislocated – fixed – flat – flat:name – goeswith – iobj – mark – nmod – nsubj – nummod – obj – obl – obl:arg – obl:mod – parataxis – parataxis:insert – punct – root – vocative – xcomp
Tokenization and Word Segmentation
- This corpus contains 3314 sentences and 71734 tokens.
- This corpus contains 6790 tokens (9%) that are not followed by a space.
- This corpus does not contain words with spaces.
- This corpus contains 14 types of words that contain both letters and punctuation. Examples: Ing-wen, Jean-Dickens, jw.org, l', n', 'dèt', 'peche', Chia-lung, Jean-Pierre, Jr., kè-sote, sere-sere, tèt-chaje, wo-nivo
Morphology
Tags
- This corpus uses 17 UPOS tags out of 17 possible: ADJ, ADP, ADV, AUX, CCONJ, DET, INTJ, NOUN, NUM, PART, PRON, PROPN, PUNCT, SCONJ, SYM, VERB, X
- This corpus contains 9 word types tagged as particles (PART): a, ann, men, non, pa, pou, t, wi, èske
- This corpus contains 29 lemmas tagged as pronouns (PRON): a, anyen, de, ke, ki, kiyès, kote, kwa, kèlkeswa, ladan, li, lui, lòt, menm, mwen, nou, noumenm, ou, pa, pèsonn, sa, sila, tan, te, toulede, tout, wou, yo, youn
- This corpus contains 32 lemmas tagged as determiners (DET): anpil, chak, de, kelkeswa, ki, konbyen, konsa, kèk, kèlkeswa, la, lòt, menm, nenpòt, non, oken, okenn, pa, pifò, plizyè, pyès, sa, sila, sèt, ti, toude, toule, tout, twòp, tèl, yo, yon, youn
- Out of the above, 11 lemmas occurred sometimes as PRON and sometimes as DET: de, ki, kèlkeswa, lòt, menm, pa, sa, sila, tout, yo, youn
- This corpus contains 7 lemmas tagged as auxiliaries (AUX): ap, dwe, ka, pral, se, ta, te
- Out of the above, 4 lemmas occurred sometimes as AUX and sometimes as VERB: dwe, ka, pral, te
- This corpus does not use the VerbForm feature.
Nominal Features
- Plur
- DET: yo, kèk, plizyè, anpil, de, sa, Toule, toude
- PRON: yo, nou, n, y, tout, n', de, noumenm, toulede, yon
- Sing
- DET: a, yon, la, an, nan, sa, lan, chak, youn, lòt
- PRON: l, li, w, ou, m, mwen, sa, n, t, youn
- Def
- DET: a, la, an, nan, yo, lan, sa
- Ind
- DET: yon, anpil, a, plizyè, kèk, youn, chak, de, sa, yo
Degree and Polarity
- Neg
- ADV: pa, pap, t, ap, non
- DET: okenn, pa, oken
- PRON: anyen, pèsonn
Verbal Features
- Prog
- AUX: ap, pral
- Cnd
- AUX: ta
- Pot
- AUX: ka
- Fut
- AUX: pral, apral, pwal, ap
- Past
- AUX: te, t
Pronouns, Determiners, Quantifiers
- Art
- DET: a, yon, la, an, yo, nan, lan, youn, anpil, tout
- Dem
- DET: sa, konsa, sila, a
- PRON: sa, sila
- Neg
- ADV: anyen
- PRON: anyen, pèsonn
- Prs
- DET: pa, Chak
- PRON: yo, l, nou, n, li, w, ou, m, mwen, y
- Rel
- ADV: kote, ki
- PRON: ki, k, ke, kote, kiyès, sa
- SCONJ: ki, k, ke
- Card
- NUM: de, 12, 14, 144000, 1914, 200,000, 2013, 28, 3, 3000
- Yes
- DET: pa, yo
- PRON: li, nou, l, m
- 1
- PRON: nou, n, m, mwen, n', noumenm, pa, t
- 2
- PRON: w, ou, nou, n, pa
- 3
- DET: yo
- PRON: yo, l, li, y, t, sa, l', menm, ni, a
Other Features
- ExtPos
- CCONJ
- SCONJ: kòm
- CCONJ
- Typo
- Yes
- ADJ: o
- NOUN: d, jennon
- PROPN: d
- Yes
Syntax
Auxiliary Verbs and Copula
- This corpus uses 1 lemmas as copulas (cop). Examples: se.
- This corpus uses 6 lemmas as auxiliaries (aux). Examples: te, ap, ka, dwe, pral, ta.
Core Arguments, Oblique Arguments and Adjuncts
Here we consider only relations between verbs (parent) and nouns or pronouns (child).
- nsubj
- VERB--NOUN (866)
- VERB--NOUN-ADP(ak) (1)
- VERB--NOUN-ADP(an) (1)
- VERB--NOUN-ADP(nan) (6)
- VERB--NOUN-ADP(pou) (1)
- VERB--PRON (4540)
- VERB--PRON-ADP(ak) (2)
- VERB--PRON-ADP(nan) (3)
- VERB--PRON-ADP(pou) (2)
- VERB--PRON-ADP(sou) (2)
- obj
- VERB--NOUN (3485)
- VERB--NOUN-ADP(ak) (5)
- VERB--NOUN-ADP(anrapò) (1)
- VERB--NOUN-ADP(bay) (1)
- VERB--NOUN-ADP(de) (8)
- VERB--NOUN-ADP(kijanm) (1)
- VERB--NOUN-ADP(konsenan) (1)
- VERB--NOUN-ADP(kont) (1)
- VERB--NOUN-ADP(nan) (11)
- VERB--NOUN-ADP(pou) (1)
- VERB--NOUN-ADP(sou) (2)
- VERB--PRON (1633)
- VERB--PRON-ADP(ak) (2)
- VERB--PRON-ADP(de) (1)
- VERB--PRON-ADP(konsenan) (1)
- VERB--PRON-ADP(kont) (1)
- VERB--PRON-ADP(nan) (8)
- VERB--PRON-ADP(pou) (2)
- iobj
- VERB--NOUN (12)
- VERB--PRON (73)
Relations Overview
- This corpus uses 7 relation subtypes: acl:relcl, advcl:cleft, compound:svc, flat:name, obl:arg, obl:mod, parataxis:insert
- The following 6 relation types are not used in this corpus at all: csubj, expl, clf, list, orphan, reparandum