UD Maghrebi Arabic French Arabizi
Language: Maghrebi Arabic French (code: qaf
)
Family: Code switching
This treebank has been part of Universal Dependencies since the UD v2.12 release.
The following people have contributed to making this treebank part of UD: Arij Riabi, Farah Essaidi, Amal Fethi, Menel Mahamdi, Djamé Seddah.
Repository: UD_Maghrebi_Arabic_French-Arabizi
Search this treebank on-line: PML-TQ
Download all treebanks: UD 2.15
License: CC BY-SA 4.0
Genre: nonfiction, news
Questions, comments? General annotation questions (either Maghrebi Arabic French-specific or cross-linguistic) can be raised in the main UD issue tracker. You can report bugs in this treebank in the treebank-specific issue tracker on Github. If you want to collaborate, please contact [djame • seddah (æt) gmail • com]. Development of the treebank happens outside the UD repository. If there are bugs, either the original data source or the conversion procedure must be fixed. Do not submit pull requests against the UD repository.
Annotation | Source |
---|---|
Lemmas | annotated manually in non-UD style, automatically converted to UD |
UPOS | annotated manually in non-UD style, automatically converted to UD |
XPOS | annotated manually |
Features | annotated manually in non-UD style, automatically converted to UD |
Relations | annotated manually in non-UD style, automatically converted to UD |
Description
A Universal Dependencies corpus for a romanized user-generated content variety of Algerian, a North-African Arabic dialect known for its frequent usage of code-switching. We added to the UD annotations NER annotations extending the French Treebank NER scheme (Sagot et al, 2012) and Offensive language classification and corrected many of the translations (still ongoing).
This repository includes dataset presented in the paper “Enriching the NArabizi Treebank: A Multifaceted Approach to Supporting an Under-Resourced Language”
The first version of the NArabizi Corpus was presented in (Seddah & al., 2020), with extensive parsing results presented in (Riabi et al, 2021).
Acknowledgments
- contributors: Arij Riabi, Farah Essaidi, Amal Fethi, Menel Mahamdi, Djamé Seddah
- contact: Arij Riabi: arij.riabi@inria.fr, Djamé Seddah: djame.seddah@gmail.com
- UD maintainer: Arij Riabi: arij.riabi@inria.fr, Djamé Seddah: djame.seddah@gmail.com
Statistics of UD Maghrebi Arabic French Arabizi
POS Tags
ADJ – ADP – ADV – AUX – CCONJ – DET – INTJ – NOUN – NUM – PART – PRON – PROPN – PUNCT – SCONJ – VERB – X
Features
AdpType – Gender – Mood – Number – Person – Polarity – PronType – Tense – Typo – VerbForm
Relations
acl – acl:relcl – advcl – advmod – amod – appos – aux – case – cc – ccomp – compound – conj – cop – csubj – dep – det – discourse – dislocated – expl – expl:pv – fixed – flat – goeswith – iobj – list – mark – nmod – nsubj – nummod – obj – obl – parataxis – punct – reparandum – root – vocative – xcomp
Tokenization and Word Segmentation
- This corpus contains 1287 sentences, 18561 tokens and 19793 syntactic words.
- This corpus contains 656 tokens (4%) that are not followed by a space.
- This corpus does not contain words with spaces.
- This corpus contains 88 types of words that contain both letters and punctuation. Examples: l', c', d', j', l’, n', m', c’, qu', d’, t', ech-chourouk, el-, f', jusqu', s', y', ''m3ak, 'Algérie, 'al, 'algérien, 'anseg, 'entente, 'orope), -t-il, /puisque, /w, 3alaycom;, ;essentiel, ;hada, ;les, ;mem, Est-, _bien, agression2-de7*la, al-, amina2003alg@yahoofr, b', be', bel-air, belhaje;, bezaf;, bi-, ch'fa, chaqu'un, chfae;, chika64@hotmailfr, d\, el', f'tourkem
- This corpus contains 1166 multi-word tokens. On average, one multi-word token consists of 2.06 syntactic words.
- There are 857 types of multi-word tokens. Examples: l'algerie, c'est, fel, bel, lalgerie, l'algérie, l3am, brabi, mel, lyoum, j'ai, l'equipe, l'équipe, c’est, el3am, jm'en, lilah, ya, asalamo, d'afrique, l3ame, lel, n'est, alkhadra, bilah, billah, ces, cest, el3ame, elyahoud, essah, jai, l'etat, l3abd, l3ali, lablade, lblad, lebled, lkhadra, lkhir, léquipe, l’affaire, qu'on, t'aime, ta, wala, 3ladik, Bessah, al3alamine, al3am.
Morphology
Tags
- This corpus uses 16 UPOS tags out of 17 possible: ADJ, ADP, ADV, AUX, CCONJ, DET, INTJ, NOUN, NUM, PART, PRON, PROPN, PUNCT, SCONJ, VERB, X
- This corpus does not use the following tags: SYM
- This corpus contains 49 word types tagged as particles (PART): 3la, Annaba, bla, fa, fala, gadi, ghayr, ghi, ghir, ghire, ghure, gir, gira, guir, ila, illa, l, la, lah, lala, lame, lan, lasa, layssa, len, lla, m3a, ma, maa, machi, macho, mafich, mahich, makach, manach, mani, may, mchi, mechi, mechni, mechou, michi, moch, raire, rhir, ri, rir, rire, sawfa
- This corpus contains 191 lemmas tagged as pronouns (PRON): -t-il, 10, 2, 66, 67, 7, 99, ;_ça, Allah, a_lui, ainsi, autour_de_lui, autres, avec, avec_ceci, avec_cela, avec_elle, avec_eux, avec_lui, avec_moi, avec_nous, avec_qui, avec_quoi, avec_toi, avec_vous, c', c'est, c'est_elle_qui, c'est_nous_que, ca, ce, ce-ci, ce_que, ce_qui, ceci, cela, celle-ci, celle_là, celles-ci, celui, celui-ci, celui-là, celui_là, celà, ces, cette, cette_elle, ceux, ceux-là, ceux_la, ceux_là, ceux_qui, chacun, chez_elle, chez_eux, chez_lui, chez_moi, chez_nous, chez_toi, chez_vous, ci, comme, comme_lui, comme_toi, comme_ça, dans_cela, dans_elle, dans_eux, dans_lui, dans_toi, de, de_cela, de_celle, de_elle, de_eux, de_lui, de_moi, de_notre, de_nous, de_tien, de_toi, de_vous, dedans, dessus_d'elle, dessus_nous, elle, elles, en, en_elle, en_nous, en_toi, en_vous, entre_eux, entre_nous, est, et_il, eux, eux_tous, i, ici, il, il_n'y, il_pas, il_y, ils, ils_eux, j, j', je, je_suis, l', l'état, l`, le, lequel, les, leur, leurs, loins_vous, lui, là, m, m', m'en, me, moi, mon, n'est_pas, nous, nous_avons, nous_tous, on, on_a, où, parmi_vous, pas_moi, pour_cette, pour_eux, pour_lui, pour_nous, pour_qui, pour_toi, pour_vous, qu', qu'il, qu'on, quand, que, que_eux, que_nous, que_vous, quelqu'un, qui, quoi, rien, sa, sauf_vous, se, suis, sur_cela, sur_elle, sur_eux, sur_lui, sur_nous, sur_toi, sur_vous, t, t', t-il, te, toi, tous, tous_ceux, tout, toute_chose, tu, vous, vous_êtes, y, you, à, à_cela, à_elle, à_eux, à_lui, à_moi, à_nous, à_tien, à_toi, à_vous, ça
- This corpus contains 64 lemmas tagged as determiners (DET): 3, 5, EL, El, Personne, Quelques, _, al, aucun, aucune, aujourd', ce, ces, cette, chaque, comme_cela, d', de, de_le, des, deux, du, il, l, l', l'équipe, la, la_punition, le, les, leurs, ma, mes, mille, milliers, moitié, mon, nos, notre, numéro, oh_la, plusieurs, premier, première, quatre_cent, quoi, sa, sans, seize, ses, son, ta, ton, tous, tout, toute, toutes, trois, un, une, vos, votre, à, ça
- Out of the above, 16 lemmas occurred sometimes as PRON and sometimes as DET: ce, ces, cette, de, il, l', le, les, leurs, mon, quoi, sa, tous, tout, à, ça
- This corpus contains 2 lemmas tagged as auxiliaries (AUX): avoir, être
- Out of the above, 2 lemmas occurred sometimes as AUX and sometimes as VERB: avoir, être
- There are 3 (de)verbal forms:
- Fin
- VERB: importe, vive, eport, import
- Inf
- VERB: dire, voir, faire, guerrir, vivre, construire, payer, DEPASSE, S, amené
- Part
- VERB: donné, perdu, fait, jouer, posée, classé, commencé, commis, comparé, connu
Nominal Features
- Fem
- VERB: rabhat, tkoun, raha, rahet, rahi, rouhi, 3adaw, 7atecheki, Kanet, MATASTAHELCH
- Masc
- ADP: m3a
- AUX: Kount, a, kona, nkoon
- PRON: t
- VERB: yal3ab, rana, ykoun, ngoul, yahdina, ndir, ndirou, ya3tik, yjib, ina
- Plur
- AUX: kona
- PRON: t
- VERB: rana, ndirou, ina, dertou, narbhou, ndiro, rabhat, rahoum, tkoun, ya3arfou
- Sing
- ADP: m3a
- AUX: Kount, a, nkoon
- VERB: yal3ab, ykoun, ngoul, ndir, ya3tik, yahdina, yjib, nesma3, rah, rak
- VERB-Fin: importe, vive, eport, import
Degree and Polarity
- Neg
- ADJ: machabah
- ADP: ma3andna, ma3andnach, ma3andouche, mafihach, m3andiche, ma33andouche, ma3adna, ma3adnach, ma3and'homch, ma3anda
- ADV: pas, ne, n', ni, n, pa, donc, ka, la, no
- AUX: mahouche
- CCONJ: ni
- DET: la
- NOUN: balahabe, la3ala9a, makida, ma3labalkomch, ma3labalnach
- PART: la, ma, machi, lala, lan, maa, makach, mechni, Annaba, fala
- PRON: walou, rien, mechni, walo, mahou, mechou, wallou, waloo, walouuuu, y
- VERB: makan, makanch, makanche, makanech, matkhafouche, liysa, makach, makache, makch, mamat
- VERB-Inf: mayderanger
Verbal Features
- Imp
- VERB: allez, roh, diri, goulou, qoulli, casse, kon, rouh, afham, al3ab
- VERB-Fin: vive
- Ind
- VERB-Fin: importe, eport, import
- Sub
- VERB: vive, viva, tahya, tahia, ViVeeeeeeeeeeeeeeee, tahiati, viiiiiiiiiiiiiive, viiiiiiiva
- Pres
- VERB-Fin: importe, vive, eport, import
Pronouns, Determiners, Quantifiers
- Dem
- DET: dak, had, hadh, hadi
- PRON: hada, had, hadi, hado, haka, hadou, hadak, hade, hda, hadha
- Ind
- PRON: quoi
- Int
- PRON: me3amen
- Rel
- PRON: li, qui, l, ma, ali, man, les, eli, elli, ki
- 1
- AUX: Kount, kona, nkoon
- VERB: rana, ngoul, ina, ndirou, nrouh, ndir, nesma3, rani, na7ki, narbhou
- 2
- PRON: t
- VERB: rak, dertou, tgoul, habit, rouhi, ta3arfou, tatabi3a, tebka, troh, 3ajbek
- 3
- AUX: a
- PRON: quoi
- VERB: yal3ab, ykoun, ya3tik, yahdina, yjib, rah, yahdik, rahou, yarham, ychafik
- VERB-Fin: importe, vive, eport, import
Other Features
- AdpType
- Prep
- ADJ: ahssenelek
- ADP: fi, de, m3a, f, b, pour, ta3, a, 3la, bi
- ADV: likole, ma, madabikom, madabikoum, malhom, po, pour
- DET: du
- NOUN: 3anbalek, 3labalkoum, contrra, homme, l3aklek, madabiya, rwahkoum, ma3labalkomch, ma3labalnach
- PRON: lahou, li, 3andhoum, 3lik, bi, bih, menou, mina, on, alihada
- PROPN: eurgway
- SCONJ: beli, bli, belli, bili
- VERB: choufoulina, daroulhom, galalkom, idjiboulna, imedouli, ta3matelhoum, ysirmatlama, zidlou, Lazmetlek, darolhom
- Prep
- Typo
- Yes
- ADJ: la, tous
- ADP: par
- ADV: sur, every, c, en, mada, on, peut, pout, tout, pour
- CCONJ: si, w
- INTJ: in, inchaa, incha, macha, bien, bon, inshaa, y
- NOUN: bla, chwin, foot, rwah, w
- PRON: ce, ha, hk
- PROPN: face, a, laah, r
- SCONJ: par, ki, li, pars, wue
- VERB: ay, la, nchaa, tou, De, sa, ya, you, yéf', yéff'
- Yes
Syntax
Auxiliary Verbs and Copula
- This corpus uses 1 lemmas as copulas (cop). Examples: être.
- This corpus uses 2 lemmas as auxiliaries (aux). Examples: avoir, être.
Core Arguments, Oblique Arguments and Adjuncts
Here we consider only relations between verbs (parent) and nouns or pronouns (child).
- nsubj
- VERB--NOUN (229)
- VERB--NOUN-ADP(car) (1)
- VERB--NOUN-ADP(dans) (1)
- VERB--NOUN-ADP(de) (1)
- VERB--NOUN-ADP(jusque) (1)
- VERB--NOUN-ADP(la) (1)
- VERB--NOUN-ADP(pour) (1)
- VERB--NOUN-ADP(à) (2)
- VERB--PRON (547)
- VERB--PRON-ADP(comme) (1)
- VERB--PRON-ADP(dans) (1)
- VERB--PRON-ADP(en) (1)
- VERB--PRON-ADP(jusque) (1)
- VERB--PRON-ADP(pour) (3)
- VERB--PRON-ADP(sur) (1)
- VERB--PRON-ADP(à) (1)
- VERB-Inf--NOUN (2)
- VERB-Inf--PRON (3)
- VERB-Inf--PRON-ADP(de) (1)
- VERB-Part--NOUN (5)
- VERB-Part--PRON (18)
- obj
- VERB--NOUN (904)
- VERB--NOUN-ADP(au) (3)
- VERB--NOUN-ADP(avec) (4)
- VERB--NOUN-ADP(comme) (2)
- VERB--NOUN-ADP(d') (1)
- VERB--NOUN-ADP(de) (6)
- VERB--NOUN-ADP(en) (1)
- VERB--NOUN-ADP(jusque) (1)
- VERB--NOUN-ADP(à) (7)
- VERB--PRON (125)
- VERB--PRON-ADP(comme) (2)
- VERB--PRON-ADP(de) (1)
- VERB--PRON-ADP(à) (1)
- VERB-Inf--NOUN (25)
- VERB-Inf--NOUN-ADP(au) (1)
- VERB-Inf--PRON (7)
- VERB-Part--NOUN (21)
- VERB-Part--NOUN-ADP(de) (1)
- VERB-Part--PRON (4)
- iobj
- VERB--NOUN (1)
- VERB--PRON (27)
- VERB-Inf--PRON (2)
- VERB-Part--PRON (4)
Reflexive Verbs
- This corpus contains 8 lemmas that occur at least once with an expl:pv child. Examples: fou m', enflammez vous, fou m, fous m, fous m', goure se, occupez vous, résume se