UD Cappadocian AMGiC
Language: Cappadocian (code: cpg)
Family: IE
This treebank has been part of Universal Dependencies since the UD v2.15 release.
The following people have contributed to making this treebank part of UD: Konstantinos Sampanis, Prokopis Prokopidis, Furkan Akkurt, Helin Binici.
Repository: UD_Cappadocian-AMGiC
Search this treebank on-line: PML-TQ
Download all treebanks: UD 2.18
License: CC BY-SA 4.0
Genre: nonfiction, news
Questions, comments? General annotation questions (either Cappadocian-specific or cross-linguistic) can be raised in the main UD issue tracker. You can report bugs in this treebank in the treebank-specific issue tracker on Github. If you want to collaborate, please contact [konstantinos • sampanis (æt) yahoo • com]. Development of the treebank happens outside the UD repository. If there are bugs, either the original data source or the conversion procedure must be fixed. Do not submit pull requests against the UD repository.
| Annotation | Source |
|---|---|
| Lemmas | annotated manually in non-UD style, automatically converted to UD |
| UPOS | annotated manually in non-UD style, automatically converted to UD |
| XPOS | annotated manually |
| Features | annotated manually in non-UD style, automatically converted to UD |
| Relations | annotated manually in non-UD style, automatically converted to UD |
Description
The “Asia Minor Greek in Contact” treebank (AMGiC, UD_AMGiC) is compiled from sentences entailing contact-induced morphosyntactic phenomena (CIMSP) that are a result of the contact between Greek and Turkish varieties in Anatolia and in adjacent regions. The sentences are traced in Asia Minor Greek (AMG) dialectal sources. In addition to the UD analysis, the AMGiC treebank provides information concerning the sociolinguistic context within which CIMSP arise.
AMGiC is a UD treebank dealing with cases of Contact-Induced Morphosyntactic Phenomena (CIMSP) in Inner Asia Minor Greek (AMG) that emerged under the influence of Turkish. Inner AMG comprises several interrelated but clearly distinct Cappadocian subdialects as well as the varieties of Silliot and Pharasiot (cf. Manolessou 2019; Cappadocian Greek (CG), Silliot and Pharasiot are in fact classified as distinct dialects, cf. Janse 2020: 203). Given however that the ISO 639-3 code we utilize for AMGiC is cpg, i.e. “Cappadocian Greek”, we employ CG as a pars pro toto designation for all Inner AMG varieties.
Apart from the annotation, AMGiC offers a detailed metadata section, in which CIMSP are tagged (cf. Sampanis & Prokopidis 2021). The current version (as of v2.18) of AMGiC includes CIMSP traced in Silliot and in the Cappadocian Greek (CG) subdialect of Delmeso. Future versions of AMGiC will include Pharasiot and other CG varieties as well.
Acknowledgments
This work was supported by COST Action CA21167 — Universality, diversity and idiosyncrasy in language technology (UniDive).
Statistics of UD Cappadocian AMGiC
POS Tags
ADJ – ADP – ADV – AUX – CCONJ – DET – INTJ – NOUN – NUM – PART – PRON – PROPN – PUNCT – SCONJ – VERB – X
Features
Aspect – Case – Clitic – Definite – Gender – Mood – Number – NumType – PartType – Person – Polarity – Poss – PronType – Tense – VerbForm – Voice
Relations
acl – acl:relcl – advcl – advmod – advmod:emph – amod – appos – aux – aux:q – case – cc – ccomp – conj – cop – csubj – dep – det – det:poss – discourse – expl – iobj – mark – nmod – nsubj – nummod – obj – obl – parataxis – punct – root – vocative – xcomp
Tokenization and Word Segmentation
- This corpus contains 70 sentences, 817 tokens and 820 syntactic words.
- This corpus contains 147 tokens (18%) that are not followed by a space.
- This corpus does not contain words with spaces.
- This corpus contains 20 types of words that contain both letters and punctuation. Examples: c', m', 'ne, apés', s', t'emélia, 'ni, 'ton, 'tun, (é)rχete, Ksevasám', as', dilimléisam', kiriós', op', put', yüsártsisam', és'kam', és'kin, ípsam'
- This corpus contains 3 multi-word tokens. On average, one multi-word token consists of 2.00 syntactic words.
- There are 3 types of multi-word tokens. Examples: domuškam, stu, tórχete.
Morphology
Tags
- This corpus uses 16 UPOS tags out of 17 possible: ADJ, ADP, ADV, AUX, CCONJ, DET, INTJ, NOUN, NUM, PART, PRON, PROPN, PUNCT, SCONJ, VERB, X
- This corpus does not use the following tags: SYM
- This corpus contains 12 word types tagged as particles (PART): Ep, as, de, den, dom, dén, dε, mi, na, re, ren, či
- This corpus contains 20 lemmas tagged as pronouns (PRON): (e)tútus, (e)γó, Ešís, _, cínus, do, ekínos, emís, esí, eγó, kaneís, ne, o, ra, ro, su, to, táre, óči, óčis
- This corpus contains 10 lemmas tagged as determiners (DET): (o), (ο), Etó, o, tiás, tu, téna, énas, ís, χer
- Out of the above, 1 lemmas occurred sometimes as PRON and sometimes as DET: o
- This corpus contains 4 lemmas tagged as auxiliaries (AUX): mi, na, se, ímu
- Out of the above, 1 lemmas occurred sometimes as AUX and sometimes as VERB: ímu
- There are 2 (de)verbal forms:
- Fin
- AUX: nde, ne, 'ne, éni, 'ni, 'ton, 'tun, se, índe, íse
- VERB: laí, leχ, eksévin, gréviz, kásun, qazánǰisi, sikoθún, éršiti, éχu, írten
- Part
- VERB: kimizméni
Nominal Features
- Fem
- ADJ: kalí, meγáli, yerasméni
- DET: čin, či, tes
- NOUN: kóri, Dunyá, enéka, mána, Güzelidyú, Güzelí, ciriás, cirjás, enékan, góri
- NUM: Tris, triz
- PRON: či, čis, ǰis, ekín, zin, zis, ǰi
- VERB-Part: kimizméni
- Masc
- ADJ: A, fikirsúzis
- DET: tu, to, tus, éna
- NOUN: pará, patisáχu, staχtiǰís, vaván, Aγás, Mándis, Qujumǰís, Vavás, astenár, gjavúriri
- PRON: du, tútus, tu, Kaneís, cínus, do, su, tus, tútunu
- PROPN: Yóryis
- Neut
- ADJ: mikró, polá, bašká, kaló, mávra, yavanúδia, ála, áspra, úla, χošá
- DET: to, éna, ta, tu, so, sa, t, Etó, da, tiyá
- NOUN: peδí, pará, psomí, dergizmú, kenér, neró, spíči, t'emélia, Psémata, alísia
- PRON: to, da, ta, do, Τúta, δa
- Plur
- ADJ: polá, mávra, yavanúδia, ála, áspra, úla
- AUX-Fin: nde, índe
- DET: ta, sa, tes, tus, čin
- NOUN: pará, Psémata, alísia, alóγata, gjavúriri, güzelmá, ksíla, líres, maχéria, méres
- NUM: Tris, triz
- PRON: emís, mas, sas, Ešís, más, tun, tus, Τúta
- VERB-Fin: kásun, sikoθún, Ksevasám', dilimléisam', drúte, fam, férum, ipúmi, istedízete, kasinonǰískaši
- Sing
- ADJ: ko, mikró, A, bašká, fikirsúzis, kalí, kaló, meγáli, yerasméni, χošá
- AUX-Fin: ne, 'ne, éni, 'ni, 'ton, 'tun, se, íse, ísu
- DET: to, tu, éna, so, čin, t, či, Etó, da, ta
- NOUN: peδí, kóri, pará, psomí, Dunyá, dergizmú, enéka, kenér, mána, neró
- PRON: to, du, su, tu, da, mu, či, do, ta, tútus
- PROPN: Yóryis
- VERB: laí, leχ, eksévin, gréviz, qazánǰisi, éršiti, éχu, írten, írtis, (é)rχete
- VERB-Fin: laí, leχ, eksévin, gréviz, qazánǰisi, éršiti, éχu, írten, írtis, (é)rχete
- VERB-Part: kimizméni
- Acc
- ADJ: polá, bašká, mikró, ála, úla
- DET: to, tu, éna, ta, čin, so, sa, t, či, da
- NOUN: pará, peδí, psomí, kenér, spíči, vaván, Psémata, alísia, alóγata, cüréi
- NUM: Tris, triz
- PRON: to, da, do, ta, sas, se, či, m, me, s'
- VERB-Part: kimizméni
- Gen
- DET: tu, so
- NOUN: dergizmú, patisáχu, Dunyá, Güzelidyú, cirjás, insanjú, klišás, korízju, zuliás, ölzüjü
- PRON: du, tu, mu, su, čis, mas, ǰis, m, m', s'
- Nom
- ADJ: kalí, kaló, mikró, mávra, yavanúδia, yerasméni, áspra, χošá
- DET: to, éna, ta, Etó
- NOUN: kóri, peδí, staχtiǰís, t'emélia, Aγás, Dunyá, Güzelí, Mándis, Qujumǰís, Vavás
- PRON: tútus, emís, ši, Ešís, Kaneís, cínus, ekín, esí, eγó, su
- Voc
- ADJ: A, fikirsúzis
- NOUN: mána, Peδí, javrú, teté, ádras, ǰaním
- PROPN: Yóryis
- Def
- DET: to, tu, ta, so, čin, sa, t, či, da, tes
- Ind
- DET: éna, téna, tóna
Degree and Polarity
- Neg
- PART: den, de, dom, dén, dε, re
Verbal Features
- Imp
- AUX-Fin: éni, 'ne, 'ni, 'ton, 'tun, ne, se, íse
- VERB-Fin: laí, gréviz, leχ, éχu, (é)rχete, Rotá, cimáse, drúte, eršinónǰiska, eršístiniz
- Perf
- VERB: eksévin, kásun, qazánǰisi, sikoθún, írten, írtis, Ksevasám', Ránsen, baγərtzísi, báris
- VERB-Fin: eksévin, kásun, qazánǰisi, sikoθún, írten, írtis, Ksevasám', Ránsen, baγərtzísi, báris
- VERB-Part: kimizméni
- Imp
- VERB-Fin: pe, pike, skáma, ápar
- Ind
- AUX-Fin: nde, ne, 'ne, éni, 'ni, 'ton, 'tun, se, índe, íse
- VERB: laí, leχ, eksévin, gréviz, qazánǰisi, éršiti, éχu, írten, írtis, (é)rχete
- VERB-Fin: laí, leχ, eksévin, gréviz, qazánǰisi, éršiti, éχu, írten, írtis, (é)rχete
- Sub
- VERB: kásun, sikoθún, baγərtzísi, báris, düšünǰísu, erzí, fam, forósu, fáγu, galatzépši
- VERB-Fin: kásun, sikoθún, baγərtzísi, báris, düšünǰísu, erzí, fam, forósu, fáγu, galatzépši
- Fut
- VERB-Fin: eleísis, pári, páru, vlépis
- Past
- AUX-Fin: 'ton, 'tun
- VERB: eksévin, qazánǰisi, írten, írtis, Ksevasám', Ránsen, dilimléisam', déken, emóšh, eršinónǰiska
- VERB-Fin: eksévin, qazánǰisi, írten, írtis, Ksevasám', Ránsen, dilimléisam', déken, eršinónǰiska, estáθin
- Pres
- AUX-Fin: nde, ne, 'ne, éni, 'ni, se, índe, íse, ísu
- VERB: laí, gréviz, leχ, éršiti, éχu, (é)rχete, Rotá, cimáse, drúte, düšünǰísu
- VERB-Fin: laí, gréviz, leχ, éršiti, éχu, (é)rχete, Rotá, cimáse, drúte, düšünǰísu
- Act
- VERB-Fin: laí, qazánǰisi, éχu, Ksevasám', Rotá, baγərtzísi, báris, dilimléisam', düšünǰísu, eleísis
- Pass
- AUX-Fin: 'ne, 'ni, éni
- VERB-Fin: sikoθún, éršiti, írtis, cimáse, eršinónǰiska, eršístiniz, estáθin, kasinonǰískaši, kimáti, pénišken
- VERB-Part: kimizméni
Pronouns, Determiners, Quantifiers
- Art
- DET: to, tu, éna, ta, so, čin, sa, t, či, da
- Dem
- DET: Etó
- PRON: ro, tútus, ra, cínus, ekín, tútunu, Τúta
- Ind
- DET: χer
- PRON: táre, Kaneís
- Int
- DET: tiyá
- PRON: ne
- Prs
- PRON: to, du, su, tu, da, mu, či, do, ta, čis
- Rel
- PRON: to, óči, óčis
- Card
- NUM: seránda, enyá, tría, δyo, Tris, triz
- Yes
- PRON: du, mu, su, čis, mas, tu, ǰis, m, m', más
- 1
- PRON: mu, emís, m, mas, eγó, m', me, más, ši, γo
- VERB: éχu, Ksevasám', dilimléisam', düšünǰísu, emóšh, fam, filáto, forósu, fáγu, férum
- VERB-Fin: éχu, Ksevasám', dilimléisam', düšünǰísu, fam, filáto, forósu, fáγu, férum, ipúmi
- 2
- AUX-Fin: se, íse, ísu
- PRON: su, s', sas, se, Ešís, esí, séna, ši
- VERB: gréviz, báris, cimáse, drúte, eleísis, istedízete, klóθete, ksévreté, les, pe
- VERB-Fin: gréviz, báris, cimáse, drúte, eleísis, istedízete, klóθete, ksévreté, les, pe
- 3
- AUX-Fin: nde, ne, 'ne, éni, 'ni, 'ton, 'tun, índe
- PRON: to, du, tu, da, či, do, ta, tútus, čis, ǰis
- VERB-Fin: laí, leχ, eksévin, kásun, qazánǰisi, sikoθún, éršiti, írten, (é)rχete, Rotá
Other Features
- Clitic
- Yes
- PRON: du, tu, da, mu, su, či, do, m, ta, to
- Yes
- PartType
- Neg
- PART: re
- Neg
Syntax
Auxiliary Verbs and Copula
- This corpus uses 1 lemmas as copulas (cop). Examples: ímu.
- This corpus uses 2 lemmas as auxiliaries (aux). Examples: na, se.
Core Arguments, Oblique Arguments and Adjuncts
Here we consider only relations between verbs (parent) and nouns or pronouns (child).
- nsubj
- VERB--NOUN-Nom (5)
- VERB--PRON (1)
- VERB-Fin--NOUN-Acc (1)
- VERB-Fin--NOUN-Acc-ADP(kadár) (1)
- VERB-Fin--NOUN-Nom (22)
- VERB-Fin--PRON-Nom (14)
- obj
- VERB--NOUN (1)
- VERB--NOUN-Acc (4)
- VERB--NOUN-Nom (1)
- VERB--PRON (1)
- VERB--PRON-Acc (2)
- VERB-Fin--NOUN (1)
- VERB-Fin--NOUN-Acc (33)
- VERB-Fin--NOUN-Nom (1)
- VERB-Fin--PRON (1)
- VERB-Fin--PRON-Acc (21)
- VERB-Fin--PRON-Gen (1)
- iobj
- VERB--PRON-Acc (1)
- VERB-Fin--NOUN-Acc (1)
- VERB-Fin--NOUN-Acc-ADP(s(e)) (1)
- VERB-Fin--PRON-Acc (2)
- VERB-Fin--PRON-Gen (3)
Relations Overview
- This corpus uses 4 relation subtypes: acl:relcl, advmod:emph, aux:q, det:poss
- The following 9 relation types are not used in this corpus at all: dislocated, clf, fixed, flat, compound, list, orphan, goeswith, reparandum