home edit page issue tracker

This page pertains to UD version 2.

It appears that you have Javascript disabled. Please consider enabling Javascript for this page to see the visualizations.

UD Cappadocian AMGiC

Language: Cappadocian (code: cpg)
Family: IE

This treebank has been part of Universal Dependencies since the UD v2.15 release.

The following people have contributed to making this treebank part of UD: Konstantinos Sampanis, Prokopis Prokopidis, Furkan Akkurt, Helin Binici.

Repository: UD_Cappadocian-AMGiC
Search this treebank on-line: PML-TQ
Download all treebanks: UD 2.18

License: CC BY-SA 4.0

Genre: nonfiction, news

Questions, comments? General annotation questions (either Cappadocian-specific or cross-linguistic) can be raised in the main UD issue tracker. You can report bugs in this treebank in the treebank-specific issue tracker on Github. If you want to collaborate, please contact [konstantinos • sampanis (æt) yahoo • com]. Development of the treebank happens outside the UD repository. If there are bugs, either the original data source or the conversion procedure must be fixed. Do not submit pull requests against the UD repository.

Annotation	Source
Lemmas	annotated manually in non-UD style, automatically converted to UD
UPOS	annotated manually in non-UD style, automatically converted to UD
XPOS	annotated manually
Features	annotated manually in non-UD style, automatically converted to UD
Relations	annotated manually in non-UD style, automatically converted to UD

Description

The “Asia Minor Greek in Contact” treebank (AMGiC, UD_AMGiC) is compiled from sentences entailing contact-induced morphosyntactic phenomena (CIMSP) that are a result of the contact between Greek and Turkish varieties in Anatolia and in adjacent regions. The sentences are traced in Asia Minor Greek (AMG) dialectal sources. In addition to the UD analysis, the AMGiC treebank provides information concerning the sociolinguistic context within which CIMSP arise.

AMGiC is a UD treebank dealing with cases of Contact-Induced Morphosyntactic Phenomena (CIMSP) in Inner Asia Minor Greek (AMG) that emerged under the influence of Turkish. Inner AMG comprises several interrelated but clearly distinct Cappadocian subdialects as well as the varieties of Silliot and Pharasiot (cf. Manolessou 2019; Cappadocian Greek (CG), Silliot and Pharasiot are in fact classified as distinct dialects, cf. Janse 2020: 203). Given however that the ISO 639-3 code we utilize for AMGiC is cpg, i.e. “Cappadocian Greek”, we employ CG as a pars pro toto designation for all Inner AMG varieties.

Apart from the annotation, AMGiC offers a detailed metadata section, in which CIMSP are tagged (cf. Sampanis & Prokopidis 2021). The current version (as of v2.18) of AMGiC includes CIMSP traced in Silliot and in the Cappadocian Greek (CG) subdialect of Delmeso. Future versions of AMGiC will include Pharasiot and other CG varieties as well.

Acknowledgments

This work was supported by COST Action CA21167 — Universality, diversity and idiosyncrasy in language technology (UniDive).

Statistics of UD Cappadocian AMGiC

POS Tags

ADJ – ADP – ADV – AUX – CCONJ – DET – INTJ – NOUN – NUM – PART – PRON – PROPN – PUNCT – SCONJ – VERB – X

Features

Aspect – Case – Clitic – Definite – Gender – Mood – Number – NumType – PartType – Person – Polarity – Poss – PronType – Tense – VerbForm – Voice

Relations

acl – acl:relcl – advcl – advmod – advmod:emph – amod – appos – aux – aux:q – case – cc – ccomp – conj – cop – csubj – dep – det – det:poss – discourse – expl – iobj – mark – nmod – nsubj – nummod – obj – obl – parataxis – punct – root – vocative – xcomp

Tokenization and Word Segmentation

This corpus contains 70 sentences, 817 tokens and 820 syntactic words.

This corpus contains 147 tokens (18%) that are not followed by a space.

This corpus does not contain words with spaces.

This corpus contains 20 types of words that contain both letters and punctuation. Examples: c', m', 'ne, apés', s', t'emélia, 'ni, 'ton, 'tun, (é)rχete, Ksevasám', as', dilimléisam', kiriós', op', put', yüsártsisam', és'kam', és'kin, ípsam'

This corpus contains 3 multi-word tokens. On average, one multi-word token consists of 2.00 syntactic words.
There are 3 types of multi-word tokens. Examples: domuškam, stu, tórχete.

Morphology

Nominal Features

Gender

Fem
- ADJ: kalí, meγáli, yerasméni
- DET: čin, či, tes
- NOUN: kóri, Dunyá, enéka, mána, Güzelidyú, Güzelí, ciriás, cirjás, enékan, góri
- NUM: Tris, triz
- PRON: či, čis, ǰis, ekín, zin, zis, ǰi
- VERB-Part: kimizméni

Masc
- ADJ: A, fikirsúzis
- DET: tu, to, tus, éna
- NOUN: pará, patisáχu, staχtiǰís, vaván, Aγás, Mándis, Qujumǰís, Vavás, astenár, gjavúriri
- PRON: du, tútus, tu, Kaneís, cínus, do, su, tus, tútunu
- PROPN: Yóryis

Neut
- ADJ: mikró, polá, bašká, kaló, mávra, yavanúδia, ála, áspra, úla, χošá
- DET: to, éna, ta, tu, so, sa, t, Etó, da, tiyá
- NOUN: peδí, pará, psomí, dergizmú, kenér, neró, spíči, t'emélia, Psémata, alísia
- PRON: to, da, ta, do, Τúta, δa

Number

Plur
- ADJ: polá, mávra, yavanúδia, ála, áspra, úla
- AUX-Fin: nde, índe
- DET: ta, sa, tes, tus, čin
- NOUN: pará, Psémata, alísia, alóγata, gjavúriri, güzelmá, ksíla, líres, maχéria, méres
- NUM: Tris, triz
- PRON: emís, mas, sas, Ešís, más, tun, tus, Τúta
- VERB-Fin: kásun, sikoθún, Ksevasám', dilimléisam', drúte, fam, férum, ipúmi, istedízete, kasinonǰískaši

Sing
- ADJ: ko, mikró, A, bašká, fikirsúzis, kalí, kaló, meγáli, yerasméni, χošá
- AUX-Fin: ne, 'ne, éni, 'ni, 'ton, 'tun, se, íse, ísu
- DET: to, tu, éna, so, čin, t, či, Etó, da, ta
- NOUN: peδí, kóri, pará, psomí, Dunyá, dergizmú, enéka, kenér, mána, neró
- PRON: to, du, su, tu, da, mu, či, do, ta, tútus
- PROPN: Yóryis
- VERB: laí, leχ, eksévin, gréviz, qazánǰisi, éršiti, éχu, írten, írtis, (é)rχete
- VERB-Fin: laí, leχ, eksévin, gréviz, qazánǰisi, éršiti, éχu, írten, írtis, (é)rχete
- VERB-Part: kimizméni

Case

Acc
- ADJ: polá, bašká, mikró, ála, úla
- DET: to, tu, éna, ta, čin, so, sa, t, či, da
- NOUN: pará, peδí, psomí, kenér, spíči, vaván, Psémata, alísia, alóγata, cüréi
- NUM: Tris, triz
- PRON: to, da, do, ta, sas, se, či, m, me, s'
- VERB-Part: kimizméni

Gen
- DET: tu, so
- NOUN: dergizmú, patisáχu, Dunyá, Güzelidyú, cirjás, insanjú, klišás, korízju, zuliás, ölzüjü
- PRON: du, tu, mu, su, čis, mas, ǰis, m, m', s'

Nom
- ADJ: kalí, kaló, mikró, mávra, yavanúδia, yerasméni, áspra, χošá
- DET: to, éna, ta, Etó
- NOUN: kóri, peδí, staχtiǰís, t'emélia, Aγás, Dunyá, Güzelí, Mándis, Qujumǰís, Vavás
- PRON: tútus, emís, ši, Ešís, Kaneís, cínus, ekín, esí, eγó, su

Voc
- ADJ: A, fikirsúzis
- NOUN: mána, Peδí, javrú, teté, ádras, ǰaním
- PROPN: Yóryis

Definite

Def
- DET: to, tu, ta, so, čin, sa, t, či, da, tes

Ind
- DET: éna, téna, tóna

Degree and Polarity

Polarity

Neg
- PART: den, de, dom, dén, dε, re

Verbal Features

Aspect

Imp
- AUX-Fin: éni, 'ne, 'ni, 'ton, 'tun, ne, se, íse
- VERB-Fin: laí, gréviz, leχ, éχu, (é)rχete, Rotá, cimáse, drúte, eršinónǰiska, eršístiniz

Perf
- VERB: eksévin, kásun, qazánǰisi, sikoθún, írten, írtis, Ksevasám', Ránsen, baγərtzísi, báris
- VERB-Fin: eksévin, kásun, qazánǰisi, sikoθún, írten, írtis, Ksevasám', Ránsen, baγərtzísi, báris
- VERB-Part: kimizméni

Mood

Imp
- VERB-Fin: pe, pike, skáma, ápar

Ind
- AUX-Fin: nde, ne, 'ne, éni, 'ni, 'ton, 'tun, se, índe, íse
- VERB: laí, leχ, eksévin, gréviz, qazánǰisi, éršiti, éχu, írten, írtis, (é)rχete
- VERB-Fin: laí, leχ, eksévin, gréviz, qazánǰisi, éršiti, éχu, írten, írtis, (é)rχete

Sub
- VERB: kásun, sikoθún, baγərtzísi, báris, düšünǰísu, erzí, fam, forósu, fáγu, galatzépši
- VERB-Fin: kásun, sikoθún, baγərtzísi, báris, düšünǰísu, erzí, fam, forósu, fáγu, galatzépši

Tense

Fut
- VERB-Fin: eleísis, pári, páru, vlépis

Past
- AUX-Fin: 'ton, 'tun
- VERB: eksévin, qazánǰisi, írten, írtis, Ksevasám', Ránsen, dilimléisam', déken, emóšh, eršinónǰiska
- VERB-Fin: eksévin, qazánǰisi, írten, írtis, Ksevasám', Ránsen, dilimléisam', déken, eršinónǰiska, estáθin

Pres
- AUX-Fin: nde, ne, 'ne, éni, 'ni, se, índe, íse, ísu
- VERB: laí, gréviz, leχ, éršiti, éχu, (é)rχete, Rotá, cimáse, drúte, düšünǰísu
- VERB-Fin: laí, gréviz, leχ, éršiti, éχu, (é)rχete, Rotá, cimáse, drúte, düšünǰísu

Voice

Act
- VERB-Fin: laí, qazánǰisi, éχu, Ksevasám', Rotá, baγərtzísi, báris, dilimléisam', düšünǰísu, eleísis

Pass
- AUX-Fin: 'ne, 'ni, éni
- VERB-Fin: sikoθún, éršiti, írtis, cimáse, eršinónǰiska, eršístiniz, estáθin, kasinonǰískaši, kimáti, pénišken
- VERB-Part: kimizméni

Pronouns, Determiners, Quantifiers

PronType

Art
- DET: to, tu, éna, ta, so, čin, sa, t, či, da

Dem
- DET: Etó
- PRON: ro, tútus, ra, cínus, ekín, tútunu, Τúta

Ind
- DET: χer
- PRON: táre, Kaneís

Int
- DET: tiyá
- PRON: ne

Prs
- PRON: to, du, su, tu, da, mu, či, do, ta, čis

Rel
- PRON: to, óči, óčis

NumType

Card
- NUM: seránda, enyá, tría, δyo, Tris, triz

Poss

Yes
- PRON: du, mu, su, čis, mas, tu, ǰis, m, m', más

Person

1
- PRON: mu, emís, m, mas, eγó, m', me, más, ši, γo
- VERB: éχu, Ksevasám', dilimléisam', düšünǰísu, emóšh, fam, filáto, forósu, fáγu, férum
- VERB-Fin: éχu, Ksevasám', dilimléisam', düšünǰísu, fam, filáto, forósu, fáγu, férum, ipúmi

2
- AUX-Fin: se, íse, ísu
- PRON: su, s', sas, se, Ešís, esí, séna, ši
- VERB: gréviz, báris, cimáse, drúte, eleísis, istedízete, klóθete, ksévreté, les, pe
- VERB-Fin: gréviz, báris, cimáse, drúte, eleísis, istedízete, klóθete, ksévreté, les, pe

3
- AUX-Fin: nde, ne, 'ne, éni, 'ni, 'ton, 'tun, índe
- PRON: to, du, tu, da, či, do, ta, tútus, čis, ǰis
- VERB-Fin: laí, leχ, eksévin, kásun, qazánǰisi, sikoθún, éršiti, írten, (é)rχete, Rotá

Other Features

Clitic
- Yes
  - PRON: du, tu, da, mu, su, či, do, m, ta, to

PartType
- Neg
  - PART: re

Syntax

Auxiliary Verbs and Copula

This corpus uses 1 lemmas as copulas (cop). Examples: ímu.

This corpus uses 2 lemmas as auxiliaries (aux). Examples: na, se.

Core Arguments, Oblique Arguments and Adjuncts

Here we consider only relations between verbs (parent) and nouns or pronouns (child).

nsubj
- VERB--NOUN-Nom (5)
- VERB--PRON (1)
- VERB-Fin--NOUN-Acc (1)
- VERB-Fin--NOUN-Acc-ADP(kadár) (1)
- VERB-Fin--NOUN-Nom (22)
- VERB-Fin--PRON-Nom (14)

obj
- VERB--NOUN (1)
- VERB--NOUN-Acc (4)
- VERB--NOUN-Nom (1)
- VERB--PRON (1)
- VERB--PRON-Acc (2)
- VERB-Fin--NOUN (1)
- VERB-Fin--NOUN-Acc (33)
- VERB-Fin--NOUN-Nom (1)
- VERB-Fin--PRON (1)
- VERB-Fin--PRON-Acc (21)
- VERB-Fin--PRON-Gen (1)

iobj
- VERB--PRON-Acc (1)
- VERB-Fin--NOUN-Acc (1)
- VERB-Fin--NOUN-Acc-ADP(s(e)) (1)
- VERB-Fin--PRON-Acc (2)
- VERB-Fin--PRON-Gen (3)

Relations Overview

This corpus uses 4 relation subtypes: acl:relcl, advmod:emph, aux:q, det:poss
The following 9 relation types are not used in this corpus at all: dislocated, clf, fixed, flat, compound, list, orphan, goeswith, reparandum