UD Kyrgyz TueCL
Language: Kyrgyz (code: ky
)
Family: Turkic
This treebank has been part of Universal Dependencies since the UD v2.14 release.
The following people have contributed to making this treebank part of UD: Bermet Chontaeva, Çağrı Çöltekin.
Repository: UD_Kyrgyz-TueCL
Search this treebank on-line: PML-TQ
Download all treebanks: UD 2.15
License: CC BY-SA 4.0
Genre: grammar-examples
Questions, comments? General annotation questions (either Kyrgyz-specific or cross-linguistic) can be raised in the main UD issue tracker. You can report bugs in this treebank in the treebank-specific issue tracker on Github. If you want to collaborate, please contact [bermet • chontaeva (æt) student • uni-tuebingen • de, cagri • coeltekin (æt) uni-tuebingen • de]. Development of the treebank happens directly in the UD repository, so you may submit bug fixes as pull requests against the dev branch.
Annotation | Source |
---|---|
Lemmas | annotated manually |
UPOS | annotated manually, natively in UD style |
XPOS | not available |
Features | annotated manually, natively in UD style |
Relations | annotated manually, natively in UD style |
Description
This is a small treebank of grammatical examples for Kyrgyz.
Kyrgyz-TueCL contains 145 sentences in total, including 20 Cairo sentences and ~ 100 sentences suggested by UD Turkic Group. This treebank is a part of UD Turkic Treebank. Translation of all sentences are available in English, Turkish and Azerbaijani languages.
Acknowledgments
We are deeply thankful to the UD Turkic Group and Kyrgyz team: Jonathan North Washington, Aida Kasieva, Gulnura Dzumalieva, Aigul Tursunova, Meerim Ryspakova, Aizat Kadyrbekova for their weekly informative meetings and discussions and for all the support we have received.
References
- (citation)
Statistics of UD Kyrgyz TueCL
POS Tags
ADJ – ADP – ADV – AUX – CCONJ – DET – INTJ – NOUN – NUM – PART – PRON – PROPN – PUNCT – SCONJ – VERB
Features
Aspect – Case – Definite – Evident – Mood – Number – Number[psor] – Person – Person[psor] – PronType – Tense – VerbForm – Voice
Relations
acl – acl:relcl – advcl – advmod – advmod:emph – amod – appos – aux – case – cc – ccomp – compound – compound:lvc – compound:svc – conj – cop – csubj – det – discourse – fixed – flat – mark – nmod – nmod:poss – nsubj – nsubj:outer – nsubj:pass – nummod – obj – obl – obl:cau – obl:tmod – orphan – parataxis – punct – root – vocative – xcomp
Tokenization and Word Segmentation
- This corpus contains 145 sentences, 972 tokens and 1001 syntactic words.
- This corpus contains 183 tokens (19%) that are not followed by a space.
- This corpus does not contain words with spaces.
- This corpus does not contain words that contain both letters and punctuation.
- This corpus contains 29 multi-word tokens. On average, one multi-word token consists of 2.00 syntactic words.
- There are 19 types of multi-word tokens. Examples: бердиби, барбы, беле, жатабы, үйдѳгү, Дениздикинин, Окугандарын, ашкананыкын, бекен, бересиңби, жаттыңызбы, жокпу, келдиби, кичинекейби, сеникинен, текчесиндегилер, ѳрѳѳнбү, ѳткѳрүлѳбү, үйдѳбү.
Morphology
Tags
- This corpus uses 15 UPOS tags out of 17 possible: ADJ, ADP, ADV, AUX, CCONJ, DET, INTJ, NOUN, NUM, PART, PRON, PROPN, PUNCT, SCONJ, VERB
- This corpus does not use the following tags: SYM, X
- This corpus contains 7 word types tagged as particles (PART): б, би, бы, бү, пу, тургандыгы, ээ
- This corpus contains 12 lemmas tagged as pronouns (PRON): _, Питер, ал, алар, биз, бир, бул, ки, ким, мен, сен, эмне
- This corpus contains 6 lemmas tagged as determiners (DET): бардык, бул, кѳп, ошо, эч, ѳз
- Out of the above, 1 lemmas occurred sometimes as PRON and sometimes as DET: бул
- This corpus contains 11 lemmas tagged as auxiliaries (AUX): ал, бол, жат, жок, кал, кет, кой, окшо, тур, э, экен
- Out of the above, 6 lemmas occurred sometimes as AUX and sometimes as VERB: ал, бол, жат, кал, кет, окшо
- There are 4 (de)verbal forms:
- Conv
- VERB: Түнѳп
- Fin
- AUX: болчу, жаткан
- VERB: такылдатыптыр, ѳткѳрүлѳ
- Inf
- VERB: жыйнап
- Part
- VERB: берген
Nominal Features
- Sing
- AUX-Fin: болчу, жаткан
- PRON: ал
- VERB-Fin: такылдатыптыр, ѳткѳрүлѳ
- Abl
- NOUN: ресторандан
- Acc
- NOUN: эшикти, үйдү
- Gen
- NOUN: үйдүн
- Nom
- NOUN: конок, тамак, буюртма, ээси
- PRON: ал
- Def
- NOUN: эшикти, үйдү, үйдүн
- Ind
- NOUN: буюртма
Degree and Polarity
Verbal Features
- Imp
- AUX-Fin: болчу
- Perf
- VERB-Conv: Түнѳп
- VERB-Inf: жыйнап
- Ind
- AUX-Fin: болчу, жаткан
- VERB-Fin: такылдатыптыр, ѳткѳрүлѳ
- Past
- AUX-Fin: болчу, жаткан
- VERB-Fin: такылдатыптыр
- VERB-Part: берген
- Pres
- VERB-Fin: ѳткѳрүлѳ
- Cau
- VERB-Fin: такылдатыптыр
- Pass
- VERB-Fin: ѳткѳрүлѳ
- Nfh
- VERB-Fin: такылдатыптыр
Pronouns, Determiners, Quantifiers
- Prs
- PRON: ал
- 3
- AUX-Fin: болчу, жаткан
- PRON: ал
- VERB-Fin: такылдатыптыр, ѳткѳрүлѳ
- Plur,Sing
- NOUN: ээси
Other Features
- Person[psor]
- 3
- NOUN: ээси
- 3
Syntax
Auxiliary Verbs and Copula
- This corpus uses 1 lemmas as copulas (cop). Examples: э.
- This corpus uses 11 lemmas as auxiliaries (aux). Examples: жат, кал, экен, ал, бол, э, кой, жок, кет, тур, окшо.
Core Arguments, Oblique Arguments and Adjuncts
Here we consider only relations between verbs (parent) and nouns or pronouns (child).
- nsubj
- VERB--NOUN (25)
- VERB--NOUN-Nom (1)
- VERB--PRON (16)
- VERB-Fin--NOUN-Nom (1)
- VERB-Inf--NOUN-Nom (1)
- obj
- VERB--NOUN (48)
- VERB--PRON (5)
- VERB-Fin--NOUN-Acc (1)
- VERB-Inf--NOUN-Acc (1)
- VERB-Part--NOUN-Nom (1)
Relations Overview
- This corpus uses 9 relation subtypes: acl:relcl, advmod:emph, compound:lvc, compound:svc, nmod:poss, nsubj:outer, nsubj:pass, obl:cau, obl:tmod
- The following 8 relation types are not used in this corpus at all: iobj, expl, dislocated, clf, list, goeswith, reparandum, dep