UD Kangri KDTB
Language: Kangri (code: xnr
)
Family: Indo-European, Indic
This treebank has been part of Universal Dependencies since the UD v2.8 release.
The following people have contributed to making this treebank part of UD: Shweta Chauhan, Shefali Saxena, Apoorva Jha, Philemon Daniel.
Repository: UD_Kangri-KDTB
Search this treebank on-line: PML-TQ
Download all treebanks: UD 2.13
License: CC BY-SA 4.0
Genre: nonfiction, news
Questions, comments? General annotation questions (either Kangri-specific or cross-linguistic) can be raised in the main UD issue tracker. You can report bugs in this treebank in the treebank-specific issue tracker on Github. If you want to collaborate, please contact [shweta (æt) nith • ac • in , shefali (æt) nith • ac • in, apoorva • jha (æt) gmail • com , phildani7 (æt) nith • ac • in]. Development of the treebank happens directly in the UD repository, so you may submit bug fixes as pull requests against the dev branch.
Annotation | Source |
---|---|
Lemmas | annotated manually in non-UD style, automatically converted to UD |
UPOS | annotated manually in non-UD style, automatically converted to UD |
XPOS | annotated manually |
Features | annotated manually in non-UD style, automatically converted to UD |
Relations | annotated manually in non-UD style, automatically converted to UD |
Description
The Kangri UD Treebank (KDTB) is a part of the Universal Dependency treebank project.
The Kangri UD Treebank (KDTB) consists of 2,249 tokens and 1108 vocabulary (288 sentences). This Treebank is a part of the Universal Dependency treebank project. Himachal Academy of Arts Culture and Languages, Shimla, Himachal Pradesh, India helped in providing annotators for universal dependency tagging. KDTB data contains syntactic annotation according to dependency-constituency schema, as well as morphological tags. In this data, XPOS is annotated according to Bureau of Indian Standards (BIS) Part of Speech (POS) tagset.
Acknowledgments
- Mr. Bhupender Bhupi
- Dr. Rajeev Kumar Trigarti
References
- (citation)
Statistics of UD Kangri KDTB
POS Tags
ADJ – ADP – ADV – AUX – CCONJ – DET – INTJ – NOUN – NUM – PART – PRON – PROPN – PUNCT – SCONJ – VERB
Features
Relations
advcl – advmod – amod – aux – aux:pass – case – cc – ccomp – compound – conj – cop – dep – det – discourse – iobj – mark – nmod – nsubj – nummod – obj – obl – punct – root – xcomp
Tokenization and Word Segmentation
- This corpus contains 288 sentences and 2514 tokens.
- This corpus contains 288 tokens (11%) that are not followed by a space.
- This corpus does not contain words with spaces.
- This corpus does not contain words that contain both letters and punctuation.
Morphology
Tags
- This corpus uses 15 UPOS tags out of 17 possible: ADJ, ADP, ADV, AUX, CCONJ, DET, INTJ, NOUN, NUM, PART, PRON, PROPN, PUNCT, SCONJ, VERB
- This corpus does not use the following tags: SYM, X
- This corpus contains 17 word types tagged as particles (PART): इसा, ऊञा, कदेया, कैस, कैह्जो, कोई, क्या, ता, तां, न, नी, प्रति, भर, भी, मत, लगभग, ही
- This corpus contains 1 lemmas tagged as pronouns (PRON): _
- This corpus contains 1 lemmas tagged as determiners (DET): _
- Out of the above, 1 lemmas occurred sometimes as PRON and sometimes as DET: _
- This corpus contains 1 lemmas tagged as auxiliaries (AUX): _
- Out of the above, 1 lemmas occurred sometimes as AUX and sometimes as VERB: _
- This corpus does not use the VerbForm feature.
Nominal Features
Degree and Polarity
Verbal Features
Pronouns, Determiners, Quantifiers
Other Features
Syntax
Auxiliary Verbs and Copula
- This corpus uses 1 lemmas as copulas (cop). Examples: _.
- This corpus uses 1 lemmas as auxiliaries (aux). Examples: _.
- This corpus uses 1 lemmas as passive auxiliaries (aux:pass). Examples: _.
Core Arguments, Oblique Arguments and Adjuncts
Here we consider only relations between verbs (parent) and nouns or pronouns (child).
- nsubj
- VERB--NOUN (69)
- VERB--NOUN-ADP(_) (3)
- VERB--PRON (68)
- VERB--PRON-ADP(_) (2)
- obj
- VERB--NOUN (52)
- VERB--NOUN-ADP(_) (11)
- VERB--PRON (7)
- VERB--PRON-ADP(_) (1)
- iobj
- VERB--NOUN (1)
- VERB--PRON (1)
- VERB--PRON-ADP(_) (1)