UD Korean LittlePrince
Language: Korean (code: ko)
Family: Korean
This treebank has been part of Universal Dependencies since the UD v2.16 release.
The following people have contributed to making this treebank part of UD: Junghyun Min, Jena Hwang, Nathan Schneider.
Repository: UD_Korean-LittlePrince
Search this treebank on-line: PML-TQ
Download all treebanks: UD 2.17
License: CC BY-SA 4.0
Genre: fiction
Questions, comments? General annotation questions (either Korean-specific or cross-linguistic) can be raised in the main UD issue tracker. You can report bugs in this treebank in the treebank-specific issue tracker on Github. If you want to collaborate, please contact [jm3743 (æt) georgetown • edu]. Development of the treebank happens directly in the UD repository, so you may submit bug fixes as pull requests against the dev branch.
| Annotation | Source |
|---|---|
| Lemmas | assigned by a program, not checked manually |
| UPOS | assigned by a program, not checked manually |
| XPOS | assigned by a program, not checked manually |
| Features | assigned by a program, not checked manually |
| Relations | assigned by a program, not checked manually |
Description
UD Korean-LittlePrince is a UD adaptation of the k-SNACS dataset (Hwang et al. 2020).
UD Korean-LittlePrince is a UD adaptation of the k-SNACS dataset (Hwang et al. 2020), a Korean version of the wider SNACS effort (Schneider et al. 2018) that annotates case and adposition supersense. Lemmas, POS tags, and dependency relations are supplied by Stanza (Qi et al. 2020) models trained on UD Korean-KAIST (Chun et al. 2018) and manually adjusted to satisfy UD validation.
- Title: 어린 왕자 (erin wangca) “The Little Prince”
- Author: Atoine de Saint-Exupéry
- Original Language: French (Le Petit Prince)
- Genre: Fiction
Acknowledgments
Contributors are as follows:
- Junghyun Min (Georgetown University)
- Jena D. Hwang (AI2)
- Nathan Schneider (Georgetown University)
Project repository: https://github.com/Aatlantise/k-snacs-ud
References
- Jayeol Chun, Na-Rae Han, Jena D. Hwang, and Jinho D. Choi. 2018. Building Universal Dependency Treebanks in Korean. In Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018), Miyazaki, Japan. European Language Resources Association (ELRA).
- Jena D. Hwang, Hanwool Choe, Na-Rae Han, and Nathan Schneider. 2020. K-SNACS: Annotating Korean Adposition Semantics. In Proceedings of the Second International Workshop on Designing Meaning Representations, pages 53–66, Barcelona Spain (online). Association for Computational Linguistics.
- Peng Qi, Yuhao Zhang, Yuhui Zhang, Jason Bolton, and Christopher D. Manning. 2020. Stanza: A Python Natural Language Processing Toolkit for Many Human Languages. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: System Demonstrations, pages 101–108, Online. Association for Computational Linguistics.
- Nathan Schneider, Jena D. Hwang, Vivek Srikumar, Jakob Prange, Austin Blodgett, Sarah R. Moeller, Aviram Stern, Adi Bitan, and Omri Abend. 2018. Comprehensive Supersense Disambiguation of English Prepositions and Possessives. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 185–196, Melbourne, Australia. Association for Computational Linguistics.
Citing this work
When using this data, please cite the following as appropriate:
Original k-SNACS annotations Hwang et al., 2020:
Hwang, Jena D., Hanwool Choe, Na-Rae Han, and Nathan Schneider. “K-SNACS: Annotating Korean adposition semantics.” In Proceedings of the Second International Workshop on Designing Meaning Representations. 2020.
Universal Dependencies adaptation Min et al., 2025:
Junghyun Min, Jena D. Hwang, and Nathan Schneider. “UD-Korean-LittlePrince.” 2025.
Statistics of UD Korean LittlePrince
POS Tags
ADJ – ADP – ADV – AUX – CCONJ – DET – INTJ – NOUN – NUM – PART – PRON – PROPN – PUNCT – SCONJ – VERB
Features
Case – ExtPos – Mood – Tense – VerbForm
Relations
acl – advcl – advmod – amod – aux – case – cc – ccomp – compound – compound:lvc – conj – cop – csubj – dep – det – discourse – dislocated – fixed – flat – iobj – mark – nmod – nsubj – nummod – obj – obl – parataxis – punct – reparandum – root – vocative – xcomp
Tokenization and Word Segmentation
- This corpus contains 1551 sentences and 13656 tokens.
- This corpus contains 4179 tokens (31%) that are not followed by a space.
- This corpus does not contain words with spaces.
- This corpus contains 14 types of words that contain both letters and punctuation. Examples: 돼., )가, )과, )도, )를, .내가, .물론이지, 것(, 곳', 별들', 비행기(, 왕(, 전철수(, 줘.
Morphology
Tags
- This corpus uses 15 UPOS tags out of 17 possible: ADJ, ADP, ADV, AUX, CCONJ, DET, INTJ, NOUN, NUM, PART, PRON, PROPN, PUNCT, SCONJ, VERB
- This corpus does not use the following tags: SYM, X
- This corpus contains 1 word types tagged as particles (PART): 는
- This corpus contains 26 lemmas tagged as pronouns (PRON): _, 거기, 그, 그거, 그것, 그곳, 그렇, 나, 내, 너, 너희, 네, 누구, 당신, 무엇, 뭐, 어디, 여기, 여러분, 우리, 이거, 이것, 이제, 자기, 저, 제
- This corpus contains 1 lemmas tagged as determiners (DET): _
- Out of the above, 1 lemmas occurred sometimes as PRON and sometimes as DET: _
- This corpus contains 1 lemmas tagged as auxiliaries (AUX): _
- Out of the above, 1 lemmas occurred sometimes as AUX and sometimes as VERB: _
- There are 2 (de)verbal forms:
- Fin
- VERB: 말했다, 것이다, 물었다, 대답했다, 것이었다, 했다, 생각했다, 이었다, 되었다, 바라보았다
- Ger
- VERB: 한아름
Nominal Features
- Acc
- AUX: 있는지를, 있음을
- NOUN: 말을, 것을, 걸, 별을, 꽃을, 그림을, 양을, 별들을, 일을, 가로등을
- PRON: 나를, 그를, 그것을, 너를, 그들을, 그걸, 무엇을, 그것들을, 너희들을, 무얼
- PROPN: 두레박을, 아리조나를, 한송이를
- Gen
- NOUN: 개의, 왕자의, 바오밥나무의, 보아뱀의, 별의, 짐의, 사람들의, 사람의, 양의, 중의
- PRON: 그의, 나의, 너의, 그들의, 자기의, 저의
- PROPN: 남아메리카의, 북아메리카의, 시베리아의, 오스트레일리아의, 오페라의, 유럽의, 인도의
- Nom
- NOUN: 왕자가, 왕자는, 사람이, 꽃은, 것은, 왕이, 건, 여우가, 별은, 사람들은
- PRON: 나는, 그는, 내가, 그가, 난, 그건, 그들은, 그것은, 네가, 너는
- PROPN: 게으름뱅이가, 드디어는, 프랑스는, 프랑스에서는
Degree and Polarity
Verbal Features
- Imp
- VERB-Fin: 말라, 하라, 것이라고, 까닭이니라, 다스리노라, 명하노라, 무어라, 뭐라고, 보라고, 일이니라
- Ind
- VERB-Fin: 말했다, 것이다, 물었다, 대답했다, 것이었다, 했다, 생각했다, 이었다, 되었다, 바라보았다
- Fut
- VERB: 알, 될, 할, 될거야, 볼, 이해할, 그럴, 모를, 둘, 못할
- Past
- VERB: 만났을, 끝났어, 나타났다, 났다, 났었지, 냈다, 떠났다, 일어섰다, 가져갔다, 갔다
- VERB-Fin: 나타났다, 났다, 냈다, 떠났다, 일어섰다, 가져갔다, 갔다, 걸어갔다, 꺼냈다, 나갔다
Pronouns, Determiners, Quantifiers
Other Features
- ExtPos
- AUX
- NOUN: 수, 것, 수도, 수가, 수는, 뿐, 수만
- AUX
Syntax
Auxiliary Verbs and Copula
- This corpus uses 1 lemmas as copulas (cop). Examples: _.
- This corpus uses 1 lemmas as auxiliaries (aux). Examples: _.
Core Arguments, Oblique Arguments and Adjuncts
Here we consider only relations between verbs (parent) and nouns or pronouns (child).
- nsubj
- VERB--NOUN (4)
- VERB--NOUN-ADP(_) (1)
- VERB--NOUN-Nom (206)
- VERB--PRON-Nom (118)
- VERB-Fin--NOUN (1)
- VERB-Fin--NOUN-Nom (200)
- VERB-Fin--PRON-ADP(_) (1)
- VERB-Fin--PRON-Nom (27)
- obj
- VERB--NOUN (1)
- VERB--NOUN-Acc (361)
- VERB--PRON-Acc (55)
- VERB-Fin--NOUN (2)
- VERB-Fin--NOUN-Acc (127)
- VERB-Fin--PRON-Acc (10)