UD Korean KSL
Language: Korean (code: ko
)
Family: Korean
This treebank has been part of Universal Dependencies since the UD v2.15 release.
The following people have contributed to making this treebank part of UD: Hakyung Sung, Gyu-Ho Shin.
Repository: UD_Korean-KSL
Search this treebank on-line: PML-TQ
Download all treebanks: UD 2.16
License: CC BY-SA 4.0
Genre: learner-essays
Questions, comments? General annotation questions (either Korean-specific or cross-linguistic) can be raised in the main UD issue tracker. You can report bugs in this treebank in the treebank-specific issue tracker on Github. If you want to collaborate, please contact [hsung (æt) uoregon • edu]. Development of the treebank happens directly in the UD repository, so you may submit bug fixes as pull requests against the dev branch.
Annotation | Source |
---|---|
Lemmas | annotated manually |
UPOS | annotated manually in non-UD style, automatically converted to UD, with some manual corrections of the conversion |
XPOS | annotated manually |
Features | annotated manually in non-UD style, automatically converted to UD |
Relations | annotated manually, natively in UD style |
Description
UD_Korean-KSL is a dependency treebank of second-language (L2) Korean.
The treebank contains 12,977 sentences—10,323 in the training set, 1,311 in the dev set, and 1,343 in the test set. These sentences are sourced from two datasets: (1) the Kyung Hee dataset, with sentence IDs starting with “KH” and annotated with classroom proficiency levels (A1–C2); and (2) the KoLLA dataset, with sentence IDs starting with “KL” and grouped as fb (foreign beginners), fi (foreign intermediates), and hb (heritage beginners).
Acknowledgments
We acknowledge the original data contributors: the Kyung Hee dataset (credit to Jungyeul Park and Jung Hee Lee; note that this dataset is no longer maintained and its sentences are no longer used for further annotation) and the KoLLA dataset (credit to Markus Dickinson, Ross Israel, and Sun-Hee Lee). We also acknowledge our annotators: Hee-June Koh, Chanyoung Lee, and Youkyung Sung.
Statistics of UD Korean KSL
POS Tags
ADJ – ADP – ADV – AUX – CCONJ – DET – NOUN – NUM – PART – PRON – PROPN – PUNCT – SYM – VERB – X
Features
Relations
acl – advcl – advmod – amod – appos – aux – case – cc – ccomp – compound – conj – cop – csubj – dep – det – discourse – dislocated – flat – goeswith – list – mark – nmod – nmod:poss – nsubj – nummod – obj – obl – parataxis – punct – reparandum – root – vocative
Tokenization and Word Segmentation
- This corpus contains 12977 sentences and 108072 tokens.
- This corpus contains 14232 tokens (13%) that are not followed by a space.
- This corpus does not contain words with spaces.
- This corpus contains 26 types of words that contain both letters and punctuation. Examples: 없어.라고, 1.명동에, 10일-17일에, 2.동대문어, Duluth., K-POP을, Ph.D.를, T-EXPRESS, choco-pie를, 남주인공-카일, 성인-199,000원, 아동-174,000원, 여주인공-사라, 용산-목포-홍도-흑산도까지, 용산-목포-홍도-흑산도다, 용산-목포-홍도-흑산도와, 용산-목포-홍도-흑산도이라는, 용산-목포-홍도-흑산도입니다, 용산-목표-홍도-족산도, 용산-목표-홍도-흑단도이다, 용산-목표-홍도-흑산도, 용산-목표-홍도-흑산도이다, 있., 전자사전,mp3인터넷, 청.훙의, 초.한으로
Morphology
Tags
- This corpus uses 15 UPOS tags out of 17 possible: ADJ, ADP, ADV, AUX, CCONJ, DET, NOUN, NUM, PART, PRON, PROPN, PUNCT, SYM, VERB, X
- This corpus does not use the following tags: SCONJ, INTJ
- This corpus contains 1 word types tagged as particles (PART): 에는
- This corpus contains 115 lemmas tagged as pronouns (PRON): 거기+는, 그, 그+는, 그+대, 그+도, 그+들+도, 그+들+은, 그+들+을, 그+들+의, 그+들+이, 그+를, 그+의, 그거+도, 그것, 그것+ㄴ, 그것+도, 그것+들+도, 그것+은, 그것+을, 그것+이, 그녀+가, 그녀+는, 그녀+를, 그녀+만+의, 그녀+의, 그때+의, 나, 나+ㄴ, 나+는, 나+도, 나+를, 나+의, 남+이, 내, 내+가, 너, 너+ㄴ, 너+는, 너+도, 너+의, 네+가, 누구+ㄴ+가, 누구+가, 누구+나, 누구+도, 누구+를, 누구+이+ㄴ가, 니+는, 다+들, 당신, 당신+은, 당신+의, 둘째+는, 모두+가, 무엇+을, 무엇+이, 무엇+이+ㄴ가, 뭐+ㄹ, 뭐+가, 비, 어디, 어디+가, 여기, 여기+가, 여기+는, 여러분, 여러분+들+은, 여러분+들+이, 영, 용+은, 우리, 우리+가, 우리+는, 우리+도, 우리+들+은, 우리+들+이, 우리+를, 우리+만+의, 우리+의, 이, 이+는, 이+를, 이거, 이것, 이것+도, 이것+들+은, 이것+들+을, 이것+만, 이것+은, 이것+을, 이것+이, 자+기, 자기, 자기+가, 자기+도, 자기+를, 자기+만, 자기+의, 자신+들+의, 자신+을, 자신+의, 자신+이, 저, 저+는, 저+도, 저+랑, 저+를, 저+와, 저+의, 저기+는, 저희, 저희+는, 전, 제+가, 중
- This corpus contains 39 lemmas tagged as determiners (DET): 각, 그, 그떤, 그러+ㄹ, 그런, 그런+한, 너+ㄴ, 몇, 몇+개, 모든, 무슨, 아무, 약, 어누, 어느, 어던, 어떤, 어러, 어려, 어쩌+ㄹ, 여러, 여려, 예기, 오+ㄴ, 이, 이+들, 이+번+에+는, 이러하+ㄴ, 이런, 이런+저런, 이런+하+ㄴ, 이럼, 이렇, 이번, 일, 저, 저런, 지지난, 한
- Out of the above, 4 lemmas occurred sometimes as PRON and sometimes as DET: 그, 너+ㄴ, 이, 저
- This corpus contains 5 lemmas tagged as auxiliaries (AUX): 싶, 않, 이, 있, 하
- Out of the above, 1 lemmas occurred sometimes as AUX and sometimes as VERB: 있
- This corpus does not use the VerbForm feature.
Nominal Features
Degree and Polarity
Verbal Features
Pronouns, Determiners, Quantifiers
Other Features
- Typo
- Yes
- ADJ: 만습니다, 중용한, 경재적인, 관찮아요, 다뜻합니다, 다향한, 달리다, 됬다, 마싯었습니다, 맜습니다
- ADP: 떄문에, 두, 동해, 때, 땜누에, 떼문에, 슾에, 은, 이을, 장도
- ADV: 그레서, 고리고, 대, 외냐하면, 그래고, 도, 때, 아프로, 재일, 하지마
- AUX: 싶어니다, 했는, 싶어, 않는, 않을면, 않있습니다, 않했습니다, 있엇다고, 하겠말이다, 하지마
- DET: 그떤, 그런한, 어누, 어던, 어러, 어려, 여려, 예기, 이, 이럼
- NOUN: 궁무원이, 웃이, 훠꿔를, 가경이, 땋알, 부모니는, 성물을, 음막이, 진고가, 홍상을
- NUM: 만, 이, 이홉
- PRON: 그대, 내, 니는, 자기, 자기를
- PROPN: 우치
- SYM: 훠꿔
- VERB: 유면한, 조세요, 보릅니다, 해어지고, 거옙니다, 건다, 논다, 다서, 도와, 배옵니다
- X: 하고
- Yes
Syntax
Auxiliary Verbs and Copula
- This corpus uses 1 lemmas as copulas (cop). Examples: 이.
- This corpus uses 5 lemmas as auxiliaries (aux). Examples: 싶, 하, 있, 않, 이.
Core Arguments, Oblique Arguments and Adjuncts
Here we consider only relations between verbs (parent) and nouns or pronouns (child).
- nsubj
- VERB--NOUN (7920)
- VERB--NOUN-ADP(가) (2)
- VERB--NOUN-ADP(는) (5)
- VERB--NOUN-ADP(도) (9)
- VERB--NOUN-ADP(들+은) (1)
- VERB--NOUN-ADP(들+이) (1)
- VERB--NOUN-ADP(등) (5)
- VERB--NOUN-ADP(등+뿐+만) (1)
- VERB--NOUN-ADP(등+은) (1)
- VERB--NOUN-ADP(등+의) (1)
- VERB--NOUN-ADP(등+이) (1)
- VERB--NOUN-ADP(따위) (1)
- VERB--NOUN-ADP(따위+도) (1)
- VERB--NOUN-ADP(때문+이) (1)
- VERB--NOUN-ADP(만) (4)
- VERB--NOUN-ADP(밖에) (2)
- VERB--NOUN-ADP(반+쯤) (1)
- VERB--NOUN-ADP(뿐+만) (10)
- VERB--NOUN-ADP(와) (1)
- VERB--NOUN-ADP(은) (2)
- VERB--NOUN-ADP(을) (1)
- VERB--NOUN-ADP(이) (2)
- VERB--NOUN-ADP(이상) (1)
- VERB--NOUN-ADP(쫌) (1)
- VERB--NOUN-ADP(쯤) (4)
- VERB--NOUN-ADP(하+고) (1)
- VERB--NOUN-ADP(하고) (6)
- VERB--PRON (1668)
- VERB--PRON-ADP(뿐+만) (5)
- obj
- VERB--NOUN (8171)
- VERB--NOUN-ADP(과) (1)
- VERB--NOUN-ADP(대신+에) (1)
- VERB--NOUN-ADP(도) (5)
- VERB--NOUN-ADP(두) (2)
- VERB--NOUN-ADP(등) (4)
- VERB--NOUN-ADP(등+을) (7)
- VERB--NOUN-ADP(라는) (1)
- VERB--NOUN-ADP(로) (1)
- VERB--NOUN-ADP(를) (13)
- VERB--NOUN-ADP(만) (3)
- VERB--NOUN-ADP(을) (5)
- VERB--NOUN-ADP(이+을) (1)
- VERB--NOUN-ADP(정도+로) (1)
- VERB--NOUN-ADP(쯤) (1)
- VERB--NOUN-ADP(하고) (10)
- VERB--NOUN-ADP(학고) (1)
- VERB--PRON (63)