UD Chinese GSD
Language: Chinese (code: zh
)
Family: Sino-Tibetan
This treebank has been part of Universal Dependencies since the UD v1.3 release.
The following people have contributed to making this treebank part of UD: Mo Shen, Ryan McDonald, Daniel Zeman, Peng Qi.
Repository: UD_Chinese-GSD
Search this treebank on-line: PML-TQ
Download all treebanks: UD 2.15
License: CC BY-SA 4.0
Genre: wiki
Questions, comments? General annotation questions (either Chinese-specific or cross-linguistic) can be raised in the main UD issue tracker. You can report bugs in this treebank in the treebank-specific issue tracker on Github. If you want to collaborate, please contact [pengqi (æt) cs • stanford • edu]. Development of the treebank happens directly in the UD repository, so you may submit bug fixes as pull requests against the dev branch.
Annotation | Source |
---|---|
Lemmas | assigned by a program, with some manual corrections, but not a full manual verification |
UPOS | annotated manually in non-UD style, automatically converted to UD |
XPOS | annotated manually |
Features | assigned by a program, with some manual corrections, but not a full manual verification |
Relations | annotated manually in non-UD style, automatically converted to UD |
Description
Traditional Chinese Universal Dependencies Treebank annotated and converted by Google.
Acknowledgments
Statistics of UD Chinese GSD
POS Tags
ADJ – ADP – ADV – AUX – CCONJ – DET – NOUN – NUM – PART – PRON – PROPN – PUNCT – SCONJ – SYM – VERB – X
Features
Aspect – Case – Number – NumType – PartType – Person – Polarity – Voice
Relations
acl – acl:relcl – advcl – advmod – amod – appos – aux – aux:pass – case – cc – ccomp – clf – compound – compound:ext – conj – cop – csubj – csubj:pass – det – discourse – discourse:sp – dislocated – flat:foreign – flat:name – iobj – mark – mark:adv – mark:rel – nmod – nmod:tmod – nsubj – nsubj:pass – nummod – obj – obl – obl:patient – orphan – parataxis – punct – reparandum – root – vocative – xcomp
Tokenization and Word Segmentation
- This corpus contains 4997 sentences and 123291 tokens.
- This corpus contains 122962 tokens (100%) that are not followed by a space.
- This corpus does not contain words with spaces.
- This corpus contains 41 types of words that contain both letters and punctuation. Examples: #A, DC-10, km/h, #B, #C, #D, #E, #F, #G, -an, A-AVG, AK-47, Arzacq-Arraziguet, Beaune-Sud, Berne-Belp, CI-7957, CRH380B-002, F-15A, F-16A, Frito-Lay, It's, Kink.com, MD-11, Micro-USM, NX-01, Navy's, O., P-700, Pre-rendering, S-IVB, TVS-5, Tu-16, Uhler-Phillips, al-Banna, f(x), g(x), t.163.com, t.qq.com, t.sina.com.cn, t.sohu.com, t.xxxx.com
Morphology
Tags
- This corpus uses 16 UPOS tags out of 17 possible: ADJ, ADP, ADV, AUX, CCONJ, DET, NOUN, NUM, PART, PRON, PROPN, PUNCT, SCONJ, SYM, VERB, X
- This corpus does not use the following tags: INTJ
- This corpus contains 583 word types tagged as particles (PART): 不, 中, 主, 之, 了, 事, 井, 亞, 亭, 人, 今, 代, 令, 位, 低, 佛, 作, 佬, 使, 侯, 俠, 信, 們, 值, 側, 偽, 傳, 僑, 價, 元, 先, 光, 光棍, 內, 公, 兵, 典, 冠, 冢, 冷, 准, 刀, 分, 列, 制, 券, 前, 副, 劇, 劍, 劑, 力, 功, 劣, 包, 化, 區, 半, 卡, 卿, 原, 友, 口, 古, 台, 史, 司, 同, 名, 君, 否, 吧, 呀, 呢, 周, 味, 和美, 品, 哈爾濱, 員, 商, 啊, 單, 嗎, 嘴, 器, 因, 圈, 國, 圍, 園, 圓, 圖, 團, 土, 地, 坊, 坡, 型, 城, 埤, 基, 堂, 堡, 堤, 報, 場, 塔, 塘, 墓, 墟, 墳, 壓, 士, 外, 多, 夜, 夢, 大, 天, 夾, 奏, 套, 女, 奸, 好, 妃, 妹, 始, 娘, 婆, 婦, 子, 孔, 字, 季, 學, 宏, 宗, 官, 客, 室, 宮, 家, 富, 審, 寬, 寺, 將, 對, 小, 尖, 局, 屋, 屍, 展, 層, 屬, 屯, 山, 岩, 岸, 峰, 島, 峽, 崖, 崗, 嶺, 嶼, 川, 州, 工, 巷, 市, 布, 帝, 師, 席, 帶, 帽, 幣, 幫, 年, 床, 底, 店, 府, 度, 座, 庫, 庭, 廟, 廠, 廬, 廳, 廷, 式, 強, 彈, 彎, 形, 後, 徑, 徒, 得, 御, 微, 徽, 心, 志, 快, 性, 怪, 恆, 感, 態, 戀, 戰, 戲, 戶, 房, 所, 手, 打, 拖, 擋, 支, 教, 數, 文, 新, 方, 族, 旗, 日, 星, 晚, 暖, 暗, 曲, 書, 會, 月, 服, 朝, 期, 本, 材, 村, 束, 杯, 板, 林, 架, 校, 株, 核, 格, 案, 桿, 梁, 棍, 棒, 棚, 業, 榜, 槍, 槳, 樂, 樓, 樹, 橋, 橙, 機, 橢, 檔, 櫃, 權, 次, 款, 歌, 正, 死, 段, 殿, 母, 毒, 氏, 氣, 水, 江, 池, 河, 沼, 泉, 法, 波, 洋, 洞, 洲, 派, 浦, 海, 涌, 液, 淡, 深, 混, 淺, 清, 渡, 港, 湖, 準, 溝, 溥儀, 溪, 滿, 滿洲, 潮, 澡, 澳, 濁, 濃, 灘, 灣, 火, 炎, 炮, 烴, 煙, 熱, 營, 爐, 父, 爺, 牆, 片, 版, 牌, 物, 犯, 狀, 狂, 狗, 獅, 獎, 率, 王, 班, 球, 琴, 生, 男, 町, 界, 畔, 畫, 病, 症, 癌, 癖, 的, 的話, 皮, 盃, 目, 省, 眼, 眾, 督, 短, 石, 砲, 硅, 碑, 碼, 礁, 礦, 社, 神, 祠, 禮, 秀, 秋, 科, 秤, 稅, 種, 窟, 窯, 站, 端, 符, 笨, 等, 管, 箱, 節, 篇, 籍, 米, 粉, 精, 系, 紀, 紅, 紋, 純, 紙, 級, 素, 組, 結, 綉, 綜, 綫, 綱, 網, 線, 縣, 總, 罩, 罪, 署, 羊, 美, 群, 翁, 老, 者, 而已, 聖, 肉, 胎, 胚, 能, 腔, 腳, 腿, 膜, 膠, 臉, 臨, 臺, 舊, 舞, 船, 艇, 艙, 艦, 色, 花, 茶, 莊, 菌, 菜, 葉, 藍, 藤, 藥, 藩, 處, 號, 蛙, 行, 術, 街, 衛, 衣, 表, 裔, 裙, 製, 褲, 親, 觀, 角, 記, 詞, 詩, 話, 誌, 語, 說, 課, 論, 證, 譜, 變, 谷, 豆, 象, 貓, 費, 資, 質, 賽, 超, 路, 躁, 身, 車, 軍, 軒, 軟, 軸, 輕, 近, 迷, 通, 週, 道, 遠, 邊, 邦, 邨, 郎, 郡, 部, 都, 鄉, 配, 酒, 酸, 醣, 醫, 里, 重, 量, 金, 針, 銘, 鋼, 錄, 錦, 鍋, 鍵, 鎮, 鏈, 鏡, 鐵, 長, 門, 間, 閣, 關, 院, 陵, 陸, 隊, 階, 際, 集, 電, 非, 面, 音, 頂, 頭, 題, 額, 類, 風, 飯, 餅, 餐, 館, 饃, 馬, 駅, 骨, 體, 高, 鬥, 鬼, 魚, 鮮, 鳥, 鹼, 點, 黨, 齋
- This corpus contains 44 lemmas tagged as pronouns (PRON): 之, 什麼, 他, 他倆, 何, 何方, 你, 個人, 其, 各自, 哪, 哪裡, 大家, 她, 她倆, 如何, 妳, 它, 對方, 彼此, 您, 我, 本人, 本地, 本身, 此, 熟, 牠, 甚麼, 祂, 自家, 自己, 自我, 自身, 誰, 這, 這些, 這兒, 這樣, 這裡, 那, 那樣, 那裏, 那裡
- This corpus contains 137 lemmas tagged as determiners (DET): $5,000, A330, AEG, AK-47, Activision, Advance, Android, CRH380C, Eve, Ghost, Google, JAXA, KKR, Kekal, Kilpatrick, M1, NDS, OROCHI, PSP, Rivers, The, WHO, Wheeler, g(x), iPhone, iPod, km/h, p, 一切, 上, 下, 以上, 以下, 任, 任何, 佔領, 何, 全, 全套, 全部, 全體, 其他, 其它, 其餘, 別, 前, 前任, 另, 另外, 各, 各個, 各州, 各式, 各種, 各種各樣, 各級, 各項, 各類, 同, 同年, 夕拾, 後, 所有, 整, 整個, 整場, 整塊, 整套, 整所, 整架, 整片, 整顆, 是次, 有的, 本, 本屆, 本班, 某, 某些, 某個, 某種, 此, 此套, 此次, 此種, 此等, 此項, 此類, 歷屆, 毎年, 每, 每位, 每個, 每元, 每卡, 每周, 每天, 每年, 每座, 每戶, 每所, 每日, 每枚, 每次, 每段, 每片, 每秒, 每組, 每週, 每邊, 每間, 每隊, 每集, 當屆, 發售, 眾, 該, 該屆, 該批, 該族, 該條, 該段, 該組, 該集, 諸, 近, 這, 這些, 這次, 這種, 那, 那些, 關於, 首, 首任, 首條, 首部
- Out of the above, 5 lemmas occurred sometimes as PRON and sometimes as DET: 何, 此, 這, 這些, 那
- This corpus contains 29 lemmas tagged as auxiliaries (AUX): 了, 可, 可以, 可能, 得, 必, 必須, 想, 應, 應當, 應該, 敢, 是, 會, 欲, 為, 肯, 能, 能夠, 著, 被, 要, 該, 過, 需, 需要, 須, 願, 願意
- Out of the above, 12 lemmas occurred sometimes as AUX and sometimes as VERB: 了, 得, 必須, 想, 敢, 是, 為, 著, 要, 過, 需, 需要
- This corpus does not use the VerbForm feature.
Nominal Features
- Plur
- NOUN: 人們
- PART: 們
- PRON: 他們, 它們, 我們, 牠們, 她們
- Gen
- ADP: 之外
- PART: 的, 之, 地
Degree and Polarity
- Neg
- ADV: 不, 未, 沒, 別, 無
- AUX: 不是, 不會, 不能, 未能, 不可, 不得, 不應, 不願, 不想, 不需
- VERB: 沒有, 不是, 不及, 不如, 不敵, 不滿, 沒收, 不停, 不受, 不合
Verbal Features
- Perf
- AUX: 了, 過
- PART: 了
- Prog
- AUX: 著
- Cau
- ADP: 以
- VERB: 以, 使, 讓, 使得, 令, 導致, 要求, 派, 派遣, 任命
- Pass
- AUX: 被, 為
Pronouns, Determiners, Quantifiers
- Card
- NUM: 一, 兩, 三, 1, 3, 12, 5, 2, 8, 10
- Ord
- ADJ: 第16, 第一
- NUM: 第一, 第二, 第三, 首次, 第四, 第五, 第1, 第六, 第七, 首位
- 1
- PRON: 我, 我們
- 2
- PRON: 你, 妳, 您
- 3
- PRON: 他, 其, 她, 它, 他們, 它們, 牠們, 她們, 牠, 祂
Other Features
- PartType
- Int
- PART: 呢, 嗎, 啊
- Int
Syntax
Auxiliary Verbs and Copula
- This corpus uses 2 lemmas as copulas (cop). Examples: 是、 為.
- This corpus uses 26 lemmas as auxiliaries (aux). Examples: 了、 會、 可以、 著、 可、 能、 要、 過、 可能、 必須、 能夠、 想、 應、 需、 需要、 得、 須、 應該、 欲、 願、 願意、 必、 肯、 應當、 敢、 該.
- This corpus uses 2 lemmas as passive auxiliaries (aux:pass). Examples: 被、 為.
Core Arguments, Oblique Arguments and Adjuncts
Here we consider only relations between verbs (parent) and nouns or pronouns (child).
- nsubj
- VERB--NOUN (4044)
- VERB--NOUN-ADP(上) (6)
- VERB--NOUN-ADP(下) (2)
- VERB--NOUN-ADP(不) (1)
- VERB--NOUN-ADP(中) (12)
- VERB--NOUN-ADP(主) (4)
- VERB--NOUN-ADP(之間) (5)
- VERB--NOUN-ADP(以) (1)
- VERB--NOUN-ADP(內) (2)
- VERB--NOUN-ADP(前) (3)
- VERB--NOUN-ADP(前)-ADP(副)-ADP(總) (1)
- VERB--NOUN-ADP(副) (2)
- VERB--NOUN-ADP(原) (5)
- VERB--NOUN-ADP(和)-ADP(的) (1)
- VERB--NOUN-ADP(在)-ADP(的) (6)
- VERB--NOUN-ADP(大) (18)
- VERB--NOUN-ADP(寬) (1)
- VERB--NOUN-ADP(對)-ADP(的) (9)
- VERB--NOUN-ADP(對於)-ADP(的) (2)
- VERB--NOUN-ADP(小) (9)
- VERB--NOUN-ADP(強) (1)
- VERB--NOUN-ADP(彎) (1)
- VERB--NOUN-ADP(微) (1)
- VERB--NOUN-ADP(新) (14)
- VERB--NOUN-ADP(暗) (1)
- VERB--NOUN-ADP(有關)-ADP(的) (3)
- VERB--NOUN-ADP(期間)-ADP(的) (1)
- VERB--NOUN-ADP(清) (1)
- VERB--NOUN-ADP(為) (1)
- VERB--NOUN-ADP(的) (1)
- VERB--NOUN-ADP(經過)-ADP(的) (2)
- VERB--NOUN-ADP(總) (5)
- VERB--NOUN-ADP(老) (5)
- VERB--NOUN-ADP(與)-ADP(的) (4)
- VERB--NOUN-ADP(舊) (2)
- VERB--NOUN-ADP(裡) (1)
- VERB--NOUN-ADP(里) (1)
- VERB--NOUN-ADP(間) (1)
- VERB--NOUN-ADP(關於)-ADP(的) (3)
- VERB--PRON (736)
- VERB--PRON-ADP(之間) (2)
- obj
- VERB--NOUN (5407)
- VERB--NOUN-ADP(上) (7)
- VERB--NOUN-ADP(下) (1)
- VERB--NOUN-ADP(不) (2)
- VERB--NOUN-ADP(中) (13)
- VERB--NOUN-ADP(主) (1)
- VERB--NOUN-ADP(之上) (1)
- VERB--NOUN-ADP(之下) (1)
- VERB--NOUN-ADP(之中) (4)
- VERB--NOUN-ADP(之內) (2)
- VERB--NOUN-ADP(之間) (2)
- VERB--NOUN-ADP(亞) (1)
- VERB--NOUN-ADP(今) (1)
- VERB--NOUN-ADP(代) (1)
- VERB--NOUN-ADP(以上) (1)
- VERB--NOUN-ADP(以下) (1)
- VERB--NOUN-ADP(以外) (1)
- VERB--NOUN-ADP(低) (1)
- VERB--NOUN-ADP(偽) (1)
- VERB--NOUN-ADP(像是)-ADP(的) (1)
- VERB--NOUN-ADP(內) (6)
- VERB--NOUN-ADP(分) (1)
- VERB--NOUN-ADP(副) (8)
- VERB--NOUN-ADP(原) (4)
- VERB--NOUN-ADP(古) (1)
- VERB--NOUN-ADP(向)-ADP(的) (1)
- VERB--NOUN-ADP(在)-ADP(的) (14)
- VERB--NOUN-ADP(堂) (1)
- VERB--NOUN-ADP(外) (1)
- VERB--NOUN-ADP(大) (34)
- VERB--NOUN-ADP(好) (1)
- VERB--NOUN-ADP(始) (1)
- VERB--NOUN-ADP(子) (2)
- VERB--NOUN-ADP(寬) (1)
- VERB--NOUN-ADP(對)-ADP(的) (26)
- VERB--NOUN-ADP(對)-ADP(的)-ADP(上) (1)
- VERB--NOUN-ADP(對於)-ADP(的) (3)
- VERB--NOUN-ADP(小) (14)
- VERB--NOUN-ADP(左右) (2)
- VERB--NOUN-ADP(彎) (1)
- VERB--NOUN-ADP(後) (1)
- VERB--NOUN-ADP(快) (1)
- VERB--NOUN-ADP(新) (21)
- VERB--NOUN-ADP(於)-ADP(的) (3)
- VERB--NOUN-ADP(有關)-ADP(的) (6)
- VERB--NOUN-ADP(毒) (1)
- VERB--NOUN-ADP(濃) (1)
- VERB--NOUN-ADP(熱) (3)
- VERB--NOUN-ADP(直到)-ADP(的) (1)
- VERB--NOUN-ADP(總) (8)
- VERB--NOUN-ADP(與)-ADP(的) (4)
- VERB--NOUN-ADP(與)-ADP(的)-ADP(老) (1)
- VERB--NOUN-ADP(舊) (4)
- VERB--NOUN-ADP(裡) (1)
- VERB--NOUN-ADP(親) (1)
- VERB--NOUN-ADP(超) (2)
- VERB--NOUN-ADP(躁) (1)
- VERB--NOUN-ADP(輕)-ADP(重) (1)
- VERB--NOUN-ADP(里) (2)
- VERB--NOUN-ADP(重) (1)
- VERB--NOUN-ADP(間) (1)
- VERB--NOUN-ADP(關於)-ADP(的) (4)
- VERB--NOUN-ADP(離)-ADP(的) (1)
- VERB--NOUN-ADP(鮮) (1)
- VERB--PRON (98)
- VERB--PRON-ADP(之中) (1)
- iobj
- VERB--NOUN (45)
- VERB--NOUN-ADP(上) (1)
- VERB--NOUN-ADP(中) (2)
- VERB--NOUN-ADP(主) (1)
- VERB--PRON (8)
Relations Overview
- This corpus uses 12 relation subtypes: acl:relcl, aux:pass, compound:ext, csubj:pass, discourse:sp, flat:foreign, flat:name, mark:adv, mark:rel, nmod:tmod, nsubj:pass, obl:patient
- The following 1 main types are not used alone, they are always subtyped: flat
- The following 5 relation types are not used in this corpus at all: expl, fixed, list, goeswith, dep