home edit page issue tracker

This page pertains to UD version 2.

Tokenization

UD Chinese

[Description not currently available]

UD Chinese-HK

This treebank (to be released) follows the segmentation guidelines for the Chinese Treebank with two notable exceptions: (a) V-R resultative compounds (see compound:vv) and (b) V-得-V and V-不-V compounds (also known as “potential complement” in the literature; see compound:dir and compound:vv). For both cases, while the Chinese Treebank treats each compound as a single token or as separated tokens depending on varying factors including semantic compositionality and multisyllabicity, UD Chinese-HK separates them all into separate tokens without exception.