UD for Shanghainese 
Shanghainese is grammatically very similar to Mandarin Chinese (see Chinese UD), with differences being primarily lexical. Therefore, this documentation is not exhaustive; it only highlights the differences from the Chinese UD guidelines. If a topic is not specifically addressed here, please refer to and apply the rules from the Chinese UD.
Lacking a standardised orthography and with the nature of primarily colloquial usage, a same Shanghainese word may be transcribed in multiple different Chinese characters. This documentation aims to list out as many verions as possible for a decent coverage.
Tokenization and Word Segmentation
- The principles of the Penn Chinese Trebank are followed.
- Function words are treated as separate tokens, even when phonologically or morphologically attached to verbs. These include items such as 了 leh, marking the perfective aspect, and 勒 leh, indicating the continuation of an action.
- Combined use of sentence-final particles are treated as a single token, unless they are syntactically different (e.g., one indicating the end of a sentence and the other one marking a question).
- Negators include 勿 veh (also 伐, 弗) and 没 meh. They are not separated from the token if:
- They are an integrated part of the word, which is very common.
- It is difficult to find an “original form” of a negated word in Shanghainese (there is not always a corresponding word).
Morphology
Tags
- Shanghainese includes 15 universal POS tags, currently excluding SYM and X.
- Particles (PART, see also
PARTin Chinese UD):- Mandarin Chinese particles 的 de, 地 de, and 得 de exactly correspond to a same multifunctional Shanghainese particle in 个 eh (also transcribed as 呃, 额) that functions as a genitive, relativiser, nominaliser, or adverbialiser.
- The character 个 eh can also be a classifier, which follows the same usage in Mandarin and is thus not a particle in such case.
- Shanghainese sentence-final particles include 伐 vah, 了 leh (also 嘞), and 啦 ‘la. Combined use is also very common in Shanghainese, especially in rhetorical questions (such as 伐啦 veh ‘la).
- Mandarin Chinese particles 的 de, 地 de, and 得 de exactly correspond to a same multifunctional Shanghainese particle in 个 eh (also transcribed as 呃, 额) that functions as a genitive, relativiser, nominaliser, or adverbialiser.
- Nouns (NOUN, see also
NOUNin Chinese UD):- Words tagged as NOUN include regular nouns, classifiers, temporal nouns, position words, and localisers.
- Temporal nouns, despite typically being the adjunct of verbs, are always tagged as a noun.
- Pronouns (PRON, see also
PRONin Chinese UD):- Personal pronouns:
- There are no polite forms of personal pronouns in Shanghainese.
- 吾 ngu (first singular)
- 阿拉 ah ‘la (first plural)
- 侬 non (second singular)
- 㑚 (also 拿) na (second plural)
- 渠 (also 伊) yi (third singular)
- 渠拉 (also 伊拉) yi ‘la (third plural)
- Possessive case of the pronouns are constructed by appending 个 eh (genitive particle).
- Demonstrative pronouns:
- There are two demonstrative pronouns in Shanghainese:
- 搿 (also 葛) geh “this/these” or “here”
- 埃 (also 伊) i “that/those” or “there”
- They also form derived forms: 搿搭 geh teh “here”, 埃搭 i teh “there”, and 埃面搭 i mie teh “there”.
- There are two demonstrative pronouns in Shanghainese:
- Personal pronouns:
- All other tagging rules are the same as Mandarin Chinese.
Features
Additional features are currently not included.
Syntax
- Shanghainese syntax is essentially the same as Chinese syntax.
- Oblique nominal (obl, see also
oblin Chinese UD)- Shanghainese 被 be corresponds to the same Mandarin word 被 bei. See
obl:agentin Chinese UD. - Mandarin word 把 ba corresponds to Shanghainese 拿 no and 帮 paon . See
obl:patientin Chinese UD.
- Shanghainese 被 be corresponds to the same Mandarin word 被 bei. See
- Relation subtypes are currently not considered.
- 34 relation types are present, excluding expl, list, and fixed.
Treebanks
There is only one Shanghainese UD treebank: