home edit page issue tracker

This page pertains to UD version 2.

Treebank Statistics: UD_Cantonese-HK: POS Tags: NOUN

There are 301 NOUN lemmas (28%), 525 NOUN types (30%) and 2088 NOUN tokens (15%). Out of 15 observed tags, the rank of NOUN is: 1 in number of lemmas, 1 in number of types and 3 in number of tokens.

The 10 most frequent NOUN lemmas: _、 個、 啲、 人、 歌、 爺爺、 緡、 阿哥、 嘢、 年

The 10 most frequent NOUN types: 個、 主席、 議員、 啲、 人、 問題、 會議、 而家、 規則、 嘢

The 10 most frequent ambiguous lemmas: _ (PUNCT 1377, VERB 1352, NOUN 1283, ADV 853, PART 764, PRON 662, AUX 335, DET 217, ADJ 209, ADP 140, NUM 124, SCONJ 101, CCONJ 93, INTJ 92, PROPN 52), 個 (NOUN 78, PART 2), 啲 (NOUN 56, ADV 26, DET 1, PART 1), 而家 (NOUN 10, CCONJ 3), 嗰陣時 (NOUN 8, ADP 4), 分 (NOUN 5, VERB 1), mean (ADJ 1, NOUN 1), 份 (NOUN 2, PART 1), 帶 (VERB 3, NOUN 2), 行 (VERB 8, NOUN 2)

The 10 most frequent ambiguous types: 個 (NOUN 205, PART 2), 啲 (NOUN 62, ADV 30, DET 2, PART 1), 而家 (NOUN 38, CCONJ 3), 選舉 (NOUN 24, VERB 4), 決定 (NOUN 23, VERB 10), 宣誓 (VERB 16, NOUN 13), 點 (NOUN 12, ADV 11), 嗰陣時 (NOUN 9, ADP 4, SCONJ 1), 份 (NOUN 8, PART 1), 分 (NOUN 7, VERB 1)

Morphology

The form / lemma ratio of NOUN is 1.744186 (the average of all parts of speech is 1.624294).

The 1st highest number of forms (259) was observed with the lemma “_”: Declaration_of_Renunciation_of_UK_citizenship, arbitrary_use_of_power, copy, power, s, sh, 上面, 上高, 主, 主席, 主持, 之下, 之前, 之後, 事, 事情, 交代, 交待, 人, 人情, 今天, 今日, 代表性, 以前, 件, 任期, 份, 休會, 位, 位置, 依家, 信, 信件, 信任, 個, 候選, 候選人, 做法, 傳媒, 儀式, 入便, 內容, 全港, 公義, 出便, 出面, 分, 分鐘, 判斷, 利益, 刻, 則, 剛才, 副本, 勸喻, 危機, 原來, 原因, 句, 可能, 司法, 同事, 名單, 呢, 問題, 啲, 嗰陣時, 嘢, 回答, 國家, 國籍, 國籍法, 地方, 基本法, 基礎, 場合, 壓力, 外國, 大學生, 大會, 大眾, 嫌疑, 學校, 學生, 學生會, 學生證, 宗教, 宣佈, 宣誓, 封, 居留權, 屆, 工作, 市民, 席位, 底下, 座位, 廢票, 式, 後便, 後果, 後面, 情況, 意見, 態度, 憲制, 憲政, 懷疑, 手, 手續, 投票, 投票權, 投票箱, 括弧, 提問, 擉, 政府, 政治, 效力, 教授, 文件, 文書, 新聞, 方式, 日後, 時候, 時間, 會, 會眾, 會議, 會議室, 會議廳, 未來, 條, 條文, 概念, 標準, 樣, 機會, 檯, 權, 權利, 權力, 次, 次序, 正本, 正話, 步, 段, 民意, 決定, 法例, 法庭, 法律, 海, 澄清, 爭議, 片段, 現在, 理據, 理由, 理解, 琴日, 番, 當中, 當事人, 發言, 監誓, 監誓員, 確認, 社會, 神仙, 秘書, 秘書處, 秘書長, 秩序, 程序, 程度, 稍後, 種, 立, 立場, 立法會, 答覆, 範疇, 精神, 結果, 義務, 考慮, 而家, 耐心, 聲, 聲明, 背景, 能力, 表決, 裁決, 裏便, 裏面, 裹面, 要求, 規則, 規定, 規矩, 規程, 視聽, 觀點, 解釋, 討論, 誓詞, 語氣, 說話, 論壇, 論點, 諮詢, 證據, 證書, 議, 議員, 議會, 議會場, 議程, 責任, 資料, 資格, 質疑, 身份, 身後便, 辦法, 辯論, 途徑, 進程, 過往, 過程, 選擧, 選民, 選舉, 邏輯, 部份, 醜人, 錯, 鐘頭, 關係, 附表, 限制, 陣, 陣間, 階段, 階程, 隔籬, 障礙, 電郵, 面前, 項, 領事館, 頭先, 風險, 首, 點.

The 2nd highest number of forms (1) was observed with the lemma “CD”: CD.

The 3rd highest number of forms (1) was observed with the lemma “Mean”: Mean.

NOUN occurs with 1 features: NounType (473; 23% instances)

NOUN occurs with 1 feature-value pairs: NounType=Clf

NOUN occurs with 2 feature combinations. The most frequent feature combination is _ (1615 tokens). Examples: 主席、 議員、 人、 問題、 會議、 而家、 規則、 嘢、 選舉、 今日

Relations

NOUN nodes are attached to their parents using 33 different relations: obj (648; 31% instances), nsubj (223; 11% instances), clf (212; 10% instances), obl:tmod (132; 6% instances), compound (117; 6% instances), det (110; 5% instances), obl (106; 5% instances), nmod (100; 5% instances), conj (78; 4% instances), root (75; 4% instances), vocative (55; 3% instances), case:loc (47; 2% instances), flat (26; 1% instances), reparandum (22; 1% instances), appos (20; 1% instances), obj:periph (19; 1% instances), advmod (16; 1% instances), ccomp (14; 1% instances), compound:vo (14; 1% instances), advmod:df (9; 0% instances), parataxis (9; 0% instances), dislocated (8; 0% instances), nsubj:periph (5; 0% instances), xcomp (5; 0% instances), amod (3; 0% instances), mark (3; 0% instances), advcl (2; 0% instances), case (2; 0% instances), iobj (2; 0% instances), obl:agent (2; 0% instances), obl:patient (2; 0% instances), acl (1; 0% instances), discourse:sp (1; 0% instances)

Parents of NOUN nodes belong to 12 different parts of speech: VERB (1171; 56% instances), NOUN (483; 23% instances), NUM (122; 6% instances), DET (84; 4% instances), (75; 4% instances), ADJ (57; 3% instances), PROPN (41; 2% instances), PRON (24; 1% instances), AUX (13; 1% instances), ADP (12; 1% instances), ADV (5; 0% instances), PART (1; 0% instances)

863 (41%) NOUN nodes are leaves.

559 (27%) NOUN nodes have one child.

320 (15%) NOUN nodes have two children.

346 (17%) NOUN nodes have three or more children.

The highest child degree of a NOUN node is 14.

Children of NOUN nodes are attached using 33 different relations: det (425; 16% instances), punct (408; 16% instances), nmod (222; 9% instances), case (208; 8% instances), nummod (201; 8% instances), discourse:sp (192; 7% instances), acl (161; 6% instances), compound (149; 6% instances), amod (127; 5% instances), advmod (73; 3% instances), conj (73; 3% instances), case:loc (61; 2% instances), cop (56; 2% instances), nsubj (49; 2% instances), appos (34; 1% instances), discourse (31; 1% instances), reparandum (26; 1% instances), cc (23; 1% instances), obl:tmod (15; 1% instances), obl (9; 0% instances), parataxis (9; 0% instances), vocative (9; 0% instances), advcl (5; 0% instances), dislocated (4; 0% instances), mark (4; 0% instances), mark:rel (4; 0% instances), ccomp (2; 0% instances), clf (2; 0% instances), flat (2; 0% instances), advcl:coverb (1; 0% instances), aux (1; 0% instances), csubj (1; 0% instances), nsubj:periph (1; 0% instances)

Children of NOUN nodes belong to 15 different parts of speech: NOUN (483; 19% instances), PUNCT (408; 16% instances), DET (308; 12% instances), PART (304; 12% instances), NUM (208; 8% instances), PRON (207; 8% instances), VERB (187; 7% instances), ADJ (130; 5% instances), ADP (116; 4% instances), ADV (83; 3% instances), AUX (60; 2% instances), PROPN (38; 1% instances), INTJ (27; 1% instances), CCONJ (23; 1% instances), SCONJ (6; 0% instances)