home edit page issue tracker

This page pertains to UD version 2.

Treebank Statistics: UD_Chinese-PUD: POS Tags: X

There are 1 X lemmas (7%), 275 X types (5%) and 306 X tokens (1%). Out of 15 observed tags, the rank of X is: 15 in number of lemmas, 5 in number of types and 13 in number of tokens.

The 10 most frequent X lemmas: _

The 10 most frequent X types: BBC、 CNN、 the、 Martin、 Anaya、 Andy、 B.C.、 Barrosos、 Catalano、 DNA

The 10 most frequent ambiguous lemmas: _ (NOUN 5410, VERB 3467, PUNCT 2902, PART 1881, PROPN 1361, ADP 1288, ADV 1283, NUM 873, PRON 710, ADJ 650, AUX 618, DET 355, X 306, CCONJ 283, SCONJ 28)

The 10 most frequent ambiguous types: 中 (ADP 113, NOUN 6, X 1), 的 (PART 1361, X 1), 而 (ADV 47, CCONJ 1, X 1), 被 (AUX 79, ADP 22, X 1)

Morphology

The form / lemma ratio of X is 275.000000 (the average of all parts of speech is 388.466667).

The 1st highest number of forms (275) was observed with the lemma “_”: Addenbrooke, Adnan, Agora, Alejandra, Amin, Anaya, Andes, Andrew, Andy, Antillas, Antilles, Aoun, Asty, Athina, Atkinson, Avery, Aviva, Avro, B.C., B29, BBC, Barratt, Barrosos, Bass, Benoît, Beust, Blindleia, Boemer, Bono, Brant, Bruno, Bruyn, Buck, Buena, Báñez, CBC, CBS, CGI, CNN, CRTC, Carlo, Carlos, Carolina, Carson, Catalano, Chiliaarm, Chris, Cifuentes, Ciscaucasus, Claret, Clinton, Conte, Cranach, Cristina, Crouch, Cup, Curio, DFB, DNA, DPA, David, Davis, Dean, Dee, Di, Die, Diess, Dietrich, Domenico, Donald, Doss, Durán, Dündar, EMicro, ETA, Eibingen, Elder, Energy, Ennio, Eon, Erdogan, F1, FSLN, Facebook, Film, Frank, Freeman, Fátima, G.D.P, GCHQ, GEMA, Garden, Gay, Georg, Georges, Germaine, Geronimo, Glenda, Goffredo, González, Guilbeault, Günter, H, Hariri, Herbert, Hillary, Hispania, Hopley, If, Income, Investors, Isner, Javier, Jeff, Jeffrey, John, Johnny, Joseph, Jr., Juan, Jutting, Karel, Kerber, Khanzir, Kiki, King, King,, Knott, Knuck, Krätschmer, Kven, Kühn, La, Larry, Leive, Lindsay, Lucas, Luther, MLA, MahaNakhon, Mailis, Marat, MarcRich, Mare, Margaret, Mark, Maroto, Marr, Martin, Mash, Mate, Mestre, Metti, Meänkieli, Miami, Mildred, Millican, Mohammed, Monster, Monte, Morgan, Morricone, Multi, Nectar卡, Negan, Nicolai, NineNews, Norman, North, Nostrum, Obermarsberg, Odi, Olivia, Packham, Paire, Papworth, Pedro, Pelucca, Petrassi, Pintado, Plano, Pugh, Punta, RECO, RHS, RSPB, Rachel, Rai, Rasa, Rastislav, Rastiz, Reddit, Reichenbach, Remy, Return, Richard, Rocco, Rogers, Rosane, Rupertsberg, SPIEGEL, Sade, Salaman, Salas, Sallyanne, SaskTel, Sea, Select, Serena, Sheppard, Show, Simon, Simple, Siri, Slack, Slate, Spotify, St., Stephen, Strategy, Style, Sánchez, Target, Tarlo, Transcaucasus, Traum,, Trump, Twitter, T型車, Uber, Vance, Viguier區, Villa, Vine, Von, Walter, Weiss, Wi-Fi, Williams, Winham, Winterkorn, Woods, Yerba, You, YouTube, Z., ZAY, ZEIT, Zayion, Zettel’s, Zimmer, Záhorie, al-Jadaan, and, andino, bjórr酒, de, funds, of, the, tipo, volcanology, vulcanology, Ángel, Évole, Ötzi, α, 中, 乎, 嘿, 底, 徹, 的, 而, 被.

X occurs with 1 features: Foreign (91; 30% instances)

X occurs with 1 feature-value pairs: Foreign=Yes

X occurs with 2 feature combinations. The most frequent feature combination is _ (215 tokens). Examples: BBC、 CNN、 Martin、 Andy、 B.C.、 Barrosos、 DNA、 Dündar、 Facebook、 Leive

Relations

X nodes are attached to their parents using 13 different relations: appos (104; 34% instances), flat (91; 30% instances), nsubj (32; 10% instances), compound (25; 8% instances), nmod (12; 4% instances), obj (11; 4% instances), obl (10; 3% instances), conj (8; 3% instances), dep (8; 3% instances), nsubj:pass (2; 1% instances), acl:relcl (1; 0% instances), discourse (1; 0% instances), root (1; 0% instances)

Parents of X nodes belong to 9 different parts of speech: NOUN (90; 29% instances), X (87; 28% instances), VERB (66; 22% instances), PROPN (55; 18% instances), ADJ (2; 1% instances), NUM (2; 1% instances), PART (2; 1% instances), PRON (1; 0% instances), (1; 0% instances)

152 (50%) X nodes are leaves.

41 (13%) X nodes have one child.

63 (21%) X nodes have two children.

50 (16%) X nodes have three or more children.

The highest child degree of a X node is 7.

Children of X nodes are attached using 15 different relations: punct (196; 56% instances), flat (82; 24% instances), case (22; 6% instances), conj (9; 3% instances), cc (7; 2% instances), acl:relcl (6; 2% instances), nmod (6; 2% instances), case:loc (4; 1% instances), compound (4; 1% instances), appos (3; 1% instances), nummod (3; 1% instances), cop (2; 1% instances), nsubj (2; 1% instances), amod (1; 0% instances), mark:relcl (1; 0% instances)

Children of X nodes belong to 12 different parts of speech: PUNCT (196; 56% instances), X (87; 25% instances), PART (15; 4% instances), ADP (13; 4% instances), NOUN (9; 3% instances), CCONJ (7; 2% instances), PROPN (7; 2% instances), VERB (7; 2% instances), NUM (3; 1% instances), AUX (2; 1% instances), ADJ (1; 0% instances), PRON (1; 0% instances)