home edit page issue tracker

This page pertains to UD version 2.

Treebank Statistics: UD_Finnish-TDT: POS Tags: NOUN

There are 13573 NOUN lemmas (51%), 27225 NOUN types (49%) and 56477 NOUN tokens (28%). Out of 15 observed tags, the rank of NOUN is: 1 in number of lemmas, 1 in number of types and 1 in number of tokens.

The 10 most frequent NOUN lemmas: vuosi, aika, päivä, osa, komissio, maa, asia, mies, artikla, ihminen

The 10 most frequent NOUN types: vuonna, vuoden, yhteisön, komission, päivänä, huomioon, mies, yliopiston, prosenttia, aikana

The 10 most frequent ambiguous lemmas: aika (NOUN 499, ADV 43), kohta (NOUN 156, ADV 8, SCONJ 1), maailma (NOUN 138, PROPN 1), kerta (NOUN 93, ADV 2), puoli (NOUN 81, NUM 25), a (NOUN 33, X 4, PROPN 1), suomalainen (ADJ 92, NOUN 55), sim (NOUN 38, PROPN 1), Jussi (NOUN 32, PROPN 1), valkea (NOUN 30, ADJ 6)

The 10 most frequent ambiguous types: ajan (NOUN 91, VERB 1), aikaa (NOUN 76, ADV 4), aikaan (NOUN 57, ADV 8), aika (ADV 41, NOUN 41), a (NOUN 33, X 4, PROPN 1), toimia (NOUN 29, VERB 22), b (NOUN 24, PROPN 1), puolella (NOUN 27, NUM 1), alusta (NOUN 18, VERB 1), asiassa (ADV 21, NOUN 18)

Morphology

The form / lemma ratio of NOUN is 2.005820 (the average of all parts of speech is 2.067586).

The 1st highest number of forms (32) was observed with the lemma “lapsi”: -lasten, Lapsina, lapsella, lapselle, lapselleni, lapsemme, lapsen, lapsena, lapseni, lapsenne, lapsensa, lapsessa, lapsesta, lapset, lapsi, lapsia, lapsiin, lapsille, lapsilleen, lapsillenne, lapsillesi, lapsilta, lapsista, lapsistaan, lapsistani, lasta, lastaan, lastamme, lasten, lasten-, lasteni, lastensa.

The 2nd highest number of forms (31) was observed with the lemma “aika”: Aikamme, aika, aikaa, aikaakaan, aikaan, aikaani, aikaansa, aikana, aikanaan, aikanamme, aikani, aikansa, aikasi, aikoihin, aikoina, aikoinaan, aikoja, aikojen, aikona, ajaksi, ajalla, ajallaan, ajalta, ajan, ajas, ajassa, ajasta, ajastaan, ajat, ajoiksi, ajoista.

The 3rd highest number of forms (31) was observed with the lemma “mies”: Mieskin, mieheen, mieheensä, mieheksi, miehelle, miehelleen, miehellä, mieheltä, mieheltäni, miehen, mieheni, miehenikin, miehensä, miehensäkään, miehenä, miehestä, miehestäni, miehestään, miehet, miehille, miehistä, miehiä, miehiään, mies, mieshän, miesten, miesten-, miestenkin, miestä, miestäni, miestään.

NOUN occurs with 10 features: Case (56415; 100% instances), Number (56379; 100% instances), Derivation (7488; 13% instances), Person[psor] (2662; 5% instances), Number[psor] (953; 2% instances), Abbr (575; 1% instances), Clitic (248; 0% instances), Typo (176; 0% instances), Style (157; 0% instances), Degree (6; 0% instances)

NOUN occurs with 44 feature-value pairs: Abbr=Yes, Case=Abe, Case=Abl, Case=Ade, Case=All, Case=Com, Case=Ela, Case=Ess, Case=Gen, Case=Ill, Case=Ine, Case=Ins, Case=Nom, Case=Par, Case=Tra, Clitic=Han, Clitic=Kaan, Clitic=Kin, Clitic=Ko, Degree=Pos, Derivation=Inen, Derivation=Inen,Vs, Derivation=Ja, Derivation=Ja,Tar, Derivation=Lainen, Derivation=Lainen,Vs, Derivation=Llinen, Derivation=Llinen,Vs, Derivation=Minen, Derivation=Tar, Derivation=Ton, Derivation=Ton,Vs, Derivation=U, Derivation=Vs, Number=Plur, Number=Sing, Number[psor]=Plur, Number[psor]=Sing, Person[psor]=1, Person[psor]=2, Person[psor]=3, Style=Arch, Style=Coll, Typo=Yes

NOUN occurs with 566 feature combinations. The most frequent feature combination is Case=Gen|Number=Sing (9173 tokens). Examples: vuoden, yhteisön, komission, yliopiston, artiklan, asetuksen, neuvoston, ajan, maan, direktiivin

Relations

NOUN nodes are attached to their parents using 33 different relations: obl (13850; 25% instances), obj (9026; 16% instances), nmod:poss (7172; 13% instances), nsubj (6458; 11% instances), conj (4564; 8% instances), nmod (3917; 7% instances), nsubj:cop (3332; 6% instances), root (2699; 5% instances), nmod:gobj (1414; 3% instances), compound:nn (906; 2% instances), appos (526; 1% instances), flat (517; 1% instances), advcl (417; 1% instances), nmod:gsubj (409; 1% instances), ccomp (213; 0% instances), xcomp:ds (210; 0% instances), orphan (181; 0% instances), acl:relcl (175; 0% instances), xcomp (158; 0% instances), vocative (89; 0% instances), parataxis (84; 0% instances), flat:name (47; 0% instances), goeswith (33; 0% instances), amod (20; 0% instances), flat:foreign (13; 0% instances), csubj:cop (11; 0% instances), case (10; 0% instances), compound (9; 0% instances), compound:prt (6; 0% instances), csubj (4; 0% instances), fixed (4; 0% instances), discourse (2; 0% instances), acl (1; 0% instances)

Parents of NOUN nodes belong to 14 different parts of speech: VERB (29566; 52% instances), NOUN (18351; 32% instances), (2699; 5% instances), ADJ (2303; 4% instances), PROPN (1360; 2% instances), ADV (839; 1% instances), PRON (810; 1% instances), NUM (406; 1% instances), SYM (67; 0% instances), X (59; 0% instances), ADP (7; 0% instances), AUX (4; 0% instances), CCONJ (3; 0% instances), INTJ (3; 0% instances)

19334 (34%) NOUN nodes are leaves.

22424 (40%) NOUN nodes have one child.

8463 (15%) NOUN nodes have two children.

6256 (11%) NOUN nodes have three or more children.

The highest child degree of a NOUN node is 22.

Children of NOUN nodes are attached using 42 different relations: nmod:poss (10055; 15% instances), amod (9892; 15% instances), punct (5752; 9% instances), conj (4531; 7% instances), nmod (3854; 6% instances), cc (3475; 5% instances), acl (3387; 5% instances), det (3198; 5% instances), nummod (3168; 5% instances), advmod (2820; 4% instances), cop (2426; 4% instances), nsubj:cop (2296; 4% instances), case (2174; 3% instances), acl:relcl (1610; 2% instances), nmod:gobj (1553; 2% instances), appos (976; 1% instances), obl (589; 1% instances), mark (556; 1% instances), nmod:gsubj (551; 1% instances), aux (435; 1% instances), compound:nn (428; 1% instances), advcl (288; 0% instances), orphan (221; 0% instances), parataxis (216; 0% instances), ccomp (141; 0% instances), xcomp:ds (99; 0% instances), cc:preconj (88; 0% instances), cop:own (84; 0% instances), discourse (74; 0% instances), flat (71; 0% instances), csubj:cop (67; 0% instances), flat:name (33; 0% instances), xcomp (30; 0% instances), compound (24; 0% instances), goeswith (18; 0% instances), vocative (13; 0% instances), compound:prt (12; 0% instances), obj (10; 0% instances), nsubj (3; 0% instances), csubj (2; 0% instances), fixed (2; 0% instances), flat:foreign (1; 0% instances)

Children of NOUN nodes belong to 15 different parts of speech: NOUN (18351; 28% instances), ADJ (10255; 16% instances), PUNCT (5752; 9% instances), VERB (5747; 9% instances), PRON (5047; 8% instances), PROPN (4212; 6% instances), CCONJ (3533; 5% instances), NUM (3422; 5% instances), ADV (3002; 5% instances), AUX (2947; 5% instances), ADP (2138; 3% instances), SCONJ (550; 1% instances), SYM (157; 0% instances), X (81; 0% instances), INTJ (29; 0% instances)