home edit page issue tracker

This page pertains to UD version 2.

Treebank Statistics: UD_Finnish-TDT: POS Tags: NOUN

There are 13565 NOUN lemmas (51%), 27216 NOUN types (49%) and 56454 NOUN tokens (28%). Out of 15 observed tags, the rank of NOUN is: 1 in number of lemmas, 1 in number of types and 1 in number of tokens.

The 10 most frequent NOUN lemmas: vuosi, aika, päivä, osa, komissio, maa, asia, mies, artikla, ihminen

The 10 most frequent NOUN types: vuonna, vuoden, yhteisön, komission, päivänä, huomioon, mies, yliopiston, prosenttia, aikana

The 10 most frequent ambiguous lemmas: aika (NOUN 499, ADV 43), kohta (NOUN 156, ADV 8, SCONJ 1), maailma (NOUN 138, PROPN 1), kerta (NOUN 93, ADV 2), puoli (NOUN 81, NUM 25), a (NOUN 33, X 4, PROPN 1), suomalainen (ADJ 92, NOUN 55), sim (NOUN 38, PROPN 1), Jussi (NOUN 32, PROPN 1), valkea (NOUN 30, ADJ 6)

The 10 most frequent ambiguous types: ajan (NOUN 91, VERB 1), aikaa (NOUN 76, ADV 4), aikaan (NOUN 57, ADV 8), aika (ADV 41, NOUN 41), a (NOUN 33, X 4, PROPN 1), toimia (NOUN 29, VERB 22), b (NOUN 24, PROPN 1), puolella (NOUN 27, NUM 1), alusta (NOUN 18, VERB 1), asiassa (ADV 21, NOUN 18)

Morphology

The form / lemma ratio of NOUN is 2.006340 (the average of all parts of speech is 2.067974).

The 1st highest number of forms (32) was observed with the lemma “lapsi”: -lasten, Lapsina, lapsella, lapselle, lapselleni, lapsemme, lapsen, lapsena, lapseni, lapsenne, lapsensa, lapsessa, lapsesta, lapset, lapsi, lapsia, lapsiin, lapsille, lapsilleen, lapsillenne, lapsillesi, lapsilta, lapsista, lapsistaan, lapsistani, lasta, lastaan, lastamme, lasten, lasten-, lasteni, lastensa.

The 2nd highest number of forms (31) was observed with the lemma “aika”: Aikamme, aika, aikaa, aikaakaan, aikaan, aikaani, aikaansa, aikana, aikanaan, aikanamme, aikani, aikansa, aikasi, aikoihin, aikoina, aikoinaan, aikoja, aikojen, aikona, ajaksi, ajalla, ajallaan, ajalta, ajan, ajas, ajassa, ajasta, ajastaan, ajat, ajoiksi, ajoista.

The 3rd highest number of forms (31) was observed with the lemma “mies”: Mieskin, mieheen, mieheensä, mieheksi, miehelle, miehelleen, miehellä, mieheltä, mieheltäni, miehen, mieheni, miehenikin, miehensä, miehensäkään, miehenä, miehestä, miehestäni, miehestään, miehet, miehille, miehistä, miehiä, miehiään, mies, mieshän, miesten, miesten-, miestenkin, miestä, miestäni, miestään.

NOUN occurs with 9 features: Case (56392; 100% instances), Number (56356; 100% instances), Derivation (7485; 13% instances), Person[psor] (2661; 5% instances), Number[psor] (952; 2% instances), Abbr (575; 1% instances), Clitic (247; 0% instances), Typo (161; 0% instances), Style (156; 0% instances)

NOUN occurs with 43 feature-value pairs: Abbr=Yes, Case=Abe, Case=Abl, Case=Ade, Case=All, Case=Com, Case=Ela, Case=Ess, Case=Gen, Case=Ill, Case=Ine, Case=Ins, Case=Nom, Case=Par, Case=Tra, Clitic=Han, Clitic=Kaan, Clitic=Kin, Clitic=Ko, Derivation=Inen, Derivation=Inen,Vs, Derivation=Ja, Derivation=Ja,Tar, Derivation=Lainen, Derivation=Lainen,Vs, Derivation=Llinen, Derivation=Llinen,Vs, Derivation=Minen, Derivation=Tar, Derivation=Ton, Derivation=Ton,Vs, Derivation=U, Derivation=Vs, Number=Plur, Number=Sing, Number[psor]=Plur, Number[psor]=Sing, Person[psor]=1, Person[psor]=2, Person[psor]=3, Style=Arch, Style=Coll, Typo=Yes

NOUN occurs with 562 feature combinations. The most frequent feature combination is Case=Gen|Number=Sing (9173 tokens). Examples: vuoden, yhteisön, komission, yliopiston, artiklan, asetuksen, neuvoston, ajan, maan, direktiivin

Relations

NOUN nodes are attached to their parents using 34 different relations: obl (13852; 25% instances), obj (9026; 16% instances), nmod:poss (7170; 13% instances), nsubj (6454; 11% instances), conj (4567; 8% instances), nmod (3917; 7% instances), nsubj:cop (3307; 6% instances), root (2700; 5% instances), nmod:gobj (1415; 3% instances), compound:nn (915; 2% instances), appos (526; 1% instances), flat (517; 1% instances), advcl (416; 1% instances), nmod:gsubj (409; 1% instances), ccomp (213; 0% instances), xcomp:ds (210; 0% instances), orphan (181; 0% instances), acl:relcl (176; 0% instances), xcomp (158; 0% instances), vocative (89; 0% instances), parataxis (85; 0% instances), flat:name (47; 0% instances), amod (20; 0% instances), nsubj:outer (16; 0% instances), flat:foreign (13; 0% instances), csubj:cop (11; 0% instances), case (10; 0% instances), compound (10; 0% instances), dislocated (7; 0% instances), compound:prt (6; 0% instances), csubj (4; 0% instances), fixed (4; 0% instances), discourse (2; 0% instances), acl (1; 0% instances)

Parents of NOUN nodes belong to 14 different parts of speech: VERB (29566; 52% instances), NOUN (18346; 32% instances), (2700; 5% instances), ADJ (2303; 4% instances), PROPN (1348; 2% instances), ADV (836; 1% instances), PRON (810; 1% instances), NUM (405; 1% instances), SYM (64; 0% instances), X (59; 0% instances), ADP (7; 0% instances), AUX (4; 0% instances), CCONJ (3; 0% instances), INTJ (3; 0% instances)

19308 (34%) NOUN nodes are leaves.

22421 (40%) NOUN nodes have one child.

8469 (15%) NOUN nodes have two children.

6256 (11%) NOUN nodes have three or more children.

The highest child degree of a NOUN node is 22.

Children of NOUN nodes are attached using 44 different relations: nmod:poss (10054; 15% instances), amod (9886; 15% instances), punct (5756; 9% instances), conj (4532; 7% instances), nmod (3852; 6% instances), cc (3478; 5% instances), acl (3394; 5% instances), det (3197; 5% instances), nummod (3170; 5% instances), advmod (2820; 4% instances), cop (2425; 4% instances), nsubj:cop (2290; 4% instances), case (2174; 3% instances), acl:relcl (1611; 2% instances), nmod:gobj (1554; 2% instances), appos (974; 1% instances), obl (589; 1% instances), mark (555; 1% instances), nmod:gsubj (551; 1% instances), compound:nn (436; 1% instances), aux (434; 1% instances), advcl (290; 0% instances), orphan (222; 0% instances), parataxis (216; 0% instances), ccomp (141; 0% instances), xcomp:ds (99; 0% instances), cc:preconj (88; 0% instances), cop:own (84; 0% instances), discourse (74; 0% instances), flat (71; 0% instances), csubj:cop (68; 0% instances), flat:name (33; 0% instances), xcomp (30; 0% instances), compound (27; 0% instances), vocative (13; 0% instances), compound:prt (12; 0% instances), obj (10; 0% instances), dislocated (6; 0% instances), goeswith (5; 0% instances), nsubj (3; 0% instances), csubj (2; 0% instances), fixed (2; 0% instances), nsubj:outer (2; 0% instances), flat:foreign (1; 0% instances)

Children of NOUN nodes belong to 15 different parts of speech: NOUN (18346; 28% instances), ADJ (10283; 16% instances), PUNCT (5756; 9% instances), VERB (5725; 9% instances), PRON (5045; 8% instances), PROPN (4211; 6% instances), CCONJ (3534; 5% instances), NUM (3423; 5% instances), ADV (3004; 5% instances), AUX (2947; 5% instances), ADP (2138; 3% instances), SCONJ (548; 1% instances), SYM (157; 0% instances), X (85; 0% instances), INTJ (29; 0% instances)