home ga/pos edit page issue tracker

PROPN: proper noun

Description

A proper noun is a noun that is the name of a specific individual, place, object or organisation. In Irish, proper nouns always have initial capitalisation.

Personal names are treated as a sequence of proper nouns. Note that some Irish names have name particles, such as Mac, Ó, , etc., that form part of this sequence (e.g. Anne-Marie Nic Dhonncha).

Similarly, placenames can occur as a string of proper nouns (e.g. Baile Átha Cliath “Dublin”), as can organisations (e.g. an iris Irish Computer “the Irish Computer magazine”). Sometimes these strings can have an internal structure containing other parts of speech such as determiners, for example (Parlaimint na hEorpa “the European Parliament”).

When initial mutation occurs with proper nouns in Irish, the inflection is lowercase, while the main form retains the initial capitalisaion (e.g. i mBaile Átha Cliath “in Dublin”). Similarly, some titles can have lower-case prefixes (e.g. an t-iar-Ghobharnóir “the former Governor”).

Note that days of the week and months of the year in Irish, while capitialised, are not marked as proper nouns but common nouns instead.

Examples


Treebank Statistics (UD_Irish)

There are 618 PROPN lemmas (15%), 659 PROPN types (11%) and 905 PROPN tokens (4%). Out of 16 observed tags, the rank of PROPN is: 2 in number of lemmas, 2 in number of types and 9 in number of tokens.

The 10 most frequent PROPN lemmas: Gaeilge, Éire, Baile, Átha_Cliath, Seán, Máire, Pádraig, Frainc, Eoraip, Gaillimh

The 10 most frequent PROPN types: Gaeilge, Átha_Cliath, Bhaile, Seán, hÉireann, mBaile, Éirinn, Ghaeilge, Mháire, Fraince

The 10 most frequent ambiguous lemmas: Éireannach (PROPN 4, ADJ 2), Eaglais (PROPN 3, NOUN 1), Eorpach (PROPN 3, ADJ 1), Gearmáin (PROPN 3, NOUN 1), Bean (PROPN 2, NOUN 1), Béal_Feirste (NOUN 2, PROPN 2), Muire (PROPN 2, NOUN 1), Mór (PROPN 2, ADJ 2), Rua (PROPN 2, ADJ 1), (NOUN 13, PROPN 2)

The 10 most frequent ambiguous types: Eaglais (PROPN 3, NOUN 1), Bean (PROPN 2, NOUN 1), Bhéal_Feirste (PROPN 2, NOUN 1), Rua (PROPN 2, ADJ 1), The (PROPN 2, X 1), (NOUN 4, PROPN 1), Cailín (NOUN 1, PROPN 1), Duine (NOUN 1, PROPN 1), Iúil (NOUN 1, PROPN 1), Muire (NOUN 1, PROPN 1)

Morphology

The form / lemma ratio of PROPN is 1.066343 (the average of all parts of speech is 1.449988).

The 1st highest number of forms (6) was observed with the lemma “Éire”: hÉire, hÉireann, hÉirinn, Éire, Éireann, Éirinn.

The 2nd highest number of forms (4) was observed with the lemma “Gaeilge”: GHAEILGE, Gaeilge, Ghaeilge, nGaeilge.

The 3rd highest number of forms (4) was observed with the lemma “Gaillimh”: Gaillimh, Gaillimhe, Ghaillimh, nGaillimh.

PROPN occurs with 6 features: ga-feat/Gender (887; 98% instances), ga-feat/Case (877; 97% instances), ga-feat/Number (848; 94% instances), ga-feat/Form (121; 13% instances), ga-feat/Definite (45; 5% instances), ga-feat/NounType (4; 0% instances)

PROPN occurs with 14 feature-value pairs: Case=Com, Case=Dat, Case=Gen, Case=Voc, Definite=Def, Form=Ecl, Form=HPref, Form=Len, Gender=Fem, Gender=Masc, NounType=Strong, NounType=Weak, Number=Plur, Number=Sing

PROPN occurs with 35 feature combinations. The most frequent feature combination is Case=Com|Gender=Masc|Number=Sing (519 tokens). Examples: Seán, Chorcaí, Fianna_Fáil, John, Pádraig, Baile, Briain, Dochartaigh, Eoin, Euro

Relations

PROPN nodes are attached to their parents using 16 different relations: ga-dep/compound (314; 35% instances), ga-dep/nmod (218; 24% instances), ga-dep/nsubj (117; 13% instances), ga-dep/name (101; 11% instances), ga-dep/conj (49; 5% instances), ga-dep/root (33; 4% instances), ga-dep/appos (29; 3% instances), ga-dep/dobj (16; 2% instances), ga-dep/vocative (12; 1% instances), ga-dep/xcomp:pred (8; 1% instances), ga-dep/advmod (2; 0% instances), ga-dep/amod (2; 0% instances), ga-dep/advcl (1; 0% instances), ga-dep/case (1; 0% instances), ga-dep/ccomp (1; 0% instances), ga-dep/det (1; 0% instances)

Parents of PROPN nodes belong to 12 different parts of speech: NOUN (360; 40% instances), PROPN (250; 28% instances), VERB (216; 24% instances), ROOT (33; 4% instances), ADJ (17; 2% instances), ADP (11; 1% instances), X (9; 1% instances), PRON (4; 0% instances), SCONJ (2; 0% instances), CONJ (1; 0% instances), NUM (1; 0% instances), PUNCT (1; 0% instances)

368 (41%) PROPN nodes are leaves.

262 (29%) PROPN nodes have one child.

147 (16%) PROPN nodes have two children.

128 (14%) PROPN nodes have three or more children.

The highest child degree of a PROPN node is 9.

Children of PROPN nodes are attached using 28 different relations: ga-dep/case (186; 18% instances), ga-dep/name (153; 15% instances), ga-dep/compound (152; 15% instances), ga-dep/punct (139; 14% instances), ga-dep/det (126; 12% instances), ga-dep/conj (49; 5% instances), ga-dep/nmod (44; 4% instances), ga-dep/cc (33; 3% instances), ga-dep/appos (24; 2% instances), ga-dep/amod (21; 2% instances), ga-dep/nsubj (11; 1% instances), ga-dep/nummod (11; 1% instances), ga-dep/case:voc (10; 1% instances), ga-dep/cop (10; 1% instances), ga-dep/advmod (8; 1% instances), ga-dep/csubj:cleft (7; 1% instances), ga-dep/ccomp (6; 1% instances), ga-dep/mark (6; 1% instances), ga-dep/xcomp (6; 1% instances), ga-dep/acl:relcl (4; 0% instances), ga-dep/advcl (4; 0% instances), ga-dep/dobj (3; 0% instances), ga-dep/nmod:prep (3; 0% instances), ga-dep/xcomp:pred (3; 0% instances), ga-dep/parataxis (2; 0% instances), ga-dep/mark:prt (1; 0% instances), ga-dep/nmod:tmod (1; 0% instances), ga-dep/vocative (1; 0% instances)

Children of PROPN nodes belong to 14 different parts of speech: PROPN (250; 24% instances), ADP (193; 19% instances), PUNCT (139; 14% instances), DET (124; 12% instances), NOUN (114; 11% instances), PART (63; 6% instances), CONJ (37; 4% instances), VERB (28; 3% instances), ADJ (24; 2% instances), PRON (19; 2% instances), NUM (10; 1% instances), X (10; 1% instances), ADV (9; 1% instances), SCONJ (4; 0% instances)


PROPN in other languages: [bg] [cs] [de] [el] [en] [es] [eu] [fa] [fi] [fr] [ga] [he] [hu] [it] [ja] [ko] [sv] [u]