home edit page issue tracker

Layered universal features

In some languages, some features are marked more than once on the same word. We say that there are several layers of the feature. The exact meaning of individual layers is language-dependent.

For example, possessive adjectives, determiners and pronouns may have two different values of u-feat/Gender and two of u-feat/Number. One of the values is determined by agreement with the modified (possessed) noun. This is parallel to other (non-possessive) adjectives and determiners that agree in gender and number with the nouns they modify. The other value is determined lexically because it is a property of the possessor. The following table shows that English distinguishes only the possessor’s gender and number; Hindi distinguishes gender in agreement and number both in agreement and of the possessor (there is no neuter gender in Hindi); German distinguishes both features in both dimensions (more differences would be seen if we also showed German dative and accusative forms, not just nominatives).

Possessor / Agreement   Sing Masc Sing Fem Sing Neut Plur Masc Plur Fem
Sing Masc [en]
[de]
[hi]
his son
sein Sohn
usakā bēṭā
his daughter
seine Tochter
usakī bēṭī
his house
sein Haus
 
his sons
seine Söhne
usakē bēṭē
his daughters
seine Töchter
usakī bēṭiyām̐
Sing Fem [en]
[de]
[hi]
her son
ihr Sohn
usakā bēṭā
her daughter
ihre Tochter
usakī bēṭī
her house
ihr Haus
 
her sons
ihre Söhne
usakē bēṭē
her daughters
ihre Töchter
usakī bēṭiyām̐
Sing Neut [en]
[de]
 
its son
sein Sohn
 
its daughter
seine Tochter
 
its house
sein Haus
 
its sons
seine Söhne
 
its daughters
seine Töchter
 
Plur [en]
[de]
[hi]
their son
ihr Sohn
unakā bēṭā
their daughter
ihre Tochter
unakī bēṭī
their house
ihr Haus
 
their sons
ihre Söhne
unakē bēṭē
their daughters
ihre Töchter
unakī bēṭiyām̐

If a feature is (can be) layered in a language, the name of the feature must indicate the layer. An additional identifier in square brackets is used to distinguish layers, e.g. Gender[psor] for the possessor’s gender. We recommend that the layer identifiers consist of lowercase English letters [a-z] and/or digits [0-9]. The layers, their meaning and their identifiers must be defined in a language-specific extension to this documentation. For each layered feature, one layer may be defined as default and the corresponding features then appear without identifier, e.g. Gender=Masc|Gender[psor]=Fem.

In the following, we list some examples of layered features attested in existing corpora. These may be used as inspiration or they may be used as-is in treebanks for which they are found appropriate. Note that even if a treebank uses a layered feature from this section, it should still be described in the language-specific documentation.

Gender[psor]

Possessive adjectives and pronouns may have two different genders: that of the possessed object (gender agreement with modified noun) and that of the possessor (lexical feature, inherent gender).

The Gender[psor] feature captures the possessor’s gender.

In the Czech examples below, the masculine Gender[psor] implies using one of the suffixes -ův, -ova, -ovo, and the feminine Gender[psor] implies using one of -in, -ina, -ino.

Masc: masculine possessor

Examples: [cs] otcův syn (father’s son; Gender=Masc|Gender[psor]=Masc); otcova dcera (father’s daughter; Gender=Fem|Gender[psor]=Masc); otcovo dítě (father’s child; Gender=Neut|Gender[psor]=Masc).

Fem: feminine possessor

Examples: [cs] matčin syn (mother’s son; Gender=Masc|Gender[psor]=Fem); matčina dcera (mother’s daughter; Gender=Fem|Gender[psor]=Fem); matčino dítě (mother’s child; Gender=Neut|Gender[psor]=Fem).

In other languages (Hebrew, Arabic), the possessor’s gender and number are agreement rather than lexical features:

Examples: [he] HKPH FL HARC (perimeter of country). Features of the two nouns are as follows: perimeter.Gender=Masc|Gender[psor]=Fem|Number=Sing|Number[psor]=Sing country.Definite=Def|Gender=Fem|Number=Sing.

The [psor] features of perimeter are dictated by agreement with the possessor, country.

(This is a partial description of this example. HKPH has many morphological analyses, some of them are masculine single-layered, some of them are feminine single-layered. You can only find the right morphosyntactic analysis if you detect the two layers of agreement features, and can identify this specific agreement pattern.)

Number[psor]

Possessives may have two different numbers: that of the possessed object (number agreement with modified noun) and that of the possessor. The Number[psor] feature captures the possessor’s number.

Sing: singular possessor

Examples: [en] my, his, her, its; [cs] můj pes (my dog; Number=Sing|Number[psor]=Sing); psi (my dogs; Number=Plur|Number[psor]=Sing).

Plur: plural possessor

Examples: [en] our, their; [cs] náš pes (our dog; Number=Sing|Number[psor]=Plur); naši psi (our dogs; Number=Plur|Number[psor]=Plur).

Person[psor]

The possessor’s person is marked e.g. on Hungarian nouns. These noun forms would be translated to English as possessive pronoun + noun.

Note that it is reasonable to make this a layered feature even though the default Person is normally not marked on nouns. In relation to verbs (which may have to mark person agreement with nouns), a noun is almost always in the third person. So even if this default person is not explicitly marked morphologically, and probably the default Person does not appear among features of the noun, we should not use the default layer of persons to mark the possessor. If we abused the default layer, the annotation would no longer be parallel to personal pronouns that could be substituted for the noun.

On the other hand, we probably do not want a separate [psor] layer for the person of possessive determiners / pronouns. They modify a noun, not a verb. Arguably they have only one Person feature and it is lexical (while for the Hungarian nouns, Person[psor] is inflectional). They usually modify nouns, not verbs, and agreement with verbs does not play any role. Moreover, in some languages possessive pronouns are actually identical to personal pronouns in the genitive case and it is logical that they have the same Person as in the nominative.

1: first person possessor

Examples: [hu] kutya = dog; kutyám = my dog; kutyánk = our dog.

2: second person possessor

Examples: [hu] kutya = dog; kutyád = your.Sing dog; kutyátok = your.Plur dog.

3: third person possessor

Examples: [hu] kutya = dog; kutyája = his/her/its dog; kutyájuk = their dog.

János csontja
lit. John his-bone
John’s bone

János csontjai
lit. John his-bones
John’s bones

Péternek sok pénze van.
lit. to-Peter much his-money there-is
Peter has a lot of money.

Number[psee]

This feature seems to be very specific to Hungarian. It denotes the possessee’s (possessed, owned noun phrase’s) number. Hungarian has three types of number in the nominal inflection:

Examples from the Multext-East Hungarian lexicon:

Words marked for plural possessions are very rare, though. Note that in the following example from Multext-East, Columbus is marked for plural possession, but not for his own owner.

See also Éva Dékány (2014): The syntax of anaphoric possessives in Hungarian: In anaphoric possessives the possessed noun, the head of the whole nominal phrase, is not pronounced, and its reference has to be recovered from the context. The possessor in Hungarian anaphoric possessives has to bear the suffix.

Since Number[psee]=Plur is extremely rare, this feature is not so important for distinguishing singular and plural possessions. However, the mere presence of Number[psee]=Sing informs that there is the suffix and thus that there is an unpronounced possession.

Layered verb agreement in Basque

Verbs in many Indo-European languages must agree in person and number with their subject. This is what typically u-feat/Person and u-feat/Number of verbs denote.

Some verbs in Basque must agree in person and number with up to three arguments: the absolutive argument (subject of intransitive verbs and object of transitive verbs), the ergative argument (subject of transitive verbs) and the dative argument (indirect object).

We could make the absolutive agreement the default, thus using Person and Number without layer identifiers. If there is also one of the other two arguments, we will have Person[erg], Number[erg] and Person[dat], Number[dat], respectively.

Example: nahi dizkiegu, lemma = nahi_izan, feats = Number=Plur|Number[dat]=Plur|Number[erg]=Plur|Person=3|Person[dat]=3|Person[erg]=1 (we want them to them).