Specific constructions
Please note: this language-specific overview of guidelines for specific constructions is a work in progress.
Subjects and objects
Finnish subjects and objects are straightforward to recognize in their prototypical cases, but both phenomena also have some difficult cases, which are discussed here.
The subject is the primary complement of the verb, usually denoting the entity doing something. In addition to the basic subject (see ISK §910), also existential subjects (eksistentiaalisubjekti, e-subjekti) are considered subjects in UD Finnish.
Possessive clauses (omistuslause) are considered a subtype of
existential clauses, and analyzed similarly. The owner in possessive
clauses is marked using the type nmod:own
. The haver must be an animate being or a group of animate beings.
Also the genitive subject in for instance necessive structures
is annotated as nsubj
. (This is not to be confused with the
genitive subject of a noun, nmod:gsubj)
In UD Finnish, subjects are allowed to be in the nominative, genitive and partitive cases, and in addition, also an accusative subject is possible (the accusative case only exists for certain pronouns). Two notable situations where a complement in the accusative form is analyzed as the subject are:
- Nonfinite clausal complements (Sain hänet itkemään. “I made him cry.”)
- Possessive clauses (Minulla on sinut. “I have you.”)
The same cases are allowed for objects as for subjects: the nominative, the partitive, the genitive and the accusative. Nominal and adjectival complements (other than predicatives), however, can be in other cases as well.
Object cased amount adverbials (objektin sijainen määrän adverbiaali, OSMA ISK,§972), which, as the name implies, use the same cases as objects, are analyzed as nominal modifiers. However, certain verbs are considered such that they can take as their object an expression that would otherwise be considered an amount adverbial. Examples where an amount is considered the object are for instance:
Examples
- [fi] Juoksin kilometrin. “I ran a kilometer.”
- [fi] Moottori pyöri kymmenen kierrosta. “The motor ran ten rounds.”
- [fi] Maitotölkki painaa kilon. “A milk jar weighs a kilogram.”
Passive verbforms take a direct object and not a passive subject, as in for instance English.
However, there are certain verbs, so called derived passives ISK, §336, which may resemble passive verbforms in meaning, but which in fact take a subject, not an object. (In English, the Finnish derived passives generally correspond to intransitive uses of a verb, such as the door opens, sometimes termed inchoative.).
References
- http://scripta.kotus.fi/visk/sisallys.php?p=910 (in Finnish)
- http://scripta.kotus.fi/visk/sisallys.php?p=972 (in Finnish)
- http://scripta.kotus.fi/visk/sisallys.php?p=336 (in Finnish)
Copulas
This section discusses first defining copular verbs and predicatives, then copulas in combination with auxiliaries, and finally the distinction between the subject and the predicative in copular clauses.
What can be a predicative?
In the UD scheme, the head of a copular clause is the predicative, not the verb (copula), unlike in other clauses. The Finnish language only has one copular verb, olla “to be” ISK §891, and in order to avoid marking other verbs as copular and to prevent copular clauses from having multiple head words, rules are needed to define what is accepted as a predicative.
The basic alternatives for predicatives are nominals (nouns, adjectives, pronouns and numerals). Words of these parts of speech are required to be in nominative, partitive or genitive case to be accepted as predicatives.
Varpunen on pieni lintu . \n Sparrow is small bird(nom.) .
nsubj:cop(lintu-4, Varpunen-1)
cop(lintu-4, on-2)
amod(lintu-4, pieni-3)
punct(lintu-4, .-5)
Nominals in any other case are not marked as predicatives, even if
they are associated with the verb olla. They, similarly to
adpositional phrases, are marked as nominal modifiers (nmod
) in case of modifiers and one of the clausal complement types (xcomp
, xcomp:ds
) in case of complements including secondary predication,
and the verb is marked as the head of the clause, even if it is olla
“to be”.
This restriction is to prevent a clause from having two predicatives and hence two heads, which would be the case in a sentence such as the following:
Examples
- [fi] Paketti on Oulusta ystävältäni. “The package is from Oulu from my friend.”
Here both Oulusta “from Oulu” and ystävältäni “from my friend” could be interpreted as predicatives, resulting in a clause with two heads, or alternatively, a decision between two equally likely head-candidates. Therefore, only nominative, genitive and partitive are allowed as cases for predicatives.
Note that cases not allowed for predicatives include the essive case; this is to avoid marking verbs other than olla as copulas.
In addition to nominals, also adverbs can act as predicatives, given that they do not express location or time. Note that with adverbs, there is no restriction with regard to case, only that they are not locational or temporal. As a result, adverbs such as täällä “here” or huomenna “tomorrow” can not act as predicatives, but others, such as naimisissa “married” (inessive adverb) and raskaana “pregnant” (essive adverb) can, regardless of their case.
In UD Finnish, also a full clause can act as a predicative, in addition to nominals and adverbs. In these cases, the head of the clause acting as the predicative becomes also the head of the main clause. (If the clause acting as the predicative is also a copular clause, this results in the predicative clause seemingly having two copula subjects and copulas. However, this is not how the analysis should be interpreted.)
Copulas and auxiliaries
In the Finnish-specific version of the UD scheme, copular verbs and auxiliaries take no dependents of their own. In cases of two auxiliaries or an auxiliary of a copular verb, all auxiliaries as well as the copular verb are attached to the main predicate or the predicative. The same principle applies also to negation verbs.
The distinction between the predicative and the subject
Distinguishing the subject from the predicative in copular clauses can be difficult, as it would often be possible to invert the word-order and thus swap the positions of the two elements. For instance in the following sentences, either kirahvit “giraffes” or eläimiä “animals” could be the subject and the other the predicative.
Examples
- [fi] Kirahvit ovat mielenkiintoisimpia eläimiä. “Giraffes are the most interesting animals.”
- [fi] Mielenkiintoisimpia eläimiä ovat kirahvit. “The most interesting animals are the giraffes.”
In UD Finnish, the main rule in annotating copular structures is that the leftmost element is the subject and the rightmost one the predicative. Hence, the above sentences would be annotated in the following manner:
Semantic considerations such as which concept is a subconcept of the other are not taken into account in the annotation. However, it is possible to mark the leftmost element the predicative in cases where the word order is clearly inverted. This occurs for instance in (indirect) questions and sometimes relative clauses. Note that especially in questions, several different word orders are possible.
Also, if the leftmost element of the copular clause is an adjective rather than a noun or pronoun, it is considered that the word order is inverted, and thus the adjective is marked as the predicative, not the subject.
References
- http://scripta.kotus.fi/visk/sisallys.php?p=891 (in Finnish)
External subjects
Open clausal complements share their subject with another verb (see
also the documentation for xcomp). The fact that the subject of
the main verb is also the subject of the complement cannot be
annotated using basic dependencies, as this would violate the treeness
restriction. Therefore, in UD Finnish these subjects are marked on
the second layer of annotation (DEPS
field) using the standard
dependency types nsubj and nsubj:cop. Note that an open
clausal complement may not always have a subject, in for instance
passive constructions.
Note that while some related schemes such as SD and TDT differentiate
second-layer (or “additional”) external subject dependencies by
applying a specific type such as xsubj
, nsubj is used on both
the basic and second layer in UD Finnish.
Appositions and appellation modifiers
The Finnish Grammar (see ISK §1059, §1062) distinguishes between three similar phenomena: the apposition, the appellation modifier (nimikemääarite) and the supporting noun (tukisubstantiivi). Out of these, the apposition and the appellation modifier (compound:nn) are distinguished in TDT, and supporting noun structures are considered appositions.
All of these structures have in common that they all include two (usually adjacent) elements, most often noun phrases, which refer to the same entity or entities and have the same function in the sentence. Thus, in order to be considered an apposition, an appellation modifier or a supporting noun structure, a structure has to fulfill the following criteria (the same as in the Finnish grammar §1059):
- Both elements of the structure must refer to the same entity or group of entities.
- Both elements of the structure must have the same function in the sentence (for instance, the subject).
These criteria are interpreted rather loosely, and there are no restrictions on the part of speech of the elements involved. Most appositions (and appellation modifiers) in TDT consist of noun phrases, but there are occurrences of different parts of speech as appositions; notably the fiction section of the treebank contains few examples of verbal appositions.
Among the expressions that fulfill criteria 1 and 2, six common cases can be distinguished according to inflection and punctuation.
- singular, both elements in nominative, no punctuation: professori Matti Tamminen “professor Matti Tamminen”
- singular, first element in nominative, second element inflected: professori Matti Tammisen mukaan “according to professor Matti Tamminen”
- singular, both elements in nominative, punctuation in between: professori, Matti Tamminen “the professor, Matti Tamminen”
- singular, first element inflected, second element in nominative: romaanissa Putkinotko “in the novel Putkinotko”
- singular, both elements inflected: professorin, Matti Tammisen, mukaan “according to the professor, Matti Tamminen”
- plural, elements either in nominative or inflected: professorit Matti Tamminen ja Erkki Koivunen “the professors Matti Tamminen and Erkki Koivunen” or professoreiden, Matti Tammisen ja Erkki Koivusen, mukaan “according to the professors, Matti Tamminen and Erkki Koivunen” or professoreiden Matti Tamminen and Erkki Koivunen mukaan “according to the professors Matti Tamminen and Erkki Koivunen”
Out of these six cases, the first two are considered appellation
modifiers, and thus marked with the dependency type nn
. Note that
the governor of the dependency in appellation modifiers is the latter
of the two words.
The remaining four cases are all considered appositions and marked
with the type appos
. Contrary to appellation modifiers, in
apposition structures the first word is considered the governor.
It should be noted that case number 4 is in fact an example of a supporting noun structure, but in TDT, these are marked as appositions. In plural (case number 6), all possible case combinations are considered appositions.
The only difference between the cases 1 and 3 is the presence or absence of punctuation. Often, said punctuation is a comma, but also parentheses, a dash or a colon are possible. As can be seen from the examples above, the punctuation produces a semantic difference, which is taken into account in the annotation. Punctuation variations of the cases 2, 4, and 5 need not be considered, as these variations are ungrammatical. (Naturally, ungrammatical phenomena can and do occur in a corpus of actual language, but these cases are resolved on a case-by-case basis.)
Examples
- [fi] *professori, Matti Tammisen mukaan “according to professor, Matti Tamminen”
- [fi] *romaanissa, Putkinotko “in the novel, Putkinotko”
- [fi] *professorin Matti Tammisen mukaan “according to the professor’s Matti Tamminen” (unless a possessive reading is intended)
References
- http://scripta.kotus.fi/visk/sisallys.php?p=567 (in Finnish)
- http://scripta.kotus.fi/visk/sisallys.php?p=1059 (in Finnish)
- http://scripta.kotus.fi/visk/sisallys.php?p=1062 (in Finnish)
Verbal dependents: Clauses, non-clauses, complements and modifiers
One particularly challenging task in annotating in the UD Finnish scheme is selecting the correct dependency type for dependents that are verbal. Verbal dependents include different kinds of subordinate clauses as well as infinitive and participial complements and modifiers. A simplified description of the decision procedure for verbal dependents is given in Table 1, and the full details are given below.
TABLE 1 OMITTED
Some basic cases are relatively easy to decide. If the dependent is a
regular subordinate clause, the choices are clear. For relative
clauses the type to be used is acl:relcl
and as indirect
questions are clausal complements, the correct type for them is
ccomp
.
If the subordinate clause is a conjunction clause, it can be either a
complement or a modifier. Complement clauses are marked with ccomp
and modifier ones with
advcl
. In the majority of cases, conjunction
clauses starting with the conjunction että are complements and
clauses starting with any other conjunction are modifiers. However, it
should be noted that the conjunction että can be a used instead of
the conjunction jotta, and respectively, also jotta can
(especially in spoken language) be used instead of että.
Examples
- [fi] Minun täytyy nyt mennä, että en myöhästy. / ~jotta en myöhästy. “I have to go now so that I won’t be late.”
- [fi] Hän sanoi, jotta tulee vasta illalla. / ~että tulee vasta illalla. “He said that I will only come in the evening.”
In these cases, a clause starting with että is a modifier, and a clause starting with jotta is a complement.
If the dependent is not a subordinate clause, the next deciding factor
is the part of speech of the governor. If the governor is a noun, the
dependent can be an infinitive modifier or a
participle modifier, both marked with acl
.
If, in turn, the governor is a verb, then the dependent can be either
a complement or a modifier. With complements, there are three alternative
dependency types available: xcomp
, ccomp
, and
xcomp:ds
.
If the subject of the dependent is shared with the governor (subject
control), the correct type to use is xcomp
. If any other sentence element is inherited from the higher clause (for example a dobj
), the correct type is
xcomp:ds
, and otherwise ccomp
.
Examples
- [fi] Hän alkoi hakata halkoja. “He started chopping the wood.”
- [fi] Sain hänet itkemään. “I made him cry.”
The dependent can also be a participial complement that resembles adjectival complements. The above-mentioned three clausal complement types should be used in these cases as well.
Examples
- [fi] Poika vei kotitehtävän opettajan tarkastettavaksi. “The boy took the homework to be inspected by the teacher.”
If the dependent is not a complement but a modifier, then the correct
dependency type is advcl
. These cases are usually
recognized as lauseenvastike (“substitute of a clause”) or non-complement participles.
Examples
- [fi] Pyyhittyään pölyt hän imuroi. “After dusting, he hoovered.”
- [fi] Huolestuneena seurasin tilanteen kehittymistä. “Worried, I followed the development of the situation.”
References
- http://scripta.kotus.fi/visk/sisallys.php?p=938 (in Finnish)
- http://scripta.kotus.fi/visk/sisallys.php?p=1452 (in Finnish)
Attachment issues: word-order-dependent structures and ambiguity
Occasionally determining the correct head word for a dependency may be difficult. Some structures are inherently ambiguous, and with some structures, often ones involving nominal modifiers, the dependent is most naturally seen to modify different sentence elements depending on the word-order. The following classic example is ambiguous:
Examples
- [fi] Ammuin elefantin pyjamassani. “I shot an elephant in my pajamas.”
In this example, it is possible that the shooting happened while wearing the pajamas, in which case the correct syntax tree would be as follows:
On the other hand, it is also possible that the elephant wore the pajamas, in which case the correct analysis is:
Ambiguities such as this one are resolved as far as possible, and also context is used to determine the correct reading where applicable. That is, if in the context there exists another sentence which makes it clear whether the shooter or the elephant wore the pajamas, then that sentence is used to disambiguate the structure.
If, however, the ambiguity cannot be resolved even given context, or if an element seems to modify two or more elements simultaneously, then the attachment higher in the tree is chosen. In the case of the previous example, this would be the reading in which the shooting happens wearing the pajamas.
In some structures, the most natural analysis may be word order dependent. Consider the following two examples.
Examples
- [fi] Mies ruskeassa takissa tuli junaan. “A man in a brown coat came into the train.”
- [fi] Mies tuli junaan ruskeassa takissa. “A man came into the train in a brown coat.”
In the former example, there is clearly a man in a brown coat, whereas in the latter case, the coming into the train happened while wearing a brown coat. Therefore, the correct analyses for these examples differ in their attachment of the phrase in a brown coat. These attachment rules are akin to those used in the Prague Dependency Treebank.
References
Relative clauses
Relative clauses most often modify noun phrases, but it is also
possible for them to modify a whole clause. From a prescriptive
perspective, the relativizer that should be used in relative clauses
that modify noun phrases is joka, and the relative clause should
always modify the word directly before it. The relativizer that should
be used in relative clauses modifying full clauses is mikä. However,
in real, especially spoken, language, the use of the two relativizers
is mixed, and not every joka clause actually refers to the word
adjacent to it. In UD Finnish, the actual reference for the relative
clause is chosen as the head of the acl:relcl
dependency
wherever possible. For this reason, the head of the acl:relcl
relation can occasionally be a verb.
The relativizer is annotated with the standard syntactic role that it
plays in the relative clause, such as nsubj or dobj. (Note
that this treatment differs from the annotation of relative clauses in
previously proposed related schemes, which used specific dependency
types (e.g. rel
) to mark the relativizer. In particular, in the TDT
corpus the basic dependency layer used rel
and the second annotation
layer identified the actual syntactic role.)
Note also that the dependent of this dependency is always the head of the relative phrase, which may or may not be the relative word itself.
Units, measures and amounts
There are several ways to express amounts. The most simple case is expressing amount with numbers: three apples, sixteen litres.
The semantic head, litraa “litres” in the above example, is selected
as the head, and the number is marked as a numeral modifier, nummod
(Morpho-syntactically, the number kolme “three” could also be
considered the head, as it determines the case used for the word
litra “litre”). For more information on the internal structure of
numerical expressions, see Section 5.12.
Amount can also be expressed with adverbs. This, too, is handled by selecting the semantic head as the head of the structure, that is, the noun.
In addition, amount can be expressed using a nominal, often in expressions such as kuppi kahvia “a cup of coffee” or joku pojista lit. someone from the boys “one of the boys”. In these cases, the first nominal is marked as the head.
These structures are considered different from the amount expressions with numerals or adverbs, as their inflection behaves differently. Consider the following examples.
Examples
- [fi] Kieltäydyin kolmesta donitsista. “I refused three doughnuts.”
- [fi] Kieltäydyin kupista kahvia. “I refused a cup of coffee.”
In the first example, both parts of the amount expression inflect as required by the verb kieltäytyä “to refuse”, whereas in the latter case, only the first nominal inflects, signaling that the head, the thing refused in this expression, is the cup. The structure Joku pojista behaves and is annotated similarly.
Two things should be noted about the above analysis of joku pojista lit. someone from the boys “one of the boys”. First, this analysis leads to yksi pojista “one of the boys” being analyzed similarly to joku pojista rather than yksi poika “one boy”.
Second, this analysis allows a structure like joku pojista to act as a predicative, as the head of the expression is in nominative.
Noun phrases without nouns
In UD Finnish, it is possible for a phrase with a head word other than a noun (or pronoun) to act as a noun phrase. Typical cases of this include adjective-headed and participle-headed noun phrases.
Examples
- [fi] Ikkunan takana oli jotain sinistä. “There was something blue behind the window”.
- [fi] Kukista kaunein oli punainen ruusu. “The most beautiful of the flowers was a red rose.”
- [fi] Kirjaa kirjoittavat sanoivat samaa. “The (ones) writing a book said the same.”
- [fi] Onnettomuudessa olleille suositeltiin terapiaa. “Therapy was recommended for the (ones) been in the accident.”
These structures are analyzed as standard noun phrases. For instance, they can be marked as the subject of a clause, or a nominal modifier, regardless of the part of speech of the head word.
Comparatives and superlatives
This section describes the annotation of comparative and superlative structures, which, in UD Finnish, are considered to include also certain similar structures that do not contain a comparative or superlative wordform.
Comparatives
Structures with comparative adjectives and adverbs may be difficult
to annotate: they are often elliptical, and it may be difficult to
tell what is being compared with what. To annotate comparative constructions, dependency types advcl
and
mark
are used.
The basic usage of these two types is as follows. The comparative
adjective or adverb acts as the head for a advcl
dependency, and the element being compared is its
dependent. The element being compared also acts as the head for a
mark
dependency, the dependent of which is a
comparative conjunction, nearly always kuin.
Note that the comparative adjective or adverb remains the head of
the advcl
dependency even if the word order is such that
the dependency becomes non-projective.
From the previous example it can also be seen that comparative structures are often elliptical in some way. Strictly speaking, the example does not compare Matti and Pekka, but rather their cars, and the car owned by Pekka is not explicitly present in the sentence. As a general rule of thumb, the different kinds of ellipsis present in comparative structures are not marked with null tokens, but rather the available elements are used wherever possible.
It is also possible to make comparisons without the comparative
conjunction kuin. In these cases, only the dependency type
advcl
is used, marking the comparative adjective or
adverb as the head, and the element compared as the dependent, just
as in the case with the comparative conjunction present.
Also some structures not involving a comparative adjective or adverb can be marked as comparatives. In order to qualify as a comparative construction, a structure has to contain either a comparative word form or a word form that otherwise semantically entails comparison, such as samanlainen “similar”, sama “same”, erilainen “different” or eri “differing, separate”. (Note that for example the word sama “same” is in fact a pronoun in Finnish.)
An additional difficulty is posed by the fact that in Finnish, the comparative conjunction kuin can also appear as a subordinating conjunction as well as an adverb. Borderline situations are resolved on a case-by-case basis, considering whether or not there is a comparison involved in the structure and, secondarily, whether the dependent structure is a clause. (Comparative structures can also occasionally be full clauses.)
Superlatives
Superlatives are less problematic than comparatives but deserve some attention nevertheless. The basic case with superlatives is simple: a lone superlative modifying a noun. The superlative form in this case is not marked in any particular way in the syntax annotation, but the structure is annotated similarly to any adjective modifying a noun. The same strategy of not marking the superlative in any particular way is also used in cases where the superlative acts as a predicative.
Often a superlative is modified by nominal in some manner. A very common phenomenon is a genitive modifier modifying a superlative. For instance, in an expression such as
the cook is the best of those in/of Finland and thus the correct head word for the genitive modifier is paras “best”. Similarly, an ordinal number can act as the head of a genitive modifier. For example, in
the championship is the sixth out of those of Virtanen, and thus the genitive modifier should modify the ordinal number.
However, it is still possible for the noun to act as the head word in some cases. For instance, in
the enemy is not the worst of the hare, but rather it is an enemy of the hare, and it is the worst enemy. Thus, the head word should be hare.
As a rule of thumb, if the noun phrase containing the genitive modifier can be turned into a copular clause in the following fashion, then the genitive modifier should modify the superlative or ordinal number.
Examples
- [fi] Kokki on Suomen paras “The cook is the best in Finland”
- [fi] Mestaruus on Virtasen kuudes “The championship is the sixth for Virtanen”
are perfectly valid, but
Examples
- [fi] ?Vihollinen on rusakon pahin ?”The enemy is the worst of the hare”
is questionable at best. Thus, in Suomen paras kokki and Virtasen kuudes mestaruus, the genitive modifier is considered to modify the superlative adjective, but in rusakon pahin vihollinen, it is considered to modify the noun directly.
In this context, it should also be noted that in addition to superlatives, also certain other adjectives can also act as the head of a genitive modifier. These adjectives can be semantically superlative-like viimeinen “last”, but there are also many others, such as oma “own”, kaltainen “-like”, välinen “between (adj.)”, and vastainen “against (adj.)”.
Also other nominal modifiers are possible, expressing the set of beings from which the objects are drawn when making the comparison. These are treated similarly to the genitive modifiers, making the superlative wordform the head of the modifier if the modifier expresses the set of beings to draw from.
Note how in the previous example the phrase kukista kaunein can act as a noun phrase (it is the subject of the clause), even though its head word is an adjective.
Subordinate clauses and expressions of time
Many subordinate clauses, especially ones starting with the conjunction kun “when”, come with an adverbial, usually expressing time. Consider the following examples.
Examples
- [fi] Tulen sinne heti, kun olen imuroinut. “I’ll come there right away, when I have hoovered.”
- [fi] Tapasin hänet sen jälkeen kun olin tullut kaupasta. “I met him after I had come from the store.”
It is often unclear where these time adverbials should be attached. On the one hand, they seem to modify the main clause, expressing when the action of the main clause takes place. On the other hand, they could also modify the subordinate clause, being a part of the time condition given in the subordinate clause. A third option would be to make the time adverbial depend on the subordinating conjunction, becoming either multi-part conjunctions or conjunctions with adverbial modifiers.
In UD Finnish, a very limited number of these cases are considered especially tightly bound with the subordinating conjunction. These cases are considered multi-part subordinating conjunctions and listed as such in the documentation for mark. Otherwise, these adverbials are consistently made dependents of the subordinate conjunctions.
However, it should be noted that all subordinate clauses themselves are not dependents of the main verb. As discussed in the documentation for ccomp, clausal complements can depend on nouns, pronouns or adverbs. Similar situations can occur with subordinate clauses that are modifiers, and they are also analyzed similarly. Most commonly this occurs with the pronoun se “it”.
Subjects and objects of a noun
In Finnish, it is possible for certain nouns which either are direct
derivations of a verb or otherwise have a verb counterpart
(verbivastineellinen substantiivi
ISK §560; in
Finnish) to take a subject- or object-like complement. Both of these
are identical in form to more general genitive modifiers of a noun,
marked with the dependency type nmod:poss
in the UD Finnish
scheme.
Genitive objects of a noun are marked the nmod:gobj
, which is a
subtype for the more general genitive-modifier type nmod:poss
.
Both nominal derivations and other nouns with verb counterparts can
take a genitive object, with the exception of JA- derivations, the
genitive modifier of which is never considered an object in UD Finnish
(talon rakentaja “the builder of the house”).
Genitive subjects, in turn, are marked using the nmod:gsubj
dependency type, also a subtype of nmod:poss
. Only nouns that
are marked as derivations of a verb in the morphological tagging
receive a nmod:gsubj
dependent.
References
- http://scripta.kotus.fi/visk/sisallys.php?p=560 (in Finnish)
Numerical expressions
The dependency type compound
is used for numerical
expressions. Generally, with multi-token numerical expressions, the
rightmost token of the expression is considered the head and the
dependencies are chained.
However, it is possible that rather complex expressions are considered numerical, and in these cases the structure of the expression is also marked, showing the parts of which the expression consists. Often these complex expressions involve dates, which are also considered numerical expressions in UD Finnish.
Dates can be expressed using many different forms, and all full dates are considered numerical expressions in UD Finnish, also those where some or all parts of the date are written with characters. Even partial dates such as
are considered numerical expressions. However, year expressions such as the following are not considered dates in UD Finnish, and thus not complex numerical expressions.
If a date expression has a clear internal syntactic structure, this
structure is annotated instead of the default chain of compound
dependencies.
If a date has a more specific time (such as kello kuudelta “at six o’clock”) attached to it, the date is considered the head of the expression, and the more specific time depends on it. Clock expressions, alone or in conjunction with a date, are not considered dates or numerical expressions in UD Finnish.
In addition to dates, there is one more case of numerical expressions
that deserves attention: numerical expressions with multiple units. If
a single amount expression involves multiple units, the units are
considered a compound unit so to say, and combined using the
dependency type compound:nn
.
In rare cases, however, the previous situation may occur with the rightmost part of the expression lacking the unit. These cases are annotated flatly as numerical expressions, with no compound units.
Participial modifiers and predicatives
In connection with participial modifiers, predicatives are given a slightly different treatment than in other contexts. In a regular copular clause, the analysis is as follows.
However, if the same analysis were applied in a situation where olla acts as a participial modifier, this would result in a non-tree structure:
Therefore, in conjunction with participial modifiers, copular verbs are analyzed similarly to regular verbs, in order to avoid non-tree structures.
The same rule is applied to certain special constructions that are normally considered passive structures but can also appear in conjunction with participial modifiers. Here the application of the rule results in two chained participial modifiers.
Necessive structures and clausal subjects
A clause can act as a subject to another clause (as well as an object,
but these are marked as clausal complements, ccomp
), in which
case it should be marked as a clausal subject, csubj
, or, if the
main clause is copular, a clausal copular subject,
csubj:cop
. However, in the case of clausal-copular subject, it
may be difficult to determine whether a clause is, in fact, the
subject of another clause, as the construct is similar to that of a
necessive structure. Consider the following example.
Examples
- [fi] On tärkeää syödä hyvin. “It is important to eat well.”
At first glance, it seems that the clause syödä hyvin is the subject of on tärkeää. However, in UD Finnish, this is not considered a clausal subject. Instead, it is considered a necessive structure, as on tärkeää can be given a subject in the genitive form:
Examples
- [fi] Hänen on tärkeää syödä hyvin. “It is important for him to eat well.”
The whole structure is considered a single unit, and the genitive subject is considered the subject of the latter verb (which expresses what it is that is necessary).
The name necessive structure comes from the fact that these structures often express the necessity of doing something, but it does not mean that all of these structures would have such a meaning; for example, on vaikea(a) “it is difficult” is a necessive structure the meaning of which does not express necessity. Common necessive structures include expressions such as on pakko, on tärkeää, on oleellista and on välttämätöntä. They usually, but not always, involve the verb olla and an adjective. There are also some verbs, such as kannattaa “be worth it” and kuulua “be supposed to”, that are analyzed in a necessive manner.
FIGURE MISSING
If it is not possible to insert a genitive subject into the clause, then the structure is considered a clausal subject case.
Examples
- [fi] *Hänen on mahtavaa käydä ulkona. “It is splendid for him to go out.” (the Finnish sentence is ungrammatical.)
Note that due to the copular nature of the main clause, the clausal subjects in these clauses which resemble necessive structures are in fact clausal copular subjects. There are also other clausal subjects which cannot be confused with necessive structures.
Passive structures and zeroth person constructions
The Finnish language has two notable cases of subjectless expressions: the passive voice and the zeroth person. In most cases, distinguishing these two is rather simple, as the zeroth person uses the same verb forms as the third person, whereas there is a morphological passive form that is used in constructions considered passive. However, there are at least two particular phenomena that deserve special attention. First, the on tehtävä -structure is worth examining:
Examples
- [fi] Tämä työ on tehtävä tänään. “This work has to be done today.”
The form tehtävä is morphologically a passive participle of the verb tehdä “to do”. Still, on tehtävä can take a subject, which could perhaps point towards to the subjectless version being zeroth person after all.
Examples
- [fi] Matin on tehtävä työ tänään. “Matti has to do the work today.”
In UD Finnish, we use the presence or absence of a subject as a cue to whether the structure is passive or not. If a subject is present, the structure is marked as an active construction, and if not, it is assumed to be passive.
Second, the on tehtävissä structure deserves a mention. Similarly to tehtävä, tehtävissä is a passive verb participle - in fact, the difference between the two forms is only that tehtävissä is the plural inessive form of the base participle tehtävä. The annotation of on tehtävissä follows a strategy similar to the previous one. In general, it is assumed that the structure is passive.
FIGURE MISSING
Unlike on tehtävä, on tehtävissä cannot take a genitive form subject:
Examples
- [fi] *Minun on tehtävissä tämä. “*I this is doable.”
However, in some cases it is possible to attach a possessive suffix to the participle and use a corresponding personal pronoun as a nominal modifier (this is a rare phenomenon and not seen with many verbs). This case is analyzed as an active structure.
FIGURE MISSING
However, as can be seen from the example, no subject is marked, but rather an object. It is still understood that means are the object of using in this example.
Morphological distinctions
Distinctions between certain dependency types, most commonly between
participial modifiers (acl
) and adjectival modifiers
(amod
) as well as adverbial modifiers (advmod
) and nominal
modifiers (nmod
), are based on the corresponding morphological
distinction, which can sometimes be rather difficult. This section
describes heuristics used to make these two most common
morphology-based distinctions. Some of these heuristics resemble those
used in the
Penn Treebank.
Participles versus adjectives
The distinction between verb participles and adjectives is difficult
in several languages, and Finnish is no exception. In UD Finnish, this
distinction affects the syntax annotation of mainly two kinds of
structures. First, it affects the choice between the dependency types
acl
(participial modifier) and amod
(adjectival
modifier).
Second, it affects whether certain structures should be marked as copular clauses, or alternatively, as passive clauses in the present or past perfect form (perfekti and pluskvamperfekti in Finnish grammar). The same structure can be considered copular if the head word is an adjective, or a passive clause if the head word is considered a passive participle.
Some words have several possible readings, and it is fairly common that a word can be given either a participial reading or an adjectival one. The following heuristics are used when deciding whether a word is an adjective or a participle.
If a word can receive comparative and superlative forms, it is likely to be an adjective. For instance, the word tunnettu “well-known”, which has both and adjectival and a participial reading, inflects in these forms: tunnettu, tunnetumpi, tunnetuin.
If, on the other hand, the word is modified by for instance a nominal or adverbial modifier, it is likely to be a verb participle. For instance, with the word tunnettu, the following contexts would be possible:
Examples
- [fi] laajalti tunnettu näyttelijä “widely known actor”
- [fi] kalliista autoistaan tunnettu näyttelijä “actor known for his expensive cars”
Thus, it is the case that the same word can act both as an adjective and as a verbal participle, depending on context, and the decisions are made on a case-by-case basis. As a third heuristic used in the decision, the annotators are asked to consider whether someone is actively doing something in the example under consideration. If so, then the word is likely a verbal participle, otherwise it is an adjective. Consider the following examples:
Examples
- [fi] Maijan tuleva aviomies lit. Maija’s coming husband “Maija’s future husband”
- [fi] Maijan Turusta tuleva aviomies “Maija’s husband coming from Turku”
In the first example, the husband is not actively doing anything, he simply is going to be Maija’s husband in the future. Thus tuleva in this example would be considered an adjective. In the second example, he is actively coming from the direction of Turku, and thus tuleva here would be a verbal participle.
As a rule of thumb, if an adjectival reading is possible in a given context, it is generally preferred. For instance, in tunnettu näyttelijä “well-known actor”, if it was not specified a a by whom or for what the actor is known, it would be assumed that the adjectival reading is intended. Similarly, in uiminen on kielletty “swimming is forbidden”, if the context does not reveal that there has been active forbidding of the swimming (the example is genuinely ambiguous), then it is assumed that it is a property of the swimming that it is forbidden.
Adverbs versus nouns
Due to the fact that certain Finnish adverbs have a partial case inflection, it is sometimes difficult to decide whether a word is an inflected form of a noun (or adjective), or rather an adverb. For instance, the word pääasiassa “mainly” could be analyzed as an adverb, or alternatively, as an inflected form of the noun pääasia “the main thing”.
This distinction affects the choice between the dependency types
advmod
(adverb modifier) and nmod
(nominal modifier).
Additionally, it can affect the choice of whether a word can be marked
as a predicative (if it is an adverb) and thus head of the clause, or
if it should me marked as a nominal modifier for the verb olla. In the
latter case, the structure of the whole clause is affected by the
decision.
Again, the main source of information while annotating is the morphological analysis of the word, but occasionally it is possible that the syntactic annotation uses a reading that has been omitted. It is less common that both an adverb and noun reading would be available. Decision heuristics are needed here as well.
The main deciding factor between a noun and an adverb reading is whether there exists a corresponding noun in its baseform and whether and to what degree the word under question is related to that noun. For example, in the case of pääasiassa “mainly” there exists a corresponding noun pääasia “main thing”, but in the case of naimisissa “married” the only candidate for such a noun would be naiminen, which could technically be translated as “marrying”, but is in fact more often used (usually in spoken language) in the meaning “having sex”. As for humalassa “drunk”, there is a candidate noun, humala, which can be used to refer to the state of being drunk.
As a test used to see whether the possible candidate noun is closely (enough) related to the word under question, annotators are asked to reflect on the hypothetical baseform of the noun reading and on whether it could be imagined to be involved in the current sentence. For instance, is there a main thing (pääasia) in which the interest rate is affected? Is there a state of being married (“naimiset”) in which Elisa and Elias are? Is there a state of being drunk (humala) in which Matti is? The answer to the first two questions is no, and thus pääasiassa and naimisissa are considered adverbs. The answer to the third question, however, is yes, and therefore the word humalassa is analyzed as an inflected form of the noun humala.
References
- Marcus et al. 1993 Building a Large Annotated Corpus of English: The Penn Treebank Computational Linguistics 19(2):313–330.
Attaching punctuation
Dependencies signaling punctuation are labeled with the dependency
type punct
, and the main rule is that the dependency should be
attached to that element which it delimits. Thus, sentence-delimiting
punctuation, such as “.”, “!” or “?” should be attached to the main
verb (or predicative) of the sentence.
According to the same rule, the comma delimiting a subordinate clause should be attached to the head word of said clause.
If there are several subordinate clauses within each other and the punctuation could delimit any of them, the shortest-spanning (closest) clause is selected.
In coordinations, the punctuation symbols (usually commas) are treated similarly to the coordinating conjunction and attached to the head of the coordination, which is the first coordinated element.
Punctuation related to coordination-like parataxis, that is, parataxis used in connection with a semicolon, colon or dash, is attached as in coordinations.
Punctuation with direct speech -type parataxis is attached to the first element.
Single and double quotes as well as parentheses are attached to the head of the quoted/parenthetical clause or phrase. Dashes signifying quotes are also attached to the head of the quote.
If the quotes or parentheses contain two or more items, such as parts of a coordination, then the punctuation is attached to the closest enclosed element, so as to avoid unnecessary non-projectivity.
Punctuation can also delimit short additions, such as nominal modifiers or appositions, and in such cases, the punctuation should be attached to the head of the addition.
Finally, list item markers such as bullets of a bulleted list are marked as punctuation attached to the head of the list item.