Please note: this language-specific overview of guidelines for specific constructions is a work in progress.
Subjects and objects
Finnish subjects and objects are straightforward to recognize in their prototypical cases, but both phenomena also have some difficult cases, which are discussed here.
The subject is the primary complement of the verb, usually denoting the entity doing something. In addition to the basic subject (see ISK §910), also existential subjects (eksistentiaalisubjekti, e-subjekti) are considered subjects in UD Finnish.
Tien vieressä on talo . \n Road beside is house . case(Tien-1, vieressä-2) nmod(on-3, Tien-1) nsubj(on-3, talo-4) punct(on-3, .-5)
Possessive clauses (omistuslause) are considered a subtype of
existential clauses, and analyzed similarly. The owner in possessive
clauses is marked using the type
nmod:own. The haver must be an animate being or a group of animate beings.
Hänellä on oma asunto . \n At_him is own apartment . nmod:own(on-2, Hänellä-1) nsubj(on-2, asunto-4) amod(asunto-4, oma-3) punct(on-2, .-5)
Minun on pakko mennä kotiin . \n I(gen.) is obligation go home . nsubj(mennä-4, Minun-1) cop(pakko-3, on-2) xcomp:ds(pakko-3, mennä-4) nmod(mennä-4, kotiin-5) punct(pakko-3, .-6)
In UD Finnish, subjects are allowed to be in the nominative, genitive and partitive cases, and in addition, also an accusative subject is possible (the accusative case only exists for certain pronouns). Two notable situations where a complement in the accusative form is analyzed as the subject are:
- Nonfinite clausal complements (Sain hänet itkemään. “I made him cry.”)
- Possessive clauses (Minulla on sinut. “I have you.”)
The same cases are allowed for objects as for subjects: the nominative, the partitive, the genitive and the accusative. Nominal and adjectival complements (other than predicatives), however, can be in other cases as well.
Object cased amount adverbials (objektin sijainen määrän adverbiaali, OSMA ISK,§972), which, as the name implies, use the same cases as objects, are analyzed as nominal modifiers. However, certain verbs are considered such that they can take as their object an expression that would otherwise be considered an amount adverbial. Examples where an amount is considered the object are for instance:
- [fi] Juoksin kilometrin. “I ran a kilometer.”
- [fi] Moottori pyöri kymmenen kierrosta. “The motor ran ten rounds.”
- [fi] Maitotölkki painaa kilon. “A milk jar weighs a kilogram.”
Passive verbforms take a direct object and not a passive subject, as in for instance English.
Oppitunti valmisteltiin huolellisesti . \n Lesson was_prepared carefully . dobj(valmisteltiin, Oppitunti) advmod(valmisteltiin, huolellisesti) punct(valmisteltiin, .)
However, there are certain verbs, so called derived passives ISK, §336, which may resemble passive verbforms in meaning, but which in fact take a subject, not an object. (In English, the Finnish derived passives generally correspond to intransitive uses of a verb, such as the door opens, sometimes termed inchoative.).
Minä avasin oven . \n I opened the_door . nsubj(avasin-2, Minä-1) dobj(avasin-2, oven-3) punct(avasin-2, .-4)
Ovi aukeaa . \n The_door opens . nsubj(aukeaa-2, Ovi-1) punct(aukeaa-2, .-3)
- http://scripta.kotus.fi/visk/sisallys.php?p=910 (in Finnish)
- http://scripta.kotus.fi/visk/sisallys.php?p=972 (in Finnish)
- http://scripta.kotus.fi/visk/sisallys.php?p=336 (in Finnish)
This section discusses first defining copular verbs and predicatives, then copulas in combination with auxiliaries, and finally the distinction between the subject and the predicative in copular clauses.
What can be a predicative?
In the UD scheme, the head of a copular clause is the predicative, not the verb (copula), unlike in other clauses. The Finnish language only has one copular verb, olla “to be” ISK §891, and in order to avoid marking other verbs as copular and to prevent copular clauses from having multiple head words, rules are needed to define what is accepted as a predicative.
The basic alternatives for predicatives are nominals (nouns, adjectives, pronouns and numerals). Words of these parts of speech are required to be in nominative, partitive or genitive case to be accepted as predicatives.
Varpunen on pieni lintu . \n Sparrow is small bird(nom.) . nsubj:cop(lintu-4, Varpunen-1) cop(lintu-4, on-2) amod(lintu-4, pieni-3) punct(lintu-4, .-5)
Maali oli valkoista . \n Paint was white(part.) . nsubj:cop(valkoista-3, Maali-1) cop(valkoista-3, oli-2) punct(valkoista-3, .-4)
Tämä kirja on minun . \n This book is mine(gen.) . det(kirja-2, Tämä-1) nsubj:cop(minun-4, kirja-2) cop(minun-4, on-3) punct(minun-4, .-5)
Nominals in any other case are not marked as predicatives, even if
they are associated with the verb olla. They, similarly to
adpositional phrases, are marked as nominal modifiers (
nmod) in case of modifiers and one of the clausal complement types (
xcomp:ds) in case of complements including secondary predication,
and the verb is marked as the head of the clause, even if it is olla
Lapset olivat pihalla . \n Children were on_yard . nsubj(olivat-2, Lapset-1) nmod(olivat-2, pihalla-3) punct(olivat-2, .-4)
Lapset olivat talon takana . \n Children were behind house . nsubj(olivat-2, Lapset-1) nmod(olivat-2, talon-3) case(talon-3, takana-4) punct(olivat-2, .-5)
This restriction is to prevent a clause from having two predicatives and hence two heads, which would be the case in a sentence such as the following:
- [fi] Paketti on Oulusta ystävältäni. “The package is from Oulu from my friend.”
Here both Oulusta “from Oulu” and ystävältäni “from my friend” could be interpreted as predicatives, resulting in a clause with two heads, or alternatively, a decision between two equally likely head-candidates. Therefore, only nominative, genitive and partitive are allowed as cases for predicatives.
Note that cases not allowed for predicatives include the essive case; this is to avoid marking verbs other than olla as copulas.
Mies oli portsarina baarissa . \n Man was doorman(essive) in_bar . nsubj(oli-2, Mies-1) nmod(oli-2, portsarina-3) nmod(oli-2, baarissa-4) punct(oli-2, .-5)
Mies toimi portsarina baarissa . \n Mand worked doorman(essive) in_bar . nsubj(toimi-2, Mies-1) nmod(toimi-2, portsarina-3) nmod(toimi-2, baarissa-4) punct(toimi-2, .-5)
In addition to nominals, also adverbs can act as predicatives, given that they do not express location or time. Note that with adverbs, there is no restriction with regard to case, only that they are not locational or temporal. As a result, adverbs such as täällä “here” or huomenna “tomorrow” can not act as predicatives, but others, such as naimisissa “married” (inessive adverb) and raskaana “pregnant” (essive adverb) can, regardless of their case.
In UD Finnish, also a full clause can act as a predicative, in addition to nominals and adverbs. In these cases, the head of the clause acting as the predicative becomes also the head of the main clause. (If the clause acting as the predicative is also a copular clause, this results in the predicative clause seemingly having two copula subjects and copulas. However, this is not how the analysis should be interpreted.)
Tarkoitus on järjestää lopuksi juhlat . \n The_meaning is to_arrange in_the_end a_party . nsubj:cop(järjestää-3, Tarkoitus-1) cop(järjestää-3, on-2) dobj(järjestää-3, juhlat-5) advmod(järjestää-3, lopuksi-4) punct(järjestää-3, .-6)
In FinnTreeBank (FI_FTB), in addition to copular clauses, also state clauses and result clauses (ISK § 891) contain predicatives. This results in a larger group of verbs accepted as copular verbs, e.g. tulla “to become”, muuttua “to turn” and tehdä “to make”. (See FinnTreeBank Annotation Manual: 16.9 Predicative.)
In FI_FTB, none of the adverbs can act as predicatives (e.g. naimisissa “married” or raskaana “pregnant”).
Copulas and auxiliaries
In the Finnish-specific version of the UD scheme, copular verbs and auxiliaries take no dependents of their own. In cases of two auxiliaries or an auxiliary of a copular verb, all auxiliaries as well as the copular verb are attached to the main predicate or the predicative. The same principle applies also to negation verbs.
Hänkin on joskus ollut nuori . \n He_too has some_time been young . nsubj:cop(nuori-5, Hänkin-1) aux(nuori-5, on-2) advmod(nuori-5, joskus-3) cop(nuori-5, ollut-4) punct(nuori-5, .-6)
Minun ei ehkä olisi pitänyt sanoa niin . \n I not maybe have should said so . nsubj(sanoa-6, Minun-1) neg(sanoa-6, ei-2) advmod(sanoa-6, ehkä-3) aux(sanoa-6, olisi-4) aux(sanoa-6, pitänyt-5) advmod(sanoa-6, niin-7) punct(sanoa-6, .-8)
The distinction between the predicative and the subject
Distinguishing the subject from the predicative in copular clauses can be difficult, as it would often be possible to invert the word-order and thus swap the positions of the two elements. For instance in the following sentences, either kirahvit “giraffes” or eläimiä “animals” could be the subject and the other the predicative.
- [fi] Kirahvit ovat mielenkiintoisimpia eläimiä. “Giraffes are the most interesting animals.”
- [fi] Mielenkiintoisimpia eläimiä ovat kirahvit. “The most interesting animals are the giraffes.”
In UD Finnish, the main rule in annotating copular structures is that the leftmost element is the subject and the rightmost one the predicative. Hence, the above sentences would be annotated in the following manner:
Kirahvit ovat mielenkiintoisimpia eläimiä . \n Giraffes are the_most_interesting animals . nsubj:cop(eläimiä-4, Kirahvit-1) cop(eläimiä-4, ovat-2) amod(eläimiä-4, mielenkiintoisimpia-3) punct(eläimiä-4, .-5)
Mielenkiintoisimpia eläimiä ovat kirahvit . \n The_most_interesting animals are giraffes . amod(eläimiä-2, Mielenkiintoisimpia-1) nsubj:cop(kirahvit-4, eläimiä-2) cop(kirahvit-4, ovat-3) punct(kirahvit-4, .-5)
Semantic considerations such as which concept is a subconcept of the other are not taken into account in the annotation. However, it is possible to mark the leftmost element the predicative in cases where the word order is clearly inverted. This occurs for instance in (indirect) questions and sometimes relative clauses. Note that especially in questions, several different word orders are possible.
Millainen matka oli ? \n What_like trip was ? nsubj:cop(Millainen-1, matka-2) cop(Millainen-1, oli-3) punct(Millainen-1, ?-4)
Kysyin , oliko matka mukava . \n I_asked , whether_was trip nice . ccomp(Kysyin-1, mukava-5) punct(Kysyin-1, .-6) punct(mukava-5, ,-2) cop(mukava-5, oliko-3) nsubj:cop(mukava-5, matka-4)
yhdistys , jonka puheenjohtaja Matikainen on \n association , of_which chairman Matikainen is acl:relcl(yhdistys-1, puheenjohtaja-4) punct(puheenjohtaja-4, ,-2) nmod:poss(puheenjohtaja-4, jonka-3) nsubj:cop(puheenjohtaja-4, Matikainen-5) cop(puheenjohtaja-4, on-6)
Also, if the leftmost element of the copular clause is an adjective rather than a noun or pronoun, it is considered that the word order is inverted, and thus the adjective is marked as the predicative, not the subject.
Kaunishan tämä talo on . \n Beautiful this house is . nsubj:cop(Kaunishan-1, talo-3) det(talo-3, tämä-2) cop(Kaunishan-1, on-4) punct(Kaunishan-1, .-5)
- http://scripta.kotus.fi/visk/sisallys.php?p=891 (in Finnish)
Open clausal complements share their subject with another verb (see
also the documentation for xcomp). The fact that the subject of
the main verb is also the subject of the complement cannot be
annotated using basic dependencies, as this would violate the treeness
restriction. Therefore, in UD Finnish these subjects are marked on
the second layer of annotation (
DEPS field) using the standard
dependency types nsubj and nsubj:cop. Note that an open
clausal complement may not always have a subject, in for instance
Note that while some related schemes such as SD and TDT differentiate
second-layer (or “additional”) external subject dependencies by
applying a specific type such as
xsubj, nsubj is used on both
the basic and second layer in UD Finnish.
Matti ryhtyi lukemaan . \n Matti started_to read . nsubj(ryhtyi-2, Matti-1) xcomp(ryhtyi-2, lukemaan-3) punct(ryhtyi-2, .-4) nsubj(lukemaan-3, Matti-1)
Hän vaikutti olevan hiljainen . \n He appeared to_be silent . nsubj(vaikutti-2, Hän-1) xcomp(vaikutti-2, hiljainen-4) cop(hiljainen-4, olevan-3) punct(vaikutti-2, .-5) nsubj:cop(hiljainen-4, Hän-1)
Appositions and appellation modifiers
The Finnish Grammar (see ISK §1059, §1062) distinguishes between three similar phenomena: the apposition, the appellation modifier (nimikemääarite) and the supporting noun (tukisubstantiivi). Out of these, the apposition and the appellation modifier (compound:nn) are distinguished in TDT, and supporting noun structures are considered appositions.
All of these structures have in common that they all include two (usually adjacent) elements, most often noun phrases, which refer to the same entity or entities and have the same function in the sentence. Thus, in order to be considered an apposition, an appellation modifier or a supporting noun structure, a structure has to fulfill the following criteria (the same as in the Finnish grammar §1059):
- Both elements of the structure must refer to the same entity or group of entities.
- Both elements of the structure must have the same function in the sentence (for instance, the subject).
These criteria are interpreted rather loosely, and there are no restrictions on the part of speech of the elements involved. Most appositions (and appellation modifiers) in TDT consist of noun phrases, but there are occurrences of different parts of speech as appositions; notably the fiction section of the treebank contains few examples of verbal appositions.
Among the expressions that fulfill criteria 1 and 2, six common cases can be distinguished according to inflection and punctuation.
- singular, both elements in nominative, no punctuation: professori Matti Tamminen “professor Matti Tamminen”
- singular, first element in nominative, second element inflected: professori Matti Tammisen mukaan “according to professor Matti Tamminen”
- singular, both elements in nominative, punctuation in between: professori, Matti Tamminen “the professor, Matti Tamminen”
- singular, first element inflected, second element in nominative: romaanissa Putkinotko “in the novel Putkinotko”
- singular, both elements inflected: professorin, Matti Tammisen, mukaan “according to the professor, Matti Tamminen”
- plural, elements either in nominative or inflected: professorit Matti Tamminen ja Erkki Koivunen “the professors Matti Tamminen and Erkki Koivunen” or professoreiden, Matti Tammisen ja Erkki Koivusen, mukaan “according to the professors, Matti Tamminen and Erkki Koivunen” or professoreiden Matti Tamminen and Erkki Koivunen mukaan “according to the professors Matti Tamminen and Erkki Koivunen”
Out of these six cases, the first two are considered appellation
modifiers, and thus marked with the dependency type
nn. Note that
the governor of the dependency in appellation modifiers is the latter
of the two words.
Professori Matti Tamminen pitää puheen . \n Professor Matti Tamminen gives a_speech . compound:nn(Matti-2, Professori-1) name(Matti-2, Tamminen-3) nsubj(pitää-4, Matti-2) dobj(pitää-4, puheen-5) punct(pitää-4, .-6)
The remaining four cases are all considered appositions and marked
with the type
appos. Contrary to appellation modifiers, in
apposition structures the first word is considered the governor.
Professori , Matti Tamminen , luennoi tänään . \n The_professor , Matti Tamminen , lectures today . appos(Professori-1, Matti-3) punct(Matti-3, ,-2) punct(Matti-3, ,-5) name(Matti-3, Tamminen-4) nsubj(luennoi-6, Professori-1) advmod(luennoi-6, tänään-7) punct(luennoi-6, .-8)
It should be noted that case number 4 is in fact an example of a supporting noun structure, but in TDT, these are marked as appositions. In plural (case number 6), all possible case combinations are considered appositions.
The only difference between the cases 1 and 3 is the presence or absence of punctuation. Often, said punctuation is a comma, but also parentheses, a dash or a colon are possible. As can be seen from the examples above, the punctuation produces a semantic difference, which is taken into account in the annotation. Punctuation variations of the cases 2, 4, and 5 need not be considered, as these variations are ungrammatical. (Naturally, ungrammatical phenomena can and do occur in a corpus of actual language, but these cases are resolved on a case-by-case basis.)
- [fi] *professori, Matti Tammisen mukaan “according to professor, Matti Tamminen”
- [fi] *romaanissa, Putkinotko “in the novel, Putkinotko”
- [fi] *professorin Matti Tammisen mukaan “according to the professor’s Matti Tamminen” (unless a possessive reading is intended)
- http://scripta.kotus.fi/visk/sisallys.php?p=567 (in Finnish)
- http://scripta.kotus.fi/visk/sisallys.php?p=1059 (in Finnish)
- http://scripta.kotus.fi/visk/sisallys.php?p=1062 (in Finnish)
Verbal dependents: Clauses, non-clauses, complements and modifiers
One particularly challenging task in annotating in the UD Finnish scheme is selecting the correct dependency type for dependents that are verbal. Verbal dependents include different kinds of subordinate clauses as well as infinitive and participial complements and modifiers. A simplified description of the decision procedure for verbal dependents is given in Table 1, and the full details are given below.
TABLE 1 OMITTED
Some basic cases are relatively easy to decide. If the dependent is a
regular subordinate clause, the choices are clear. For relative
clauses the type to be used is
acl:relcl and as indirect
questions are clausal complements, the correct type for them is
If the subordinate clause is a conjunction clause, it can be either a
complement or a modifier. Complement clauses are marked with
ccomp and modifier ones with
advcl. In the majority of cases, conjunction
clauses starting with the conjunction että are complements and
clauses starting with any other conjunction are modifiers. However, it
should be noted that the conjunction että can be a used instead of
the conjunction jotta, and respectively, also jotta can
(especially in spoken language) be used instead of että.
- [fi] Minun täytyy nyt mennä, että en myöhästy. / ~jotta en myöhästy. “I have to go now so that I won’t be late.”
- [fi] Hän sanoi, jotta tulee vasta illalla. / ~että tulee vasta illalla. “He said that I will only come in the evening.”
In these cases, a clause starting with että is a modifier, and a clause starting with jotta is a complement.
If the dependent is not a subordinate clause, the next deciding factor
is the part of speech of the governor. If the governor is a noun, the
dependent can be an infinitive modifier or a
participle modifier, both marked with
If the subject of the dependent is shared with the governor (subject
control), the correct type to use is
xcomp. If any other sentence element is inherited from the higher clause (for example a
dobj), the correct type is
xcomp:ds, and otherwise
- [fi] Hän alkoi hakata halkoja. “He started chopping the wood.”
- [fi] Sain hänet itkemään. “I made him cry.”
The dependent can also be a participial complement that resembles adjectival complements. The above-mentioned three clausal complement types should be used in these cases as well.
- [fi] Poika vei kotitehtävän opettajan tarkastettavaksi. “The boy took the homework to be inspected by the teacher.”
If the dependent is not a complement but a modifier, then the correct
dependency type is
advcl. These cases are usually
recognized as lauseenvastike (“substitute of a clause”) or non-complement participles.
- [fi] Pyyhittyään pölyt hän imuroi. “After dusting, he hoovered.”
- [fi] Huolestuneena seurasin tilanteen kehittymistä. “Worried, I followed the development of the situation.”
- http://scripta.kotus.fi/visk/sisallys.php?p=938 (in Finnish)
- http://scripta.kotus.fi/visk/sisallys.php?p=1452 (in Finnish)
Attachment issues: word-order-dependent structures and ambiguity
Occasionally determining the correct head word for a dependency may be difficult. Some structures are inherently ambiguous, and with some structures, often ones involving nominal modifiers, the dependent is most naturally seen to modify different sentence elements depending on the word-order. The following classic example is ambiguous:
- [fi] Ammuin elefantin pyjamassani. “I shot an elephant in my pajamas.”
In this example, it is possible that the shooting happened while wearing the pajamas, in which case the correct syntax tree would be as follows:
Ammuin elefantin pyjamassani . \n I_shot an_elephant in_my_pajamas . dobj(Ammuin-1, elefantin-2) nmod(Ammuin-1, pyjamassani-3) punct(Ammuin-1, .-4)
On the other hand, it is also possible that the elephant wore the pajamas, in which case the correct analysis is:
Ammuin elefantin pyjamassani . \n I_shot an_elephant in_my_pajamas . dobj(Ammuin-1, elefantin-2) nmod(elefantin-2, pyjamassani-3) punct(Ammuin-1, .-4)
Ambiguities such as this one are resolved as far as possible, and also context is used to determine the correct reading where applicable. That is, if in the context there exists another sentence which makes it clear whether the shooter or the elephant wore the pajamas, then that sentence is used to disambiguate the structure.
If, however, the ambiguity cannot be resolved even given context, or if an element seems to modify two or more elements simultaneously, then the attachment higher in the tree is chosen. In the case of the previous example, this would be the reading in which the shooting happens wearing the pajamas.
In some structures, the most natural analysis may be word order dependent. Consider the following two examples.
- [fi] Mies ruskeassa takissa tuli junaan. “A man in a brown coat came into the train.”
- [fi] Mies tuli junaan ruskeassa takissa. “A man came into the train in a brown coat.”
In the former example, there is clearly a man in a brown coat, whereas in the latter case, the coming into the train happened while wearing a brown coat. Therefore, the correct analyses for these examples differ in their attachment of the phrase in a brown coat. These attachment rules are akin to those used in the Prague Dependency Treebank.
Mies ruskeassa takissa tuli junaan . \n Man brown in_coat came into_train . nmod(Mies-1, takissa-3) amod(takissa-3, ruskeassa-2) nsubj(tuli-4, Mies-1) nmod(tuli-4, junaan-5) punct(tuli-4, .-6)
Mies tuli junaan ruskeassa takissa . \n Man came into_train brown in_coat . nsubj(tuli-2, Mies-1) nmod(tuli-2, junaan-3) nmod(tuli-2, takissa-5) amod(takissa-5, ruskeassa-4) punct(tuli-2, .-6)
Relative clauses most often modify noun phrases, but it is also
possible for them to modify a whole clause. From a prescriptive
perspective, the relativizer that should be used in relative clauses
that modify noun phrases is joka, and the relative clause should
always modify the word directly before it. The relativizer that should
be used in relative clauses modifying full clauses is mikä. However,
in real, especially spoken, language, the use of the two relativizers
is mixed, and not every joka clause actually refers to the word
adjacent to it. In UD Finnish, the actual reference for the relative
clause is chosen as the head of the
wherever possible. For this reason, the head of the
acl:relcl relation can occasionally be a verb.
Annoin hänelle kirjan , joka sitä oli pyytänyt . \n I_gave him the_book , who it had asked_for . nmod(Annoin-1, hänelle-2) dobj(Annoin-1, kirjan-3) acl:relcl(hänelle-2, pyytänyt-8) punct(pyytänyt-8, ,-4) nsubj(pyytänyt-8, joka-5) dobj(pyytänyt-8, sitä-6) aux(pyytänyt-8, oli-7) punct(Annoin-1, .-9)
The relativizer is annotated with the standard syntactic role that it
plays in the relative clause, such as nsubj or dobj. (Note
that this treatment differs from the annotation of relative clauses in
previously proposed related schemes, which used specific dependency
rel) to mark the relativizer. In particular, in the TDT
corpus the basic dependency layer used
rel and the second annotation
layer identified the actual syntactic role.)
Lapsi , jonka hän sai itkemään , parkui yhä surkeasti . \n The_child , whom he made cry , wailed still miserably . acl:relcl(Lapsi-1, sai-5) punct(sai-5, ,-2) nsubj(itkemään-6, jonka-3) nsubj(sai-5, hän-4) xcomp:ds(sai-5, itkemään-6) punct(sai-5, ,-7) nsubj(parkui-8, Lapsi-1) advmod(parkui-8, yhä-9) advmod(parkui-8, surkeasti-10) punct(parkui-8, .-11)
Tuon lapsen hän sai itkemään . \n That child he made cry . det(lapsen-2, Tuon-1) nsubj(sai-4, hän-3) xcomp:ds(sai-4, itkemään-5) nsubj(itkemään-5, lapsen-2) punct(sai-4, .-6)
Note also that the dependent of this dependency is always the head of the relative phrase, which may or may not be the relative word itself.
Nainen , jonka auto hajosi , seisoo tuolla . \n Lady , whose car broke , stands there . acl:relcl(Nainen-1, hajosi-5) punct(hajosi-5, ,-2) punct(hajosi-5, ,-6) nmod:poss(auto-4, jonka-3) nsubj(hajosi-5, auto-4) nsubj(seisoo-7, Nainen-1) advmod(seisoo-7, tuolla-8) punct(seisoo-7, .-9)
Units, measures and amounts
There are several ways to express amounts. The most simple case is expressing amount with numbers: three apples, sixteen litres.
kolme litraa \n three litres nummod(litraa-2, kolme-1)
The semantic head, litraa “litres” in the above example, is selected
as the head, and the number is marked as a numeral modifier,
(Morpho-syntactically, the number kolme “three” could also be
considered the head, as it determines the case used for the word
litra “litre”). For more information on the internal structure of
numerical expressions, see Section 5.12.
Amount can also be expressed with adverbs. This, too, is handled by selecting the semantic head as the head of the structure, that is, the noun.
paljon maitoa \n a_lot_of milk advmod(maitoa-2, paljon-1)
In addition, amount can be expressed using a nominal, often in expressions such as kuppi kahvia “a cup of coffee” or joku pojista lit. someone from the boys “one of the boys”. In these cases, the first nominal is marked as the head.
Hän joi kupin kahvia . \n He drank a_cup_of coffee . nsubj(joi-2, Hän-1) dobj(joi-2, kupin-3) nmod(kupin-3, kahvia-4) punct(joi-2, .-5)
Joku pojista voisi auttaa minua . \n Someone from_boys could help me . nmod(Joku-1, pojista-2) nsubj(auttaa-4, Joku-1) aux(auttaa-4, voisi-3) dobj(auttaa-4, minua-5) punct(auttaa-4, .-6)
These structures are considered different from the amount expressions with numerals or adverbs, as their inflection behaves differently. Consider the following examples.
- [fi] Kieltäydyin kolmesta donitsista. “I refused three doughnuts.”
- [fi] Kieltäydyin kupista kahvia. “I refused a cup of coffee.”
In the first example, both parts of the amount expression inflect as required by the verb kieltäytyä “to refuse”, whereas in the latter case, only the first nominal inflects, signaling that the head, the thing refused in this expression, is the cup. The structure Joku pojista behaves and is annotated similarly.
Two things should be noted about the above analysis of joku pojista lit. someone from the boys “one of the boys”. First, this analysis leads to yksi pojista “one of the boys” being analyzed similarly to joku pojista rather than yksi poika “one boy”.
Yksi pojista juoksi ulos . \n One from_boys ran out . nsubj(juoksi-3, Yksi-1) nmod(Yksi-1, pojista-2) advmod(juoksi-3, ulos-4) punct(juoksi-3, .-5)
Second, this analysis allows a structure like joku pojista to act as a predicative, as the head of the expression is in nominative.
Se oli joku pojista . \n It was someone from_boys . nsubj:cop(joku-3, Se-1) cop(joku-3, oli-2) nmod(joku-3, pojista-4) punct(joku-3, .-5)
Contrary to the special cases desribed above, in FI_FTB (FinnTreeBank) the amounts expressed using a nominal are treated similarly to the amounts expressed with a number or an adverb. This means that the semantic nucleus of the phrase is marked as the head in spite of its case (often the partitive or elative case) as in kuppi kahvia “a cup of coffee” or joku pojista “one of the boys”.
Noun phrases without nouns
In UD Finnish, it is possible for a phrase with a head word other than a noun (or pronoun) to act as a noun phrase. Typical cases of this include adjective-headed and participle-headed noun phrases.
- [fi] Ikkunan takana oli jotain sinistä. “There was something blue behind the window”.
- [fi] Kukista kaunein oli punainen ruusu. “The most beautiful of the flowers was a red rose.”
- [fi] Kirjaa kirjoittavat sanoivat samaa. “The (ones) writing a book said the same.”
- [fi] Onnettomuudessa olleille suositeltiin terapiaa. “Therapy was recommended for the (ones) been in the accident.”
These structures are analyzed as standard noun phrases. For instance, they can be marked as the subject of a clause, or a nominal modifier, regardless of the part of speech of the head word.
Ikkunan takana oli jotain sinistä . \n Window behind was something blue . case(Ikkunan-1, takana-2) nmod(oli-3, Ikkunan-1) nsubj(oli-3, sinistä-5) det(sinistä-5, jotain-4) punct(oli-3, .-6)
Onnettomuudessa olleille suositeltiin terapiaa . \n In_accident been(_ones) was_recommended therapy . nmod(olleille-2, Onnettomuudessa-1) nmod(suositeltiin-3, olleille-2) dobj(suositeltiin-3, terapiaa-4) punct(suositeltiin-3, .-5)
Comparatives and superlatives
This section describes the annotation of comparative and superlative structures, which, in UD Finnish, are considered to include also certain similar structures that do not contain a comparative or superlative wordform.
Structures with comparative adjectives and adverbs may be difficult
to annotate: they are often elliptical, and it may be difficult to
tell what is being compared with what. To annotate comparative constructions, dependency types
mark are used.
The basic usage of these two types is as follows. The comparative
adjective or adverb acts as the head for a
dependency, and the element being compared is its
dependent. The element being compared also acts as the head for a
mark dependency, the dependent of which is a
comparative conjunction, nearly always kuin.
Keittiö on pienempi kuin olohuone . \n Kitchen is smaller than livingroom . nsubj:cop(pienempi-3, Keittiö-1) cop(pienempi-3, on-2) advcl(pienempi-3, olohuone-5) mark(olohuone-5, kuin-4) punct(pienempi-3, .-6)
Note that the comparative adjective or adverb remains the head of
advcl dependency even if the word order is such that
the dependency becomes non-projective.
Matilla on isompi auto kuin Pekalla . \n At_Matti is bigger car than Pekka . nmod:own(on-2, Matilla-1) nsubj(on-2, auto-4) amod(auto-4, isompi-3) advcl(isompi-3, Pekalla-6) mark(Pekalla-6, kuin-5) punct(on-2, .-7)
From the previous example it can also be seen that comparative structures are often elliptical in some way. Strictly speaking, the example does not compare Matti and Pekka, but rather their cars, and the car owned by Pekka is not explicitly present in the sentence. As a general rule of thumb, the different kinds of ellipsis present in comparative structures are not marked with null tokens, but rather the available elements are used wherever possible.
It is also possible to make comparisons without the comparative
conjunction kuin. In these cases, only the dependency type
advcl is used, marking the comparative adjective or
adverb as the head, and the element compared as the dependent, just
as in the case with the comparative conjunction present.
Olohuone on keittiötä suurempi . \n Livingroom is (than_)kitchen bigger . nsubj:cop(suurempi-4, Olohuone-1) cop(suurempi-4, on-2) advcl(suurempi-4, keittiötä-3) punct(suurempi-4, .-5)
Also some structures not involving a comparative adjective or adverb can be marked as comparatives. In order to qualify as a comparative construction, a structure has to contain either a comparative word form or a word form that otherwise semantically entails comparison, such as samanlainen “similar”, sama “same”, erilainen “different” or eri “differing, separate”. (Note that for example the word sama “same” is in fact a pronoun in Finnish.)
Luin saman kirjan kuin Pekka . \n I_read same book as Pekka . dobj(Luin-1, kirjan-3) det(kirjan-3, saman-2) advcl(saman-2, Pekka-5) mark(Pekka-5, kuin-4) punct(Luin-1, .-6)
An additional difficulty is posed by the fact that in Finnish, the comparative conjunction kuin can also appear as a subordinating conjunction as well as an adverb. Borderline situations are resolved on a case-by-case basis, considering whether or not there is a comparison involved in the structure and, secondarily, whether the dependent structure is a clause. (Comparative structures can also occasionally be full clauses.)
Superlatives are less problematic than comparatives but deserve some attention nevertheless. The basic case with superlatives is simple: a lone superlative modifying a noun. The superlative form in this case is not marked in any particular way in the syntax annotation, but the structure is annotated similarly to any adjective modifying a noun. The same strategy of not marking the superlative in any particular way is also used in cases where the superlative acts as a predicative.
Suurin paketti oli muiden takana . \n Biggest package was others behind . amod(paketti-2, Suurin-1) nsubj(oli-3, paketti-2) nmod(oli-3, muiden-4) case(muiden-4, takana-5) punct(oli-3, .-6)
Often a superlative is modified by nominal in some manner. A very common phenomenon is a genitive modifier modifying a superlative. For instance, in an expression such as
Suomen paras kokki \n Finland's best cook nmod:poss(paras-2, Suomen-1) amod(kokki-3, paras-2)
the cook is the best of those in/of Finland and thus the correct head word for the genitive modifier is paras “best”. Similarly, an ordinal number can act as the head of a genitive modifier. For example, in
Virtasen kuudes mestaruus \n Virtanen's sixth championship nmod:poss(kuudes-2, Virtasen-1) nummod(mestaruus-3, kuudes-2)
the championship is the sixth out of those of Virtanen, and thus the genitive modifier should modify the ordinal number.
However, it is still possible for the noun to act as the head word in some cases. For instance, in
Rusakon pahin vihollinen \n The_hare's worst enemy nmod:poss(vihollinen-3, Rusakon-1) amod(vihollinen-3, pahin-2)
the enemy is not the worst of the hare, but rather it is an enemy of the hare, and it is the worst enemy. Thus, the head word should be hare.
As a rule of thumb, if the noun phrase containing the genitive modifier can be turned into a copular clause in the following fashion, then the genitive modifier should modify the superlative or ordinal number.
- [fi] Kokki on Suomen paras “The cook is the best in Finland”
- [fi] Mestaruus on Virtasen kuudes “The championship is the sixth for Virtanen”
are perfectly valid, but
- [fi] ?Vihollinen on rusakon pahin ?”The enemy is the worst of the hare”
is questionable at best. Thus, in Suomen paras kokki and Virtasen kuudes mestaruus, the genitive modifier is considered to modify the superlative adjective, but in rusakon pahin vihollinen, it is considered to modify the noun directly.
In this context, it should also be noted that in addition to superlatives, also certain other adjectives can also act as the head of a genitive modifier. These adjectives can be semantically superlative-like viimeinen “last”, but there are also many others, such as oma “own”, kaltainen “-like”, välinen “between (adj.)”, and vastainen “against (adj.)”.
Also other nominal modifiers are possible, expressing the set of beings from which the objects are drawn when making the comparison. These are treated similarly to the genitive modifiers, making the superlative wordform the head of the modifier if the modifier expresses the set of beings to draw from.
Kukista kaunein oli ikkunalaudalla . \n From_the_flowers most_beautiful was on_windowsill . nmod(kaunein-2, Kukista-1) nsubj(oli-3, kaunein-2) nmod(oli-3, ikkunalaudalla-4) punct(oli-3, .-5)
Note how in the previous example the phrase kukista kaunein can act as a noun phrase (it is the subject of the clause), even though its head word is an adjective.
Subordinate clauses and expressions of time
Many subordinate clauses, especially ones starting with the conjunction kun “when”, come with an adverbial, usually expressing time. Consider the following examples.
- [fi] Tulen sinne heti, kun olen imuroinut. “I’ll come there right away, when I have hoovered.”
- [fi] Tapasin hänet sen jälkeen kun olin tullut kaupasta. “I met him after I had come from the store.”
It is often unclear where these time adverbials should be attached. On the one hand, they seem to modify the main clause, expressing when the action of the main clause takes place. On the other hand, they could also modify the subordinate clause, being a part of the time condition given in the subordinate clause. A third option would be to make the time adverbial depend on the subordinating conjunction, becoming either multi-part conjunctions or conjunctions with adverbial modifiers.
In UD Finnish, a very limited number of these cases are considered especially tightly bound with the subordinating conjunction. These cases are considered multi-part subordinating conjunctions and listed as such in the documentation for mark. Otherwise, these adverbials are consistently made dependents of the subordinate conjunctions.
Tulen sinne heti , kun pääsen . \n I_will_come there right_away , when I_can . advmod(Tulen-1, sinne-2) advcl(Tulen-1, pääsen-6) advmod(kun-5, heti-3) punct(kun-5, ,-4) mark(pääsen-6, kun-5) punct(Tulen-1, .-7)
However, it should be noted that all subordinate clauses themselves are not dependents of the main verb. As discussed in the documentation for ccomp, clausal complements can depend on nouns, pronouns or adverbs. Similar situations can occur with subordinate clauses that are modifiers, and they are also analyzed similarly. Most commonly this occurs with the pronoun se “it”.
Hänet säikäytti se , kun poika putosi hevosen selästä . \n Him scared it , when boy fell horse's from_back . dobj(säikäytti-2, Hänet-1) nsubj(säikäytti-2, se-3) advcl(se-3, putosi-7) punct(putosi-7, ,-4) mark(putosi-7, kun-5) nsubj(putosi-7, poika-6) nmod(putosi-7, selästä-9) nmod:poss(selästä-9, hevosen-8) punct(säikäytti-2, .-10)
To prevent pure function words from having dependents when possible, the first of the three options has been chosen in FinnTreeBank (FTB_FI). The time adverbial modifies the main clause and the following subordinate clause modifies the adverbial. If the time adverbial could not stand on its own, a multi-part subordinating conjunction is considered (e.g. ennen kuin “before”).
Subjects and objects of a noun
In Finnish, it is possible for certain nouns which either are direct
derivations of a verb or otherwise have a verb counterpart
ISK §560; in
Finnish) to take a subject- or object-like complement. Both of these
are identical in form to more general genitive modifiers of a noun,
marked with the dependency type
nmod:poss in the UD Finnish
talon katto \n house(gen.) roof(N) nmod:poss(katto-2, talon-1)
Genitive objects of a noun are marked the
nmod:gobj, which is a
subtype for the more general genitive-modifier type
Both nominal derivations and other nouns with verb counterparts can
take a genitive object, with the exception of JA- derivations, the
genitive modifier of which is never considered an object in UD Finnish
(talon rakentaja “the builder of the house”).
talon rakentaminen \n house(gen.) building(N+deriv.) nmod:gobj(rakentaminen-2, talon-1)
Genitive subjects, in turn, are marked using the
dependency type, also a subtype of
nmod:poss. Only nouns that
are marked as derivations of a verb in the morphological tagging
maljakon putoaminen \n vase(gen.) falling(N+deriv.) nmod:gsubj(putoaminen-2, maljakon-1)
- http://scripta.kotus.fi/visk/sisallys.php?p=560 (in Finnish)
In the current release of FinnTreeBank (FI_FTB) only minen-derivations of nouns can take a genitive object or subject. The information about being a verb-derived nominal does not occur in the morphological tagging of these nouns.
The dependency type
compound is used for numerical
expressions. Generally, with multi-token numerical expressions, the
rightmost token of the expression is considered the head and the
dependencies are chained.
Poikasia on yleensä 3 - 5 . \n Youngsters are usually 3 to 5 . nsubj:cop(5-6, Poikasia-1) cop(5-6, on-2) advmod(5-6, yleensä-3) compound(--5, 3-4) compound(5-6, --5) punct(5-6, .-7)
However, it is possible that rather complex expressions are considered numerical, and in these cases the structure of the expression is also marked, showing the parts of which the expression consists. Often these complex expressions involve dates, which are also considered numerical expressions in UD Finnish.
3. joulukuuta 1510 - 15. kesäkuuta 1579 \n 3rd December 1510 to 15th June 1579 compound(joulukuuta-2, 3.-1) compound(1510-3, joulukuuta-2) compound(--4, 1510-3) compound(1579-7, --4) compound(kesäkuuta-6, 15.-5) compound(1579-7, kesäkuuta-6)
Dates can be expressed using many different forms, and all full dates are considered numerical expressions in UD Finnish, also those where some or all parts of the date are written with characters. Even partial dates such as
3. joulukuuta \n 3rd December compound(joulukuuta-2, 3.-1)
are considered numerical expressions. However, year expressions such as the following are not considered dates in UD Finnish, and thus not complex numerical expressions.
sanoi vuonna 1996 \n said in_the_year 1996 nmod(sanoi-1, vuonna-2) nummod(vuonna-2, 1996-3)
tapahtui kesällä 1972 \n happened in_the_summer 1972 nmod(tapahtui-1, kesällä-2) nummod(kesällä-2, 1972-3)
If a date expression has a clear internal syntactic structure, this
structure is annotated instead of the default chain of
syyskuun 3. ja 4. päivä \n September's 3rd and 4th day nmod:poss(3.-2, syyskuun-1) cc(3.-2, ja-3) conj(3.-2, 4.-4) nummod(päivä-5, 3.-2)
If a date has a more specific time (such as kello kuudelta “at six o’clock”) attached to it, the date is considered the head of the expression, and the more specific time depends on it. Clock expressions, alone or in conjunction with a date, are not considered dates or numerical expressions in UD Finnish.
6. joulukuuta kello 18 \n 6th December o'clock 18 compound(joulukuuta-2, 6.-1) nmod(joulukuuta-2, kello-3) nummod(kello-3, 18-4)
In addition to dates, there is one more case of numerical expressions
that deserves attention: numerical expressions with multiple units. If
a single amount expression involves multiple units, the units are
considered a compound unit so to say, and combined using the
2 kg 315 g nummod(kg-2, 2-1) compound:nn(g-4, kg-2) nummod(g-4, 315-3)
In rare cases, however, the previous situation may occur with the rightmost part of the expression lacking the unit. These cases are annotated flatly as numerical expressions, with no compound units.
2 kg 315 compound(kg-2, 2-1) compound(315-3, kg-2)
In FinnTreeBank (FI_FTB), the dependency type
compound is not
used for numerical expressions. If any clear internal syntactic
structure is not noticeable in a numerical expression, the
rightmost token of the expression is considered the head of a
chain consisting of
Respectively, numerical expressions with multiple units are
annotated using a
Participial modifiers and predicatives
In connection with participial modifiers, predicatives are given a slightly different treatment than in other contexts. In a regular copular clause, the analysis is as follows.
Eeva on raskaana . \n Eeva is pregnant . nsubj:cop(raskaana-3, Eeva-1) cop(raskaana-3, on-2) punct(raskaana-3, .-4)
However, if the same analysis were applied in a situation where olla acts as a participial modifier, this would result in a non-tree structure:
Raskaana oleva nainen on nälkäinen . \n Pregnant being woman is hungry . cop(Raskaana-1, oleva-2) nsubj:cop(Raskaana-1, nainen-3) nsubj:cop(nälkäinen-5, nainen-3) cop(nälkäinen-5, on-4) punct(nälkäinen-5, .-6)
Therefore, in conjunction with participial modifiers, copular verbs are analyzed similarly to regular verbs, in order to avoid non-tree structures.
Raskaana oleva nainen on nälkäinen . \n Pregnant being woman is hungry . advmod(oleva-2, Raskaana-1) acl(nainen-3, oleva-2) nsubj:cop(nälkäinen-5, nainen-3) cop(nälkäinen-5, on-4) punct(nälkäinen-5, .-6)
The same rule is applied to certain special constructions that are normally considered passive structures but can also appear in conjunction with participial modifiers. Here the application of the rule results in two chained participial modifiers.
Resurssit ovat käytettävissä . \n Resources are usable . dobj(käytettävissä-3, Resurssit-1) auxpass(käytettävissä-3, ovat-2) punct(käytettävissä-3, .-4)
Käytettävissä olevat resurssit ovat rajalliset . \n Usable being resources are limited . xcomp(olevat-2, Käytettävissä-1) acl(resurssit-3, olevat-2) nsubj:cop(rajalliset-5, resurssit-3) cop(rajalliset-5, ovat-4) punct(rajalliset-5, .-6)
As the passive-verb-derived, idiomatic structures
olla tehtävissä / tehtävillä (“to be doable”) are
root (or other) +
in FinnTreeBank (FI_FTB), the rule relating
to certain passive structures does not
apply to FinnTreeBank.
Necessive structures and clausal subjects
A clause can act as a subject to another clause (as well as an object,
but these are marked as clausal complements,
ccomp), in which
case it should be marked as a clausal subject,
csubj, or, if the
main clause is copular, a clausal copular subject,
csubj:cop. However, in the case of clausal-copular subject, it
may be difficult to determine whether a clause is, in fact, the
subject of another clause, as the construct is similar to that of a
necessive structure. Consider the following example.
- [fi] On tärkeää syödä hyvin. “It is important to eat well.”
At first glance, it seems that the clause syödä hyvin is the subject of on tärkeää. However, in UD Finnish, this is not considered a clausal subject. Instead, it is considered a necessive structure, as on tärkeää can be given a subject in the genitive form:
- [fi] Hänen on tärkeää syödä hyvin. “It is important for him to eat well.”
The whole structure is considered a single unit, and the genitive subject is considered the subject of the latter verb (which expresses what it is that is necessary).
Hänen on pakko mennä kotiin . \n He has to go home . nsubj(mennä-4, Hänen-1) cop(pakko-3, on-2) xcomp:ds(pakko-3, mennä-4) nmod(mennä-4, kotiin-5) punct(pakko-3, .-6)
The name necessive structure comes from the fact that these structures often express the necessity of doing something, but it does not mean that all of these structures would have such a meaning; for example, on vaikea(a) “it is difficult” is a necessive structure the meaning of which does not express necessity. Common necessive structures include expressions such as on pakko, on tärkeää, on oleellista and on välttämätöntä. They usually, but not always, involve the verb olla and an adjective. There are also some verbs, such as kannattaa “be worth it” and kuulua “be supposed to”, that are analyzed in a necessive manner.
If it is not possible to insert a genitive subject into the clause, then the structure is considered a clausal subject case.
- [fi] *Hänen on mahtavaa käydä ulkona. “It is splendid for him to go out.” (the Finnish sentence is ungrammatical.)
On mahtavaa mennä ulos . \n (it)_is splendid to_go out . cop(mahtavaa-2, On-1) csubj:cop(mahtavaa-2, mennä-3) advmod(mennä-3, ulos-4) punct(mahtavaa-2, .-5)
Note that due to the copular nature of the main clause, the clausal subjects in these clauses which resemble necessive structures are in fact clausal copular subjects. There are also other clausal subjects which cannot be confused with necessive structures.
Hänen aikomuksenaan oli mennä ulos . \n His intention(essive) was to_go out . nmod:poss(aikomuksenaan-2, Hänen-1) nmod(oli-3, aikomuksenaan-2) csubj(oli-3, mennä-4) advmod(mennä-4, ulos-5) punct(oli-3, .-6)
Passive structures and zeroth person constructions
The Finnish language has two notable cases of subjectless expressions: the passive voice and the zeroth person. In most cases, distinguishing these two is rather simple, as the zeroth person uses the same verb forms as the third person, whereas there is a morphological passive form that is used in constructions considered passive. However, there are at least two particular phenomena that deserve special attention. First, the on tehtävä -structure is worth examining:
- [fi] Tämä työ on tehtävä tänään. “This work has to be done today.”
The form tehtävä is morphologically a passive participle of the verb tehdä “to do”. Still, on tehtävä can take a subject, which could perhaps point towards to the subjectless version being zeroth person after all.
- [fi] Matin on tehtävä työ tänään. “Matti has to do the work today.”
In UD Finnish, we use the presence or absence of a subject as a cue to whether the structure is passive or not. If a subject is present, the structure is marked as an active construction, and if not, it is assumed to be passive.
Tämä työ on tehtävä tänään . \n This work has_to_be done today . det(työ-2, Tämä-1) dobj(tehtävä-4, työ-2) auxpass(tehtävä-4, on-3) advmod(tehtävä-4, tänään-5) punct(tehtävä-4, .-6)
Matin on tehtävä työ tänään . \n Matti has_to do work today . nsubj(tehtävä-3, Matin-1) aux(tehtävä-3, on-2) dobj(tehtävä-3, työ-4) advmod(tehtävä-3, tänään-5) punct(tehtävä-3, .-6)
Second, the on tehtävissä structure deserves a mention. Similarly to tehtävä, tehtävissä is a passive verb participle - in fact, the difference between the two forms is only that tehtävissä is the plural inessive form of the base participle tehtävä. The annotation of on tehtävissä follows a strategy similar to the previous one. In general, it is assumed that the structure is passive.
Unlike on tehtävä, on tehtävissä cannot take a genitive form subject:
- [fi] *Minun on tehtävissä tämä. “*I this is doable.”
However, in some cases it is possible to attach a possessive suffix to the participle and use a corresponding personal pronoun as a nominal modifier (this is a rare phenomenon and not seen with many verbs). This case is analyzed as an active structure.
However, as can be seen from the example, no subject is marked, but rather an object. It is still understood that means are the object of using in this example.
Distinctions between certain dependency types, most commonly between
participial modifiers (
acl) and adjectival modifiers
amod) as well as adverbial modifiers (
advmod) and nominal
nmod), are based on the corresponding morphological
distinction, which can sometimes be rather difficult. This section
describes heuristics used to make these two most common
morphology-based distinctions. Some of these heuristics resemble those
used in the
Participles versus adjectives
The distinction between verb participles and adjectives is difficult
in several languages, and Finnish is no exception. In UD Finnish, this
distinction affects the syntax annotation of mainly two kinds of
structures. First, it affects the choice between the dependency types
acl (participial modifier) and
Tunnettu näyttelijä John Travolta \n Well-known actor John Travolta amod/acl?(näyttelijä-2, Tunnettu-1) compound:nn(John-3, näyttelijä-2) name(John-3, Travolta-4)
Second, it affects whether certain structures should be marked as copular clauses, or alternatively, as passive clauses in the present or past perfect form (perfekti and pluskvamperfekti in Finnish grammar). The same structure can be considered copular if the head word is an adjective, or a passive clause if the head word is considered a passive participle.
Uiminen järvessä on kielletty . \n Swimming in_lake is\/has_been forbidden . nsubj:cop/dobj?(kielletty-4, Uiminen-1) nmod(Uiminen-1, järvessä-2) cop/auxpass?(kielletty-4, on-3) punct(kielletty-4, .-5)
Some words have several possible readings, and it is fairly common that a word can be given either a participial reading or an adjectival one. The following heuristics are used when deciding whether a word is an adjective or a participle.
If a word can receive comparative and superlative forms, it is likely to be an adjective. For instance, the word tunnettu “well-known”, which has both and adjectival and a participial reading, inflects in these forms: tunnettu, tunnetumpi, tunnetuin.
If, on the other hand, the word is modified by for instance a nominal or adverbial modifier, it is likely to be a verb participle. For instance, with the word tunnettu, the following contexts would be possible:
- [fi] laajalti tunnettu näyttelijä “widely known actor”
- [fi] kalliista autoistaan tunnettu näyttelijä “actor known for his expensive cars”
Thus, it is the case that the same word can act both as an adjective and as a verbal participle, depending on context, and the decisions are made on a case-by-case basis. As a third heuristic used in the decision, the annotators are asked to consider whether someone is actively doing something in the example under consideration. If so, then the word is likely a verbal participle, otherwise it is an adjective. Consider the following examples:
- [fi] Maijan tuleva aviomies lit. Maija’s coming husband “Maija’s future husband”
- [fi] Maijan Turusta tuleva aviomies “Maija’s husband coming from Turku”
In the first example, the husband is not actively doing anything, he simply is going to be Maija’s husband in the future. Thus tuleva in this example would be considered an adjective. In the second example, he is actively coming from the direction of Turku, and thus tuleva here would be a verbal participle.
As a rule of thumb, if an adjectival reading is possible in a given context, it is generally preferred. For instance, in tunnettu näyttelijä “well-known actor”, if it was not specified a a by whom or for what the actor is known, it would be assumed that the adjectival reading is intended. Similarly, in uiminen on kielletty “swimming is forbidden”, if the context does not reveal that there has been active forbidding of the swimming (the example is genuinely ambiguous), then it is assumed that it is a property of the swimming that it is forbidden.
Adverbs versus nouns
Due to the fact that certain Finnish adverbs have a partial case inflection, it is sometimes difficult to decide whether a word is an inflected form of a noun (or adjective), or rather an adverb. For instance, the word pääasiassa “mainly” could be analyzed as an adverb, or alternatively, as an inflected form of the noun pääasia “the main thing”.
This distinction affects the choice between the dependency types
advmod (adverb modifier) and
nmod (nominal modifier).
Additionally, it can affect the choice of whether a word can be marked
as a predicative (if it is an adverb) and thus head of the clause, or
if it should me marked as a nominal modifier for the verb olla. In the
latter case, the structure of the whole clause is affected by the
Pääasiassa tämä vaikuttaa koron suuruuteen . \n Mainly this affects interest's level . advmod/nmod?(vaikuttaa-3, Pääasiassa-1) nsubj(vaikuttaa-3, tämä-2) nmod(vaikuttaa-3, suuruuteen-5) nmod:poss(suuruuteen-5, koron-4) punct(vaikuttaa-3, .-6)
Elisa ja Elias ovat naimisissa . \n Elisa and Elias are married . cc(Elisa-1, ja-2) conj(Elisa-1, Elias-3) nsubj:cop?(naimisissa-5, Elisa-1) cop?(naimisissa-5, ovat-4) punct(naimisissa-5, .-6)
Matti oli humalassa . \n Matti was drunk . nsubj?(oli-2, Matti-1) nmod?(oli-2, humalassa-3) punct(oli-2, .-4)
Again, the main source of information while annotating is the morphological analysis of the word, but occasionally it is possible that the syntactic annotation uses a reading that has been omitted. It is less common that both an adverb and noun reading would be available. Decision heuristics are needed here as well.
The main deciding factor between a noun and an adverb reading is whether there exists a corresponding noun in its baseform and whether and to what degree the word under question is related to that noun. For example, in the case of pääasiassa “mainly” there exists a corresponding noun pääasia “main thing”, but in the case of naimisissa “married” the only candidate for such a noun would be naiminen, which could technically be translated as “marrying”, but is in fact more often used (usually in spoken language) in the meaning “having sex”. As for humalassa “drunk”, there is a candidate noun, humala, which can be used to refer to the state of being drunk.
As a test used to see whether the possible candidate noun is closely (enough) related to the word under question, annotators are asked to reflect on the hypothetical baseform of the noun reading and on whether it could be imagined to be involved in the current sentence. For instance, is there a main thing (pääasia) in which the interest rate is affected? Is there a state of being married (“naimiset”) in which Elisa and Elias are? Is there a state of being drunk (humala) in which Matti is? The answer to the first two questions is no, and thus pääasiassa and naimisissa are considered adverbs. The answer to the third question, however, is yes, and therefore the word humalassa is analyzed as an inflected form of the noun humala.
- Marcus et al. 1993 Building a Large Annotated Corpus of English: The Penn Treebank Computational Linguistics 19(2):313–330.
Dependencies signaling punctuation are labeled with the dependency
punct, and the main rule is that the dependency should be
attached to that element which it delimits. Thus, sentence-delimiting
punctuation, such as “.”, “!” or “?” should be attached to the main
verb (or predicative) of the sentence.
Söin jäätelöä . \n I_ate ice-cream . dobj(Söin-1, jäätelöä-2) punct(Söin-1, .-3)
According to the same rule, the comma delimiting a subordinate clause should be attached to the head word of said clause.
Jos sataa , menen sisälle . \n If it_rains , I_go inside . mark(sataa-2, Jos-1) punct(sataa-2, ,-3) advcl(menen-4, sataa-2) advmod(menen-4, sisälle-5) punct(menen-4, .-6)
If there are several subordinate clauses within each other and the punctuation could delimit any of them, the shortest-spanning (closest) clause is selected.
Jos syöt sieniä , jotka ovat myrkyllisiä , kuolet . \n If you_eat mushrooms , that are poisonous , you_die . mark(syöt-2, Jos-1) dobj(syöt-2, sieniä-3) acl:relcl(sieniä-3, myrkyllisiä-7) nsubj:cop(myrkyllisiä-7, jotka-5) punct(myrkyllisiä-7, ,-4) cop(myrkyllisiä-7, ovat-6) punct(myrkyllisiä-7, ,-8) advcl(kuolet-9, syöt-2) punct(kuolet-9, .-10)
In coordinations, the punctuation symbols (usually commas) are treated similarly to the coordinating conjunction and attached to the head of the coordination, which is the first coordinated element.
kivet , kannot ja männynkävyt \n rocks , stumps and pinecones punct(kivet-1, ,-2) conj(kivet-1, kannot-3) cc(kivet-1, ja-4) conj(kivet-1, männynkävyt-5)
Punctuation related to coordination-like parataxis, that is, parataxis used in connection with a semicolon, colon or dash, is attached as in coordinations.
Matti tuli töistä ; Maija oli jo kotona . \n Matti came from_work ; Maija was already home . nsubj(tuli-2, Matti-1) nmod(tuli-2, töistä-3) punct(tuli-2, ;-4) parataxis(tuli-2, oli-6) nsubj(oli-6, Maija-5) advmod(oli-6, jo-7) advmod(oli-6, kotona-8) punct(tuli-2, .-9)
Punctuation with direct speech -type parataxis is attached to the first element.
" Älä sotke itseäsi " , äiti sanoi . \n " Don't mess yourself " , mother said . neg(sotke-3, Älä-2) dobj(sotke-3, itseäsi-4) punct(sotke-3, "-1) punct(sotke-3, "-5) punct(sotke-3, ,-6) parataxis(sotke-3, sanoi-8) nsubj(sanoi-8, äiti-7) punct(sotke-3, .-9)
Single and double quotes as well as parentheses are attached to the head of the quoted/parenthetical clause or phrase. Dashes signifying quotes are also attached to the head of the quote.
Illan elokuva on " Kuninkaan puhe " . \n Tonigt's movie is " The_King's speech " . nmod:poss(elokuva-2, Illan-1) nsubj:cop(puhe-6, elokuva-2) cop(puhe-6, on-3) punct(puhe-6, "-4) nmod:poss(puhe-6, Kuninkaan-5) name(Kuninkaan-5, puhe-6) punct(puhe-6, "-7) punct(puhe-6, .-8)
Matikainen ( s. 1943 ) on ammatiltaan kirjailija . \n Matikainen ( born 1943 ) is by_profession author . nsubj:cop(kirjailija-8, Matikainen-1) acl(Matikainen-1, s.-3) punct(s.-3, (-2) nmod(s.-3, 1943-4) punct(s.-3, )-5) cop(kirjailija-8, on-6) nmod(kirjailija-8, ammatiltaan-7) punct(kirjailija-8, .-9)
- Älä sotke itseäsi , sanoi äiti . \n - Don't mess yourself , said mother . punct(sotke-3, --1) neg(sotke-3, Älä-2) dobj(sotke-3, itseäsi-4) punct(sotke-3, ,-5) parataxis(sotke-3, sanoi-6) nsubj(sanoi-6, äiti-7) punct(sotke-3, .-8)
If the quotes or parentheses contain two or more items, such as parts of a coordination, then the punctuation is attached to the closest enclosed element, so as to avoid unnecessary non-projectivity.
Hän pitää kirjoista ( ja näytelmistä ) . \n He likes books ( and plays ) . nsubj(pitää-2, Hän-1) dobj(pitää-2, kirjoista-3) cc(kirjoista-3, ja-5) conj(kirjoista-3, näytelmistä-6) punct(pitää-2, .-8) punct(ja-5, (-4) punct(näytelmistä-6, )-7)
Punctuation can also delimit short additions, such as nominal modifiers or appositions, and in such cases, the punctuation should be attached to the head of the addition.
Matti Tamminen , professori \n Matti Tamminen , the_professor name(Matti-1, Tamminen-2) appos(Matti-1, professori-4) punct(professori-4, ,-3)
Lähden matkalle , ainakin viikoksi . \n I_am_going to_trip , at_least for_a_week . nmod(Lähden-1, matkalle-2) nmod(Lähden-1, viikoksi-5) punct(Lähden-1, .-6) punct(viikoksi-5, ,-3) advmod(viikoksi-5, ainakin-4)
Finally, list item markers such as bullets of a bulleted list are marked as punctuation attached to the head of the list item.
* Käy kaupassa . \n * Visit store . punct(Käy-2, *-1) punct(Käy-2, .-4) nmod(Käy-2, kaupassa-3)