UD for Classical Armenian
Tokenization and Word Segmentation
- Words are generally delimited by whitespace or punctuation. No tokens in the UD Classical Armenian treebank contain whitespace.
- Most punctuation marks are attached to the preceding word and are tokenized as separate tokens.
- Words, containing “infixed” punctuation (e.g. question, exclamation, emphasis and abbreviation marks), as զիա՞րդ = զիարդ/ziard + ՞ “why?”, are treated as multiword tokens and are segmented to individual syntactic words.
- According to typographical rules, the following words are spelled together with a neighbouring word:
- proclitic adposition յ=/y=, ց=/cʽ= and զ=/z=
- proclitic negation particle չ=/čʽ=
- enclitic determiners =ս/=s, =դ/=d, =ն/=n
Sentence splitting
- A full sentence is usually concluded by the punctuation sign verjaket [ ։ ] corresponding to the English period. In case of longer sentences, the editor of a digital text may decide to split a sentence after the punctuation signs mijaket [ . ], boot [ ՝ ] or storaket [ , ], corresponding to the English colon, semicolon, and comma, respectively.
Morphology
Tags
This is an overview only. For more detailed discussion and examples, follow the links below.
- UD_Classical_Armenian-CAVaL currently uses 16 UPOS tags (the tag SYM does not occur in the treebank): ADJ, ADP, ADV, AUX, CCONJ, DET, INTJ, NOUN, NUM, PART, PRON, PROPN, PUNCT, SCONJ, VERB, X.
- The complete list of Classical Armenian words, which must be tagged PART in UD, has to be worked out. At present, the tag is used restrictively and is applied to the following lexemes:
- contrasting particle: իսկ/isk “but”
- dubitation particle: ապաքէն/apakʻēn “indeed”, գուցէ/gowcʽē “perhaps”
- negation particles: ոչ/očʽ (with its proclitic variant չ=/čʽ=) and մի/mi
- The tag DET is used for articles and adjectival pronouns with a determiner function. Pronominal quantifiers (which the traditional grammar includes in pronouns) are DET as well. The tag PRON is reserved for pronouns occurring as the head of a noun phrase.
- The Classical Armenian auxiliaries (tagged AUX) include: եմ/em “to be”, its perfective counterpart լինիմ/linim “to become”, չիք/čʽikʽ “there is no”, and տամ/tam “to give”.
The auxiliaries եմ and լինիմ are used in the following constructions:
- The copula with non-verbal predicates, including predicates of location.
- Periphrastic past tenses: present form of եմ + past participle; imperfect form of եմ + past participle; aorist form of լինիմ + past participle of the main verb.
- Periphrastic future/subjunctive tenses: present subjunctive form of եմ + past participle; present subjunctive form of լինիմ + past participle; aorist subjunctive form of լինիմ + past participle of the main verb.
- The auxiliary չիք is used as a negated copula.
- The auxiliary տամ is used to form periphrastic causative:
- Periphrastic causative: any form of տամ, including periphrastic forms, + infinitive of the main verb.
Nominal Features
- Number has two values:
Sing
andPlur
. The following parts of speech inflect for number: ADJ, DET, NOUN, NUM, PROPN, PRON, as well as the finite forms of VERB and AUX.- Classical Armenian has numerous pluralia tantum nouns, the plural form of which expresses a single entity or abstract notion, cf. ապարանք/aparankʽ “palace”, երեսք/ereskʽ “face”, բարիք/barikʽ “goodness”, etc.
- Case has seven values:
Nom
,Acc
,Gen
,Dat
,Abl
,Ins
,Loc
. It occurs with ADJ, DET, NOUN, NUM, PROPN, PRON, as well as with participles and verbal nouns, tagged VERB or AUX. - Deixis has three values:
Prox
(proximal),Med
(medial),Remt
(remote). It occurs with PRON, DET, ADP, ADV, INTJ. - NumType is used with NUM (
Card
,Sets
), ADJ (Ord
) and ADV (Mult
). - Animacy (
Anim
,Inan
) and Definite (Def
,Ind
,Spec
) can be lexically expressed in PRON and DET. A semi-grammaticalized adposition զ=/z= also takesDef
, when it expresses the referentially prominent direct object.
Pronouns, Determiners, Quantifiers
- Different values of PronType (
Art
,Dem
,Ind
,Int
,Prs
,Rcp
,Rel
,Tot
) are used with PRON, DET, ADV and deictic interjections (INTJ). - Poss marks possessive personal determiners (e.g. իմ/im “my”, իւր/iwr “his/her own”).
- Reflex marks reflexive pronoun իւր/iwr (gen.sg.) “of him/her-self” and determiner իւր/iwr (nom.sg.), իւրոյ/iwroy “his/her own”.
- Person is lexically expressed in personal pronouns and determiners. Only the first and second person pronouns are marked with the values
1
and2
, respectively. The third person pronoun նա/na “(s)he, it” coincides with the demonstrative նա/na “that” and is left unmarked. The same applies to the possessive determiners.
Verbal Features
- VerbForm distinguishes five main (de)verbal forms. Although the verbal noun functions as a nominal and the past participle can be used adjectivally, they are consistently tagged VERB or AUX.
- Person has three values (
1
,2
,3
), which mark the person of the verb’s subject on verbs. Classical Armenian is a pro-drop language and a personal pronoun as subject is often omitted. - Aspect has two values,
Imp
andPerf
. The aspect is defined in purely morphological terms based on the type of the verb stem, from which a verb form is derived. The aspectual semantics expressed by either of the two types of forms may not match the formal aspect. - Finite verbs always have one of three values of Mood:
Ind
,Sub
, orImp
. - In the indicative mood, verbs always have one of the two values of Tense:
Pres
orPast
, which, in combination with the aforementioned aspectual values, define the three synthetic tenses, the Present, the Aorist, and the Imperfect. Sub
defines the subjunctive mood, which is also used to express the Future and combines with the two aspectual values.Imp
defines the imperative, derived from a perfective stem, and the prohibitive, derived from an imperfective stem and obligatorily combined with a prohibitive particle մի/mi; the prohibitive forms are tagged with the feature Connegative.- Voice has two values,
Act
andPass
. It characterises the oppositional inflectional voice, which is expressed only in part of the verbal paradigm. Some forms, such as the present indicative forms of the a-conjugation (գնամ/gnam “I go”) and the first plural form of the aorist indicative (լուաք/luakʽ “we heard”), are underspecified for voice. ThePass
value defines to a wide range of valency-decreasing alternations including the passive, middle, reflexive, etc. The morphological causative is a derivational category; the active and labile forms of the causatives are tagged asCau
, whereas the mediopassive haveCauPass
. Note that the active and labile forms currently remain underspecified, which awaits an improvement in the UD guidelines with respect to systems, which combine the derivational and inflectional voice marking. - Polarity feature with its
Polarity=Neg
value applies to negation particles ոչ/očʽ (with its proclitic variant չ=/čʽ=) and մի/mi as well as a negated copula չիք/čʻikʻ “there is no”, which has a PRON tag.
Other Features
- The feature Foreign is applied to foreign words, which are not loanwords.
- The following universal features are not used in Classical Armenian: Clusivity, Evident, Gender, NounClass, Polite.
Syntax
This is an overview only. For more detailed discussion and examples, follow the links below.
Core Arguments, Oblique Arguments and Adjuncts
- Nominal subject nsubj is a noun phrase (possibly headed by a deverbal nominal) typically in the nominative case, without preposition.
- Objects (obj) are noun phrases in the accusative, which can take the proclitic determinate object marker զ=/z=.
- Secondary objects (iobj) are expressed by bare noun phrases in the dative. A functionally similar prepositional construction ց=/cʽ= + dative is tagged as obl:arg.
- Clausal complements are labelled ccomp, whereas open clausal complements are linked with the relation xcomp.
- In passive clauses:
- the subject is labelled either nsubj:pass or csubj:pass;
- if the agent is present, it is typically expressed by an adpositional ablative noun phrase and is labeled obl:agent.
- In causative clauses (both bare and periphrastic causative), the subject is labelled with nsubj:caus or csubj:caus.
- Other non-core arguments and adjuncts are linked to the head word by the following relations: oblique (obl), vocative, dislocated, advcl for clausal dependents. Arguments in the accusative that express spatial or temporal meanings are tagged as
obl
as well.
Nominal dependents
Nominal dependents
- Midifier words are linked to their heads with the relations advmod, amod, and discourse.
- According to UD guidelines, determiners are attached to their head with the det relation.
Function words
- The analytic verb forms are built with the help of auxiliaries aux. In the case of causative clauses, the auxiliary is labelled aux:caus.
- The copula cop is used in the following non-verbal clauses: equational, attributional, locative, possessive, benefactory, existential.
- Subordinate clauses are introduced by marker words mark.
- Adpositions can function as case markers and are then linked to nominals by the case relation.
Other relations
- Other relation tags used in the UD_Classical_Armenian-CAVaL treebanks include: conj, cc, compound and its specialized subtype compound:redup, fixed, flat, orphan, parataxis, punct, root.
Treebanks
There is one Classical Armenian UD treebank: