UD for Classical Armenian
Tokenization and Word Segmentation
- Words are generally delimited by whitespace or punctuation. No tokens in the UD Classical Armenian treebank contains whitespace.
- Most punctuation marks are attached to the preceding word and are tokenized as separate tokens.
- Words, containing “infixed” punctuation (e.g. question, exclamation, emphasis and abbreviation marks), as զիա՞րդ = զիարդ/ziard + ՞ ‘why?’, are treated as multiword tokens and segmented to individual syntactic words.
- According to typographical rules, the following words are attached to a neighbouring word:
- proclitic prepositions յ=/y=, ց=/cʽ= and զ=/z=;
- a proclic determinative particle զ=/z=;
- a proclic negation particle չ=/čʽ=;
- enclitic determinative particles =ս/=s, =դ/=d, =ն/=n.
Sentence splitting
- A full sentence is usually concluded by the punctuation sign verjaket [ ։ ] corresponding to the English period. In case of longer sentences, the editor of a digital text may decide to split a sentence after the punctuation signs mijaket [ . ], boot [ ՝ ] or storaket [ , ], corresponding to the English colon, semicolon, and comma, respectively.
Morphology
Tags
This is an overview only. For more detailed discussion and examples, see the list of Classical Armenian POS tags.
- Classical Armenian currently uses 16 UPOS tags; the tag SYM does not occur in the UD_Classical_Armenian-CAVaL treebank.
- The complete list of Classical Armenian words, which must be tagged PART in UD, has to be worked out. At present, the tag is used restrictively and is applied to four lexemes:
- contrasting particle: իսկ/isk;
- dubitation particle: գուցէ/gowcʽē;
- negation particles: ոչ/očʽ (with its proclitic variant չ=/čʽ=) and մի/mi.
- The tag DET is used for articles and adjectival pronouns with a determiner function. Pronominal quantifiers (which the traditional grammar includes in pronouns) are DET as well. The tag PRON is reserved for pronouns occurring as the head of a noun phrase.
- The Classical Armenian auxiliaries (tagged AUX) include: եմ/em (‘to be’), its perfective counterpart լինիմ/linim (‘to become’), չիք/čʽikʽ (‘there is no’), and տամ (‘to give’).
The auxiliaries եմ and լինիմ are used in the following constructions:
- The copula with non-verbal predicates, including predicates of location.
- Periphrastic past tenses (present form of եմ + past participle, imperfect form of եմ + past participle, aorist form of լինիմ + past participle of the main verb).
- Periphrastic future/subjunctive tenses (present subjunctive form of եմ + past participle, present subjunctive form of լինիմ + past participle, aorist subjunctive form of լինիմ + past participle of the main verb). The auxiliary չիք is used as a negated copula. The auxiliary տամ is used to form periphrastc causative:
- Periphrastic causative (any form of տամ, including periphrastic forms, + infinitve of the main verb).
- Besides եմ, լինիմ and տամ, the verbs կամ (‘to stand, exist’) and ունիմ (‘to have’) occasionally function as auxiliaries.
Nominal Features
- Number has two values:
Sing
andPlur
. The following parts of speech inflect for number: NOUN, PROPN, PRON, as well as the finite forms of VERB and AUX.- Classical Armenian has numerous pluralia tantum nouns, the plural form of which expresses a single entity or abstract notion, cf. ապարանք/aparankʽ ‘palace’, երեսք/ereskʽ ‘face’, բարիք/barikʽ ‘goodness’, etc.
- Case has seven values:
Nom
,Acc
,Gen
,Dat
,Abl
,Ins
,Loc
. It occurs with NOUN, PROPN, NUM, PRON, DET, ADJ, as well as with participles and verbal nouns, tagged VERB or AUX. - NumType is used with numerals (NUM) and adjectives (ADJ)
Pronouns, Determiners, Quantifiers
- PronType is used with pronouns (PRON), determiners (DET), adverbs (ADV) and deictic interjections (INTJ).
- Poss marks possessive personal determiners (e.g. իմ/im ‘my’, իւր/iwr ‘his/her own’).
- Reflex marks reflexive pronoun իւր/iwr (gen.sg.) ‘of him/her-self’ and determiner իւր/iwr (nom.sg.), իւրոյ/iwroy ‘his/her own’.
- Animacy has two values
Anim
andInan
and can be lexically expressed in PRON, e.g. ոմն/omn ‘somebody’, զինչ/zinčʽ ‘what’. - Person is lexically expressed in personal pronouns (PRON). Only the first and second person pronouns are marked with the values
1
and2
, respectively. The third person pronoun նա/na ‘(s)he, it’ coincides with the demonstrative նա/na ‘that’ and is left unmarked. The same applies to the possessive determiners. - Definite has two values
Def
andSpec
can be lexically expressed in DET, e.g. զ=/z= (nota accusativi), մի/mi ‘a certain’.
Verbal Features
- VerbForm distinguishes five main (de)verbal forms. Although the verbal noun functions as a nominal and the past participle can be used adjectivally, they are consistently tagged VERB or AUX.
- Person has three values, which mark the person of the verb’s subject on verbs. Classical Armenian is a pro-drop language and a personal pronoun as subject is often omitted.
- Aspect has two values,
Imp
andPerf
. The aspect is defined in purely morphological terms based on the type of the verb stem, from which a verb form is derived. The aspectual semantics expressed by either of the two types of forms may not match the formal aspect. - Finite verbs always have one of three values of Mood:
Ind
,Sub
, orImp
. - In the indicative mood, verbs always have one of the two values of Tense:
Pres
orPast
, which, in combination with the aforementioned aspectual values, define the three synthetic tenses, the Present, the Aorist, and the Imperfect. Sub
defines the Subjunctve mood, which is also used to express the Future and combines with the two aspectual values.Imp
defines the imperative, derived from a perfective stem, and the prohibitive, derived from an imperfective stem and obligatorily combined with a prohibitive particle մի/mi.- Voice has four values,
Act
,Pass
,Cau
, andCauPass
. These values can be expressed by the inflections and a causative suffix. The valuesAct
andPass
characterise the oppositional inflectional voice, which is expressed only in part of the verbal paradigm. Some forms, such as the present indicative forms of the a-conjugation (գնամ/gnam ‘I go’) are underspecified for oppositional voice. ThePass
value defines to a wide range of valency-decreasing alternations including the passive, middle, reflexive, etc. The valuesCau
andCauPass
are reserved for derived causatives and correspond to theAct
andPass
values of base verbs, respectively. - Polarity feature with its
Polarity=Neg
value applies primarily to verbs (VERB, AUX) that can be negated using ոչ/očʽ (with its proclitic variant չ=/čʽ=) or a prohibitive particle մի/mi. The particle ոչ can also modify pronouns.
Other Features
- The following universal features are not used in Classical Armenian: Clusivity, Evident, Gender, NounClass, Polite.
Syntax
This is an overview only.
Core Arguments, Oblique Arguments and Adjuncts
- Nominal subject (
nsubj
) is a noun phrase (possibly headed by a deverbal nominal) typically in the nominative case, without preposition.- In the periphrastic past tenses, the subject of transitive verbs is typically coded by the genitive case.
- Clausal subjects (
csubj
) are typically expressed by finate clauses, and clauses headed by infinitives or nonverbal predicates.
- Objects (
obj
) are noun phrases in the accusative, which can take the proclic determinate object marker զ=/z=. - Secondary objects (
iobj
) are expressed by bare noun phrases in the dative. - All other arguments and adjuncts are oblique
obl
. Arguments in the accusative that express spatial or temporal meanings are tagged asobl
as well. - The infinitive complement is typically labeled
xcomp
. - In passive clauses:
- the subject is labeled
nsubj:pass
. - if the agent is present, it is typically expressed by an adpositional ablative noun phrase and is labeled
obl:agent
.
- the subject is labeled
- In causative clauses (both bare and periphrastic causative):
- the subject is labeled with
nsubj:caus
orcsubj:caus
. - The auxiliary verb in periphrastic causative is labeled
aux:caus
.
- the subject is labeled with
Non-verbal Clauses
- The copula (եմ/լինիմ) and it’s negated variant (չիք) are used in the following non-verbal clauses:
- equational (ես եմ Գաբրիէղ / es em Gabriēł “I am Gabriel”, Luke 1:19)
- attributional (նա քաղցր է ի վերայ չարաց / na k῾ałc῾r ē i veray č῾arac῾ “he is kind unto the evil”, Luke 6:35; չիք ինչ ծածուկ. որ ոչ յայտ լիցի / č῾ik῾ inč῾ cacowk, or oč῾ yayt lic῾i “for nothing is secret that shall not be made manifest”, Luke 8:17)
- locative (եւ էր յանապատս մինչեւ յաւր երեւելոյն նորա Իսրայեղի / ew ēr yanapats minč῾ew yawr ereweloyn nora Israyełi “and was in the deserts till the day of his shewing unto Israel”, Luke 1:80)
- possessive (եւ նորա էր քոյր մի որում անուն էր Մարիամ / ew nora ēr k῾oyr mi orowm anown ēr Mariam “and she had a sister called Mary”, Luke 10:39)
- benefactory (եւ սպանցէ զնոխազն զվասն մեղաց, որ վասն ժողովրդեանն իցէ / ew spanc῾ē znoxazn zvasn mełac῾, or vasn žołovrdeann ic῾ē “then he shall slaughter the goat of the sin offering, which is for the people”, Lev. 16:15)
- existential (եւ անդ էր Աննա մարգարէ / ew and ēr Anna margarē “and there was one Anna, a prophetess”, Luke 2:36)
Relations Overview
- The following relation subtypes are used in Classical Armenian:
nsubj:pass
for nominal subjects of passive verbsnsubj:caus
for nominal subjects of causative verbscsubj:caus
for clausal subjects of causative verbsobl:agent
for agents of passive verbsaux:caus
for auxiliaries of periphrastic causativesacl:relcl
for relative clauses
Treebanks
There is one Classical Armenian UD treebank: