UD for Naga 
Tokenization and Word Segmentation
-
Words in Suansu are generally delimited by whitespace or punctuation. Exceptions:
-
Multiword tokens occur in the case of clitics. The nominalizer di is written after the verb complex without whitespace. For example, mazokwoan ngammedi (“think.arrive + be able to + NMLZ = memory”).
-
No words with internal whitespace appear in the current data.
-
Morphology
Tags
- Suansu employs 16 universal POS tags. The
SYM
category does not appear in the current dataset.
Adjectives
Definition: Adjectives are words that typically modify nouns and specify their properties or attributes. In Suansu, most adjectives are verbs. In Suansu, they follow nouns, for example Peter za tra mohn mule (“Peter likes hot tea”) and they can be used predicatively, where they behave like verbs, carrying tense and aspect values (za trale, “the tea is hot”).
Suansu has no gender (see Features) and case and number markers occur at the end of the NP.
Adverbs
Definition: Adverbs are words that typically modify verbs for such categories as place, time, or manner.
Suansu examples:
- hano “here”
- athwenan “now”
- daichu “only”
Interjections
Definition: An interjection is a word that is used most often as an exclamation or part of an exclamation. It typically expresses an emotional reaction, is not syntactically related to other accompanying expressions, and may include a combination of sounds not otherwise found in the language.
Examples include borrowing like ok “ok”, ay “yes”, wi and dinan, expressing frustration in the context of the respective clauses, and ugh “ugh”.
Nouns
Definition: Nouns are a part of speech typically denoting a person, place, thing, animal or idea.
The NOUN
tag is intended for common nouns only. See PROPN for proper nouns and PRON for pronouns.
Gender is not a category in Suansu. There are remnants of gender specification on two nouns, baneo “boy” and leneo “girl”. Given the exceptionality of these forms, we code the tokens as two distinct lemmas without including gender marking.
Nouns inflect for Number (zero marked singular and overt plural) and Case.
Nominalization is a very productive process in Suansu, cf. runghaphadi “say.PST-PL-NMLZ” (the said things)
Verbs
Definition: A verb is a member of the syntactic class of words that typically signal events and actions, can constitute a minimal predicate in a clause, and govern the number and types of other constituents which may occur in the clause.
In the Suansu treebank we distinguish between content verbs and auxiliaries AUX. Suansu verbs do not take person nor number agreement: A/Bu thale “I/They know”.
See Verbal Features for detailed information on verbal categories.
Particles (PART)
- Negative particles, used for emphatic negation (khama) and tag questions and negative answers (garhe)
- Discourse particle: lagu (marks pragmatic force)
- Reportative particles: re, reha (used after quotes and labels)
Auxiliaries (AUX)
Suansu has eight non-verbal auxiliaries:
- Evidential markers: gu (first-hand), ga (non-first-hand)
- Imperative markers: dai, ra
- Hortative marker: diga
- Interrogative markers: dima, la
- Obligatory modality marker: geraha
The only verbal auxiliary is la “be”, which is used to express progressive with simultaneous converbs:
- Peter Mariadi nungganan lale “Peter is fighting with Maria” – [progressive]
Determiners vs. Pronouns
- The DET tag applies to words functioning as determiners, including demonstratives (hai, tye), indefinites (kwehn, za), total quantifiers (mazohm), reflexives (drekhalai, khalailehnda), and interrogatives (mwe, gare).
- The PRON tag is reserved for words serving as the head of a noun phrase, including personal, demonstrative, indefinite, total, and interrogative pronouns.
(De)verbal Forms
Suansu distinguishes four main (de)verbal forms based on the VerbForm feature:
- Finite verbs (
Fin
), tagged as VERB or AUX - Infinitives (
Inf
), tagged as VERB or AUX - Converbs (
Conv
), tagged as VERB or AUX - Verbal nouns (
Vnoun
), tagged as NOUN
Features
Nominal Features
- Nominals (NOUN, PRON, PROPN) carry Number (
Sing
,Plur
), and can carry Case (Abl
,Agn
,Dat
,DatAgn
,Gen
,GenAbl
,Loc
,Top
) and Definite (Def
). - ADJ and NUM inherit nominal features of the whole noun phrase.
- DET gets nominal features when it is the final element in a noun phrase.
Verbal Features
- VERB and verbal auxiliary la “be” (AUX) may have features such as Aspect (
Imp
,Perf
,Sim
), Modal (Abil
,Perm
,Poss
), Mood (Imp
,Ind
), and Tense (Past
,Pres
). - All verbal features also appear on ADP and ADV in phrasal verb particles.
- Mood auxiliaries have Mood (
Des
,Jus
,Hort
,Imp
,Int
,Irr
) feature. -
The evidential auxiliaries gu and ga use the Evident (
Fh
,Nfh
) feature. - The obligatory modal auxiliary geraha has the Modal (
Obl
) feature.
Pronouns, Determiners, and Quantifiers
- PronType (
Dem
,Ind
,Int
,Prs
,Tot
) is used with PRON, DET, and ADV. - Personal pronouns have the Person feature in addition to Case, Definite, and Number.
- Demonstratives have Deixis (
Prox
,Remt
). - Reflex is used with reflexive DET
Other Features
- Degree (
Cmp
,Pos
) is used with ADJ. - Polarity (
Neg
) is used on negative PART and INTJ and on the last word in the clause (VERB, AUX, etc.). Polarity (Pos
) is used on positive INTJ (e.g., ay “yes”). - The Foreign feature is applied to foreign words tagged as X.
- The following universal features are currently not used in Suansu: Animacy, Clusivity, DeixisRef, ExtPos, Gender, NounClass, Polite, Poss, Typo, Voice.
Syntax
Core Arguments, Oblique Arguments, and Adjuncts
-
A nominal subject (nsubj) is a noun phrase, typically in the first position of a clause. It can have an Agent case (-nan), a Topic case (-di), or no case marking, and it does not have a postposition.
-
Intransitive predicates usually have subjects with the Topic case, though it can be omitted.
-
Transitive predicates usually have subjects with the Agent case, though it can be omitted.
-
A finite subordinate clause can serve as the subject and is labeled csubj.
-
-
For transitive predicates, the other argument (the one that is not the subject) is the direct object (obj). It is usually in the second position of the clause, has a Topic case or no case marking, and does not have a postposition.
-
Indirect nominal objects (iobj) of ditransitive predicates usually have the Dative case (-la), though it can be omitted.
-
Adjuncts are either postpositional phrases or bare nominals with cases other than Agent, Topic, or Dative, or with omitted case marking. They are labeled obl.
Non-verbal Clauses
Nominalized clauses
Relations Overview
-
The following relation subtypes are used in Suansu:
- acl:relc - relative clause modifier
- advmod:emph - emphasizing word, intensifier
- compound:prt - phrasal verb particle
- compound:svc - serial verb compounds
- csubj:outer - outer clause clausal subject
- flat:foreign - foreign words
- flat:name - multiword names
- nmod:poss - possessive nominal modifier
- nsubj:outer - outer clause nominal subject
-
The following relation types are not currently used in Suansu: clf, cop, dep, dislocated, expl, fixed, goeswith, list.
Treebanks
There i one Naga UD treebanks: