UD Naga Suansu
Language: Naga (code: nmf)
Family: Sino-Tibetan
This treebank has been part of Universal Dependencies since the UD v2.16 release.
The following people have contributed to making this treebank part of UD: Jessica K. Ivani, Kira Tulchynska.
Repository: UD_Naga-Suansu
Search this treebank on-line: PML-TQ
Download all treebanks: UD 2.17
License: CC BY-SA 4.0
Genre: fiction, grammar-examples
Questions, comments? General annotation questions (either Naga-specific or cross-linguistic) can be raised in the main UD issue tracker. You can report bugs in this treebank in the treebank-specific issue tracker on Github. If you want to collaborate, please contact [jessica • ivani (æt) uzh • ch; kira • tulchynska (æt) mail • huji • ac • il]. Development of the treebank happens directly in the UD repository, so you may submit bug fixes as pull requests against the dev branch.
| Annotation | Source |
|---|---|
| Lemmas | annotated manually |
| UPOS | annotated manually, natively in UD style |
| XPOS | not available |
| Features | annotated manually, natively in UD style |
| Relations | annotated manually, natively in UD style |
Description
UD_Naga-Suansu is a Universal Dependencies (UD) treebank for Suansu (Glottocode: suan1234), an endangered Tibeto-Burman language spoken on the Indo-Myanmar border. The annotation was performed manually based on glosses. This treebank includes texts from fiction and grammar. The treebank contains 3.1k tokens, distributed as follows:
- Training set: 2945 tokens
- Test set: 157 tokens
The UD_Naga-Suansu treebank consists of various texts translated into Suansu by native speakers, then glossed and annotated. The included texts are:
- grammar_Cairo: 20 examples from the [Cairo Cicling Corpus](https://github.com/UniversalDependencies/cairo/blob/master/translations.txt)
- grammar_BivalTyp: BivalTyp dataset translated to Suansu.
- grammar_ValPal: ValPal dataset dataset translated to Suansu.
- film_Bridge: Begining of the subtitles from Bridge of Spies (2015).
Genre Classification
- Fiction: Sentence IDs start with film.
- Grammar: Sentence IDs start with grammar.
Data Splits
- Training set: Full grammar_Cairo
- Test set: Full grammar_BivalTyp, grammar_ValPal, and film_Bridge
Acknowledgments
This work was supported by the University of Zurich Global Strategy and Partnerships Funding Scheme (Project Fund Level 3) (https://www.global.uzh.ch). We gratefully acknowledge the Suansu-speaking community for their continuous support. We also thank Jason M. Vashum for his generous assistance with translation and annotation.
References
- Say, Sergey (ed.). 2020-. BivalTyp: Typological database of bivalent verbs and their encoding frames. (Available online at https://www.bivaltyp.info, Accessed on 1 April 2025.)
- Hartmann, Iren & Haspelmath, Martin & Taylor, Bradley (eds.) 2013. Valency Patterns Leipzig. Leipzig: Max Planck Institute for Evolutionary Anthropology. (Available online at https://valpal.info, Accessed on 2025-04-01.)
Statistics of UD Naga Suansu
POS Tags
ADJ – ADP – ADV – AUX – CCONJ – DET – INTJ – NOUN – NUM – PART – PRON – PROPN – PUNCT – SCONJ – VERB – X
Features
Abbr – Aspect – Case – Degree – Deixis – Evident – ExtPos – Foreign – Modal – Mood – Number – NumForm – NumType – Person – Polarity – PronType – Tense – VerbForm
Relations
acl – acl:relcl – advcl – advmod – advmod:emph – amod – appos – aux – case – cc – ccomp – compound – compound:prt – compound:svc – conj – cop – csubj – csubj:outer – det – discourse – fixed – flat – flat:foreign – flat:name – iobj – mark – nmod – nmod:poss – nsubj – nsubj:outer – nsubj:pass – nummod – obj – obl – orphan – parataxis – punct – reparandum – root – vocative – xcomp
Tokenization and Word Segmentation
- This corpus contains 584 sentences, 3098 tokens and 3123 syntactic words.
- This corpus contains 387 tokens (12%) that are not followed by a space.
- This corpus does not contain words with spaces.
- This corpus contains 3 types of words that contain both letters and punctuation. Examples: Mr., Mmm-mmm, a.m.-va
- This corpus contains 25 multi-word tokens. On average, one multi-word token consists of 2.00 syntactic words.
- There are 16 types of multi-word tokens. Examples: ladi, redi, ngammedi, lamiszudi, mathammedi, mazohndi, mokrwedi, nuedi, phanungedi, puimadi, redime, runghaphadi, thedi, theszyuiamadi, thungmididi, wienahndi.
Morphology
Tags
- This corpus uses 16 UPOS tags out of 17 possible: ADJ, ADP, ADV, AUX, CCONJ, DET, INTJ, NOUN, NUM, PART, PRON, PROPN, PUNCT, SCONJ, VERB, X
- This corpus does not use the following tags: SYM
- This corpus contains 14 word types tagged as particles (PART): dinan, gala, garhe, garhema, khama, la, lagu, re, reganan, reha, rehate, rema, remale, zahai
- This corpus contains 14 lemmas tagged as pronouns (PRON): a, ba, bu, chatha, didi, hadi, hai, mazohn, mwe, mwethe, na, nahn, thuza, tye
- This corpus contains 7 lemmas tagged as determiners (DET): gare, hai, mazohn, mwe, rigatratrahn, tye, za
- Out of the above, 4 lemmas occurred sometimes as PRON and sometimes as DET: hai, mazohn, mwe, tye
- This corpus contains 13 lemmas tagged as auxiliaries (AUX): dai, diga, dila, dima, e, ga, geraha, gu, la, ra, raga, rahn, tha
- Out of the above, 3 lemmas occurred sometimes as AUX and sometimes as VERB: la, ra, tha
- There are 4 (de)verbal forms:
- Conv
- PART: reganan
- VERB: rungganan, theganan, yoanganan, bahnganan, laganan, lwaganan, malungphzganan, nyamganan, phanungganan, ruganan
- Fin
- ADP: dhohnte, zwehnne
- ADV: lhia
- AUX: lale, laha, lala, lalate, laia, lalama, late
- PART: remale
- VERB: dhohnte, nue, reha, wile, kanale, manungle, rae, rue, runge, rungha
- Inf
- ADP: thohn
- ADV: rai, kai, chaszuma
- AUX: la, lama
- VERB: la, the, rung, yoan, chari, mu, thungmi, dhohn, rike, ru
- Vnoun
- NOUN: runge, theyikke, phehnne, rungedi, runghapha, Tramahnne, dhohnphala, huppe, kathamme, laithiedi
Nominal Features
- Plur
- NOUN: trahnpha, baneopha, neopha, thathokpha, Duhpha, Katrahnpha, Offerpha, Russianla, bikpha, chokeypha
- NOUN-Vnoun: runghapha, dhohnphala
- PRON: hai, bu, Bunan, Na, banan, haidi, hainan, Budi, buva, haibyahn
- PROPN: Rosenbergwidi
- Sing
- ADJ: canned
- NOUN: miszu, neo, baneo, insurance, leneo, rhui, puirawi, Szukhyate, bya, sir
- NOUN-Vnoun: runge, theyikke, phehnne, rungedi, Tramahnne, huppe, kathamme, laithiedi, matrahnne, nungae
- PRON: a, nahn, ba, bava, ava, Anan, badi, didi, adi, nahndi
- PROPN: Peter, Mariadi, Jim, Maria, Peternan, Donovan, Mary, Bar, Bob, Doug
- Abl
- ADJ: mayingeda
- NOUN: lehnda, thungda, rhuida, anohnda, kumda, makhwada, Ada, actsda, blanketda, capitalda
- NOUN-Vnoun: rueda, ruweda
- NUM: 1941da
- PRON: haida, Mwetheda
- PROPN: Nurembergda, Parisda
- Ben
- ADJ: kaphebyahn
- NOUN: governmentbyahn, lhaibyahn, rungebyahn, yurbyahn
- NOUN-Vnoun: rungebyahn
- PRON: babyahn, haibyahn, nahnbyahn
- PROPN: Abelbyahn
- Dat
- ADJ: puidungrela
- NOUN: spyla, baszuela, dhohnphala, neola, neolewila, neophala, puirawila, wizala
- NOUN-Vnoun: dhohnphala
- PRON: Ala, nahnla, Thuzala
- PROPN: Mariala, Alisonla, Unionla
- DatErg
- PRON: Alanan
- Erg
- NOUN: leneonan, neonan, Nahnnan, baneonan, miszunan, Ainnan, Anan, Huinan, Hwehnnan, Thamoknan
- NUM: skanan
- PRON: Anan, nahnnan, Bunan, banan, mazohnnan, thuzanan, hainan, didinan
- PROPN: Peternan, Associationnan, Committeenan, Jessicanan, Marianan, Natalienan
- ErgTop
- NOUN: baneonandi, neonandi
- Gen
- NOUN: miszuva, theva, a.m.-va, atomicva, caseva, companyva, courtva, huiva, neobeva, neova
- NUM: skava
- PRON: bava, ava, nahnva, haiva, nava, Nahnvai, buva, haivai
- PROPN: Donovanva, Franceva, Investigationva, Peterva
- GenAbl
- NOUN: baswevada, clientvada, thaivada
- PRON: nahnvada
- PROPN: Petervada
- GenTop
- NOUN: suivadi
- PRON: Avadi, nahnvadi
- Loc
- ADJ: criminalnahn
- ADP: athoongenahn, mathaknahn, rinahn, thrinahn
- NOUN: clientnahn, rhuinahn, gaenahn, maganahn, maramnahn, marketnahn, Airportnahn, Biknahn, CIAnahn, Dukannahn
- NOUN-Vnoun: phethenahn, wienahn
- NUM: phangenahn, 19nahn
- PRON: anahn, Nahnnahn, hainahn, nahn, nahnla, thuzanahn
- PROPN: Earlnahn, CIAnahn, Jasonnahn, Marianahn, Predentialnahn, Unionnahn
- LocTop
- NOUN: desknahndi, rungenahndi
- NOUN-Vnoun: rungenahndi
- NUM: phangenahndi
- Top
- ADJ: kathadi, makhadi, mazohndi, yahndi, zazudi
- NOUN: neodi, baneodi, kelasdi, rhuidi, miszudi, rungedi, badi, jehndi, letterdi, lhaidi
- NOUN-Vnoun: rungedi, laithiedi
- NUM: skadi
- PRON: badi, adi, nahndi, thuzadi, haidi, Budi, Chathadi, mazohndi
- PROPN: Mariadi, Abeldi, Browndi, Iguazudi, Lynndi, Peterdi, Rosenbergwidi, Shinndi, Smithdi, Streetdi
- SCONJ: didi
Degree and Polarity
- Cmp
- ADV: szumahnnan
- VERB-Fin: szumahnle
- VERB-Inf: szumahn
- Pos
- ADJ: Szuate, katha, next, thokke, American, Atra, criminal, drungle, gaswe, gaswwe
- VERB-Conv: nungaikhama
- VERB-Inf: phehnszuma
- Neg
- ADJ: nungaima, rungma
- ADV: prye, rema, chaszuma
- ADV-Inf: chaszuma
- AUX: gama, lama, lalama
- AUX-Fin: lalama
- AUX-Inf: lama
- INTJ: garhe, ma, me
- NOUN: miszuma
- NUM: Phangema
- PART: garhe, rema, khama, garhema
- SCONJ: dime
- VERB-Conv: ngammaganan, nungaikhama, rehama, szammaganan, thengammaganan, yinglamaganan
- VERB-Fin: themate, katomszumale, keikapmate, mumate, nungaimate, rumale, theate, thekhamate, thelate, thema
- VERB-Inf: phabtama, thaima, thema, lama, thaithema, muma, rema, rungma, Kajahnma, Thekhamzama
- Pos
- INTJ: ame, ay
Verbal Features
- Imp
- ADP: thai, zwehnne
- ADP-Fin: zwehnne
- ADV-Fin: lhia
- AUX: lale, rahn, laha, laia
- AUX-Fin: lale, laha, laia
- NOUN: runge, theyikke, rungedi, phehnne, runghapha, Tramahnne, dhohnphala, huppe, kathamme, laithiedi
- NOUN-Vnoun: runge, theyikke, phehnne, rungedi, runghapha, Tramahnne, dhohnphala, huppe, kathamme, laithiedi
- PART-Fin: remale
- VERB-Fin: nue, reha, wile, kanale, manungle, rae, rue, runge, rungha, thaile
- Perf
- ADP-Fin: dhohnte
- AUX: rahnte, lalate, late
- AUX-Fin: lalate, late
- NOUN-Vnoun: szumahnne
- PART: rehate
- VERB-Conv: nungaikhama, rehama
- VERB-Fin: dhohnte, themate, theate, mathilate, mathunglate, myate, puiate, ruate, sahnte, thate
- Prog
- VERB-Conv: rungganan, theganan, yoanganan, bahnganan, malungphzganan, nyamganan, phanungganan, ruganan, thyiganan, Nueganan
- Des
- AUX: tha
- Hort
- AUX: ga, diga, raga
- Imp
- AUX: ra
- VERB-Fin: mia, shamma
- Ind
- ADP-Fin: dhohnte, zwehnne
- ADV-Fin: lhia
- AUX-Fin: lale, laha, lalate, laia, late
- PART-Fin: remale
- VERB-Fin: dhohnte, nue, reha, wile, kanale, manungle, rae, rue, runge, rungha
- Int
- AUX: la, dila, lala, dima, lalama
- AUX-Fin: lala, lalama
- Irr
- AUX: rahn, rahnte
- Jus
- AUX: dai, gama
- Past
- ADP: dhohnte, thai, zwehnne
- ADP-Fin: dhohnte, zwehnne
- ADV-Fin: lhia
- AUX-Fin: lalate, laia, late
- VERB-Fin: dhohnte, nue, rae, rue, runge, themate, thea, theate, bahnne, makhe
- Pqp
- AUX-Fin: laha
- NOUN-Vnoun: runghapha
- PART: reha, rehate
- VERB-Fin: reha, rungha, choklaha, chaha, jaazakha, kaprahate, khayeaha, khurumha, lahate, lailehnmaha
- Pres
- AUX-Fin: lale
- PART-Fin: remale
- VERB-Fin: wile, kanale, manungle, thaile, chule, matrehnle, phabtale, rele, thele, woanle
- Fh
- AUX: gu
- Nfh
- AUX: ga
Pronouns, Determiners, Quantifiers
- Dem
- ADV: hano, dino
- DET: tye, hai
- PRON: didi, tye, hadi, Avadi, Haida, didinan, nahnvadi
- Ind
- DET: rigatratrahn, za
- PRON: Chathadi
- Int
- ADV: gare, mwetheda, Reda, kanahn, kukunahn, zapzare
- DET: Gare, mwe
- PRON: mwe, thuzadi, thuzanan, thuza, Mwetheda, Thuzala, thuzanahn
- Prs
- PRON: a, nahn, ba, bava, ava, Anan, badi, hai, adi, nahndi
- Tot
- DET: mazohn
- PRON: mazohn, mazohnnan, mazohndi
- Card
- NUM: ska, phange, phangenahn, $100,000, 19, 1941da, 19nahn, 3, 9:00, Phangema
- 1
- PRON: a, ava, Anan, hai, adi, bava, anahn, haidi, Ala, banan
- 2
- PRON: nahn, nahndi, nahnnan, nahnla, Na, nahnbyahn, nahnva, Nahnnahn, Nahnvai, nahnvada
- 3
- PRON: ba, bava, badi, bu, Bunan, babyahn, nahnva, Banan, Budi, buva
Other Features
- Abbr
- Yes
- NOUN: CIAnahn
- PROPN: CIAnahn
- Yes
- Deixis
- Prox
- ADV: hano
- DET: hai
- PRON: hadi, Haida
- Remt
- ADV: dino
- DET: tye
- PRON: didi, tye, didinan
- Prox
- ExtPos
- ADV
- NUM: ska
- VERB
- INTJ: ay
- ADV
- Foreign
- Yes
- X: ,, Cowan, Donovan, Visitors, Watters, and
- Yes
- Modal
- Abil
- NOUN-Vnoun: themingame, thengamme
- VERB-Inf: rungamma, woanngam
- Obl
- AUX: geraha
- Perm
- ADV-Inf: chaszuma
- VERB-Inf: kapralaszuma, phehnszuma
- Abil
- NumForm
- Digit
- NUM: 19, 1941da, 19nahn
- Word
- NUM: ska, phange, phangenahn, $100,000, 3, 9:00, khanika, lhohnphange, phangenahndi, skadi
- Digit
Syntax
Auxiliary Verbs and Copula
- This corpus uses 1 lemmas as copulas (cop). Examples: e.
- This corpus uses 12 lemmas as auxiliaries (aux). Examples: la, rahn, dai, ga, geraha, gu, dila, ra, tha, diga, dima, raga.
Core Arguments, Oblique Arguments and Adjuncts
Here we consider only relations between verbs (parent) and nouns or pronouns (child).
- nsubj
- VERB-Conv--NOUN (7)
- VERB-Conv--NOUN-Erg (3)
- VERB-Conv--NOUN-Top (3)
- VERB-Conv--PRON (9)
- VERB-Conv--PRON-Erg (3)
- VERB-Conv--PRON-Top (1)
- VERB-Fin--NOUN (59)
- VERB-Fin--NOUN-Erg (19)
- VERB-Fin--NOUN-ErgTop (2)
- VERB-Fin--NOUN-Top (16)
- VERB-Fin--PRON (36)
- VERB-Fin--PRON-Erg (9)
- VERB-Fin--PRON-Gen (1)
- VERB-Fin--PRON-Top (3)
- VERB-Inf--NOUN (12)
- VERB-Inf--NOUN-Erg (3)
- VERB-Inf--NOUN-Top (1)
- VERB-Inf--PRON (43)
- VERB-Inf--PRON-Erg (8)
- VERB-Inf--PRON-Top (5)
- obj
- VERB-Conv--NOUN (14)
- VERB-Conv--NOUN-Top (6)
- VERB-Conv--PRON (2)
- VERB-Conv--PRON-Top (1)
- VERB-Fin--NOUN (75)
- VERB-Fin--NOUN-Abl (1)
- VERB-Fin--NOUN-Top (56)
- VERB-Fin--PRON (6)
- VERB-Fin--PRON-Erg (3)
- VERB-Fin--PRON-Top (2)
- VERB-Inf--NOUN (32)
- VERB-Inf--NOUN-ADP(re) (1)
- VERB-Inf--NOUN-ADP(thri) (1)
- VERB-Inf--NOUN-Abl (1)
- VERB-Inf--NOUN-Top (17)
- VERB-Inf--PRON (7)
- VERB-Inf--PRON-GenTop (2)
- VERB-Inf--PRON-Top (8)
- iobj
- VERB-Fin--NOUN (3)
- VERB-Fin--NOUN-Dat (5)
- VERB-Fin--NOUN-Loc (1)
- VERB-Fin--PRON-Dat (1)
- VERB-Fin--PRON-Top (1)
- VERB-Inf--NOUN (1)
- VERB-Inf--NOUN-Dat (3)
- VERB-Inf--PRON-Dat (3)
- VERB-Inf--PRON-Loc (1)
Relations Overview
- This corpus uses 10 relation subtypes: acl:relcl, advmod:emph, compound:prt, compound:svc, csubj:outer, flat:foreign, flat:name, nmod:poss, nsubj:outer, nsubj:pass
- The following 6 relation types are not used in this corpus at all: expl, dislocated, clf, list, goeswith, dep