home edit page issue tracker

This page pertains to UD version 2.

UD for Nepali

Tokenization and Word Segmentation

Morphology

Tags

The Nepali UD treebank uses the following universal POS tags:

The tags SCONJ and SYM do not occur in the current data.

Tag usage

AUX vs. VERB

The distinction between AUX and VERB is based on grammatical function.

DET vs. PRON

The distinction between DET and PRON is syntactic.

Deverbal forms

Participial and converbal forms are normally tagged as VERB when they preserve verbal syntax.

Features

The current Nepali treebank uses the following morphological features:

Only features attested in the current data are listed here.

Nominal and pronominal features

Value Meaning Examples
Nom nominative कर्तव्य, खुशी
Acc accusative/direct object, often morphologically unmarked काम, घ्यू, कर्तव्य, कर्म, सोच (could be marked with लाई in certain situations)
Dat dative मानिसलाई, कसैलाई
Gen genitive कर्तव्यको, हाम्रो, उसको
Loc locative ग्रन्थमा, जंगलमा, गोठमा
Abl ablative/source/comparative अधिकारभन्दा, धनबाट
Ins instrumental विचारले, तरिकाले
Erg ergative/instrumental subject marking मान्छेले, चिन्तकले

Verbal features

Value Meaning Examples
Fin finite verb गर्छन्, हुन्छ,
Part participle गरेको, गरिएको, भएको
Conv converb फर्केर, आएर, खर्चेर
Inf infinitive गर्न, चराउन

Other features

Syntax

Basic clause structure

Nepali is a head-final language. The normal constituent order is SOV, and the finite verbal predicate or verbal complex normally occurs at the end of the clause.

Core arguments

Subjects (nsubj)

Canonical subjects are annotated as nsubj. They may be unmarked nominative NPs or case-marked NPs.

Direct objects (obj)

Direct objects are annotated as obj when they are the core patient/theme argument of a lexical verb.

Indirect objects (iobj)

Dative or recipient-like arguments are annotated as iobj when they are core arguments of the predicate.

Obliques (obl)

Non-core arguments are annotated as obl. These include locative, ablative, instrumental, temporal, source, manner and other adverbial nominals.

Copula and nonverbal clauses

In nonverbal clauses with nominal or adjectival predicates, the lexical predicate is the syntactic head and the copular auxiliary is attached with cop.

In verbal predicates, auxiliary forms such as and छन् attach to the lexical verb with aux.

Relational constructions

Participial and relative clause modifiers

Participial clauses modifying a noun are attached with acl.

Converbs and clause chaining

Converb forms are annotated as adverbial clauses with advcl.

Coordination and parataxis

Coordinating conjunctions such as , वा, and अनि are annotated as cc, and the non-initial conjunct is attached with conj.

Loosely sequenced main clauses, especially in narrative passages, may be connected with parataxis when they are not clear cases of coordination or subordination.

Discourse and special relations

Discourse particles such as , नै, रे, केरे, चाहिँ, and पनि are attached with discourse.

The following subtyped relations occur in the current Nepali data:

Other important relations used in the data include advcl, advmod, amod, appos, aux, case, cc, ccomp, compound, conj, cop, dep, det, discourse, dislocated, iobj, nmod, nsubj, nummod, obj, obl, parataxis, punct, reparandum, root, and xcomp.

Treebanks

There is 1 Nepali UD treebank: