home edit page issue tracker

This page pertains to UD version 2.

UD for Gujarati

Gujarati is an Indo-Aryan language originating from the western Indian state of Gujarat. The language is widely spoken by over 56 million speakers and is one of the 22 languages with official status in India. Yet, the Gujarati Computational Linguistics community is still in its infancy. Earlier literature classifies Gujarati in the “Scraping-Bys” category (category 1) in their taxonomy indicating a scant availability of labeled datasets.

Tokenization and Word Segmentation

Morphology

Gujarati morphology is agglutinative and has a rich system of inflectional and derivational morphology. The language has a complex system of verb conjugation, noun declension, and postpositions.

Tags

Features

Syntax

Standard dependency relations are used, except for clf which is not used in Gujarati.

Treebanks

There is 1 Gujarati UD treebank: