home edit page issue tracker

This page pertains to UD version 2.

UD for Naija

This UD treebank is built from the transcription of audio recordings made in 2017 for the ANR project NaijaSyncor. This oral corpus is characterised by occasional codeswitching to English and native Nigerian languages. Sections codeswitched to English have been annotated following the UD English conventions. Below are a few elements concerning the Naija sections.

Tokenization and Word Segmentation

Morphology

Tags

Features

Syntax

The annotation is done in SUD (Surface-Syntactic Universal Dependencies) with an automatic translation into UD. SUD uses the same tagset and features as UD but has a different distribution-based dependency structure.