home edit page issue tracker

This page pertains to UD version 2.

UD for Naija

This UD treebank is built from the transcription of audio recordings made in 2017 for the ANR project NaijaSyncor. This oral corpus is characterised by occasional codeswitching to English as well as several native Nigerian languages including Yoruba, Hausa, and Igbo. Sections codeswitched to English have been annotated following the UD English conventions. The following contains a number of elements concerning the Naija sections.

Tokenization and Word Segmentation





The annotation is carried out in SUD (Surface-Syntactic Universal Dependencies) with an automatic translation into UD. SUD uses the same tagset and features as UD but has a different distribution-based dependency structure. For more information, see SUD guidelines and Naija SUD page.

In UD data: