home edit page issue tracker

This page pertains to UD version 2.

UD Hebrew IAHLTwiki

Language: Hebrew (code: he)
Family: Afro-Asiatic

This treebank has been part of Universal Dependencies since the UD v2.10 release.

The following people have contributed to making this treebank part of UD: Amir Zeldes, Avner Algom, Noam Ordan, Yifat Ben Moshe, Shira Wigderson.

Repository: UD_Hebrew-IAHLTwiki
Search this treebank on-line: PML-TQ
Download all treebanks: UD 2.15

License: CC BY-SA 4.0

Genre: wiki

Questions, comments? General annotation questions (either Hebrew-specific or cross-linguistic) can be raised in the main UD issue tracker. You can report bugs in this treebank in the treebank-specific issue tracker on Github. If you want to collaborate, please contact [amir • zeldes (æt) georgetown • edu]. Development of the treebank happens directly in the UD repository, so you may submit bug fixes as pull requests against the dev branch.

Annotation Source
Lemmas annotated manually
UPOS annotated manually, natively in UD style
XPOS not available
Features annotated manually, natively in UD style
Relations annotated manually, natively in UD style

Description

Publicly available subset of the IAHLT UD Hebrew Treebank’s Wikipedia section (https://www.iahlt.org/)

The UD Hebrew-IAHLTWiki treebank consists of 5,000 contemporary Hebrew sentences representing a variety of texts originating from Wikipedia entries, compiled by the Israeli Association of Human Language Technology. It includes various text domains, such as: biography, law, finance, health, places, events and miscellaneous. The schema for the UD Hebrew-IAHLT treebank, from which the publicly available UD Hebrew-IAHLTWiki subset is derived, is based on the conversion of the Hebrew Treebank (HTB) into the latest UD V2 and is checked against the Universal Dependencies validator as of UD release V2.10, in addition to a range of additional validations using the grewv tool.

Compatible datasets

The HTB version used in the project was initially converted automatically, then a subset of the converted data was manually validated and adopted as a gold standard for training the model for UD parsing used in Hebrew-IAHLT. The entire parsed data has been manually edited to correct parsing errors, and was automatically QA’ed to apply corrections following updates in the schema. For a fork of UD_Hebrew-HTB (Ha’aretz newswire data) using the same annotation scheme, see:

https://github.com/IAHLT/UD_Hebrew

For an additional UD_Hebrew corpus with the same annotation scheme (spoken parliament proceedings), see:

https://github.com/UniversalDependencies/UD_Hebrew-IAHLTknesset

NER annotations

The data additionally contains nested Named Entity annotations in the IAHLT scheme in the MISC annotation Entity=, illustrated in the following excerpt:

## Acknowledgments

We would like to thank all the people who contributed to this corpus: Amir Zeldes, Hilla Merhav, Israel Landau, Netanel Dahan, Nick Howell, Noam Ordan, Omer Strass, Shira Wigderson, Yael Minerbi, Yifat Ben Moshe

## References

To cite this dataset please refer to the following paper:

Zeldes, Amir, Nick Howell, Noam Ordan and Yifat Ben Moshe (2022) [A Second Wave of UD Hebrew Treebanking and Cross-Domain Parsing](https://arxiv.org/abs/2210.07873). In: *Proceedings of EMNLP 2022*. Abu Dhabi, UAE, 4331-4344.

```bibtex
@InProceedings{ZeldesHowellOrdanBenMoshe2022,
author = {Amir Zeldes and Nick Howell and Noam Ordan and Yifat Ben Moshe},
booktitle = {Proceedings of {EMNLP} 2022},
title = {A Second Wave of {UD} {H}ebrew Treebanking and Cross-Domain Parsing},
year = {2022},
pages = {4331--4344},
address = {Abu Dhabi, UAE},
url = {https://aclanthology.org/2022.emnlp-main.292/},
}

Statistics of UD Hebrew IAHLTwiki

POS Tags

ADJADPADVAUXCCONJDETINTJNOUNNUMPRONPROPNPUNCTSCONJSYMVERBX

Features

AbbrAspectCaseDefiniteForeignGenderHebBinyanMoodNumberNumTypePersonPolarityPossPrefixPronTypeReflexTenseTypoVerbFormVerbTypeVoice

Relations

aclacl:relcladvcladvmodamodapposauxcaseccccompcompoundcompound:affixconjcopcsubjcsubj:outercsubj:passdepdetdiscoursedislocatedexplfixedflatgoeswithlistmarknmodnmod:possnmod:unmarkednsubjnsubj:outernsubj:passnummodobjoblobl:unmarkedorphanparataxispunctreparandumrootvocativexcomp

Tokenization and Word Segmentation

Morphology

Tags

Nominal Features

Degree and Polarity

Verbal Features

Pronouns, Determiners, Quantifiers

Other Features

Syntax

Auxiliary Verbs and Copula

Core Arguments, Oblique Arguments and Adjuncts

Here we consider only relations between verbs (parent) and nouns or pronouns (child).

Verbs with Reflexive Core Objects

Relations Overview