UD Ligurian GLT
Language: Ligurian (code: lij
)
Family: IE
This treebank has been part of Universal Dependencies since the UD v2.9 release.
The following people have contributed to making this treebank part of UD: Stefano Lusito, Jean Maillard.
Repository: UD_Ligurian-GLT
Search this treebank on-line: PML-TQ
Download all treebanks: UD 2.15
License: C-UDA 1.0
Genre: nonfiction, fiction, news, wiki, bible, spoken, grammar-examples
Questions, comments? General annotation questions (either Ligurian-specific or cross-linguistic) can be raised in the main UD issue tracker. You can report bugs in this treebank in the treebank-specific issue tracker on Github. If you want to collaborate, please contact [stefano • lusito (æt) uibk • ac • at]. Development of the treebank happens outside the UD repository. If there are bugs, either the original data source or the conversion procedure must be fixed. Do not submit pull requests against the UD repository.
Annotation | Source |
---|---|
Lemmas | annotated manually |
UPOS | annotated manually, natively in UD style |
XPOS | not available |
Features | annotated manually, natively in UD style |
Relations | annotated manually, natively in UD style |
Description
The Genoese Ligurian Treebank is a small, manually annotated collection of contemporary Ligurian prose. The focus of the treebank is written Genoese, the koiné variety of Ligurian which is associated with today’s literary, journalistic and academic ligurophone sphere.
This dataset represents the first dependency treebank of Ligurian ever collected. The materials included span several genres, and have been extracted from the most varied sources in order to reflect variation in syntax and register.
The largest source of material is the fiction domain, represented by excerpts from three novels by contemporary authors or translators. We also include a news article, a current affairs article, a passage from the Bible, two entries from the Ligurian Wikipedia, a number of example sentences from a grammar book, and the transcript of a short radio broadcast. All these documents make up the test split of the dataset. The training split consists of translations of the sentences from the Cairo CICLing Corpus.
Acknowledgments
We are deeply grateful to the publisher De Ferrari Editore and the editor of the newspaper O Stafî for allowing the computational use of some of their written materials for this treebank.
References
- (citation)
Statistics of UD Ligurian GLT
POS Tags
ADJ – ADP – ADV – AUX – CCONJ – DET – INTJ – NOUN – NUM – PART – PRON – PROPN – PUNCT – SCONJ – VERB – X
Features
Definite – Degree – Foreign – Gender – Mood – Number – NumType – Person – Poss – PronType – Tense – VerbForm
Relations
acl – acl:relcl – advcl – advmod – amod – appos – aux – case – cc – ccomp – conj – cop – csubj – dep – det – discourse – dislocated – expl – expl:impers – expl:pv – fixed – flat – iobj – mark – nmod – nsubj – nummod – obj – obl – orphan – parataxis – punct – root – vocative – xcomp
Tokenization and Word Segmentation
- This corpus contains 316 sentences, 6568 tokens and 6928 syntactic words.
- This corpus contains 1239 tokens (19%) that are not followed by a space.
- This corpus does not contain words with spaces.
- This corpus contains 26 types of words that contain both letters and punctuation. Examples: l', ch', gh', d', m', s', n', t', comm', v', quand', quell', unn', G.B., bell', cös', dell', franco-belga, int', novant', qd., quattr', quest', sott', trent', tutt'
- This corpus contains 355 multi-word tokens. On average, one multi-word token consists of 2.01 syntactic words.
- There are 80 types of multi-word tokens. Examples: do, da, a-o, di, de, into, a-a, inta, co-o, da-o, pe-o, sciô, a-i, da-a, pe-a, co-a, sciâ, co-e, co-i, pe-i, a-e, da-i, inti, scî, dâse, fâghe, inte, Andemmosene, Levite, Vegnîme, allargâve, ammonîlo, anâlo, ascoacciâme, assettilo, aveighe, aveilo, beivime, contentâve, convertîve, convinsive, da-e, desligâghe, desperdilo, deuviâse, dâghe, dâve, dîve, dîvene, figuite.
Morphology
Tags
- This corpus uses 16 UPOS tags out of 17 possible: ADJ, ADP, ADV, AUX, CCONJ, DET, INTJ, NOUN, NUM, PART, PRON, PROPN, PUNCT, SCONJ, VERB, X
- This corpus does not use the following tags: SYM
- This corpus contains 1 word types tagged as particles (PART): l'
- This corpus contains 44 lemmas tagged as pronouns (PRON): atro, che, chi, cöse, donde, ghe, guæi, liatri, lo, lê, lô, me, mi, mæximo, ne, niatri, ninte, nisciun, o, pægio, quante, quarchedun, quarcösa, quello, questo, quæ, sciâ, se, seu, sto, tanto, te, teu, ti, tutto, tò, un, uña, ve, voiatri, voscià, voî, vòstro, ô
- This corpus contains 22 lemmas tagged as determiners (DET): atro, che, ciaschedun, mæ, nisciun, nisciuña, nòstro, o, quarche, quello, questo, seu, sto, sò, tanto, tròppo, tutto, tò, un, vòstro, çerto, ò
- Out of the above, 13 lemmas occurred sometimes as PRON and sometimes as DET: atro, che, nisciun, o, quello, questo, seu, sto, tanto, tutto, tò, un, vòstro
- This corpus contains 8 lemmas tagged as auxiliaries (AUX): avei, dovei, poei, savei, stâ, vegnî, voei, ëse
- Out of the above, 8 lemmas occurred sometimes as AUX and sometimes as VERB: avei, dovei, poei, savei, stâ, vegnî, voei, ëse
- There are 4 (de)verbal forms:
- Fin
- AUX: é, à, ea, son, ò, en, peu, an, ê, poeiva
- VERB: é, ea, à, ò, diva, pâ, aiva, an, fa, mia
- Ger
- VERB: piggiando
- Inf
- AUX: ëse, poei, stâ, avei
- VERB: fâ, dâ, anâ, parlâ, stâ, vedde, beive, dî, mette, pensâ
- Part
- AUX: stæto, dovuo, posciuo, stæti, stæta, vosciuo, avuo, stæte
- VERB: fæto, dæto, dito, andæto, anæto, misso, fæta, stæto, vegnuo, bevuo
Nominal Features
- Fem
- ADJ: antiga, bella, belle, disegnæ, avvoxæ, fresche, lesta, mæxima, primma, Sovietica
- AUX-Part: stæta, stæte
- DET: a, e, unna, l', sta, vòstra, quella, tutta, quelle, tutte
- NOUN: parte, figgia, çittæ, vòtte, cà, avventue, cöse, raxon, arte, coæ
- NUM: doe
- PRON: a, â, uña, quella, Sta, atra, l', quelle, vòstra, le
- PROPN: Zena, Arbâ, Galilea, Milan, Nervi, Foxe, Liguria, Tëxinin, Euröpa, Fransa
- VERB-Part: fæta, vegnua, Interpretâ, ambientâ, anæta, averta, basâ, cegâ, compreisa, consegnâ
- Masc
- ADJ: bello, antigo, cao, mæximo, San, Basso, neuvo, piccin, seguo, Santo
- AUX-Part: stæto, dovuo, posciuo, stæti, vosciuo, avuo
- DET: o, i, un, l', quello, quelli, tutto, tutti, sti, sto
- NOUN: giorno, cheu, gio, paise, òmmo, caxo, inno, mezo, mâ, paixi
- NUM: doî, un
- PRON: o, quello, ô, un, l', quelli, lo, tutti, atri, atro
- PROPN: Gexù, Segnô, Zane, Peter, Scimon, Andria, Aostin, Belgio, Besagno, Françesco
- VERB-Part: fæto, dæto, dito, andæto, anæto, misso, stæto, vegnuo, bevuo, comensou
- Plur
- ADJ: avvoxæ, belle, disegnæ, fresche, pin, pægi, sossi, abbandon֯æ, antighi, appægiæ
- AUX-Fin: son, en, an, sei, aggian, ean, ei, semmo, Emmo, Saian
- AUX-Part: stæti, stæte
- DET: i, e, quelli, tutti, quelle, tutte, sti, tante, atre, ste
- NOUN: vòtte, avventue, cöse, paixi, anni, gente, giorni, pê, stöie, tempi
- PRON: ve, quelli, voî, tutti, atri, se, v', î, tanti, niatri
- PROPN: Trilli, Ferrari
- VERB-Fin: an, van, fan, contëgnan, diei, divan, emmo, stæ, Arrivan, abergan
- VERB-Part: andæti, sciortii, abbrassæ, afferræ, arrivæ, attrovæ, ciantæ, construti, contestualizzæ, decoræ
- Sing
- ADJ: bello, antigo, cao, mæximo, San, Basso, antiga, bella, neuvo, piccin
- AUX-Fin: é, à, ea, ò, peu, son, ê, poeiva, agge, segge
- AUX-Part: stæto, dovuo, posciuo, stæta, vosciuo, avuo
- DET: o, a, un, l', unna, quello, sta, tutto, vòstra, quella
- NOUN: figgia, giorno, parte, çittæ, cheu, cà, gio, paise, òmmo, caxo
- PRON: o, a, ti, me, m', quello, ô, lê, mi, l'
- PROPN: Zena, Gexù, Segnô, Zane, Arbâ, Galilea, Peter, Scimon, Milan, Nervi
- VERB-Fin: é, ea, à, ò, diva, pâ, aiva, fa, mia, penso
- VERB-Part: fæto, dæto, dito, andæto, anæto, misso, fæta, stæto, vegnuo, bevuo
- Def
- DET: o, a, i, e, l'
- Ind
- DET: un, unna, unn', Çerti
Degree and Polarity
- Abs
- ADJ: affeçionatiscimo, braviscima
- Cmp
- ADJ: megio, pezo
- ADV: megio, pezo
Verbal Features
- Cnd
- AUX-Fin: aviæ, porriæ, saieiva, saiæ, dovieiva, porrieivan, porriësci, saieivan, saiva, saviæ
- VERB-Fin: aniësci, arriviësci, aviæ, daiæ, faiæ, frustieiva, saviæ, serviæ, vorriæ, öriæ
- Imp
- AUX-Fin: stanni
- VERB-Fin: mia, ammia, danni, vanni, Appægia, Leva, Metti, Monda, Remescia, Sciuscia
- Ind
- AUX-Fin: é, à, ea, son, ò, en, peu, an, ê, poeiva
- VERB-Fin: é, ea, à, ò, diva, pâ, aiva, an, fa, penso
- Sub
- AUX-Fin: agge, aggian, segge, aggiæ, avesse, poësci, pòsse, sacce, seggian, vegnisse
- VERB-Fin: agge, aise, arreste, ceuve, creddesci, fesse, mange, pigge, piggesse, portesse
- Fut
- AUX-Fin: saià, Saian, aviemo, doviæ, porriò
- VERB-Fin: avià, diei, pensià, aviei, battezzià, dian, mantegnià, mettià, Çerchiò
- Imp
- AUX-Fin: ea, poeiva, ean, saieiva, stava, aiva, aveiva, aveivo, avesse, avieivan
- VERB-Fin: ea, diva, aiva, anava, arriesciva, arrivava, divan, gïava, mostrava, predicava
- Past
- AUX-Part: stæto, dovuo, posciuo, stæti, stæta, vosciuo, avuo, stæte
- VERB-Part: fæto, dæto, dito, andæto, anæto, misso, fæta, stæto, vegnuo, bevuo
- Pres
- AUX-Fin: é, à, son, ò, en, peu, an, ê, sei, agge
- AUX-Inf: ëse, poei, stâ, avei
- VERB-Fin: é, à, ò, pâ, an, fa, mia, penso, sa, sò
- VERB-Ger: piggiando
- VERB-Inf: fâ, dâ, anâ, parlâ, stâ, vedde, beive, dî, mette, pensâ
Pronouns, Determiners, Quantifiers
- Art
- DET: o, a, i, un, e, l', unna, unn'
- Dem
- DET: quello, sta, quelli, quella, quelle, quell', sti, sto, atre, ste
- PRON: quello, gh', ghe, ne, n', quelli, quella, Sta, quelle, questo
- Exc
- DET: che
- Ind
- DET: quarche, tante, atra, atro, tanti, atri, i, nisciuña, tròppa, tutti
- PRON: un, uña, atri, atro, quarcösa, atra, guæi, tanti, mæxima, pægi
- Int
- DET: che
- PRON: chi, cöse, cös'
- Neg
- DET: nisciun, nisciuña
- PRON: ninte, nisciun
- Prs
- DET: mæ, seu, vòstra, sò, vòstro, nòstra, nòstro, tò, vòstri, ò
- PRON: o, a, se, ti, me, m', ghe, ve, ô, lê
- Rel
- PRON: che, ch', quæ, chi, quante, donde
- Tot
- DET: tutto, tutta, tutte, tutti, ciascheduña
- PRON: tutti, tutto, tutt', tutte
- Card
- NUM: eutto, 1929, quaranta, doe, doî, 1746, 1815, 1847, 1874, 1892
- Ord
- ADJ: primma, primmo, vinteximo, primme, quarto, segonda
- Yes
- DET: mæ, seu, vòstra, sò, vòstro, nòstra, nòstro, tò, vòstri, ò
- PRON: vòstra, seu, teu, tò
- 1
- AUX-Fin: ò, son, agge, ea, semmo, Emmo, aveivo, avesse, aviemo, aviæ
- PRON: me, m', mi, se, niatri, n', ne
- VERB-Fin: ò, penso, sò, veuggio, aiva, arriesciva, conoscio, daggo, diggo, emmo
- 2
- AUX-Fin: ê, sei, ei, æ, aggiæ, doviæ, peu, porriësci, poësci, stanni
- PRON: ti, ve, t', te, voî, v'
- VERB-Fin: mia, ammia, fæ, danni, diei, stæ, vanni, veu, æ, Appægia
- 3
- AUX-Fin: é, à, ea, son, en, an, peu, poeiva, aggian, ean
- PRON: o, a, se, ghe, ô, lê, s', l', â, gh'
- VERB-Fin: é, ea, à, diva, pâ, an, fa, sa, van, vedde
Other Features
- Foreign
- Yes
- X: Aventures, Les, Tintin, de, del, Canto, Italiani, Strade, claire, degli
- Yes
Syntax
Auxiliary Verbs and Copula
- This corpus uses 2 lemmas as copulas (cop). Examples: ëse, stâ.
- This corpus uses 8 lemmas as auxiliaries (aux). Examples: avei, ëse, poei, stâ, voei, dovei, savei, vegnî.
Core Arguments, Oblique Arguments and Adjuncts
Here we consider only relations between verbs (parent) and nouns or pronouns (child).
- nsubj
- VERB-Fin--NOUN (43)
- VERB-Fin--NOUN-ADP(de) (3)
- VERB-Fin--PRON (163)
- VERB-Fin--PRON-ADP(de) (1)
- VERB-Inf--NOUN (2)
- VERB-Inf--PRON (16)
- VERB-Part--NOUN (24)
- VERB-Part--PRON (68)
- obj
- VERB-Fin--NOUN (75)
- VERB-Fin--NOUN-ADP(de) (2)
- VERB-Fin--PRON (66)
- VERB-Fin--PRON-ADP(de) (1)
- VERB-Fin--PRON-ADP(insemme) (1)
- VERB-Ger--NOUN (1)
- VERB-Inf--NOUN (52)
- VERB-Inf--NOUN-ADP(de) (1)
- VERB-Inf--PRON (24)
- VERB-Part--NOUN (25)
- VERB-Part--NOUN-ADP(pe) (1)
- VERB-Part--PRON (25)
- iobj
- VERB-Fin--PRON (52)
- VERB-Inf--PRON (17)
- VERB-Part--PRON (29)
Reflexive Verbs
- This corpus contains 75 lemmas that occur at least once with an expl:pv child. Examples: ëse gh', avei gh', avei ghe, mette se, mette s', pensâ ghe, andâ n' se, anâ gh', beive me, dâ se, fâ ghe, mette me, piggiâ se, stâ gh', stâ ne se, veuâ se, accapî s', accattâ m', accòrze s', ammuggiâ s', andemmo ne se, andâ ne, anâ n' ve, anâ ne se, anâ ne te, appensâ gh' ve, arregordâ s', arretiâ s', arrivâ se, arvî s', assomeggiâ gh' se, attento ghe, attrovâ se, avei n', compiaxei me, credde se, destaccâ ve, dovei se, dâ s', dî ne, fermâ me, fiâ ve, fâ m', fâ ne, fâ s', fâ se, gödî me, identificâ s', imaginâ se, imbarcâ s'
Relations Overview
- This corpus uses 3 relation subtypes: acl:relcl, expl:impers, expl:pv
- The following 5 relation types are not used in this corpus at all: clf, compound, list, goeswith, reparandum