UD Faroese OFT
Language: Faroese (code: fo
)
Family: IE
This treebank has been part of Universal Dependencies since the UD v2.2 release.
The following people have contributed to making this treebank part of UD: Daniel Zeman, Bjartur Mortensen, Francis Tyers.
Repository: UD_Faroese-OFT
Search this treebank on-line: PML-TQ
Download all treebanks: UD 2.15
License: CC BY-SA 4.0
Genre: wiki
Questions, comments? General annotation questions (either Faroese-specific or cross-linguistic) can be raised in the main UD issue tracker. You can report bugs in this treebank in the treebank-specific issue tracker on Github. If you want to collaborate, please contact [ftyers (æt) hse • ru]. Development of the treebank happens outside the UD repository. If there are bugs, either the original data source or the conversion procedure must be fixed. Do not submit pull requests against the UD repository.
Annotation | Source |
---|---|
Lemmas | annotated manually in non-UD style, automatically converted to UD |
UPOS | annotated manually in non-UD style, automatically converted to UD |
XPOS | annotated manually |
Features | annotated manually in non-UD style, automatically converted to UD |
Relations | annotated manually, natively in UD style |
Description
This is a treebank of Faroese based on the Faroese Wikipedia.
The treebank is based on sentences from the Faroese Wikipedia. The whole Wikipedia was analysed using Trond Trosterud’s tools for Faroese.[1] We took all the sentences and discarded those with unknown words.
The remaining sentences were manually annotated for Universal Dependencies and the morphology and POS tags were converted deterministically using a lookup table. Errors in the original morphology and disambiguation were corrected where found.
The treebank contains a lot of copula sentences and very little first or second person, as can be expected from Wikipedia texts.
- http://gtweb.uit.no/cgi-bin/smi/smi.cgi?text=%C3%81+tunguni+eru+sm%C3%A1ar+tenn.&action=analyze&lang=fao&plang=eng
Acknowledgments
The morphology and preliminary disambiguation was done by Trond Trosterud’s finite-state morphology and constraint grammar for Faroese.
If you use this treebank in your work, please cite:
@inproceedings{tyersetal18-faroese,
author = {Francis M. Tyers and Mariya Sheyanova and Alexandra Martynova and Pavel Stepachev and Konstantin Vinogradovsky},
title = {Multi-source synthetic treebank creation for improved cross-lingual dependency parsing},
booktitle = {Proceedings of the Second Workshop on Universal Dependencies (UDW 2018)},
pages = {144--150},
year = 2018
}
Statistics of UD Faroese OFT
POS Tags
ADJ – ADP – ADV – AUX – CCONJ – DET – INTJ – NOUN – NUM – PART – PRON – PROPN – PUNCT – SCONJ – VERB – X
Features
Case – Definite – Degree – Gender – Mood – Number – NumType – Person – PronType – Reflex – Tense – VerbForm – Voice
Relations
acl – acl:relcl – advcl – advmod – amod – appos – aux – aux:pass – case – cc – cc:preconj – ccomp – compound – conj – cop – csubj – dep – det – discourse – expl – fixed – flat – iobj – mark – nmod – nmod:poss – nsubj – nsubj:pass – nummod – obj – obl – orphan – parataxis – punct – root – xcomp
Tokenization and Word Segmentation
- This corpus contains 1208 sentences and 10002 tokens.
- This corpus contains 1567 tokens (16%) that are not followed by a space.
- This corpus does not contain words with spaces.
- This corpus contains 22 types of words that contain both letters and punctuation. Examples: t.d., uml., handils-, á.Kr., ABC-samgonguni, Baden-Württemberg, KT-tænastum, Krýn., NATO-Ráðið, Nakin?, St., Sør-Trøndelag, búskapar-, cand., dr., fíggjar-, km., margarin-, mentanar-, mió., róma-, ídnaðar-
Morphology
Tags
- This corpus uses 16 UPOS tags out of 17 possible: ADJ, ADP, ADV, AUX, CCONJ, DET, INTJ, NOUN, NUM, PART, PRON, PROPN, PUNCT, SCONJ, VERB, X
- This corpus does not use the following tags: SYM
- This corpus contains 1 word types tagged as particles (PART): at
- This corpus contains 20 lemmas tagged as pronouns (PRON): allur, báðir, eg, eingin, hann, hasin, hesin, hon, hvør, ið, mann, nakar, onkur, seg, sum, summur, tað, teir, tú, vit
- This corpus contains 6 lemmas tagged as determiners (DET): allur, annar, ein, mín, summur, sín
- Out of the above, 2 lemmas occurred sometimes as PRON and sometimes as DET: allur, summur
- This corpus contains 6 lemmas tagged as auxiliaries (AUX): hava, kunna, mega, skula, vera, verða
- Out of the above, 4 lemmas occurred sometimes as AUX and sometimes as VERB: hava, skula, vera, verða
- There are 3 (de)verbal forms:
- Inf
- AUX: vera
- VERB: síggja, fáa, gera, koma, byggja, eta, hava, ganga, kasta, styrkja
- Part
- VERB: nevndur, Sameindu, fingin, flettir, fluttur, framdur, gjørdur, hóskandi, kendastur, keyptir
- Sup
- AUX: verið
- VERB: gjørt, kent, lagt, sæst, dyrkað, endurreist, friðað, funnið, havt, loyvt
Nominal Features
- Fem
- ADJ: størsta, fleiri, nógvar, stór, turr, aðrar, føroysku, nógv, onnur, somu
- DET: ein, eina, øll, Allar, eini, sína, Summi, allari, ei, einari
- NOUN: kommuna, kommunur, kommunu, ár, oyggin, oynni, øld, bygdini, kommununi, ferðavinna
- NUM: ein, tvær, trimum, tríggjar
- PRON: hon, henni, hennara, hana, onga, tær
- PROPN: Føroyum, Føroya, Føroyar, Danmark, Kina, Keypmannahavn, Florida, Tórshavnar, Tórshavn, Bergtóra
- VERB-Part: Sameindu, nevndar
- Masc
- ADJ: størsti, stórur, stóran, nógvur, stórir, føroyskur, aðrir, mangir, amerikanska, einasti
- ADV: vanliga
- DET: ein, einum, allir, Summir, allan, allur, sínum, mínir
- NOUN: býur, høvuðsstaður, býurin, høvuðsstaðurin, landslutinum, partur, týdning, Meginparturin, limur, landslutur
- NUM: tveir
- PRON: hann, teir, hansara, honum, nakrir, Allir, Báðir, Summir, hesir, nakar
- PROPN: Kalifornia, Tróndur, Jákupsson, Bergur, Dávid, Gásadali, Hanus, Jóannes, Jógvan, Magnus
- VERB-Part: flettir, kendastur, keyptir, prentaðir
- Neut
- ADJ: nógv, mong, stórt, Flestu, stór, sama, ymisk, annað, fleiri, føroyskt
- ADV: størsta, vanliga, veldiga
- DET: eitt, einum, annað, síni, sínum, Øll
- NOUN: fólkinum, fólk, landinum, landi, landið, grundarlagið, mál, Endamálið, fólkatalið, lýðveldi
- NUM: trý, tveimum, tvey
- PRON: hetta, Hatta, Hettar, hvat, okkurt
- PROPN: Noregi, Fraklandi, Niðurlondum, Noregs, Grønlandi, Hordalandi, Island, Russlandi, Estlandi, Grønland
- VERB-Part: samlaða
- Plur
- ADJ: nógv, fleiri, nógvar, mong, Flestu, stórir, aðrar, aðrir, mangir, stór
- ADV: vanliga, størsta, veldiga
- AUX: eru, vóru, hava, kunnu, skulu, máttu, mugu, verða
- DET: allir, Summir, Allar, mínir, síni, Øll
- NOUN: fólk, kommunur, býnum, døgum, ferðir, Føturnir, býir, indiánar, minuttir, muslimar
- PRON: teir, tey, vit, nakrir, Allir, Báðir, Summir, hesir, okkara, okkum
- PROPN: Føroyum, Føroya, Føroyar, Niðurlondum, Niðurlond, Hellurnar
- VERB: eru, búgva, doyðu, búsettust, búðu, hava, hjálpa, tala, vórðu, Drívið
- VERB-Part: Sameindu, flettir, keyptir, nevndar, prentaðir
- Sing
- ADJ: størsti, størsta, stórur, stóran, nógvur, stórt, føroyskur, sama, amerikanska, stór
- AUX: er, var, hevur, verður, varð, kann, skal, skuldi, havi, hevði
- DET: ein, eitt, einum, eina, allan, sínum, øll, allur, eini, sína
- NOUN: býur, høvuðsstaður, býurin, høvuðsstaðurin, kommuna, fólkinum, landslutinum, partur, týdning, Meginparturin
- NUM: ein, tveir, trý, tvær, trimum, tríggjar, tveimum, tvey
- PRON: hon, hann, tað, hetta, hansara, henni, honum, eg, hennara, Hatta
- PROPN: Noregi, Danmark, Kanada, Amerika, Kina, Fraklandi, Italia, Keypmannahavn, New, Nigeria
- VERB: býr, hevur, kom, liggur, Sí, fer, varð, fór, er, stendur
- VERB-Part: kendastur, samlaða
- Acc
- ADJ: stóran, aðrar, nógv, nógvar, búskaparligan, mong, Føroysk, arábiskt, fá, føroyskan
- DET: ein, eina, allan, eitt, sína, allir, síni
- NOUN: týdning, dag, fólk, íbúgvar, hátt, Styrkin, USA, ampa, bygdina, búskapin
- NUM: 2, 500, 7, 718.646, FM08, tvær, tríggjar, trý, tveir
- PRON: seg, hetta, hann, tað, hana, okkurt, onga, teir, tey
- PROPN: New, York, Jákupsson, Pakistan, West, Butan, Colorado, Eyguni, Føroyar, Island
- Dat
- ADJ: stórum, amerikanska, sama, mongum, bestu, bretskum, danska, gamlari, hvítum, høgum
- DET: einum, sínum, eini, Summi, allari, einari
- NOUN: USA, fólkinum, landslutinum, landinum, kommunu, býnum, ár, ES, døgum, oynni
- NUM: 2005, 2010, 2011, 000, 1931, 2000, 2008, 2009, 2014, 10
- PRON: henni, honum, sær, okkum
- PROPN: Føroyum, Noregi, Danmark, Fraklandi, Niðurlondum, Grønlandi, Hordalandi, Kalifornia, Mississippi, Russlandi
- VERB-Part: Sameindu, samlaða
- Gen
- ADJ: arbeiðsleys
- NOUN: dømis, landsins, felagsins, handils-, Fólkaháskúla, Islams, Læraraskúla, Rithøvundafelagsins, altars, arbeiðis
- NUM: 1930
- PRON: hansara, hennara, mín, okkara
- PROPN: Føroya, Noregs, Tórshavnar, Finsens, Sandavágs, Tvøroyrar, Bergens, Bretlands, Fraklands, Fuglafjarðar
- Nom
- ADJ: størsti, stórur, størsta, nógv, fleiri, nógvur, stór, Flestu, stórir, føroyskur
- DET: ein, eitt, øll, Summir, allir, Allar, allur, annað, ei, mínir
- NOUN: býur, høvuðsstaður, býurin, høvuðsstaðurin, kommuna, partur, Meginparturin, kommunur, limur, landslutur
- NUM: %, ein, 10, 26, 4, tveir, 13, 14, 18, 1917
- PRON: hon, hann, tað, hetta, teir, tey, vit, eg, nakrir, Allir
- PROPN: Føroyar, Kanada, Amerika, Kina, Italia, Nigeria, Asia, Florida, Jackson, Norra
- VERB-Part: flettir, kendastur, keyptir, nevndar, prentaðir
- Def
- ADJ: størsti, størsta, Flestu, sama, amerikanska, einasti, somu, føroysku, hægsti, størstu
- NOUN: býurin, høvuðsstaðurin, fólkinum, landslutinum, Meginparturin, landinum, býnum, oyggin, oynni, bygdini
- PROPN: Sprotin, Stiðin, Arbeiðaraflokkurin, Framburðsflokkin, Framburðsflokkurin, Høgra, Norðlandinum, Norðurlandinum, Suðurlandinum
- VERB-Part: Sameindu, samlaða
- Ind
- ADJ: nógv, fleiri, stórur, stóran, stór, nógvar, nógvur, mong, stórir, stórt
- DET: allir, øll, allan, Allar, allur, allari, annað
- NOUN: býur, høvuðsstaður, kommuna, partur, týdning, fólk, kommunur, limur, ár, kommunu
- PRON: nakrir, Allir, Báðir, nakar, okkurt, onga
- PROPN: Føroyum, Føroya, Føroyar, Noregi, Danmark, Fraklandi, Keypmannahavn, Niðurlondum, Noregs, Grønlandi
- VERB-Part: flettir, kendastur, keyptir, nevndar, prentaðir
Degree and Polarity
- Sup
- ADJ: størsti, størsta, Flestu, hægsti, størstu, bestu, minsta, minsti, besta, besti
- ADV: best, størsta
- VERB-Part: kendastur
Verbal Features
- Ind
- AUX: er, eru, var, vóru, hevur, verður, varð, kann, skal, hava
- VERB: býr, hevur, kom, liggur, eru, fer, varð, fór, er, stendur
- VERB-Part: nevndur, Sameindu, fingin, flettir, fluttur, framdur, gjørdur, hóskandi, kendastur, keyptir
- Imp
- VERB: Sí, Drívið, Les, end
- Past
- AUX: var, vóru, varð, skuldi, máttu, hevði
- VERB: kom, varð, fór, tók, gjørdist, hevði, vann, bleiv, spældi, byrjaði
- VERB-Part: nevndur, Sameindu, fingin, flettir, fluttur, framdur, gjørdur, kendastur, keyptir, nevndar
- Pres
- AUX: er, eru, hevur, verður, kann, skal, hava, kunnu, skulu, havi
- VERB: býr, hevur, liggur, eru, fer, er, stendur, eitur, fæst, nevnist
- VERB-Part: hóskandi
- Pass
- VERB: gjørdist, sæst, fæst, nevnist, búsettust, gerast, Andaðist, berast, berjast, boksast
- VERB-Inf: berast, berjast, gerast, gevast, giftast, kappast, klekjast, mannast, mennast, miðlast
- VERB-Sup: sæst, staðist
Pronouns, Determiners, Quantifiers
- Dem
- PRON: hetta, Hettar, hesir
- Int
- PRON: hvat
- Prs
- PRON: hon, hann, tað, teir, hansara, tey, henni, honum, vit, eg
- Rel
- PRON: sum, ið
- Ord
- ADJ: 2., 1., 18., 19., 11., 12., 16., 17., 29., 3.
- Yes
- PRON: seg, sær
- 1
- AUX: havi
- PRON: vit, eg, mín, okkara, okkum
- VERB: taki
- 2
- PRON: tú
- VERB: sært
- 3
- AUX: er, var, hevur, verður, varð, kann, skal, varir
- PRON: hon, hann, tað, teir, hansara, tey, henni, honum, hennara, hana
- VERB: býr, hevur, kom, liggur, fer, varð, fór, er, stendur, tók
Other Features
Syntax
Auxiliary Verbs and Copula
- This corpus uses 1 lemmas as copulas (cop). Examples: vera.
- This corpus uses 6 lemmas as auxiliaries (aux). Examples: hava, kunna, skula, verða, vera, mega.
- This corpus uses 2 lemmas as passive auxiliaries (aux:pass). Examples: verða, vera.
Core Arguments, Oblique Arguments and Adjuncts
Here we consider only relations between verbs (parent) and nouns or pronouns (child).
- nsubj
- VERB--NOUN (1)
- VERB--NOUN-Acc (4)
- VERB--NOUN-Dat (1)
- VERB--NOUN-Nom (208)
- VERB--NOUN-Nom-ADP(um) (1)
- VERB--PRON (21)
- VERB--PRON-Nom (50)
- VERB-Inf--NOUN-Dat (1)
- VERB-Inf--NOUN-Nom (18)
- VERB-Inf--PRON (1)
- VERB-Inf--PRON-Nom (3)
- VERB-Part--NOUN-Nom (9)
- VERB-Part--PRON (1)
- VERB-Part--PRON-Nom (2)
- VERB-Sup--NOUN-Nom (30)
- VERB-Sup--PRON-Nom (15)
- obj
- VERB--NOUN (2)
- VERB--NOUN-Acc (54)
- VERB--NOUN-Acc-ADP(á) (1)
- VERB--NOUN-Dat (9)
- VERB--NOUN-Nom (4)
- VERB--PRON (1)
- VERB--PRON-Acc (12)
- VERB--PRON-Dat (3)
- VERB-Inf--NOUN-Acc (33)
- VERB-Inf--NOUN-Dat (3)
- VERB-Inf--NOUN-Nom (1)
- VERB-Inf--PRON-Acc (6)
- VERB-Inf--PRON-Dat (1)
- VERB-Part--NOUN-Acc (1)
- VERB-Sup--NOUN-Acc (4)
- VERB-Sup--NOUN-Dat (2)
- VERB-Sup--NOUN-Nom (2)
- VERB-Sup--PRON (1)
- iobj
- VERB-Sup--NOUN-Dat (1)
Verbs with Reflexive Core Objects
- This corpus contains 11 lemmas that occur at least once with a reflexive core object (obj or iobj). Examples: taka seg, breiða seg, búseta seg, halda sær, játta seg, krúpa sær, laga seg, lata seg, leggja seg, sita sær, venja seg
Relations Overview
- This corpus uses 5 relation subtypes: acl:relcl, aux:pass, cc:preconj, nmod:poss, nsubj:pass
- The following 6 relation types are not used in this corpus at all: vocative, dislocated, clf, list, goeswith, reparandum