UD Galician CTG
Language: Galician (code: gl)
Family: IE
This treebank has been part of Universal Dependencies since the UD v1.3 release.
The following people have contributed to making this treebank part of UD: Xavier Gómez Guinovart.
Repository: UD_Galician-CTG
Search this treebank on-line: PML-TQ
Download all treebanks: UD 2.18
License: CC BY-NC-SA 3.0
Genre: medical, legal, nonfiction, news
Questions, comments? General annotation questions (either Galician-specific or cross-linguistic) can be raised in the main UD issue tracker. You can report bugs in this treebank in the treebank-specific issue tracker on Github. If you want to collaborate, please contact [xgg (æt) uvigo • es]. Development of the treebank happens outside the UD repository. If there are bugs, either the original data source or the conversion procedure must be fixed. Do not submit pull requests against the UD repository.
| Annotation | Source |
|---|---|
| Lemmas | annotated manually in non-UD style, automatically converted to UD |
| UPOS | annotated manually in non-UD style, automatically converted to UD |
| XPOS | annotated manually |
| Features | not available |
| Relations | annotated manually in non-UD style, automatically converted to UD |
Description
The Galician UD treebank is based on the automatic parsing of the Galician Technical Corpus (http://sli.uvigo.gal/CTG) created at the University of Vigo by the the TALG NLP research group.
Original corpus sentences were selected and shuffled at random, and divided in 60-20-20 splits for the train, dev and test files, respectively.
The bootstrap version of the CTG UD annotated corpus was obtained by using FreeLing 4.0 parser with the Treeler library, and by adapting the POS and dependency relations tags to CoNLL-U Format. Next versions of the corpus imply a review of the results of this initial version.
The Galician UD treebank covers mainly technical texts of the fields of medicine, sociology, ecology, economy and law.
Acknowledgments
- Special thanks to Martin Popel and Dan Zeman for their invaluable help
Statistics of UD Galician CTG
POS Tags
ADJ – ADP – ADV – AUX – CCONJ – DET – INTJ – NOUN – NUM – PART – PRON – PROPN – PUNCT – SCONJ – SYM – VERB – X
Features
Definite – Foreign – Gender – Number – Polarity – Poss – PronType – Reflex – Typo
Relations
acl – acl:relcl – advcl – advmod – amod – appos – aux – case – cc – ccomp – compound – conj – cop – csubj – csubj:outer – dep – det – discourse – expl:pass – flat – goeswith – iobj – list – mark – nmod – nsubj – nsubj:outer – nummod – obj – obl – orphan – parataxis – punct – root – xcomp
Tokenization and Word Segmentation
- This corpus contains 3993 sentences, 126011 tokens and 139122 syntactic words.
- This corpus contains 13812 tokens (11%) that are not followed by a space.
- This corpus does not contain words with spaces.
- This corpus contains 242 types of words that contain both letters and punctuation. Examples: etc., AA., CC., C., S.A., a., art., pp., d/105, L., 1.ª, 2.a, 2.º, 5.º, 80/68/CEE, 92/43/CEE, B., C.H., Castela-A, E., EE.UU., I., J., Timbre-Real, contencioso-administrativo, gr., m/105h, marítimo-terrestre, ptas., varianzas-covarianzas, -n/2, 08.05.432A.740.0, 1%dos, 1.1.-España, 1.Programa, 1.a, 101/97/CE, 12.1.e, 123.A, 149.1.21.ª, 1999/519/EC, 2.o, 2000/76/CE, 2005,mais, 21.Un, 3.º, 35.3.n, 4.º, 620.1.º, 76/464/CEE
- This corpus contains 13035 multi-word tokens. On average, one multi-word token consists of 2.01 syntactic words.
- There are 1211 types of multi-word tokens. Examples: do, da, no, dos, na, das, á, ao, nos, ó, polo, nas, co, pola, ás, dun, coa, dunha, neste, aos, ós, nun, deste, desta, nunha, polos, cos, cunha, nesta, coas, cun, termos, polas, tódolos, destes, destas, nestes, deles, pódese, noutros, tódalas, delas, doutros, doutras, trátase, del, nese, débese, modifícase, nalgúns.
Morphology
Tags
- This corpus uses 17 UPOS tags out of 17 possible: ADJ, ADP, ADV, AUX, CCONJ, DET, INTJ, NOUN, NUM, PART, PRON, PROPN, PUNCT, SCONJ, SYM, VERB, X
- This corpus contains 4 word types tagged as particles (PART): Ln, föhn, non, senon
- This corpus contains 35 lemmas tagged as pronouns (PRON): algo, alguén, aquel, aquilo, cal, cando, canto, che, como, consigo, cuxo, el, este, eu, iso, isto, lle, me, min, nada, ninguén, nos, nós, o, onde, que, quen, quén, se, si, te, ti, un, vos, vostede
- This corpus contains 33 lemmas tagged as determiners (DET): a, algún, ambos, aquel, as, cada, calquera, canto, certo, demais, el, entrambos, ese, este, la, mesmo, meu, moito, ningún, noso, o, outro, pouco, propio, que, senllos, seu, tal, tanto, teu, todo, un, varios
- Out of the above, 7 lemmas occurred sometimes as PRON and sometimes as DET: aquel, canto, el, este, o, que, un
- This corpus contains 12 lemmas tagged as auxiliaries (AUX): acabar, deber, deixar, estar, haber, ir, levar, poder, seguir, ser, ter, vir
- Out of the above, 12 lemmas occurred sometimes as AUX and sometimes as VERB: acabar, deber, deixar, estar, haber, ir, levar, poder, seguir, ser, ter, vir
- This corpus does not use the VerbForm feature.
Nominal Features
- Fem
- DET: as, a
- Plur
- DET: as
- Sing
- DET: a
- Def
- DET: a, o, os, as, la, los
- Ind
- DET: un, unha, unhas
Degree and Polarity
- Neg
- PART: non
Verbal Features
Pronouns, Determiners, Quantifiers
- Art
- DET: a, o, os, as, un, unha, unhas, la, el, los
- Dem
- DET: este, esta, estes, estas, ese, aqueles, tal, esa, aquelas, tales
- PRON: isto, iso, aqueles, aquilo, esta
- Ind
- DET: outros, mesmo, outra, outras, outro, calquera, algúns, mesma, uns, moitas
- PRON: un, unha, algo, alguén
- Int
- DET: qué
- Neg
- DET: ningún, ningunha
- PRON: nada, ninguén
- Prs
- DET: súa, seu, seus, súas, nosa, noso, nosos, nosas, meu, miña
- PRON: se, o, os, lle, me, nos, lles, eles, a, el
- Rel
- DET: que, canto, cantas
- PRON: que, como, cando, onde, cal, quen, canto, cales, cantos, cuxa
- Tot
- DET: cada, todo, todos, todas, toda, ambos, ambas, entrambos
- Yes
- DET: súa, seu, seus, súas, nosa, noso, nosos, nosas, meu, miña
- Yes
- PRON: se, si, consigo
Other Features
- Foreign
- Yes
- ADJ: obstante, Alternative, antimonopólico, apreciable, efectivo, funcional, gráficos, mayor, mediático, menor
- ADP: de, al, en, a, on
- ADV: concretamente
- CCONJ: o, y
- DET: la, el, los, un, Ese, a
- NOUN: táboa, Capítulo, cadro, figura, figuras, Califf, Cantábrica-Rías, Energy, Lei, Parque
- NUM: 1, 10, 20, 23, 70, -11,86, -36,03, -42,43, 105, 11
- PART: non
- PRON: como, que
- PROPN: 1979a, BOE, Benson, Brassington, CV, Covarrubias, Exponse, Hannan, Isla, Lim
- PUNCT: ,, .
- SCONJ: que
- SYM: +
- VERB: di, Brooman, FA+U, IPa, MECACAR, Pasa, atraviesa, autoinmune, cae, chega
- Yes
- Typo
- Yes
- NOUN: te-rra
- VERB: anali
- Yes
Syntax
Auxiliary Verbs and Copula
- This corpus uses 2 lemmas as copulas (cop). Examples: ser, estar.
- This corpus uses 12 lemmas as auxiliaries (aux). Examples: poder, deber, ser, haber, estar, ter, ir, seguir, deixar, vir, acabar, levar.
Core Arguments, Oblique Arguments and Adjuncts
Here we consider only relations between verbs (parent) and nouns or pronouns (child).
- nsubj
- VERB--NOUN (2831)
- VERB--NOUN-ADP(a) (10)
- VERB--NOUN-ADP(ata) (1)
- VERB--NOUN-ADP(de) (12)
- VERB--NOUN-ADP(de)-ADP(punto) (1)
- VERB--NOUN-ADP(en) (1)
- VERB--NOUN-ADP(entre) (1)
- VERB--NOUN-ADP(por) (1)
- VERB--NOUN-ADP(sobre) (1)
- VERB--NOUN-ADP(xa) (1)
- VERB--PRON (1427)
- VERB--PRON-ADP(de) (2)
- VERB--PRON-ADP(de)-ADP(a) (1)
- obj
- VERB--NOUN (4227)
- VERB--NOUN-ADP(a) (295)
- VERB--NOUN-ADP(a)-ADP(a)-ADP(abandono) (1)
- VERB--NOUN-ADP(a)-ADP(a)-ADP(calidade) (1)
- VERB--NOUN-ADP(a)-ADP(a)-ADP(configuración) (1)
- VERB--NOUN-ADP(a)-ADP(a)-ADP(descoñecemento) (1)
- VERB--NOUN-ADP(a)-ADP(a)-ADP(déficit) (1)
- VERB--NOUN-ADP(a)-ADP(a)-ADP(referente) (1)
- VERB--NOUN-ADP(a)-ADP(con)-ADP(horizonte) (1)
- VERB--NOUN-ADP(a)-ADP(de) (3)
- VERB--NOUN-ADP(con) (2)
- VERB--NOUN-ADP(de) (27)
- VERB--NOUN-ADP(desde) (1)
- VERB--NOUN-ADP(en) (4)
- VERB--NOUN-ADP(mentres) (1)
- VERB--NOUN-ADP(precisamente) (1)
- VERB--NOUN-ADP(segundo) (1)
- VERB--NOUN-ADP(sobre) (1)
- VERB--PRON (1590)
- VERB--PRON-ADP(a) (6)
- VERB--PRON-ADP(a)-ADP(procurar) (1)
- VERB--PRON-ADP(para) (1)
- iobj
- VERB--NOUN-ADP(a) (190)
- VERB--NOUN-ADP(a)-ADP(a) (2)
- VERB--NOUN-ADP(a)-ADP(a)-ADP(c.h.) (1)
- VERB--NOUN-ADP(a)-ADP(a)-ADP(coidado) (1)
- VERB--NOUN-ADP(a)-ADP(a)-ADP(consello) (1)
- VERB--NOUN-ADP(a)-ADP(a)-ADP(emprego) (1)
- VERB--NOUN-ADP(a)-ADP(a)-ADP(lingua) (1)
- VERB--NOUN-ADP(a)-ADP(a)-ADP(mantemento) (1)
- VERB--NOUN-ADP(a)-ADP(a)-ADP(percebe)-ADP(a)-ADP(mariñeiro) (1)
- VERB--NOUN-ADP(a)-ADP(a)-ADP(reciclaxe) (1)
- VERB--NOUN-ADP(a)-ADP(a)-ADP(sistema) (1)
- VERB--NOUN-ADP(a)-ADP(a)-ADP(tempo) (1)
- VERB--NOUN-ADP(a)-ADP(de) (2)
- VERB--NOUN-ADP(para) (4)
- VERB--PRON (270)
- VERB--PRON-ADP(a) (7)
- VERB--PRON-ADP(a)-ADP(caber) (1)
- VERB--PRON-ADP(para) (1)
Reflexive Passive
- This corpus contains 54 lemmas that occur at least once with an expl:pass child. Examples: facer se, observar se, realizar se, comprobar se, desenvolver se, incorporar se, manter se, aclarar se, almacenar se, aplicar se, apreciar se, bo se, calcular se, caracterizar se, consolidar se, corresponder se, dar se, debater se, definir se, detallar se, diferenciar se, engader se, entender se, estar se, estimar se, facilitar se, impulsar se, indicar se, informar se, introducir se, invitar se, levar se, modificar se, necesario se, notificar se, obter se, papel se, participar se, posible se, poñer se, preciso se, preguntar se, prever se, proceder se, recomendar se, reintegrar se, remitir se, representar se, reunir se, ser se
Verbs with Reflexive Core Objects
- This corpus contains 369 lemmas that occur at least once with a reflexive core object (obj or iobj). Examples: producir se, facer se, atopar se, referir se, realizar se, considerar se, tratar se, observar se, ter se, presentar se, empregar se, establecer se, manter se, aprobar se, encontrar se, incluír se, situar se, utilizar se, desenvolver se, efectuar se, ver se, dar se, determinar se, indicar se, pretender se, analizar se, obter se, prever se, crear se, enfrontar se, esperar se, incrementar se, modificar se, reducir se, apreciar se, basear se, chegar se, comentar se, coñecer se, dicir se, haber se, incorporar se, integrar se, regular se, relacionar se, xerar se, acadar se, aplicar se, asignar se, axustar se
- Out of those, 4 lemmas occurred more than once, but never without a reflexive dependent. Examples: confundir, meter, opor, reiniciar
Relations Overview
- This corpus uses 4 relation subtypes: acl:relcl, csubj:outer, expl:pass, nsubj:outer
- The following 1 main types are not used alone, they are always subtyped: expl
- The following 5 relation types are not used in this corpus at all: vocative, dislocated, clf, fixed, reparandum