UD_Alemannic-DIVITAL
|
UD_Alemannic-UZH
|
Tokenization and Word Segmentation
|
Tokenization and Word Segmentation
|
- This corpus contains 977 sentences, 19334 tokens and 19743 syntactic words.
|
- This corpus contains 100 sentences and 1444 tokens.
|
- This corpus contains 3376 tokens (17%) that are not followed by a space.
|
- This corpus contains 176 tokens (12%) that are not followed by a space.
|
- This corpus does not contain words with spaces.
|
- This corpus does not contain words with spaces.
|
- This corpus contains 206 types of words that contain both letters and punctuation. Examples: d’, d', 's, ’s, s', m'r, d'r, 'm, 'r, ’ne, g’sinn, g’sààt, l', z', frz., g’komme, so-n-, 'ne, Nàtionàl-, g’hett, wisse-n-, ’m, 'em, -ed-, ABC-Buech, Diwan-Netzwerk, Diwan-Schuele, Regional-, biss'l, d'ran, d'rvon, de⸗n⸗, g'säit, g’funde, g’schlàcht, g’sindigt, kumme-n-, mi', numme-n-, od'r, wid'r, worre-n-, wùrre-n-, z’, àng’fànge, ⸗i, 'rem, 'rüs, -ere, -ewer-
|
- This corpus contains 12 types of words that contain both letters and punctuation. Examples: Baguetteschliff-Diamante, Chaux-de-Fonds, Informations-, Marie-Claire, Mercury-atlas-8-flug, Mont-pèlerin, Möhli-basel, Natsi-spiler, PowerPoint-Präsentation, Schloss-heer, Scientology-Chilä, YB-Fans
|
- This corpus contains 409 multi-word tokens. On average, one multi-word token consists of 2.00 syntactic words.
- There are 74 types of multi-word tokens. Examples: im, vùm, ìm, àm, vum, zuem, des, vùme, bim, zum, am, ime, mìtem, ìme, ùffem, ins, vùnere, ìnere, dùrichs, mit'm, mìteme, mìtere, noochem, sowie, àme, àns, ìns, ùffs, üs'm, em, l'abbé, mit'em, mìme, noocheme, uf'm, voreme, zem, züem, ànere, ém, ùntereme, aux, du, fers, fonana, foum, hìnterem, i's, jedesmol, l’Homme.
|
|
Morphology
Tags
- This corpus uses 17 UPOS tags out of 17 possible: ADJ, ADP, ADV, AUX, CCONJ, DET, INTJ, NOUN, NUM, PART, PRON, PROPN, PUNCT, SCONJ, SYM, VERB, X
|
Morphology
Tags
- This corpus uses 15 UPOS tags out of 17 possible: ADJ, ADP, ADV, AUX, CCONJ, DET, NOUN, NUM, PART, PRON, PROPN, PUNCT, SCONJ, VERB, X
- This corpus does not use the following tags: INTJ, SYM
|
- This corpus contains 22 word types tagged as particles (PART): am, fer, im, in, ne, nem, nemm, nemmi, net, nimmeh, nimmi, nit, nitt, nét, nëm, nìmm, nìmmi, nìt, ze, zu, z’, àm
|
- This corpus contains 12 word types tagged as particles (PART): am, go, hi, los, nid, nöd, nümm, ume, use, uuf, uus, z
|
- This corpus contains 1 lemmas tagged as pronouns (PRON): _
|
- This corpus contains 1 lemmas tagged as pronouns (PRON): _
|
- This corpus contains 1 lemmas tagged as determiners (DET): _
|
- This corpus contains 1 lemmas tagged as determiners (DET): _
|
- Out of the above, 1 lemmas occurred sometimes as PRON and sometimes as DET: _
|
- Out of the above, 1 lemmas occurred sometimes as PRON and sometimes as DET: _
|
- This corpus contains 1 lemmas tagged as auxiliaries (AUX): _
|
- This corpus contains 1 lemmas tagged as auxiliaries (AUX): _
|
- Out of the above, 1 lemmas occurred sometimes as AUX and sometimes as VERB: _
|
- Out of the above, 1 lemmas occurred sometimes as AUX and sometimes as VERB: _
|
- This corpus does not use the VerbForm feature.
|
- This corpus does not use the VerbForm feature.
|
Nominal Features
|
Nominal Features
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Degree and Polarity
|
Degree and Polarity
|
|
|
|
|
|
|
|
|
|
Verbal Features
|
Verbal Features
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Pronouns, Determiners, Quantifiers
|
Pronouns, Determiners, Quantifiers
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Other Features
|
Other Features
|
- Epenthesis
- Yes
- ADP: gajen'
- ADV: so-n-, numme⸗n⸗, o, so
- AUX: worre-n-, wùrre-n-, hàn, welle⸗n⸗
- DET: eso-n-
- SCONJ: wo
- VERB: wisse-n-, wissen, Wissen', Wissen-, Wisse⸗n, bekùmme-n-, frschiasan', gangen, geh'n, gelaje-n-
|
|
- Foreign
- Yes
- ADJ: constitutionnel, européenne, international, nationale, régional, supérieur, Alsacienne, Basque, Culturelle, Législatives
- ADP: de, d', d’, pour, en, an, du, à
- ADV: enfin, bien, ex, finalement, merci, également
- AUX: hàn, sommes
- CCONJ: et
- DET: les, la, l', de, le, ma, Das, dem, den, des
- INTJ: Bravo, Eh, allez, Oui, Salut, Sapristi, bien
- NOUN: Conseil, Rapport, République, droits, Bretzel, Institut, Or, article, bilinguisme, langue
- NUM: IIIe
- PRON: -toi, Toi, ich, je, nous
- PROPN: Alsace, France, ONU, Europe, Gascogne, IPA, La, Moselle, oc, AMCT
- SCONJ: que
- VERB: coûte, Foie, VIVE, aussuchen, cherche, choisir, matar, parlez, passant, suche
- X: bon, Alsace, BA, BE, BI, BO, BU, Little, Pace, Texas
|
|
- Typo
- Yes
- ADV: o, so
- PRON: sin
- SCONJ: wo
|
|
Syntax
Auxiliary Verbs and Copula
- This corpus uses 1 lemmas as copulas (cop). Examples: _.
|
Syntax
Auxiliary Verbs and Copula
- This corpus uses 1 lemmas as copulas (cop). Examples: _.
|
- This corpus uses 1 lemmas as auxiliaries (aux). Examples: _.
- This corpus uses 1 lemmas as passive auxiliaries (aux:pass). Examples: _.
|
- This corpus uses 1 lemmas as auxiliaries (aux). Examples: _.
- This corpus uses 1 lemmas as passive auxiliaries (aux:pass). Examples: _.
|
Core Arguments, Oblique Arguments and Adjuncts
Here we consider only relations between verbs (parent) and nouns or pronouns (child).
- nsubj
- VERB--NOUN (322)
- VERB--NOUN-ADP(_) (2)
- VERB--PRON (680)
|
Core Arguments, Oblique Arguments and Adjuncts
Here we consider only relations between verbs (parent) and nouns or pronouns (child).
- nsubj
- VERB--NOUN (24)
- VERB--PRON (44)
|
- obj
- VERB--NOUN (516)
- VERB--NOUN-ADP(_) (4)
- VERB--PRON (187)
|
- obj
- VERB--NOUN (30)
- VERB--PRON (22)
|
|
|
- iobj
- VERB--NOUN (1)
- VERB--PRON (6)
|
Reflexive Verbs
- This corpus contains 11 lemmas that occur at least once with an expl:pv child. Examples: _ sich, _ sìch, _ mi, _ mich, _ éich, _ anànder, _ di, _ eich, _ eijch, _ enànder, _ sin
|
|
|
|
|
|
|
|
Relations Overview
- This corpus uses 22 relation subtypes: acl:relcl, advcl:relcl, advmod:emph, advmod:lmod, advmod:tmod, aux:pass, cc:preconj, compound:prt, csubj:outer, det:poss, det:predet, expl:pv, flat:name, nmod:lmod, nmod:poss, nmod:tmod, nsubj:outer, nsubj:pass, obl:agent, obl:arg, obl:lmod, obl:tmod
- The following 4 relation types are not used in this corpus at all: iobj, clf, list, dep
|
Relations Overview
- This corpus uses 5 relation subtypes: acl:relcl, aux:pass, compound:prt, nmod:poss, nsubj:pass
- The following 8 relation types are not used in this corpus at all: vocative, dislocated, discourse, clf, list, orphan, goeswith, reparandum
|