home edit page issue tracker

This page pertains to UD version 2.

UD_Alemannic-DIVITAL

UD_Alemannic-UZH

Tokenization and Word Segmentation

Tokenization and Word Segmentation

  • This corpus contains 977 sentences, 19334 tokens and 19743 syntactic words.
  • This corpus contains 100 sentences and 1444 tokens.
  • This corpus contains 3376 tokens (17%) that are not followed by a space.
  • This corpus contains 176 tokens (12%) that are not followed by a space.
  • This corpus does not contain words with spaces.
  • This corpus does not contain words with spaces.
  • This corpus contains 206 types of words that contain both letters and punctuation. Examples: d’, d', 's, ’s, s', m'r, d'r, 'm, 'r, ’ne, g’sinn, g’sààt, l', z', frz., g’komme, so-n-, 'ne, Nàtionàl-, g’hett, wisse-n-, ’m, 'em, -ed-, ABC-Buech, Diwan-Netzwerk, Diwan-Schuele, Regional-, biss'l, d'ran, d'rvon, de⸗n⸗, g'säit, g’funde, g’schlàcht, g’sindigt, kumme-n-, mi', numme-n-, od'r, wid'r, worre-n-, wùrre-n-, z’, àng’fànge, ⸗i, 'rem, 'rüs, -ere, -ewer-
  • This corpus contains 12 types of words that contain both letters and punctuation. Examples: Baguetteschliff-Diamante, Chaux-de-Fonds, Informations-, Marie-Claire, Mercury-atlas-8-flug, Mont-pèlerin, Möhli-basel, Natsi-spiler, PowerPoint-Präsentation, Schloss-heer, Scientology-Chilä, YB-Fans
  • This corpus contains 409 multi-word tokens. On average, one multi-word token consists of 2.00 syntactic words.
  • There are 74 types of multi-word tokens. Examples: im, vùm, ìm, àm, vum, zuem, des, vùme, bim, zum, am, ime, mìtem, ìme, ùffem, ins, vùnere, ìnere, dùrichs, mit'm, mìteme, mìtere, noochem, sowie, àme, àns, ìns, ùffs, üs'm, em, l'abbé, mit'em, mìme, noocheme, uf'm, voreme, zem, züem, ànere, ém, ùntereme, aux, du, fers, fonana, foum, hìnterem, i's, jedesmol, l’Homme.

Morphology

Tags

Morphology

Tags

  • This corpus contains 22 word types tagged as particles (PART): am, fer, im, in, ne, nem, nemm, nemmi, net, nimmeh, nimmi, nit, nitt, nét, nëm, nìmm, nìmmi, nìt, ze, zu, z’, àm
  • This corpus contains 12 word types tagged as particles (PART): am, go, hi, los, nid, nöd, nümm, ume, use, uuf, uus, z
  • This corpus contains 1 lemmas tagged as pronouns (PRON): _
  • This corpus contains 1 lemmas tagged as pronouns (PRON): _
  • This corpus contains 1 lemmas tagged as determiners (DET): _
  • This corpus contains 1 lemmas tagged as determiners (DET): _
  • Out of the above, 1 lemmas occurred sometimes as PRON and sometimes as DET: _
  • Out of the above, 1 lemmas occurred sometimes as PRON and sometimes as DET: _
  • This corpus contains 1 lemmas tagged as auxiliaries (AUX): _
  • This corpus contains 1 lemmas tagged as auxiliaries (AUX): _
  • Out of the above, 1 lemmas occurred sometimes as AUX and sometimes as VERB: _
  • Out of the above, 1 lemmas occurred sometimes as AUX and sometimes as VERB: _
  • This corpus does not use the VerbForm feature.
  • This corpus does not use the VerbForm feature.

Nominal Features

Nominal Features

Degree and Polarity

Degree and Polarity

Verbal Features

Verbal Features

Pronouns, Determiners, Quantifiers

Pronouns, Determiners, Quantifiers

Other Features

Other Features

  • Epenthesis
    • Yes
      • ADP: gajen'
      • ADV: so-n-, numme⸗n⸗, o, so
      • AUX: worre-n-, wùrre-n-, hàn, welle⸗n⸗
      • DET: eso-n-
      • SCONJ: wo
      • VERB: wisse-n-, wissen, Wissen', Wissen-, Wisse⸗n, bekùmme-n-, frschiasan', gangen, geh'n, gelaje-n-
  • Foreign
    • Yes
      • ADJ: constitutionnel, européenne, international, nationale, régional, supérieur, Alsacienne, Basque, Culturelle, Législatives
      • ADP: de, d', d’, pour, en, an, du, à
      • ADV: enfin, bien, ex, finalement, merci, également
      • AUX: hàn, sommes
      • CCONJ: et
      • DET: les, la, l', de, le, ma, Das, dem, den, des
      • INTJ: Bravo, Eh, allez, Oui, Salut, Sapristi, bien
      • NOUN: Conseil, Rapport, République, droits, Bretzel, Institut, Or, article, bilinguisme, langue
      • NUM: IIIe
      • PRON: -toi, Toi, ich, je, nous
      • PROPN: Alsace, France, ONU, Europe, Gascogne, IPA, La, Moselle, oc, AMCT
      • SCONJ: que
      • VERB: coûte, Foie, VIVE, aussuchen, cherche, choisir, matar, parlez, passant, suche
      • X: bon, Alsace, BA, BE, BI, BO, BU, Little, Pace, Texas
  • Typo
    • Yes
      • ADV: o, so
      • PRON: sin
      • SCONJ: wo

Syntax

Auxiliary Verbs and Copula

  • This corpus uses 1 lemmas as copulas (cop). Examples: _.

Syntax

Auxiliary Verbs and Copula

  • This corpus uses 1 lemmas as copulas (cop). Examples: _.
  • This corpus uses 1 lemmas as auxiliaries (aux). Examples: _.
  • This corpus uses 1 lemmas as passive auxiliaries (aux:pass). Examples: _.
  • This corpus uses 1 lemmas as auxiliaries (aux). Examples: _.
  • This corpus uses 1 lemmas as passive auxiliaries (aux:pass). Examples: _.

Core Arguments, Oblique Arguments and Adjuncts

Here we consider only relations between verbs (parent) and nouns or pronouns (child).
  • nsubj
    • VERB--NOUN (322)
    • VERB--NOUN-ADP(_) (2)
    • VERB--PRON (680)

Core Arguments, Oblique Arguments and Adjuncts

Here we consider only relations between verbs (parent) and nouns or pronouns (child).
  • nsubj
    • VERB--NOUN (24)
    • VERB--PRON (44)
  • obj
    • VERB--NOUN (516)
    • VERB--NOUN-ADP(_) (4)
    • VERB--PRON (187)
  • obj
    • VERB--NOUN (30)
    • VERB--PRON (22)
  • iobj
    • VERB--NOUN (1)
    • VERB--PRON (6)

Reflexive Verbs

  • This corpus contains 11 lemmas that occur at least once with an expl:pv child. Examples: _ sich, _ sìch, _ mi, _ mich, _ éich, _ anànder, _ di, _ eich, _ eijch, _ enànder, _ sin

Relations Overview

Relations Overview