home fr/pos edit page issue tracker

PUNCT: punctuation

Definition

Punctuation marks are non-alphabetical characters and character groups used to delimit linguistic units in printed text. They are tagged PUNCT regardless of their function.


Treebank Statistics (UD_French)

There are 1 PUNCT lemmas (6%), 40 PUNCT types (0%) and 44312 PUNCT tokens (11%). Out of 17 observed tags, the rank of PUNCT is: 13 in number of lemmas, 13 in number of types and 4 in number of tokens.

The 10 most frequent PUNCT lemmas: _

The 10 most frequent PUNCT types: ,, ., (, ), “, «, », :, !, ?

The 10 most frequent ambiguous lemmas: _ (NOUN 73641, ADP 64129, DET 61780, PUNCT 44312, VERB 36183, PROPN 31663, ADJ 22616, PRON 17750, ADV 13108, NUM 10834, CONJ 10138, AUX 8952, SCONJ 2908, PART 1668, X 1056, SYM 486, INTJ 267)

The 10 most frequent ambiguous types: (PUNCT 952, SYM 4), / (PUNCT 130, SYM 1), (PUNCT 33, SYM 2), + (SYM 14, PUNCT 6, PROPN 1, ADP 1), = (SYM 7, PUNCT 6), x (PUNCT 3, NOUN 1, SYM 1), { (PUNCT 3, X 1), * (SYM 3, PUNCT 2), } (PUNCT 2, NOUN 1), extraits (NOUN 4, VERB 1, PUNCT 1)

Morphology

The form / lemma ratio of PUNCT is 40.000000 (the average of all parts of speech is 2777.470588).

The 1st highest number of forms (40) was observed with the lemma “_”: !, “, ‘, ‘’, (, ), *, +, ,, -, -), –, ., .., …, …., ….., ……, …….., /, :, ;, =, ?, Chapitre, [, ], ^, _, `, extraits, trois, x, {, |, }, «, ·, », —.

PUNCT does not occur with any features.

Relations

PUNCT nodes are attached to their parents using 10 different relations: fr-dep/punct (44283; 100% instances), fr-dep/cc (9; 0% instances), fr-dep/det (9; 0% instances), fr-dep/conj (4; 0% instances), fr-dep/nmod (2; 0% instances), fr-dep/acl (1; 0% instances), fr-dep/appos (1; 0% instances), fr-dep/case (1; 0% instances), fr-dep/compound (1; 0% instances), fr-dep/nummod (1; 0% instances)

Parents of PUNCT nodes belong to 17 different parts of speech: VERB (21353; 48% instances), NOUN (12967; 29% instances), PROPN (5050; 11% instances), ADJ (2361; 5% instances), NUM (1102; 2% instances), PRON (428; 1% instances), X (390; 1% instances), ADV (280; 1% instances), SYM (133; 0% instances), CONJ (74; 0% instances), ADP (63; 0% instances), INTJ (42; 0% instances), PUNCT (32; 0% instances), DET (14; 0% instances), AUX (13; 0% instances), SCONJ (7; 0% instances), PART (3; 0% instances)

44272 (100%) PUNCT nodes are leaves.

25 (0%) PUNCT nodes have one child.

15 (0%) PUNCT nodes have two children.

The highest child degree of a PUNCT node is 2.

Children of PUNCT nodes are attached using 10 different relations: fr-dep/punct (32; 58% instances), fr-dep/nmod (8; 15% instances), fr-dep/case (5; 9% instances), fr-dep/conj (3; 5% instances), fr-dep/name (2; 4% instances), fr-dep/advmod (1; 2% instances), fr-dep/compound (1; 2% instances), fr-dep/det (1; 2% instances), fr-dep/nsubj (1; 2% instances), fr-dep/nummod (1; 2% instances)

Children of PUNCT nodes belong to 9 different parts of speech: PUNCT (32; 58% instances), PROPN (7; 13% instances), NOUN (5; 9% instances), ADV (4; 7% instances), ADP (2; 4% instances), NUM (2; 4% instances), ADJ (1; 2% instances), DET (1; 2% instances), PRON (1; 2% instances)


PUNCT in other languages: [bg] [cs] [de] [el] [en] [es] [eu] [fa] [fi] [fr] [ga] [he] [hu] [it] [ja] [ko] [sv] [u]