This is part of archived UD v1 documentation. See http://universaldependencies.org/ for the current version.
home pt/pos issue tracker

DET: determiner

Definition

Determiners are words that modify nouns or noun phrases and express the reference of the noun phrase in context. That is, a determiner may indicate whether the noun is referring to a definite or indefinite element of a class, to a closer or more distant element, to an element belonging to a specified person or thing, to a particular number or quantity, etc.

In Portuguese corpora, numerals are not tagged as DET. In a noun phrase such as “os cinco mortos” (the five dead [people]), only “os” is tagged as DET.

Examples


Treebank Statistics (UD_Portuguese)

There are 61 DET lemmas (0%), 115 DET types (0%) and 35660 DET tokens (16%). Out of 17 observed tags, the rank of DET is: 8 in number of lemmas, 8 in number of types and 3 in number of tokens.

The 10 most frequent DET lemmas: o, um, seu, este, a, outro, esse, todo, algum, muito

The 10 most frequent DET types: o, a, os, as, um, uma, sua, seu, este, esta

The 10 most frequent ambiguous lemmas: o (DET 28065, PRON 220, ADP 3, NOUN 1), um (DET 3202, NUM 371, PRON 7, NOUN 1), seu (DET 1079, PRON 8), este (DET 597, PRON 83), a (ADP 3789, DET 315, PRON 36, ADV 1, PROPN 1), outro (DET 266, PRON 85, ADJ 20), esse (DET 265, PRON 31), todo (DET 265, PRON 63, ADV 5, NOUN 2), algum (DET 174, PRON 29, NOUN 1), muito (ADV 192, DET 136, PRON 70, NOUN 3)

The 10 most frequent ambiguous types: o (DET 10652, PRON 222, NOUN 1), a (DET 9811, ADP 3672, PRON 91, PROPN 1, ADV 1), os (DET 3342, PRON 64), as (DET 2225, PRON 37), um (DET 1615, NUM 233, PRON 2), uma (DET 1432, NUM 136, PRON 2), sua (DET 433, PRON 7), seu (DET 353, PRON 5), este (DET 254, PRON 27), esta (DET 207, PRON 17)

Morphology

The form / lemma ratio of DET is 1.885246 (the average of all parts of speech is 1.432674).

The 1st highest number of forms (9) was observed with the lemma “meu”: meu, meus, minha, minhas, nossos, seu, seus, sua, suas.

The 2nd highest number of forms (5) was observed with the lemma “muito”: mais, muita, muitas, muito, muitos.

The 3rd highest number of forms (5) was observed with the lemma “o”: a, as, o, o(s), os.

DET occurs with 11 features: Number (35547; 100% instances), PronType (34984; 98% instances), Gender (34335; 96% instances), Definite (31631; 89% instances), Person (1283; 4% instances), Poss (1283; 4% instances), pt-feat/Number[psor] (1276; 4% instances), NumType (1012; 3% instances), Reflex (559; 2% instances), Degree (411; 1% instances), pt-feat/Typo (3; 0% instances)

DET occurs with 24 feature-value pairs: Definite=Def, Definite=Ind, Degree=Cmp, Degree=Sup, Gender=Fem, Gender=Masc, NumType=Card, Number=Plur, Number=Sing, Number[psor]=Plur, Number[psor]=Sing, Person=1, Person=2, Person=3, Poss=Yes, PronType=Art, PronType=Dem, PronType=Ind,Neg,Tot, PronType=Int, PronType=Prs, PronType=Rel, PronType=Tot, Reflex=Yes, Typo=Yes

DET occurs with 93 feature combinations. The most frequent feature combination is Definite=Def|Gender=Masc|Number=Sing|PronType=Art (11674 tokens). Examples: o, El, Os, a, um

Relations

DET nodes are attached to their parents using 20 different relations: det (34932; 98% instances), mwe (241; 1% instances), nsubj (119; 0% instances), advmod (97; 0% instances), nmod (70; 0% instances), dobj (60; 0% instances), compound (51; 0% instances), mark (24; 0% instances), acl (14; 0% instances), conj (9; 0% instances), root (8; 0% instances), case (6; 0% instances), cop (6; 0% instances), nsubjpass (6; 0% instances), pt-dep/advmod:emph (5; 0% instances), xcomp (4; 0% instances), ccomp (3; 0% instances), appos (2; 0% instances), iobj (2; 0% instances), name (1; 0% instances)

Parents of DET nodes belong to 14 different parts of speech: NOUN (28257; 79% instances), PROPN (5608; 16% instances), ADJ (665; 2% instances), VERB (594; 2% instances), PRON (192; 1% instances), ADV (99; 0% instances), ADP (86; 0% instances), NUM (79; 0% instances), DET (57; 0% instances), SYM (12; 0% instances), ROOT (8; 0% instances), CONJ (1; 0% instances), PART (1; 0% instances), SCONJ (1; 0% instances)

34786 (98%) DET nodes are leaves.

519 (1%) DET nodes have one child.

217 (1%) DET nodes have two children.

138 (0%) DET nodes have three or more children.

The highest child degree of a DET node is 9.

Children of DET nodes are attached using 25 different relations: mwe (313; 21% instances), compound (260; 17% instances), case (242; 16% instances), nmod (141; 9% instances), punct (127; 9% instances), acl (66; 4% instances), nsubj (57; 4% instances), det (44; 3% instances), name (44; 3% instances), advmod (41; 3% instances), cop (30; 2% instances), nummod (27; 2% instances), conj (24; 2% instances), cc (21; 1% instances), amod (16; 1% instances), dobj (10; 1% instances), advcl (7; 0% instances), mark (7; 0% instances), appos (5; 0% instances), ccomp (3; 0% instances), dep (2; 0% instances), neg (2; 0% instances), pt-dep/advmod:emph (1; 0% instances), csubj (1; 0% instances), xcomp (1; 0% instances)

Children of DET nodes belong to 14 different parts of speech: ADP (347; 23% instances), NOUN (307; 21% instances), SCONJ (224; 15% instances), PUNCT (127; 9% instances), VERB (101; 7% instances), PROPN (88; 6% instances), PRON (85; 6% instances), ADV (69; 5% instances), DET (57; 4% instances), NUM (31; 2% instances), CONJ (25; 2% instances), ADJ (24; 2% instances), PART (5; 0% instances), SYM (2; 0% instances)


Treebank Statistics (UD_Portuguese-Bosque)

There are 53 DET lemmas (0%), 111 DET types (0%) and 35236 DET tokens (15%). Out of 17 observed tags, the rank of DET is: 9 in number of lemmas, 10 in number of types and 3 in number of tokens.

The 10 most frequent DET lemmas: o, um, seu, este, todo, outro, esse, muito, algum, mesmo

The 10 most frequent DET types: o, a, os, as, um, uma, sua, seu, este, esta

The 10 most frequent ambiguous lemmas: o (DET 27984, PRON 325, PROPN 21, NOUN 4, ADP 3), um (DET 3196, NUM 371, PRON 12, PROPN 3, ADP 2, NOUN 1), seu (DET 1084, PRON 3, ADP 1, PROPN 1), este (DET 574, PRON 88), todo (DET 283, PRON 60, ADV 5, NOUN 3), outro (DET 279, PRON 97, NOUN 1), esse (DET 265, PRON 28), muito (ADV 181, DET 174, PRON 52, NOUN 3), algum (DET 169, PRON 34, NOUN 1), mesmo (DET 151, ADV 82, PRON 29, NOUN 9, ADP 2)

The 10 most frequent ambiguous types: o (DET 10520, PRON 344, PROPN 21, NOUN 4), a (DET 9579, ADP 4007, PRON 89, PROPN 27, NOUN 4, ADV 2), os (DET 3324, PRON 76, ADP 5, PROPN 4), as (DET 2233, PRON 43, PROPN 1), um (DET 1616, NUM 233, PROPN 3, PRON 3, ADP 2, NOUN 1), uma (DET 1407, NUM 138, ADP 12, PRON 3, NOUN 3, PROPN 1, ADJ 1), sua (DET 433, ADP 6, PRON 1), seu (DET 355, PRON 2, ADP 1, PROPN 1), este (DET 252, PRON 29), esta (DET 201, PRON 17)

Morphology

The form / lemma ratio of DET is 2.094340 (the average of all parts of speech is 1.449059).

The 1st highest number of forms (9) was observed with the lemma “meu”: meu, meus, minha, minhas, nossos, seu, seus, sua, suas.

The 2nd highest number of forms (5) was observed with the lemma “muito”: mais, muita, muitas, muito, muitos.

The 3rd highest number of forms (5) was observed with the lemma “o”: a, as, o, o(s), os.

DET occurs with 4 features: Gender (34025; 97% instances), Number (34024; 97% instances), PronType (31966; 91% instances), Definite (30856; 88% instances)

DET occurs with 12 feature-value pairs: Definite=Def, Definite=Ind, Gender=Fem, Gender=Masc, Gender=Unsp, Number=Plur, Number=Sing, Number=Unsp, PronType=Art, PronType=Dem, PronType=Int, PronType=Rel

DET occurs with 37 feature combinations. The most frequent feature combination is Definite=Def|Gender=Masc|Number=Sing|PronType=Art (11358 tokens). Examples: o, os, a, um

Relations

DET nodes are attached to their parents using 13 different relations: det (34832; 99% instances), mwe (174; 0% instances), advmod (88; 0% instances), appos (53; 0% instances), dep (32; 0% instances), conj (24; 0% instances), root (9; 0% instances), mark (7; 0% instances), parataxis (7; 0% instances), nsubj (6; 0% instances), dislocated (2; 0% instances), compound (1; 0% instances), dobj (1; 0% instances)

Parents of DET nodes belong to 12 different parts of speech: NOUN (28202; 80% instances), PROPN (5499; 16% instances), ADJ (554; 2% instances), PRON (442; 1% instances), VERB (237; 1% instances), DET (174; 0% instances), NUM (77; 0% instances), ADV (28; 0% instances), ROOT (9; 0% instances), SYM (7; 0% instances), ADP (6; 0% instances), PART (1; 0% instances)

34929 (99%) DET nodes are leaves.

209 (1%) DET nodes have one child.

56 (0%) DET nodes have two children.

42 (0%) DET nodes have three or more children.

The highest child degree of a DET node is 7.

Children of DET nodes are attached using 24 different relations: mwe (145; 30% instances), punct (88; 18% instances), nmod (71; 15% instances), det (34; 7% instances), advmod (25; 5% instances), conj (18; 4% instances), acl:relcl (17; 4% instances), case (15; 3% instances), acl (13; 3% instances), cc (9; 2% instances), nsubj (8; 2% instances), amod (6; 1% instances), mark (6; 1% instances), xcomp (5; 1% instances), dep (3; 1% instances), pt-dep/nmod:npmod (3; 1% instances), appos (2; 0% instances), ccomp (2; 0% instances), cop (2; 0% instances), dobj (2; 0% instances), advcl (1; 0% instances), aux (1; 0% instances), nummod (1; 0% instances), parataxis (1; 0% instances)

Children of DET nodes belong to 13 different parts of speech: DET (174; 36% instances), PUNCT (88; 18% instances), NOUN (63; 13% instances), VERB (33; 7% instances), ADV (24; 5% instances), PROPN (22; 5% instances), ADP (21; 4% instances), ADJ (18; 4% instances), PRON (15; 3% instances), CONJ (9; 2% instances), NUM (5; 1% instances), SCONJ (5; 1% instances), AUX (1; 0% instances)


Treebank Statistics (UD_Portuguese-BR)

There are 1 DET lemmas (7%), 126 DET types (0%) and 26122 DET tokens (9%). Out of 14 observed tags, the rank of DET is: 6 in number of lemmas, 11 in number of types and 6 in number of tokens.

The 10 most frequent DET lemmas: _

The 10 most frequent DET types: o, a, os, um, uma, as, sua, seu, seus, cada

The 10 most frequent ambiguous lemmas: _ (NOUN 57316, ADP 51928, PUNCT 42033, PROPN 32948, VERB 29700, DET 26122, ADJ 15107, CONJ 10984, ADV 9773, NUM 8491, PRON 7392, AUX 5242, PART 748, X 539)

The 10 most frequent ambiguous types: o (DET 6544, PRON 226, ADP 1, PROPN 1, X 1), a (DET 5338, ADP 1658, PRON 64, VERB 5, X 2, PROPN 1, CONJ 1), os (DET 1765, PRON 36, PROPN 1, X 1), um (DET 1704, PRON 176, NUM 121, NOUN 1), uma (DET 1631, NUM 89, PRON 87), as (DET 1097, PRON 15, ADP 13), sua (DET 516, PRON 2), seu (DET 423, PRON 1), cada (DET 135, PRON 2), outros (DET 116, PRON 40)

Morphology

The form / lemma ratio of DET is 126.000000 (the average of all parts of speech is 2514.000000).

The 1st highest number of forms (126) was observed with the lemma “_”: Duas, Imensas, This, Três, Tua, WesleyA, a, aas, algum, alguma, algumas, alguns, ambas, ambos, aquela, aquele, aqueles, as, bastante, cada, casa, certa, certas, certo, certos, cuja, cujas, cujo, cujos, dado, de, demais, determinada, determinadas, determinado, determinados, diferente, diferentes, diversas, diversos, e, el, essa, essas, esse, esses, esta, estas, este, estes, flagrante, la, le, les, los, mais, menos, meu, meus, minha, minhas, muita, muitas, muito, muitos, múltiplos, nenhum, nenhuma, nossa, nossas, nosso, nossos, numerosas, nível, o, oa, onze, os, ourtos, outra, outras, outro, outros, pouca, poucas, pouco, poucos, pouquíssimos, quais, quaisquer, qual, qualquer, quantas, quantos, que, quão, seu, seus, sua, suas, sui, tais, tal, tanta, tantas, tanto, tantos, the, toda, todas, todo, todos, um, uma, umas, uns, varias, varios, vossa, vosso, várias, vários, your, à, às, águaA.

DET does not occur with any features.

Relations

DET nodes are attached to their parents using 13 different relations: det (24425; 94% instances), det:poss (1612; 6% instances), mark (34; 0% instances), dep (15; 0% instances), mwe (14; 0% instances), case (8; 0% instances), conj (5; 0% instances), dobj (2; 0% instances), nmod (2; 0% instances), nsubj (2; 0% instances), advmod (1; 0% instances), cc (1; 0% instances), root (1; 0% instances)

Parents of DET nodes belong to 13 different parts of speech: NOUN (22674; 87% instances), PROPN (2845; 11% instances), PRON (247; 1% instances), PART (120; 0% instances), NUM (100; 0% instances), VERB (48; 0% instances), ADV (25; 0% instances), ADJ (24; 0% instances), ADP (17; 0% instances), X (11; 0% instances), DET (9; 0% instances), CONJ (1; 0% instances), ROOT (1; 0% instances)

26076 (100%) DET nodes are leaves.

2 (0%) DET nodes have one child.

41 (0%) DET nodes have two children.

3 (0%) DET nodes have three or more children.

The highest child degree of a DET node is 6.

Children of DET nodes are attached using 9 different relations: mwe (66; 69% instances), cc (8; 8% instances), conj (7; 7% instances), det (4; 4% instances), punct (4; 4% instances), nmod (3; 3% instances), case (2; 2% instances), cop (1; 1% instances), parataxis (1; 1% instances)

Children of DET nodes belong to 11 different parts of speech: NOUN (37; 39% instances), CONJ (34; 35% instances), DET (9; 9% instances), ADP (4; 4% instances), PUNCT (4; 4% instances), ADJ (3; 3% instances), ADV (1; 1% instances), NUM (1; 1% instances), PRON (1; 1% instances), PROPN (1; 1% instances), VERB (1; 1% instances)


DET in other languages: [bg] [cs] [de] [el] [en] [es] [eu] [fa] [fi] [fr] [ga] [he] [hu] [it] [ja] [ko] [sv] [u]