Foreign: is this a foreign word?
Boolean feature. Is this a foreign word? Not a loan word but a genuinely foreign word appearing inside native text, e.g. inside direct speech, titles of books etc.
Note that Czech data (especially those from the PDT) often indicate the original part of speech of foreign words. Thus this feature may occur with any POS tag. If the original part of speech is not known, the feature will accompany the cs-pos/X tag.
Foreign: it is foreign
Examples
- … nese jméno VLIW (Very Long Instruction Word – velmi dlouhé instrukční slovo)
Fscript: it is foreign and written in a foreign script
Examples
- V nepálštině se hora jmenuje सगरमाथा. “In Nepali, the mountain is called सगरमाथा.”
Tscript: it is foreign and transcribed from a foreign script
Examples
- Výše uvedené nepálské slovo lze přepsat jako Sagaramāthā. “The above Nepali word can be transcribed Sagaramāthā.”
Diffs
Prague Dependency Treebank
PDT does not contain words in foreign scripts (what it does contain are foreign letters based on
the Latin script), and transcriptions from foreign scripts are not explicitly marked, hence the
values Fscript and Tscript do not appear in the converted PDT data.
For proper nouns the borderline between foreign words and loan words is somewhat fuzzy, so e.g. the English personal name George is marked as foreign even though it would not normally be translated (except for names of rulers and saints, which would become Jiří).
Articles in foreign names (the, die, le) are tagged cs-pos/ADJ, not cs-pos/DET.
Treebank Statistics (UD_Czech)
This feature is language-specific.
It occurs with 1 different values: Foreign.
9317 tokens (1%) have a non-empty value of Foreign.
3670 types (3%) occur at least once with a non-empty value of Foreign.
3490 lemmas (6%) occur at least once with a non-empty value of Foreign.
The feature is used with 13 part-of-speech tags: cs-pos/PROPN (3684; 0% instances), cs-pos/ADJ (2669; 0% instances), cs-pos/NOUN (1813; 0% instances), cs-pos/ADP (592; 0% instances), cs-pos/PART (120; 0% instances), cs-pos/VERB (120; 0% instances), cs-pos/ADV (116; 0% instances), cs-pos/CONJ (80; 0% instances), cs-pos/PRON (79; 0% instances), cs-pos/NUM (29; 0% instances), cs-pos/SCONJ (8; 0% instances), cs-pos/INTJ (6; 0% instances), cs-pos/DET (1; 0% instances).
PROPN
3684 cs-pos/PROPN tokens (4% of all PROPN tokens) have a non-empty value of Foreign.
The most frequent other feature values with which PROPN and Foreign co-occurred: Negative=Pos (3684; 100%), Case=EMPTY (2905; 79%), Abbr=EMPTY (2670; 72%), NameType=Com (2512; 68%), Animacy=EMPTY (2259; 61%), Number=EMPTY (2177; 59%).
PROPN tokens may have the following values of Foreign:
Foreign(3684; 100% of non-emptyForeign): HZDS, IRA, Floyd, Nature, International, Science, Sinn, Fein, Times, Cup
Foreign seems to be lexical feature of PROPN. 100% lemmas (1422) occur only with one value of Foreign.
ADJ
2669 cs-pos/ADJ tokens (1% of all ADJ tokens) have a non-empty value of Foreign.
The most frequent other feature values with which ADJ and Foreign co-occurred: Negative=Pos (2665; 100%), Degree=Pos (2655; 99%), Animacy=EMPTY (2570; 96%), Case=EMPTY (2545; 95%), Number=EMPTY (2447; 92%), Gender=EMPTY (2439; 91%).
ADJ tokens may have the following values of Foreign:
Foreign(2669; 100% of non-emptyForeign): New, the, open, US, Pink, la, Le, Deutsche, die, United
Foreign seems to be lexical feature of ADJ. 100% lemmas (1003) occur only with one value of Foreign.
NOUN
1813 cs-pos/NOUN tokens (0% of all NOUN tokens) have a non-empty value of Foreign.
The most frequent other feature values with which NOUN and Foreign co-occurred: Negative=Pos (1812; 100%), Case=EMPTY (1250; 69%), Animacy=EMPTY (1015; 56%), Number=EMPTY (975; 54%).
NOUN tokens may have the following values of Foreign:
Foreign(1813; 100% of non-emptyForeign): play, managementu, management, CD, s, facto, st, o, homo, neem
Foreign seems to be lexical feature of NOUN. 100% lemmas (946) occur only with one value of Foreign.
ADP
592 cs-pos/ADP tokens (0% of all ADP tokens) have a non-empty value of Foreign.
The most frequent other feature values with which ADP and Foreign co-occurred: AdpType=Prep (592; 100%), Case=EMPTY (353; 60%).
ADP tokens may have the following values of Foreign:
Foreign(592; 100% of non-emptyForeign): de, of, di, van, in, von, versus, ad, Pro, to
Foreign seems to be lexical feature of ADP. 100% lemmas (56) occur only with one value of Foreign.
PART
120 cs-pos/PART tokens (1% of all PART tokens) have a non-empty value of Foreign.
PART tokens may have the following values of Foreign:
Foreign(120; 100% of non-emptyForeign): off, džambo, not, t, oui, Bienvenue, So, ne, sorry, viva
Foreign seems to be lexical feature of PART. 100% lemmas (28) occur only with one value of Foreign.
VERB
120 cs-pos/VERB tokens (0% of all VERB tokens) have a non-empty value of Foreign.
The most frequent other feature values with which VERB and Foreign co-occurred: Aspect=EMPTY (120; 100%), Negative=Pos (114; 95%), Gender=EMPTY (112; 93%), Person=EMPTY (66; 55%), Tense=EMPTY (63; 53%), Voice=EMPTY (62; 52%), Mood=EMPTY (61; 51%).
VERB tokens may have the following values of Foreign:
Foreign(120; 100% of non-emptyForeign): is, Be, can, est, transit, Check, Come, Habent, Keep, Love
Foreign seems to be lexical feature of VERB. 100% lemmas (86) occur only with one value of Foreign.
ADV
116 cs-pos/ADV tokens (0% of all ADV tokens) have a non-empty value of Foreign.
The most frequent other feature values with which ADV and Foreign co-occurred: Negative=EMPTY (107; 92%), Degree=EMPTY (107; 92%).
ADV tokens may have the following values of Foreign:
Foreign(116; 100% of non-emptyForeign): cca, priori, Today, live, Here, Only, Sic, Very, dove, echt
Foreign seems to be lexical feature of ADV. 100% lemmas (71) occur only with one value of Foreign.
CONJ
80 cs-pos/CONJ tokens (0% of all CONJ tokens) have a non-empty value of Foreign.
CONJ tokens may have the following values of Foreign:
Foreign(80; 100% of non-emptyForeign): and, et, und, As, or, ma, So, e, n
PRON
79 cs-pos/PRON tokens (0% of all PRON tokens) have a non-empty value of Foreign.
The most frequent other feature values with which PRON and Foreign co-occurred: Variant=EMPTY (78; 99%), Reflex=EMPTY (76; 96%), Gender=EMPTY (58; 73%), PronType=Prs (49; 62%), Case=EMPTY (42; 53%).
PRON tokens may have the following values of Foreign:
Foreign(79; 100% of non-emptyForeign): it, All, you, I, Me, We, Some, Us, My, She
Foreign seems to be lexical feature of PRON. 100% lemmas (34) occur only with one value of Foreign.
NUM
29 cs-pos/NUM tokens (0% of all NUM tokens) have a non-empty value of Foreign.
The most frequent other feature values with which NUM and Foreign co-occurred: NumForm=Word (29; 100%), Gender=EMPTY (29; 100%), NumType=Card (29; 100%), Case=EMPTY (26; 90%), NumValue=1,2,3 (24; 83%), Number=Plur (22; 76%).
NUM tokens may have the following values of Foreign:
Foreign(29; 100% of non-emptyForeign): Four, Twenty, Seven, Six, one, Five, Three, Tre, Tri, seděm
Foreign seems to be lexical feature of NUM. 100% lemmas (12) occur only with one value of Foreign.
SCONJ
8 cs-pos/SCONJ tokens (0% of all SCONJ tokens) have a non-empty value of Foreign.
SCONJ tokens may have the following values of Foreign:
Foreign(8; 100% of non-emptyForeign): as, If, When, ak, ako, gdyž, kak
INTJ
6 cs-pos/INTJ tokens (5% of all INTJ tokens) have a non-empty value of Foreign.
INTJ tokens may have the following values of Foreign:
Foreign(6; 100% of non-emptyForeign): O, propos, Bang, Boom, Crash
DET
1 cs-pos/DET tokens (0% of all DET tokens) have a non-empty value of Foreign.
The most frequent other feature values with which DET and Foreign co-occurred: Person=1 (1; 100%), Gender[psor]=EMPTY (1; 100%), PronType=Prs (1; 100%), Gender=Fem (1; 100%), Case=EMPTY (1; 100%), Reflex=EMPTY (1; 100%), Number[psor]=Plur (1; 100%), Number=Sing (1; 100%), Poss=Yes (1; 100%).
DET tokens may have the following values of Foreign:
Foreign(1; 100% of non-emptyForeign): Notre
Relations with Agreement in Foreign
The 10 most frequent relations where parent and child node agree in Foreign:
PROPN –[foreign]–> ADJ (920; 100%),
NOUN –[foreign]–> ADJ (594; 100%),
PROPN –[foreign]–> PROPN (286; 99%),
NOUN –[foreign]–> NOUN (163; 99%),
ADJ –[foreign]–> ADJ (139; 100%),
NOUN –[foreign]–> ADP (127; 100%),
ADJ –[foreign]–> PROPN (96; 100%),
NOUN –[foreign]–> PART (51; 100%),
ADJ –[foreign]–> NOUN (40; 100%),
NOUN –[foreign]–> PROPN (27; 87%).
Treebank Statistics (UD_Czech-CAC)
This feature is language-specific.
It occurs with 1 different values: Foreign.
525 tokens (0%) have a non-empty value of Foreign.
386 types (1%) occur at least once with a non-empty value of Foreign.
375 lemmas (1%) occur at least once with a non-empty value of Foreign.
The feature is used with 9 part-of-speech tags: cs-pos/NOUN (257; 0% instances), cs-pos/ADJ (116; 0% instances), cs-pos/ADP (64; 0% instances), cs-pos/PROPN (37; 0% instances), cs-pos/PART (14; 0% instances), cs-pos/ADV (13; 0% instances), cs-pos/PRON (13; 0% instances), cs-pos/VERB (8; 0% instances), cs-pos/CONJ (3; 0% instances).
NOUN
257 cs-pos/NOUN tokens (0% of all NOUN tokens) have a non-empty value of Foreign.
The most frequent other feature values with which NOUN and Foreign co-occurred: Negative=Pos (257; 100%), Animacy=EMPTY (178; 69%).
NOUN tokens may have the following values of Foreign:
Foreign(257; 100% of non-emptyForeign): luxe, vitro, generis, nepusto, pusto, excellence, homo, lege, peeling, Buch
Foreign seems to be lexical feature of NOUN. 100% lemmas (204) occur only with one value of Foreign.
ADJ
116 cs-pos/ADJ tokens (0% of all ADJ tokens) have a non-empty value of Foreign.
The most frequent other feature values with which ADJ and Foreign co-occurred: Negative=Pos (116; 100%), Degree=Pos (113; 97%), Animacy=EMPTY (103; 89%), Case=EMPTY (82; 71%), Number=EMPTY (80; 69%), Gender=EMPTY (77; 66%).
ADJ tokens may have the following values of Foreign:
Foreign(116; 100% of non-emptyForeign): online, signifiant, super, la, Jazykovedným, New, Telephone, Tonkünstler, ferenda, fit
Foreign seems to be lexical feature of ADJ. 100% lemmas (95) occur only with one value of Foreign.
ADP
64 cs-pos/ADP tokens (0% of all ADP tokens) have a non-empty value of Foreign.
The most frequent other feature values with which ADP and Foreign co-occurred: AdpType=Prep (64; 100%).
ADP tokens may have the following values of Foreign:
Foreign(64; 100% of non-emptyForeign): de, in, a, ad, cross, of, par, Pro, ante, aus
Foreign seems to be lexical feature of ADP. 100% lemmas (21) occur only with one value of Foreign.
PROPN
37 cs-pos/PROPN tokens (0% of all PROPN tokens) have a non-empty value of Foreign.
The most frequent other feature values with which PROPN and Foreign co-occurred: Negative=Pos (37; 100%), Abbr=EMPTY (36; 97%), Case=EMPTY (30; 81%), Number=EMPTY (24; 65%), Animacy=EMPTY (24; 65%).
PROPN tokens may have the following values of Foreign:
Foreign(37; 100% of non-emptyForeign): Combi, Kombi, Manche, Orchester, Bell, Böhmen, Corriere, Fruit, Gaudeamus, George
Foreign seems to be lexical feature of PROPN. 100% lemmas (32) occur only with one value of Foreign.
PART
14 cs-pos/PART tokens (0% of all PART tokens) have a non-empty value of Foreign.
PART tokens may have the following values of Foreign:
Foreign(14; 100% of non-emptyForeign): La, das, des, non, Le, el, quo, Al
ADV
13 cs-pos/ADV tokens (0% of all ADV tokens) have a non-empty value of Foreign.
The most frequent other feature values with which ADV and Foreign co-occurred: Negative=EMPTY (13; 100%), Degree=EMPTY (13; 100%).
ADV tokens may have the following values of Foreign:
Foreign(13; 100% of non-emptyForeign): priori, explicite, quo, defacto, expost, innuce, ipsofacto, memoriam, theory
PRON
13 cs-pos/PRON tokens (0% of all PRON tokens) have a non-empty value of Foreign.
The most frequent other feature values with which PRON and Foreign co-occurred: Variant=EMPTY (13; 100%), Number=Sing (10; 77%), Reflex=EMPTY (10; 77%), Person=EMPTY (9; 69%), PronType=Prs (7; 54%).
PRON tokens may have the following values of Foreign:
Foreign(13; 100% of non-emptyForeign): sui, eo, ipso, Tous, er, hoc, qua, quem, they
VERB
8 cs-pos/VERB tokens (0% of all VERB tokens) have a non-empty value of Foreign.
The most frequent other feature values with which VERB and Foreign co-occurred: Aspect=EMPTY (8; 100%), Negative=Pos (8; 100%), Gender=EMPTY (7; 88%), Person=EMPTY (5; 63%), Mood=EMPTY (5; 63%), Tense=EMPTY (5; 63%).
VERB tokens may have the following values of Foreign:
Foreign(8; 100% of non-emptyForeign): are, data, formo, movere, savoir, singen, singt, vivre
CONJ
3 cs-pos/CONJ tokens (0% of all CONJ tokens) have a non-empty value of Foreign.
CONJ tokens may have the following values of Foreign:
Foreign(3; 100% of non-emptyForeign): et, and
Relations with Agreement in Foreign
The 10 most frequent relations where parent and child node agree in Foreign:
NOUN –[foreign]–> ADJ (35; 100%),
NOUN –[conj]–> NOUN (23; 57%),
NOUN –[foreign]–> ADP (18; 100%),
NOUN –[foreign]–> NOUN (14; 82%),
ADJ –[foreign]–> NOUN (13; 100%),
NOUN –[case]–> ADP (11; 52%),
ADJ –[conj]–> ADJ (7; 88%),
PROPN –[foreign]–> ADJ (7; 100%),
NOUN –[foreign]–> PART (6; 100%),
PROPN –[foreign]–> PROPN (5; 100%).