LAS Ranking
1. HIT-SCIR (Harbin) 75.84 ± 0.14 [OK] (p<0.001)
2. TurkuNLP (Turku) 73.28 ± 0.14 [OK] (p=0.039)
3-5. UDPipe Future (Praha) 73.11 ± 0.13 [OK] (p=0.221)
3-5. LATTICE (Paris) 73.02 ± 0.14 [OK] (p=0.461)
3-5. ICS PAS (Warszawa) 73.02 ± 0.14 [OK] (p<0.001)
6. CEA LIST (Paris) 72.56 ± 0.14 [OK] (p=0.036)
7-8. Uppsala (Uppsala) 72.37 ± 0.15 [OK] (p=0.191)
7-8. Stanford (Stanford) 72.29 ± 0.14 [OK] (p<0.001)
9-10. AntNLP (Shanghai) 70.90 ± 0.15 [OK] (p=0.242)
9-10. NLP-Cube (București) 70.82 ± 0.14 [OK] (p=0.032)
11. ParisNLP (Paris) 70.64 ± 0.14 [OK] (p<0.001)
12. SLT-Interactions (Bengaluru) 69.98 ± 0.14 [OK] (p<0.001)
13. IBM NY (Yorktown Heights) 69.11 ± 0.16 [OK] (p<0.001)
14. UniMelb (Melbourne) 68.66 ± 0.15 [OK] (p=0.002)
15. LeisureX (Shanghai) 68.31 ± 0.16 [OK] (p<0.001)
16. KParse (İstanbul) 66.58 ± 0.16 [OK] (p=0.015)
17. Fudan (Shanghai) 66.34 ± 0.15 [OK] (p<0.001)
18. BASELINE UDPipe 1.2 (Praha) 65.80 ± 0.15 [OK] (p=0.048)
19. Phoenix (Shanghai) 65.61 ± 0.16 [OK] (p<0.001)
20. CUNI x-ling (Praha) 64.87 ± 0.16 [OK] (p<0.001)
21. BOUN (İstanbul) 63.54 ± 0.15 [OK] (p<0.001)
22. ONLP lab (Ra'anana) 58.35 ± 0.15 [81] (p<0.001)
23. iParse (Pittsburgh) 55.83 ± 0.11 [65] (p<0.001)
24. HUJI (Yerushalayim) 53.69 ± 0.15 [80] (p<0.001)
25. ArmParser (Yerevan) 47.02 ± 0.11 [66] (p<0.001)
26. SParse (İstanbul) 1.95 ± 0.00 [2]
MLAS Ranking
1. UDPipe Future (Praha) 61.25 ± 0.13 (p=0.007)
2-3. TurkuNLP (Turku) 60.99 ± 0.14 (p=0.254)
2-3. Stanford (Stanford) 60.92 ± 0.13 (p<0.001)
4. ICS PAS (Warszawa) 60.25 ± 0.13 (p<0.001)
5. CEA LIST (Paris) 59.92 ± 0.14 (p<0.001)
6. HIT-SCIR (Harbin) 59.78 ± 0.14 (p<0.001)
7. Uppsala (Uppsala) 59.20 ± 0.15 (p<0.001)
8. NLP-Cube (București) 57.32 ± 0.14 (p<0.001)
9. LATTICE (Paris) 57.01 ± 0.14 (p<0.001)
10. AntNLP (Shanghai) 55.92 ± 0.13 (p=0.034)
11. ParisNLP (Paris) 55.74 ± 0.14 (p<0.001)
12. SLT-Interactions (Bengaluru) 54.52 ± 0.13 (p<0.001)
13-14. LeisureX (Shanghai) 53.70 ± 0.14 (p=0.239)
13-14. UniMelb (Melbourne) 53.62 ± 0.14 (p<0.001)
15. KParse (İstanbul) 53.25 ± 0.15 (p<0.001)
16. Fudan (Shanghai) 52.69 ± 0.15 (p=0.005)
17-18. BASELINE UDPipe 1.2 (Praha) 52.42 ± 0.14 (p=0.066)
17-18. Phoenix (Shanghai) 52.26 ± 0.15 (p<0.001)
19-20. BOUN (İstanbul) 50.40 ± 0.15 (p=0.494)
19-20. CUNI x-ling (Praha) 50.35 ± 0.15 (p<0.001)
21. ONLP lab (Ra'anana) 46.09 ± 0.15 (p<0.001)
22. iParse (Pittsburgh) 45.65 ± 0.12 (p<0.001)
23. HUJI (Yerushalayim) 44.60 ± 0.14 (p<0.001)
24. IBM NY (Yorktown Heights) 40.61 ± 0.13 (p<0.001)
25. ArmParser (Yerevan) 36.28 ± 0.12 (p<0.001)
26. SParse (İstanbul) 1.68 ± 0.00
BLEX Ranking
1. TurkuNLP (Turku) 66.09 ± 0.13 (p<0.001)
2. HIT-SCIR (Harbin) 65.33 ± 0.13 (p<0.001)
3-4. UDPipe Future (Praha) 64.49 ± 0.14 (p=0.301)
3-4. ICS PAS (Warszawa) 64.44 ± 0.14 (p<0.001)
5. Stanford (Stanford) 64.04 ± 0.13 (p<0.001)
6-7. LATTICE (Paris) 62.39 ± 0.14 (p=0.071)
6-7. CEA LIST (Paris) 62.23 ± 0.15 (p<0.001)
8. AntNLP (Shanghai) 60.91 ± 0.14 (p=0.017)
9. ParisNLP (Paris) 60.70 ± 0.14 (p<0.001)
10. SLT-Interactions (Bengaluru) 59.68 ± 0.14 (p<0.001)
11. UniMelb (Melbourne) 58.67 ± 0.14 (p=0.009)
12. LeisureX (Shanghai) 58.42 ± 0.14 (p<0.001)
13-14. BASELINE UDPipe 1.2 (Praha) 55.80 ± 0.15 (p=0.218)
13-14. Phoenix (Shanghai) 55.71 ± 0.15 (p=0.044)
15. NLP-Cube (București) 55.52 ± 0.14 (p=0.007)
16. KParse (İstanbul) 55.26 ± 0.15 (p<0.001)
17-18. CUNI x-ling (Praha) 54.07 ± 0.15 (p=0.360)
17-18. Fudan (Shanghai) 54.03 ± 0.15 (p<0.001)
19. BOUN (İstanbul) 53.45 ± 0.15 (p<0.001)
20. iParse (Pittsburgh) 48.71 ± 0.11 (p<0.001)
21. HUJI (Yerushalayim) 48.05 ± 0.15 (p<0.001)
22. ArmParser (Yerevan) 39.18 ± 0.12 (p<0.001)
23. IBM NY (Yorktown Heights) 32.55 ± 0.13 (p<0.001)
24. Uppsala (Uppsala) 32.09 ± 0.13 (p<0.001)
25. ONLP lab (Ra'anana) 28.29 ± 0.12 (p<0.001)
26. SParse (İstanbul) 1.71 ± 0.00
Other rankings
- LAS per treebank
- MLAS per treebank
- BLEX per treebank
- UAS per treebank
- CLAS per treebank
- UPOS tagging
- XPOS tagging
- Morphological featuers
- All morphological tags
- Lemmatization
- Sentence segmentation
- Word segmentation
- Tokenization
- LAS/MLAS/BLEX including unofficial runs
- LAS/MLAS/BLEX of the 2017 systems on 2018 data
- Treebanks ranked by best LAS
- Treebanks ranked by best MLAS
- Treebanks ranked by best BLEX
- Treebanks ranked by best word segmentation
- Treebanks ranked by best sentence segmentation
- EPE results
All scores were computed by the official evaluation script. The 95% confidence intervals and p-values were computed by Udapi using gold re-segmentation and bootstrap resampling. The p-values were computed by a paired bootstrap test for a given system and the system on the following line. System pairs with p<0.05 are considered significantly different, other pairs are assigned the same range of ranks.
Outputs of system runs are available from http://hdl.handle.net/11234/1-2885.