LAS Ranking
1. HIT-SCIR (Harbin) 75.84 ± 0.14 [OK] (p<0.001) 2. TurkuNLP (Turku) 73.28 ± 0.14 [OK] (p=0.039) 3-5. UDPipe Future (Praha) 73.11 ± 0.13 [OK] (p=0.221) 3-5. LATTICE (Paris) 73.02 ± 0.14 [OK] (p=0.461) 3-5. ICS PAS (Warszawa) 73.02 ± 0.14 [OK] (p<0.001) 6. CEA LIST (Paris) 72.56 ± 0.14 [OK] (p=0.036) 7-8. Uppsala (Uppsala) 72.37 ± 0.15 [OK] (p=0.191) 7-8. Stanford (Stanford) 72.29 ± 0.14 [OK] (p<0.001) 9-10. AntNLP (Shanghai) 70.90 ± 0.15 [OK] (p=0.242) 9-10. NLP-Cube (București) 70.82 ± 0.14 [OK] (p=0.032) 11. ParisNLP (Paris) 70.64 ± 0.14 [OK] (p<0.001) 12. SLT-Interactions (Bengaluru) 69.98 ± 0.14 [OK] (p<0.001) 13. IBM NY (Yorktown Heights) 69.11 ± 0.16 [OK] (p<0.001) 14. UniMelb (Melbourne) 68.66 ± 0.15 [OK] (p=0.002) 15. LeisureX (Shanghai) 68.31 ± 0.16 [OK] (p<0.001) 16. KParse (İstanbul) 66.58 ± 0.16 [OK] (p=0.015) 17. Fudan (Shanghai) 66.34 ± 0.15 [OK] (p<0.001) 18. BASELINE UDPipe 1.2 (Praha) 65.80 ± 0.15 [OK] (p=0.048) 19. Phoenix (Shanghai) 65.61 ± 0.16 [OK] (p<0.001) 20. CUNI x-ling (Praha) 64.87 ± 0.16 [OK] (p<0.001) 21. BOUN (İstanbul) 63.54 ± 0.15 [OK] (p<0.001) 22. ONLP lab (Ra'anana) 58.35 ± 0.15 [81] (p<0.001) 23. iParse (Pittsburgh) 55.83 ± 0.11 [65] (p<0.001) 24. HUJI (Yerushalayim) 53.69 ± 0.15 [80] (p<0.001) 25. ArmParser (Yerevan) 47.02 ± 0.11 [66] (p<0.001) 26. SParse (İstanbul) 1.95 ± 0.00 [2]
MLAS Ranking
1. UDPipe Future (Praha) 61.25 ± 0.13 (p=0.007) 2-3. TurkuNLP (Turku) 60.99 ± 0.14 (p=0.254) 2-3. Stanford (Stanford) 60.92 ± 0.13 (p<0.001) 4. ICS PAS (Warszawa) 60.25 ± 0.13 (p<0.001) 5. CEA LIST (Paris) 59.92 ± 0.14 (p<0.001) 6. HIT-SCIR (Harbin) 59.78 ± 0.14 (p<0.001) 7. Uppsala (Uppsala) 59.20 ± 0.15 (p<0.001) 8. NLP-Cube (București) 57.32 ± 0.14 (p<0.001) 9. LATTICE (Paris) 57.01 ± 0.14 (p<0.001) 10. AntNLP (Shanghai) 55.92 ± 0.13 (p=0.034) 11. ParisNLP (Paris) 55.74 ± 0.14 (p<0.001) 12. SLT-Interactions (Bengaluru) 54.52 ± 0.13 (p<0.001) 13-14. LeisureX (Shanghai) 53.70 ± 0.14 (p=0.239) 13-14. UniMelb (Melbourne) 53.62 ± 0.14 (p<0.001) 15. KParse (İstanbul) 53.25 ± 0.15 (p<0.001) 16. Fudan (Shanghai) 52.69 ± 0.15 (p=0.005) 17-18. BASELINE UDPipe 1.2 (Praha) 52.42 ± 0.14 (p=0.066) 17-18. Phoenix (Shanghai) 52.26 ± 0.15 (p<0.001) 19-20. BOUN (İstanbul) 50.40 ± 0.15 (p=0.494) 19-20. CUNI x-ling (Praha) 50.35 ± 0.15 (p<0.001) 21. ONLP lab (Ra'anana) 46.09 ± 0.15 (p<0.001) 22. iParse (Pittsburgh) 45.65 ± 0.12 (p<0.001) 23. HUJI (Yerushalayim) 44.60 ± 0.14 (p<0.001) 24. IBM NY (Yorktown Heights) 40.61 ± 0.13 (p<0.001) 25. ArmParser (Yerevan) 36.28 ± 0.12 (p<0.001) 26. SParse (İstanbul) 1.68 ± 0.00
BLEX Ranking
1. TurkuNLP (Turku) 66.09 ± 0.13 (p<0.001) 2. HIT-SCIR (Harbin) 65.33 ± 0.13 (p<0.001) 3-4. UDPipe Future (Praha) 64.49 ± 0.14 (p=0.301) 3-4. ICS PAS (Warszawa) 64.44 ± 0.14 (p<0.001) 5. Stanford (Stanford) 64.04 ± 0.13 (p<0.001) 6-7. LATTICE (Paris) 62.39 ± 0.14 (p=0.071) 6-7. CEA LIST (Paris) 62.23 ± 0.15 (p<0.001) 8. AntNLP (Shanghai) 60.91 ± 0.14 (p=0.017) 9. ParisNLP (Paris) 60.70 ± 0.14 (p<0.001) 10. SLT-Interactions (Bengaluru) 59.68 ± 0.14 (p<0.001) 11. UniMelb (Melbourne) 58.67 ± 0.14 (p=0.009) 12. LeisureX (Shanghai) 58.42 ± 0.14 (p<0.001) 13-14. BASELINE UDPipe 1.2 (Praha) 55.80 ± 0.15 (p=0.218) 13-14. Phoenix (Shanghai) 55.71 ± 0.15 (p=0.044) 15. NLP-Cube (București) 55.52 ± 0.14 (p=0.007) 16. KParse (İstanbul) 55.26 ± 0.15 (p<0.001) 17-18. CUNI x-ling (Praha) 54.07 ± 0.15 (p=0.360) 17-18. Fudan (Shanghai) 54.03 ± 0.15 (p<0.001) 19. BOUN (İstanbul) 53.45 ± 0.15 (p<0.001) 20. iParse (Pittsburgh) 48.71 ± 0.11 (p<0.001) 21. HUJI (Yerushalayim) 48.05 ± 0.15 (p<0.001) 22. ArmParser (Yerevan) 39.18 ± 0.12 (p<0.001) 23. IBM NY (Yorktown Heights) 32.55 ± 0.13 (p<0.001) 24. Uppsala (Uppsala) 32.09 ± 0.13 (p<0.001) 25. ONLP lab (Ra'anana) 28.29 ± 0.12 (p<0.001) 26. SParse (İstanbul) 1.71 ± 0.00
Other rankings
- LAS per treebank
- MLAS per treebank
- BLEX per treebank
- UAS per treebank
- CLAS per treebank
- UPOS tagging
- XPOS tagging
- Morphological featuers
- All morphological tags
- Lemmatization
- Sentence segmentation
- Word segmentation
- Tokenization
- LAS/MLAS/BLEX including unofficial runs
- LAS/MLAS/BLEX of the 2017 systems on 2018 data
- Treebanks ranked by best LAS
- Treebanks ranked by best MLAS
- Treebanks ranked by best BLEX
- Treebanks ranked by best word segmentation
- Treebanks ranked by best sentence segmentation
- EPE results
All scores were computed by the official evaluation script. The 95% confidence intervals and p-values were computed by Udapi using gold re-segmentation and bootstrap resampling. The p-values were computed by a paired bootstrap test for a given system and the system on the following line. System pairs with p<0.05 are considered significantly different, other pairs are assigned the same range of ranks.
Outputs of system runs are available from http://hdl.handle.net/11234/1-2885.