Apertium

script to test coverage over wikipedia corpus

Write a script (in python or ruby) that in one mode checks out a specified language module to a given directory, compiles it (or updates it if already existant), and then gets the most recently nightly wikipedia archive for that language and runs coverage over it (as much in RAM if possible). In another mode, it compiles the language pair in a docker instance that it then disposes of after successfully running coverage. Scripts exist in Apertium already for finding where a wikipedia is, extracting a wikipedia archive into a text file, and running coverage.

Read more

Task tags

  • python
  • ruby
  • wikipedia

Students who completed this task

Grzegorz Stark

Task type

  • code Code
close

2017