Software for Google Web 1T 5-Grams and other N-Gram Databases
-
Web1T5-Easy (version 1.1)
Web1T5-Easy is a collection of Perl scripts for indexing and querying the Google Web 1T 5-Grams database with the open-source database engine SQLite. This package offers a quick and convenient way to build an interactively searchable version of the Web1T5 database, including a full collocation analysis and a simple, but powerful Web interface. It is not designed as a high-performance Web service and requires considerable amounts of disk space (approx. 220 GiB) as well as patience (indexing may take up to 2 weeks on a state-of-the-art server).
You will soon be able to check out cutting-edge source code from the sf.net repository with the following command:
svn co svn://svn.code.sf.net/p/webascorpus/code/ngrams/Web1T5-Easy/trunk Web1T5-EasyWeb1T5-Easy was written by Stefan Evert. An online demonstration is available here:
http://linglit193.linglit.tu-darmstadt.de/cgi-bin/Web1T5/Web1T5_freq.perl (temporary, please do not run automated queries!).