Other Web as Corpus sites
- The Special Interest Group on the Web as Corpus (SIGWAC) of the Association for Computational Linguistics (ACL) is the main professional organisation for Web as Corpus researchers.
- The WACwiki is currently the main place for sharing ideas and information on the Web as Corpus. Bill Fletcher also offers some large data sets for download from this site.
- The WaCky wiki documents the groundbreaking efforts of the WaCky initiative to build a tightly-knit Web as Corpus community and develop basic software for collecting and annotation linguistic data from the Web.
- The LiMiNe project (Linguistic Mining of the Net) offers several large Web corpora (each one containing more than a billion words of text) with linguistic annotations.