Pattern is a web mining module for the Python programming language.
It has tools for data mining (Google, Twitter and Wikipedia API, a web crawler, a HTML DOM parser), natural language processing (part-of-speech taggers, n-gram search, sentiment analysis, WordNet), machine learning (vector space model, clustering, SVM), network analysis and
It is well documented and bundled with 30+ examples and 350+ unit tests. The source code is licensed under BSD. The documentation assumes no prior knowledge ‘except for a background in Python programming’. The source code is released under a BSD license, so it can be incorporated into proprietary products or used in combination with other open source packages such as SCRAPY (web mining), NLTK (natural language processing), PYBRAIN and PYML (machine learning) and NETWORKX (network analysis).
Authorship attribution and verification with many authors and little data: http://dl.acm.org/citation.cfm?id=1599146
Measuring the complexity of writing systems: http://www.tandfonline.com/doi/abs/10.1080/09296179408590015#.VB2yLK3l2Bs
Predicting age and gender in online social networks: http://dl.acm.org/citation.cfm?id=2065035
Personae, a corpus for author and personality prediction from text: http://www.clips.uantwerpen.be/sites/default/files/LD08lrec.pdf