Pattern for Python

Pattern is a web mining module for the Python programming language.

It has tools for data mining (Google, Twitter and Wikipedia API, a web crawler, a HTML DOM parser), natural language processing (part-of-speech taggers, n-gram search, sentiment analysis, WordNet), machine learning (vector space model, clustering, SVM), network analysis and visualization.

It is well documented and bundled with 30+ examples and 350+ unit tests. The source code is licensed under BSD. The documentation assumes no prior knowledge ‘except for a background in Python programming’. The source code is released under a BSD license, so it can be incorporated into proprietary products or used in combination with other open source packages such as SCRAPY (web mining), NLTK (natural language processing), PYBRAIN and PYML (machine learning) and NETWORKX (network analysis).

http://www.clips.ua.ac.be/pattern

Case studies:

Authorship attribution and verification with many authors and little data: http://dl.acm.org/citation.cfm?id=1599146

Measuring the complexity of writing systems: http://www.tandfonline.com/doi/abs/10.1080/09296179408590015#.VB2yLK3l2Bs

Predicting age and gender in online social networks: http://dl.acm.org/citation.cfm?id=2065035

Personae, a corpus for author and personality prediction from text: http://www.clips.uantwerpen.be/sites/default/files/LD08lrec.pdf