wikifoo

/dev/random
As part of the 2011 Wikimedia Summer of Research, we uncovered a possible correlation between the decline in new active editors that began in 2007 and the rise of warnings issued to new users by bots and automated tools, which started in 2006.
http://blog.wikimedia.org/2012/03/27/analysis-of-the-quality-of-newcomers-in-wikipedia-over-time/

L=A=N=G=U=A=G=E
James Joyce dictionary
deception
detection of alzheimer's = Memories of my nervous illness by Daniel Paul Schreber
http://en.wikipedia.org/wiki/Daniel_Paul_Schreber
http://www.luftgangster.de/schreber/start.html

http://en.wikipedia.org/wiki/War_on_Terror

Wikipedia / Mediawiki API: http://www.mediawiki.org/wiki/API:Main_page

documentation english pattern: http://www.clips.ua.ac.be/pages/pattern-en

Tasks!

=> build wiki parsing tool
=> collate data in csv (necessary?)
=> build pattern parser (checking modality && sentiment) for output - what shall be the output? text-based, graph, web application?
=> Choose and download taxonomy datasets

http://en.wikipedia.org/w/api.php?action=query&prop=revisions&rvprop=content&format=xml&titles=Main%20Page

Getting the history of a wikipedia page.

Properties Revision
http://www.mediawiki.org/wiki/API:Properties#revisions_.2F_rv

Useful wiki API URLs:

get revisions and diff between them => http://en.wikipedia.org/w/api.php?action=query&prop=revisions&rvstartid=642867209&rvdiffto=642867209&rvlimit=10&titles=War_on_Terror&rvprop=timestamp|user|comment|content|ids&format=json
get 10 revisions from the article on War on Terror = > http://en.wikipedia.org/w/api.php?action=query&prop=revisions&rvlimit=10&titles=War_on_Terror&rvprop=timestamp|user|comment|content|ids
Add &format=json to url for querying!!!!!!

Python diffing
http://localhost/doc/python2.7/html/library/difflib.html?highlight=diff#difflib

how subjective is the Wikipedia article for Neutrality?

Wikipedia article: Neutrality

'Neutral' is 0.39375 subjective.
'Politics and social science' is 0.195238095238 subjective.
'Mathematics and natural science' is 0.338293650794 subjective.
'Geographic locations' is 0.0 subjective.
'Other and related senses' is 0.3875 subjective.

Wikipedia article: Subjectivity

'Subjectivity' is 0.405013736264 subjective and -0.00144230769231 positive.
'Society' is 0.414166666667 subjective and 0.015 positive.
'Self' is 0.37 subjective and -0.0388888888889 positive.
'See also' is 0.333333333333 subjective and -0.166666666667 positive.
'References' is 0.0 subjective and 0.0 positive.
'Further reading' is 0.176136363636 subjective and 0.0340909090909 positive.

# ;) / tourette.py
from pattern.en.wordlist import PROFANITY
import os

for word in PROFANITY:
    print word
    os.system('echo "'+word+'" | festival --tts --pipe')

#!/usr/bin/python #getting: time|user|content import urllib import json from csv import writer pagetitle = 'War_on_Terror' #baseq = 'http://en.wikipedia.org/w/api.php?action=query&prop=revisions&rvli baseq = 'http://en.wikipedia.org/w/api.php?action=query&prop=revisions&rvlim q=baseq count = 0 csvfile = open('revisions.csv', 'wb') w = writer(csvfile, dialect='excel') while True: results = json.load(urllib.urlopen(q)) p = results['query']['pages'] for key in p: pass revs = p[key]['revisions'] count += len(revs) print revs[-1]['timestamp'] print len(revs) for r in revs: w.writerow((r['revid'], r['timestamp'], r['user'].encode('utf-8'), r['co rvcontinue = None if 'query-continue' in results: if 'revisions' in results['query-continue']: if 'rvcontinue' in results['query-continue']['revisions']: rvcontinue = results['query-continue']['revisions']['rvcontinue'] q = baseq+"&rvcontinue="+str(rvcontinue) if rvcontinue==None: break break print "done" print count, "total revs" csvfile.close()

# GETTING DATASETS FOR USING TAXONOMIES

# DBpedia used for source of generating taxonomies
# http://wiki.dbpedia.org/Downloads2014?v=jd3#persondata
# In this instance 'person data' for identifying politicians
# After extracting the tar I check the data availabilitygrep politician persondata_en.nt | head -n 100
# Output looks like this:
<http://dbpedia.org/resource/Urho_Kekkonen> <http://purl.org/dc/elements/1.1/description> "Finnish politician, Prime Minister and President"@en .
<http://dbpedia.org/resource/William_Allen_(governor)> <http://purl.org/dc/elements/1.1/description> "American politician"@en .
<http://dbpedia.org/resource/Winnie_Madikizela-Mandela> <http://purl.org/dc/elements/1.1/description> "South African politician"@en .
<http://dbpedia.org/resource/William_Jardine_(merchant)> <http://purl.org/dc/elements/1.1/description> "British politician"@en .
<http://dbpedia.org/resource/Wim_Kok> <http://purl.org/dc/elements/1.1/description> "Dutch politician"@en .
<http://dbpedia.org/resource/William_Ewart_Gladstone> <http://purl.org/dc/elements/1.1/description> "British politician and prime minister"@en .
<http://dbpedia.org/resource/Ilona_Staller> <http://purl.org/dc/elements/1.1/description> "Hungarian-born Italian porn actress and politician"@en .
<http://dbpedia.org/resource/Marcus_Junius_Brutus_the_Younger> <http://purl.org/dc/elements/1.1/description> "Roman politician"@en .
<http://dbpedia.org/resource/Fran%C3%A7ois_Mitterrand> <http://purl.org/dc/elements/1.1/description> "French politician"@en .
<http://dbpedia.org/resource/Gerry_Adams> <http://purl.org/dc/elements/1.1/description> "British politician"@en .
<http://dbpedia.org/resource/Enoch_Powell> <http://purl.org/dc/elements/1.1/description> "British politician"@en .
<http://dbpedia.org/resource/Michael_Bloomberg> <http://purl.org/dc/elements/1.1/description> "American businessman, philanthropist, politician"@en .
# I think I will try pyparsing to extract this info