Final presentations

The Annotator

[[The Annotator]]


Response to pattern package
paradox: annotator is key to create reference data and made invisible 
train on human scoring but this process is hidden

group w/ different experiments
what feature interesting to develop?
How could we play out the problems?


chosing your classifier close to choose your sources
chose sthg interesting to annotate
make a proposal for the pattern software: naming 
from pattern.en import revolution, sentiment etc

issue of ?nudging (economic behaviourism, bill people with comparison of energybill with you rneighbour) hope that through the actions of humans, computer can "learn", improve.
discussion on paternalism reflected discussion on big data
autonomy of the subject to decide

sources Corpus of (non)patternalist statements, also latent paternalism:
    Gutenberg project - positively quot for patternalism
    Wikipedia - can have latent patternalism

3 groups of 3 annotators looked at same data
Looking  at definitions of paternalism
Find agreement on what it is, adding comments on the score - add comments
take quantity into accont -
what to do w/ disagreement? in general this would be removed. Decided to mark the results with D as in disagreement

Meta mining: disagrements are listed
different styles of annotating

Not frictionless, Gijs spent lot of time handling

Train the algorithm to train the algorithm to differe nciate bw paternalism and other things
lots of data, asking the classifier what are the features of paternalism.
Disapointing results: hard to connect to any understanding of paternalism + very few features
testing the algorithm against the annotations

Manage to insert disagreement into the algorithm

Wiki History


Looking at wikipedia history
scripts that dumps history of an article
limit of pattern api, it doesn t understand article is in constant flux, it only gives last version
dump a json file and visualize raw data:
    ../share/wiki_history/terror.html
    explore the history and see when things get removed or added, possibility to see where a term is mentioned
same work on the entry on svw
q: if the feature would be used in another context could it reveal the comments that lead to that decision?
Democratic: it is the opinion of all anotators that is taken into account

Pattern writing coach

if somebody transcribes what I did it will start to criticize me
Characters to use the criterias: positiveness, objectiivty and modality
The Love Coach is a guy ... the coaches have been swapped to counter gender stereotyping
Make more visible how it affects the way sbdy speaks
You have the judgement but are not forced to respond back

Was there too little content maybe for the algorithm to be effective?
"We did not want to hide the crudeness"

"i don't believe the algorithms very much"
write a script where other modules could be plugged in.. like patternalism

paternalism, patternalism, paternity, patternity

what is a bag of words?
feed the algorithm into intself, feeds the bag of words into the classifier and see how it understand itself

Correlative 1 / Small Data


critical, having studied probability and models...
criticize machine learning
no big data, make survey w/ very few data
small data
Trace incertaine, aire incertaine
several a population of (a small number of) anonymous individuals provided a very specific set of data on the adjective 'correlative'
where to put the adj. correlative b/w obj and subj
Graph displays area covered by the adjective correlative
We cannot compare the results obtained through interview and obtained through text mining

If the word covers more than half the area means that the term (or its classification?) has no meaning
"it is crazy"
it is us
it is unbelievable!
https://s-media-cache-ak0.pinimg.com/236x/77/f9/24/77f924cc7208df7b1444d0ea36128d3b.jpg

Bayes naive network: 
Probability

sunny weather and/or salary raise => happiness

Two independent variables suggest a correlation through the result:
    
    The weather is nice because I got a raise
    I did not get a raise because the weather was nice

independent values / dependent values
if there would be a correlation, you could just multiply them

Question is always, are variables really independent or not

What is the probability that you are happy, how does it relate to the probability of you being happy because of a raise?

Correlative 2 / Small Data


parti de l idee trop deterministe de representer un mot par un point
echelle est de totale incertitude, mesure du non-sens
100% les criteres ne veulent plus rien dire
l aire de la surface pas seule indication aussi la forme
plus elle est ramifiee plus elle montre la complexite ds le sens du mot
complexite en montrant les configurations possibles de sens
nombre total de rectangles total ds cette forme on aurait une idee de la complexite du mot
donc moins de sens de retranscrire un mot par un point

the determinist idea to classify a word by a point, it does not make any sense, it is a measure of non-sense
at some point criteria do not matter anymore

i realised that the the surface e is not only importance, also the form

the more its ramified the more it shows the complexity of a word
all the possible configurations of sense

we looked at all possible meanings, we're unprecise in defining words

by counting the number of rectangles : the more the rectangles, the more complex, the less possible to transcribe a word as a dot like CLIPS does
the more the surface is rectangular at the basis, the more simple it would be
it doesn't make sens to transcribe word by a dot, like CLIPS does

it has still makes no sense to me translate a word by a point, like is happening in the pattern library
words are not unidirectional, by analysing text they try to reduce blurriness, defining their results by the 40 % that is not blurry... words after words it is reduced
if we manage to communicate even if there is not intersection, confusion relationships all the time
-> big data can also reflect this
is the word the same going from one person to another, are we measuring the same thing?
looks like the question of the background in debate of last night
"the sample of the sample"

Est-ce que l incertitude diminue? Est-ce qu on mesure la meme chose?
do we measure the same thing each time ? it seems more complex than a measure in physics for instance ?

Is this the question of the background? What space are we in? What plane?

it's beautiful as an exercice : it shows a certain type of questions. shadiness of meanings, words... it shows that human beings are adaptable despite/because we have different meanings for words.
we can live together despite:thanks to this fuzziness (would be horrible if we all would understand same thing
bug reports of conversations?

KAFKA// Mining the trial

../share/the_kafka_trial/
sentiment analysis in pattern analyses the adjectives
Analyzing all the adjectives of the trial: values for positive/negative
the presence of the word wrong in a sentence means the sentence will be classified as negative
The trial is then reordered according to its index of positivity/negativity
Poem;
 Country of the mountains, country of the river
Poem created by navigating through a dictionary - Oulipian s+1
A text-mining inspector!
No irony is permited, no nuance. Use of the irony symbol to signal irony is in use.

It could be used as a study aid. Text mining inspector, inspects the result of the algorithm.

-> text mining inspector (look at where text mining is used and what results it gives/is based on

Speech recognition


using sphinx
[[talk2201 speechRec]]
proposes combinations by proximity of words
each sentence produces a series of hypothesis
-> all the doubts of the software are displayed, normally hidden in black box, now shown

Latent semantic analysis

Confront two models between two texts and see what words connect the two texts
jump b/w meanings of words


Newspaper style

wants to look at style, but to make it easy looked at content.
Sources from various news sites
crawler -> classifier -> webpage
how news items are structured
trained with different newspapers
correlates newspaper and political parties
in the future, use css styles as classifiers