ICWSM 2010 Liveblog, Day 2

Fourth International AAAI Conference on Weblogs and Social Media (ICWSM-10)

***Microblogging 2***

Predicting Elections with Twitter: What 140 Characters Reveal about Political Sentiment (Tumasjan et al.)

Successful use of social media in las presidential campaign has established twitter as an integral part of political campaign toolbox

Goal: analyze on Twitter: 1. Deliberation, 2. Sentiment, 3. Prediction

Previous work:

Deliberation: Honeycutt and Herring – Twitter not only used for one-way comm, but 31% of all tweets direct a specific addressee. Kroop and Jansen – political internet discussion boards dominated by small # of heavy users

Sentiment: How accurately can Twitter inform us about the electorate’s political sentiment?

Prediction: can Twitter serve as a predictor of the election result?

Data: examined more than 100k tweets and extracted their sentiment using LIWC

Target: German federal election 2009


1. While Twitter is used as a forum for political deliberation on substantive issues, this forum is dominated by heavy users

Two widely accepted indicators of blog-based deliberation:

-The exchange of substantive issues (31% of all messages contain “@”),

-Equality of participaion: While the distribution of users across groups is almost identical with the one found on internet message boards, we find even less equality of participation for the political debate on Twitter. Additional analyses have shown users to exhibit a party-bias in the volume and sentiment of messages.

2. The online sentiment in tweets reflects nuanced offline differences between the politicians in our sample.

LIWC profiles:

-Leading candidates: Very similar profile for all leading candidates, only polarizing political characters, such as liberal leader and socialist, deviate in line with their roles as opposition leaders. Messages mentioning Steinmeir (coalition leader) are most tentative

3. Similarity of profiles is a plausible reflection of the political proximity between the parties

Key findings: high convergence of leading candidates, more divergence among politicians of governin grand coalition than among those of a potential right wing coalition

4. Activity on Twitter prior to election seems to validly reflect the election outcome (MAE 1.65%), and joint party mentions accurately reflect the political ties between parties.

From Tweets to Polls: Linking Text Sentiment to Public Opinion Time Series (Brendan O’Connor)

Continue reading

RWTH Aachen – Browse ACM conference networks over the web

There are hundreds of conferences sponsored by the ACM on almost every topic related to computing.  In some cases the same person will publish a paper in more than one conference, creating a tie between them.  Below is a network map application that displays a collection of ACM conferences connected by this authorship tie: http://bosch.informatik.rwth-aachen.de:5080/AERCS/Networks.jsp

The application is a project created by Manh Cuong Pham a graduate student at RWTH Aachen University, Dept. of Databases and Information Systems working with Prof. Ralf Klamma.

2009 - December - RWTH Aachen - AERCS Screenshot

This image displays the isolated component that is composed of the “social” conferences in the ACM schedule: CHI, CSCW, DIS, UIST, GROUP, ECSCW, and Interact.  The overview illustrates the macro structure of the graph, with the prominent giant cluster of core computer science topics like algorithms, machine learning, and logic.  The rows below this cluster are populated by an archipelago of conferences, a few composed of ten to twenty conferences, but most made up of two to five conferences.  These are the more marginal topics in the ACM world, in contrast to the conferences at the cores of the giant component.

It would be nice to see the application add additional network display attributes like size, color, shape, edge thickness to indicate conference attributes like papers published, cited, attendees, and sponsors.  It is a nice example of the insights network visualizations can bring to a data set and the value of an interactive interface (and a web interface at that!) for investigating complex graphs.

Call for Papers – ICWSM 2010 – Washington, D.C. May 23-26

Here is the Call for Papers for the

Fourth International AAAI Conference on Weblogs and Social Media (ICWSM-10)
May 23-26, 2010
George Washington University, Washington, DC

Sponsored by the Association for the Advancement of Artificial Intelligence

Tutorial Proposals: December 1, 2009
Paper Submission: January 8, 2010
Poster/Demo Submission: January 8, 2010

Paper Acceptance: March 3, 2010
Poster/Demo Acceptance: March 3, 2010
Workshop Submission: March 1, 2010
Camera Ready Copies: March 12, 2010

Featuring a keynote by:
Professor Bob Kraut
, CMU,
on “Designing Online Communities from Theory

Professor Michael Kearns, Computer and Information Science,
Univ. of Pennsylvania,
on “Behavioral Experiments in Strategic Networks”

Speakers in Special Sessions:
– Nicole Ellison, Dept. of Telecommunication,
Information Studies and Media, Michigan State Univ.
– James Pennebaker, Dept. of Psychology, Univ. of Texas, Austin
– S. Craig Watkins, Dept. of Radio, TV and Film, Univ. of Texas, Austin- Don Burke, CIA Directorate of Science and Technology, Intellipedia
– Haym Hirsh, National Science Foundation IIS Division Director
– Macon Phillips, U.S. White House, Head of New Media

Tutorial Speakers will include:
– Jake Hofman, Yahoo! Research,
“Large-scale social media analytics with Hadoop”

– Cindy Chung and James Pennebaker, Univ. Texas,
“Using LIWC to uncover social psychology in social media”

Continue reading

Liveblogging ICWSM 2009 – Day 1

2009 ICWSM in San Jose

[Vladimir Barash is liveblogging the ICWSM conference]
9-10AM: A Tempest: Or, on the Flood of Interest in Sentiment Analysis, Opinion Mining, and the Computational Treatment of Subjective Language (Lillian Lee)

-Sentiment analysis using discussion structure: clasify speeches in US congressional floor debates as supporting or opposing proposed legislation -Individual doc classifier -agreement (degree) classifier for pairs of speeches

-Agreement info allows COLLECTIVE CLASSIFICATION – “agreeing speeches should get the same label”

-ECON: debate about effect of sentiment on sales
-comScore (users willing to pay 20-99% more for 5 star item vs. 4 star item)
-Jury is still out

-SOC: What opinions are influential? (Niculescu-Danescu Muzyl et al.)
-Prior work has focused on features of text and has not been in context of sociological aspects of reviews
-look at helpfulness scores

Continue reading

Princeton – Studying Society In A Digital World – Conference Slides and photos

Princeton Center for Information Technology Policy
Princeton Center for Information Technology Policy

I attended the “Studying Society In A Digital World” conference at the Center for Information Technology Policy at Princeton University.  They just posted most of the conference slides.  I took some pictures and have inserted them next to the link for slides where I had a picture (or a good one!).  The conference was very useful and informative: there is a great trend towards sensor driven data sets that, in aggregate, illuminate large complex systems in detailed and surprising ways.

Talks from SenseNetworks and MIT made the vision of a continuous “trail” document assembled by location and biological sensors from every human on earth seem not so outlandish. Samuel Madden from MIT spoke about opportunistic mobile wifi connectivity in moving vehicles.  MIT rebuilt the WiFi stack to enable 13ms associations instead of 13 second associations with an access point.  The result is that a car with such a WiFi card can drive along Boston city streets and exchange about 200KB a minute with open unsecured access points along the way.  Free bandwidth in the city.  What do they do with it?  They stream live telemetry of a fleet of cabs.  The cabs have accelerometers on them and GPS which is reported in almost real time back to a server.  Along with the engine computer’s data, they collect a ton of data about traffic and road surface quality.  They can see changing patterns in the activity levels of the cabs and infer changing activity at businesses.

A major theme of several presentations was crowdsourcing for science, with talks about ebird.org and galaxyzoo highlighting a distinction between sites that enable a group to collect data (ebird) – with the associated issues of data validity — and those sites that enable a group to annotate data (galaxyzoo) that has already been expertly collected.

Lada Adamic: Social Influence and the Diffusion of User-Generated Content

Chris Barrett: Co-evolution of Sociotechnical Networks and Individual Behavior

Princeton: Studying Society in a Digital World

Kathleen M. Carley: Information & Belief Diffusion Through Social Networks: Empirically Grounded Simulation

Princeton: Studying Society in a Digital World

Damon Centola: New Theory and Experiments on Diffusion in Social Networks

Pablo Chavez: The Current Policy Debates Over Online Information Practices: Implications for Research in the Digital Age

Princeton: Studying Society in a Digital World

Nosh Contractor: Digital Traces: An Exploratorium for Understanding & Enabling Social Networks

Princeton: Studying Society in a Digital World

Nathan Eagle, Michael Macy: Scaling of Sociodynamics

Scott Golder at Princeton

Scott Golder: Temporal Rhythms in Electronic Society: Examples from Facebook and Elsewhere

Eric Horvitz: Through the Lens of a Large Instant-Messaging Network: Planetary-Scale Views on Behavior

Tony Jebara at Princeton

Tony Jebara: Learning Networks of Places and People from Location Data

Steve Kelling: eBird: The Long Tail of Community Engagement in the Scientific Process

Jon Kleinberg at Princeton

Jon Kleinberg: Spatial Signatures of On-line Behavior

Princeton: Studying Society in a Digital World

Robert Kraut: Theory-Based Design of On-Line Worlds

W. Russell Neuman: Social Science and Policy Praxis

Jukka-Pekka Onnela: Using Cell Phones to Study the Large-Scale Structure of Social Networks

Princeton: Studying Society in a Digital World

Paul Resnick: Understanding Opinion Diversity Preferences Through Field Experiments

Princeton: Studying Society in a Digital World

Matthew Salganik: Community-Generated and Community-Sorted Information In his presentation Matt made the remarkable connection between deliberative democracy and the cat comparrison site: Kitten Wars.  His talk introduced a model for a kind of Am I Hot or Not for political discussions.  His group built a web site that helped the student community at Princeton set its priorities for student government.  The work has significant implications fo deliberation tools for organizations and enterprises.  Unlike systems that simply encourage users to contribute ideas to a potentially long and never acted upon list, this system forces a comparison task that can be performed in one click but demands implicit contrasts and estimation of value.  The use of the almost adictive “hot or not” style interface (or more accurately, kittenwars)  allows users to decide between, for example, longer hours for the student cafeteria or expanded video rental services, and get presented with their estimate in the context of other’s choices and the opportunity to choose between two things again.  After a population has run through a set of pair-wise contrasts a broader sense of the priorities of the community can be calculated.

In my talk, I focused on the idea that information want not to be free or expensive, rather, information wants to be copied.  Like DNA, the goal of any string of bits is to make a duplicate copy of themselves.  Several technical realities mean that while information may exist on a spectrum from private to public, it only moves in one direction (public) and almost never back.  Once made public on the Internet, even if only for a moment, a photo, document, or other digital object is almost certainly to have been copied, indexed, backed up, or replicated.  All efforts to delete a digital object once widely distributed is like trying to take wine out of water.  This is because all cryptography become brittle over time, most bits end up exposed after they get distributed, and more events trigger widespread distribution of bits than expected (for example, linking a photo, and a location, to a tweet that gets copied to LinkedIn and Facebook, that then appears in an RSS feed and is copied from there to Friend Feed.  As it travels, information looses more of the access controls that initially made it relatively private until it is effectively public.

Marc Smith talks about information at Princeton CITP "Studying Society in a Digital World" Conference
Marc Smith talks about information at Princeton CITP "Studying Society in a Digital World" Conference - Photo Credit: Scott Golder

Marc Smith: Autobiography, Mobile Social Life-Lagging and the Transition from Ephemeral to Archival Society

Joshua Tauberer: Watching the Watchers: Government Oversight with Civic Hacking

Princeton: Studying Society in a Digital World

Marshall Van Alstyne: Information, Social Networks and Productivity

Sadly, no picture for Luis Van Ahn’s talk: however, this presentation was a fascinating review of the capcha and re-capcha services and the new direction of providing translation services as language learning games.  Luis Van Ahn invented capcha, felt bad about the cumulative human time wasted by filling out those squiggle word puzzels to get on a web site, and decided to harness capchas to a useful task: text recognition for books.  To translate words from bad scans of books that the OCR software fails to recognize correctly, the garbled data is presented to humans, who, collectively, have translated millions of previously unintelligible words.  Now, his new project is to expand the small user population of bi or multilingual speakers who can translate between languages.  The approach applies the “Mechanical Turk” “human intelligence task” concept to language translation.  His language translation service presents foreign language sentences to users with all dictionary words from a simple translation listed below.  Users click on best word selection beneath each foreign word.  The surprising results: pretty good translations AND users start learning a foreign language!

Luis von Ahn: Human Computation