ICWSM 2011 Liveblog: Day One

The 5th Int’l AAAI Conference on Weblogs and Social Media
19-21 July 2011, Barcelona, Spain
Sponsored by the Association for the Advancement of Artificial Intelligence.

The fifth International Conference on Weblogs and Social Media brings together researchers from the disciplines of NLP, Social Psychology, Data Mining, Sociology and Visualization to increase our understanding of social media in all its incarnations. Research that blends social science and technology is especially encouraged.

Vladimir Barash is attending the 2011 ICWSM and reports from the sessions.


These are rough notes from Day One of ICWSM.

Manuel Castells – Keynote. Social Media and Wiki-Revolutions
–social media and social change
–what makes human human = meaningful communication
—changes in the nature of communication process => changes everywhere in society
—-more specifically, changes in the nature of communication process => changes in power relationships

—(mass-self communication) = social media
—-organized by large corporations but driven by desires of communicators (people)
—-this principle is embedded in the technical structure of the internet
—–enterpreneurial character, hacker culture

–culture of autonomy
–individuation != individualism
—referential point for society in what groups of individuals decide is important for them (as opposed to cultural / societal institutions)

–Castells study of internet usage in Catalonia
—empirical test of Internet as a platform of autonomy
—results: top 20% of population ranking high on autonomy according to several personality-based characteristics use the internet the most
—suggest there is a synergistic interaction between autonomy and use of the Internet(?)

–power and counter-power = dynamics of society

–growth of social media and increasing impact on social change over last ten years:
—Korea, Ukraine, Spain, US (Obama campaign)
—key is interaction of local / grassroots and the internet
—Arab spring
—-Note the incredible power of social movement e.g. Syria where people continue to demonstrate despite (because of?) hundreds of deaths

–Economic conditions NOT a sufficient cause of social change
—the sufficient cause = feeling of powerlessness to change things (including economic conditions)
—this powerlessness is transformed into empowerement via a feeling of togetherness
—always linked to a key event that brings the people together (e.g. Al-Wazizi self-immolation)
—this event needs to be communicated
—in the case of arab spring, communication = social media + mass media (Al-Jazeera)
—-Al-Jazeera worked to integrate news with mobile phones, mass media with social media
—once the event is communicated, the population needs to be organized
—-organizing force = young people. Both because younger more often agents of social change and because this generation grew up with the transformation of communcation
—once organization emerges in social media, it then moves “to the streets” and reaches individuals who don’t have regular access to social media

–Example: Egypt
—key event = Tunisian demonstration.
—First slogan = “Tunisia is the solution” (vs. “Islam is the solution”)
—But also events before-hand, such as 2008 massacre at factory where an anti-government protest took place
—this is key: revolutions do not happen merely because a younger generation wills them, they come out of exisitng sociopolitical pressures
—-broader point: the critical thing is not the technology itself, but the kind of message the technology enables to communicate conditions of struggle, humiliation, etc.
—-As a particular example, the Arab spring would not have happened without connections in social media that then moved to urban networks. Previously, revolutionary movements in the area tried strikes, etc. and were unsuccessful
—-these revolutions were not exclusively caused by social media – they were caused by injustice and humiliation. But they were *enabled* by social media

—Egypt also an example of attempt by government to shut off the internet, ultimately unsuccessful.
—-Huge economic losses
—-Couldn’t shut off everything: can’t control small-scale ISPs
—-Workarounds: speak2tweet, distribution of information via phone and fax networks (cooperation of hacker communities, Google, etc.)
—-Most importantly, the movement was already underway
—–A government that wants to control social movements that are enabled via social media needs to cut off social media, on a regular basis

—People say that Arab spring unique because these societies are highly repressive, but over 60% people worldwide feel un-represented by their government

–Example: Indigation movm’t in Spain
—started with a manifesto saying that the government is essentially out of touch with the people, and the economic situation is highly unfair
—manifesto led to peaceful demonstrations a week before election
—then self-organized debates and discussions in the cities
—integration of the internet and offline communication
—keys: 1. Anti-organization, anti-leadership. 2. Anti-violence. 3. No deliberate short-term goal. “We are slow, because we go far.” (vs. constitution, etc.) Long time to create true democracy.

Session: Location, Emergency, Health
Extracting “Situational Awareness” Tweets during Mass Emergency
–idea: extract clear actionable tweets related to emergency events
–look for “situational awareness”
–procedure:
1. collect data
2. annotate for situational awareness
3. annotate for linguistic features: subjectivity, formal/informal, personal dimension
4. classify for SA, linguistic features
Feature space:
–n-grams
–POS
–linguistic features

Results: classification outperforms naive baseline
linguistic features improve accuracy over unigrams but not necessarily POS

You are What You Tweet: Analyzing Twitter for Public Health
-identifying health-related keywords in Twitter

.96 correlation between flu-related tweet rate and CDC flu rate (consistent with prediction)

-used location, other tweet properties

-allergy map consistent with a priori assumptions

-self-medication data

-model not amenable to per-user analyses

Foster Provost Keynote: Social Targeting and Privacy
-Brief history of targeting:
–Contextual -> Demographic -> Psychographic -> Social
–Social targeting: choosing consumers for offers based on connections to other, specific consumers. Social network targeting is a special case of social targeting.

–Develop and expand the notion of social targeting
–present social targeting design

2 different goals for (display) advertising
–Direct response (short-term focus)
–Brand advertising (long-term focus)

In both cases we’d like to identify some “brand action” of interest, and induce consumers to take that action at a higher rate
–purchasing a product, downloading trial version, downloading product info, entering a brand contest, joining a loyalty club, visiting the brand’s website, etc.
–beware of using clicks as a brand action!
–EXTREMELY low correlation between clicking and purchasing

Social targeting will use brand actions to identify “seed” consumers and also for forward-looking and/or holdout evaluation of targeting effectiveness

Social network targeting
–cross btw viral marketing and traditional
—target “network neighbors” of existing customers
—based on direct communication between consumers
—this could expand “virally” through the network without any word-of-mouth advocacy, or could take advantage of it
—this results into 3x-5x rate lift in response

Privacy-related backlash from consumers
-are there points that give us acceptable tradeoffs between “privacy” and efficacy?
One option: doubly-anonymized bipartite content-affinity network. Bipartite network between browsers and content visited, anonymize both browsers and content. (KDD 2009)
-mode transform to create content-affinity network among browsers
-target the strong social neighbors (e.g. in ad exchanges)

Are the social neighbors “like” the seed brand actors? Some data showing yes, as would be expected by homophily

-Some brand proximity measures: number of unique content pieces connecting browser to action taker, maximum number of distinct paths, minimum euclidead distance of normalized content vector to seed node, etc.

-Make each proximity measure a feature in a predictive model, where you’re trying to predict likelihood of taking a brand action

Data:
~10mln anonymized browsers, all of their observed visits to social media content over 90 days (here: from several of the largest SN sites)
–10^7 browsers, 10^8 social media pages, 15 (mostly) well-known brands

-Turns out anonymized network embeds social network

Fast forward 1.5 years, “in vivo” performance by a marketing company. Many brands, all experienced positive lift from social network targeting, median lift was 5x. (not really comparable to lab results though)

Results improve significantly with more data

Main Points Revisited:
1. Social targeting goes beyond just social-network targeting
–define connections in various ways, such as visiting same social media pages
–find people “close to” a selected set of “seed” consumers
–find people “similar to” the seeds, based on fine-grained action data

2. Social targeting applies to brand advertising and to direct-marketing advertising

3. Surprisingly, perhaps, social targeting can be quite privacy friendly (via predictive modeling over doubly anonymized data)

What Stops Social Epidemics
-Empirical dataset = digg
-most cascades fall within the critical region for transmissibility, which is a tiny band of values – why?
-mean field model does not account for graph structure (only degree distribution), and predicts a strange value for transmissibility
-adding clustering oes not significantly improve prediction

Look at cascade model: relax assumption that infections are independent. what about repeat exposures?
-Independent model shows a strong relationship between multiple exposures and P(adoption)
-Actual relationship between multiple exposures and P(adoption) is much less strong
-changing assumption of infection to cap effect of multiple exposures at 1 creates simulations of cascade size that reflect empirical data (digg)

-observation: number of new fans falls off exponentially over time
–cascades peter out because of lack of exposue to fresh faces!
–alternative explanation to novelty turnover models

-mechanisms of propagation can affect behavioral properties such as effect of repeated exposure
–E.g. on Digg additional recommendations of a story don’t bring it up to the top of a page, whereas on Twitter RTs (and all tweets) always go directly to the top

data available: http://www.isi.edu/~lerman/downloads/digg2009.html

Differential Adaptive Diffusion: Understanding Diversity and Learning Whom to Trust in Viral Marketing
–Flickr is similar to Twitter here, as you get “latest from your friends” updates

Viral marketing: testing assumptions of independence
–Selective recommendations lead to trust imbalances between peers

Assumptions:
-Peer influence is dynamic (dependent on previous user interactions)
-Viral marketing strategies have an implicit effect on the underlying social network (sometimes changing the structure of the underlying network altogether)
-Peer influence dependent on product being spread

In existing models, influence probabilities assumed to be static, insensitive to product type, known in advance -> assumptions need to be challenged!

Case study: Digg
Following links define social network
–12K users
–1.3M follow edges
–50K news stories
–2M digg edges

-Observation 1a: most users digg stories that they are interested in / match their submission profile
-Observation 1b: a few users digg stories that are different from submissions
-Observation 1c: three kinds of digg users:
–focused on a few topics
–biased towards a few topics
–balanced (least populous group)

-Observation 2: Effect of HOmophily on Adoption
–Peers with similar topic preferences gain confidence in each other’s recommendations over time
–Peers with dissimilar topic preferences lose condifence in each other’s recommendations over time

These observations are clearly at odds with assumptions of current diffusion models

Differential adaptive diffusion:
-Influence probability between u,v for category c:
p(u,v) = w_i(v,u) x F(v,c)
–confidence of v in u at campaign i
–focus of v on category c
–Use Linear Kernel functions

Experimental evaluation:
–predict future adoptions (split dataset into halves temporally)

Baselines:
-Bernoulli: each product rec = Bernoulli Trial
-Influence probabilities using MLE over given contagion time for each user

-Bernoulli-PC:
-as Bernoulli, but each peer gets partial credit for making recommendation

Implications for viral marketing:
-Selective behavior: users will listen more to recommendations that are successful

-Introduce reward for accepted recommendations / penalty for rejected ones
–Use ABM to test
—results: intermediate level of reward lead to consistently high adoption rate & confidence over a set of marketing campaigns

-Tested for presence of spammers and found ABM adapts to presence of spammers and lowers their confidence

Participation Maximization Based on Social Influence in Online Discussion Forums
-Look at influence on user posting
-create edges u->v (interpersonal influence) and u->t (thread influence ~ topic interest)

Problem Statement:
-allocate exactly B threads to each user (e.g. sidebar)
-user will look at B threads with higher probability due to prominence
-want to optimize total number of participants

This can be interpreted as a max social welfare problem
-consider graph of users u -> threads t, want to max the number of users u who participate across all threads t

-existing approaches offer good approximation (1-1/e assuming submodularity which can be shown for this problem) but either work only for new threads or require (mn)^7 computations

-new approach
additional influence that is brought by displaying thread t user v’s sidebar = p(v is activated) * (number of users activated)

Data: TripAdvisor
-Implicit Influence network
–keep edge u->v iff v forllows u to post in at least 2 threads
–EM-algorithm to learn edge weight w_{u,v}
-Visit probabilities
–Note Random, TEABIF, TABI (algorithms used) don’t rely on visit probabilitiyes
–delta_r, drawn from empirical distribution (thread rank)

-Show that TABI performs consistently better under different numbers of threads (30, 40, 50), dfferent topics, different original visit probabilities, allocating in different time slots

Panel: Is Social Media making Media Consumption and Production Better?
Krishna Gummadi – need to understand social media-based news dissemination better

1. Who is psreading news to whom? Are grassroots really active?
2. Why are users spreading news? What incentives drive their behavior? How do media sites select stories to propagate, what drives social marketers to propagate news?
3. How is users’ exposure to news changing?

Marcus Mabry

Catherine Quayle: What about news quality? Great leveling of information brings some good things, but more bad things. New model = user incoming information from other sources, incl. other users. But now user has no context for importance / relevance / veracity

Catherine Quayle: is curating more information better?

Mor Namaan – News for the People?
Zuckerberg quote (squirrel dying in front of your house may be more important than people starving in Africa)

Does social media help news generation & consumption in a way that is an improvement to society?

“Real news – reporting done for citizens vs. consumers – is a public good” –Clay Shirky

Need an intermediate between the recluse typing on his typewriter and the bystander taking video on her phone

Tools for journalism (detecting and covering breaking events, filtering, validating information, finding and evaluating sources, help dissemination and discussion)
-existing tools (Rutgers, MIT)
-*Physical World* Events (#in10years vs. #jan25 vs. #rain)

eyewitness id and verify

“we shape our tools, and thereafter our tools shape us” –Marshall McLuhan
“Everyone can speak, but no one can be heard” -Yochai Benkler

Gilad Lotan

Attention is the Bottleneck

What is produced vs. what is consumed? (click-cloud viz)

keith urbahn tweet that triggered OSL death tweet cascade
-gaining your network’s trust

live tweeting of the OSL operation
–pakistani twitter user linked to by Forbes blogger who was searching for information about raid after Obama announcement

–what about humans translating comments on forums?

–is it more important for producers to know what to put out vs. consumers to know what to write?

–what about Wikipedia and news event coverage?
—Is the success of WP at covering the news pointing to failing of MSM at covering? Vis-a-vis obsession with latest news?

-NYT teaches online course about how to be a “news gatherer for your community”

-what is the ideal business model for news? IS there an ideal business model for news?