Liveblogging ICWSM 2009 – Day 2

ICWSM 2009 in San Jose

[Vladimir Barash is liveblogging the ICWSM conference]

10.30am A categorical model for discovering latent structure in social annotations (Said Kashoob)
Given a collection of web objects, users and tags, can we model the underlying tag generation process?

-Discover implict communities of interest?

-Categories of related tags?

-For given category, id most relevant objs for category

-compare categories

Initial thoughts: content-based topic modeling (Latent Dirichlet Allocation, LSA). Recent work applying LDA models to tags (Wu 2006, Zhou 2008)

Continue reading

Liveblogging ICWSM 2009 – Day 1

2009 ICWSM in San Jose

[Vladimir Barash is liveblogging the ICWSM conference]
9-10AM: A Tempest: Or, on the Flood of Interest in Sentiment Analysis, Opinion Mining, and the Computational Treatment of Subjective Language (Lillian Lee)

-Sentiment analysis using discussion structure: clasify speeches in US congressional floor debates as supporting or opposing proposed legislation -Individual doc classifier -agreement (degree) classifier for pairs of speeches

-Agreement info allows COLLECTIVE CLASSIFICATION – “agreeing speeches should get the same label”

-ECON: debate about effect of sentiment on sales
-comScore (users willing to pay 20-99% more for 5 star item vs. 4 star item)
-Jury is still out

-SOC: What opinions are influential? (Niculescu-Danescu Muzyl et al.)
-Prior work has focused on features of text and has not been in context of sociological aspects of reviews
-look at helpfulness scores

Continue reading

PAPER: ICWSM 2009 – Distinguishing Knowledge vs Social Capital in Social Media with Roles and Context

ICWSM 2009 in San Jose

Our (Vladimir D. Barash, Marc Smith, Lise Getoor, Howard T. Welser ) poster paper, Distinguishing Knowledge vs Social Capital in Social Media with Roles and Context  has been accepted and published at the 2009 ICWSM (International Conference on Weblogs and Social Media) conference which will be held in San Jose, California this May 17, 2009 – May 20, 2009.

Social media communities (e.g. Wikipedia, Flickr, Live Q&A) give rise to distinct types of content, foremost among which are relational content (discussion, chat) and factual content (answering questions, problem-solving). Both users and researchers are increasingly interested in developing strategies that can rapidly distinguish these types of content. While many text-based and structural strategies are possible, we extend two bodies of research that show how social context, and the social roles of answerers can predict content type.  We test our framework on a dataset of manually labeled contributions to Microsoft’s Live Q&A and find that it reliably extracts factual and relational messages from the data.

Full Text: PDF: 2009 ICWSM Distingusihing knowledge versus social capital

Best paper at HICSS-42! A Conceptual and Operational Definition of “Social Role” in Online Community

A shout out to my co-authors Eric Gleave, Howard (“Ted”) Welser, and Tom Lento – our paper “A conceptual and operational definition of “Social Role” in Online Community” got the best paper award at HICSS-42!  The Hawaii International Conference of System Sciences has featured a great series of mini tracks over the years.  The Persistent Conversations mini track has featured great work on threaded conversations, blogs, chats, wikis, and social media for more than a decade.  This year our paper appeared in the Digital Media: Content and Communication Track.


With a  very nice letter that puts the award in some context:


Ten papers out of 515 at the conference were selected for Best Paper Awards.  Many thanks to track organizers Karrie Karahalios and Fernanda Viegas.

A previous paper in 2006 also got best paper: You Are Who You Talk To: Detecting Roles in Usenet Newsgroups, by Danyel Fisher, Marc Smith, and Howard T. Welser

Two years before that in 2004 Fernanda Viegas and I also published a paper at HICSS that got best paper: Newsgroup Crowds and AuthorLines: Visualizing the Activity of Individuals in Conversational Cyberspaces,
by Ferndanda B. Viégas and Marc Smith.

Tom Erickson maintains a great listing of many years of HICSS papers.

Here is Tom Lento receiving the award at the conference in Hawaii earlier this month:


More Best Papers from this year…

Continue reading

A great paper and network structure visualization of social roles in Yahoo Answers

I love this paper from Lada A. Adamic, Jun Zhang, Eytan Bakshy and Mark Ackerman at WWW2008:

Knowledge sharing and Yahoo Answers: Everyone knows something

In particular, this image (figure 4) is a great use of an innovative way of handling large network graphs: chop them into a matrix of ego-net thumbnails.

Adamic et al. WWW 2008 Yahoo Answers Roles and Tag Ecologies
Adamic et al. WWW 2008 Yahoo Answers Roles and Tag Ecologies

This is a neat way to side step the “blob” problem of many directed graph visualizations: too many nodes and too many links make the image impossible to understand.

Each grid of images represents a collection of authors who share contribution to questions with the same tag.  In this case the programming, wrestling, and marriage tags.  Each grid is a collection of ego-centric network diagrams, each author is displayed with their “1.5 degree” connections: their links to friends and thei friend’s links to one another.

Display a collection of authors and contrast multiple collections and  several interesting observations are possible.  First, not everyone who contributes to a tag is the same, a few highly active people make significant contributions while most people are lightly connected and make modest contributions.  Second, not everyone who contributes heavily does so in the same way.   In the upper left hand corner of each grid is the “top” person in that sample of the tag population.  Each is highly active but create different types of patterns of connection through that activity.  In programming the top person is a classic “answer person” – high out degree, low in-degree, connected to isolates and a resulting low clustering coefficient.  The top contributor in the Marriage tag is different, however: most of the people they connect to are connected to the other people they connect to: their “friends are friends”.  Their clustering coefficient is comparatively high in contrast with the top contributor to the “Programming” tag.  The top contributor of questions with the “Wrestling” tag is a hybrid: the author maintains a cluster of highly inter-connected repeat discussion partners while replying to a population of question people like a classic “answer person”.

It is worth noting the marriage resembles wrestling more than programing.

It is also worth noting that this visualization approach, while not perfect, is a nice step forward for information visualizations of complex graphs.  Graph vizualization has been stuck for many years: complex graphs are hard to draw in meaningful ways, let alone to do so automatically. This approach side-steps many of the obstacles to the main approach of whole graph visualization to focus on attributes of individuals and distributions of network variation.

The NodeXL add-in for Excel can generate these sub-graphs for any network: select “Insert subgraph images”.  The thumbnail of each node’s “egonet” is inserted in the spreadsheet and can be written to a local directory and later stitched into an array.

NodeXL Subgraph images
NodeXL Subgraph images