The Communities and Technologies conference is holding its 4th meeting in Penn State June 24-27. This conference gathers a range of scholars interested in online community, social media, social networks, and mobile social software. A paper “Analyzing (Social Media) Networks with NodeXL” has been accepted for publication in the conference! Congrats to my co-authors!
Abstract: In this paper we present NodeXL, an extendible toolkit for network data analysis and visualization, implemented as an add-in to the Microsoft Excel 2007 spreadsheet software. We demonstrate NodeXL features through analysis of a data sample drawn from an enterprise intranet social network, discussion, and wiki. Through a sequence of steps we show how NodeXL leverages and extends the broadly used spreadsheet paradigm to support common operations in network analysis. This ranges from data import to computation of network statistics and refinement of network visualization through a selection of ready-to-use sorting, filtering, and clustering functions.
[Vladimir Barash is liveblogging the ICWSM conference]
10.30am A categorical model for discovering latent structure in social annotations (Said Kashoob) Given a collection of web objects, users and tags, can we model the underlying tag generation process?
-Discover implict communities of interest?
-Categories of related tags?
-For given category, id most relevant objs for category
-compare categories
Initial thoughts: content-based topic modeling (Latent Dirichlet Allocation, LSA). Recent work applying LDA models to tags (Wu 2006, Zhou 2008)
[Vladimir Barash is liveblogging the ICWSM conference] 9-10AM: A Tempest: Or, on the Flood of Interest in Sentiment Analysis, Opinion Mining, and the Computational Treatment of Subjective Language (Lillian Lee)
-Sentiment analysis using discussion structure: clasify speeches in US congressional floor debates as supporting or opposing proposed legislation -Individual doc classifier -agreement (degree) classifier for pairs of speeches
-Agreement info allows COLLECTIVE CLASSIFICATION – “agreeing speeches should get the same label”
-ECON: debate about effect of sentiment on sales
-comScore (users willing to pay 20-99% more for 5 star item vs. 4 star item)
-Jury is still out
-SOC: What opinions are influential? (Niculescu-Danescu Muzyl et al.)
-Prior work has focused on features of text and has not been in context of sociological aspects of reviews
-look at helpfulness scores
Abstract
Social media communities (e.g. Wikipedia, Flickr, Live Q&A) give rise to distinct types of content, foremost among which are relational content (discussion, chat) and factual content (answering questions, problem-solving). Both users and researchers are increasingly interested in developing strategies that can rapidly distinguish these types of content. While many text-based and structural strategies are possible, we extend two bodies of research that show how social context, and the social roles of answerers can predict content type. We test our framework on a dataset of manually labeled contributions to Microsoft’s Live Q&A and find that it reliably extracts factual and relational messages from the data.
A shout out to my co-authors Eric Gleave, Howard (“Ted”) Welser, and Tom Lento – our paper “A conceptual and operational definition of “Social Role” in Online Community” got the best paper award at HICSS-42! The Hawaii International Conference of System Sciences has featured a great series of mini tracks over the years. The Persistent Conversations mini track has featured great work on threaded conversations, blogs, chats, wikis, and social media for more than a decade. This year our paper appeared in the Digital Media: Content and Communication Track.
With a very nice letter that puts the award in some context:
Ten papers out of 515 at the conference were selected for Best Paper Awards. Many thanks to track organizers Karrie Karahalios and Fernanda Viegas.
In particular, this image (figure 4) is a great use of an innovative way of handling large network graphs: chop them into a matrix of ego-net thumbnails.
Adamic et al. WWW 2008 Yahoo Answers Roles and Tag Ecologies
This is a neat way to side step the “blob” problem of many directed graph visualizations: too many nodes and too many links make the image impossible to understand.
Each grid of images represents a collection of authors who share contribution to questions with the same tag. In this case the programming, wrestling, and marriage tags. Each grid is a collection of ego-centric network diagrams, each author is displayed with their “1.5 degree” connections: their links to friends and thei friend’s links to one another.
Display a collection of authors and contrast multiple collections and several interesting observations are possible. First, not everyone who contributes to a tag is the same, a few highly active people make significant contributions while most people are lightly connected and make modest contributions. Second, not everyone who contributes heavily does so in the same way. In the upper left hand corner of each grid is the “top” person in that sample of the tag population. Each is highly active but create different types of patterns of connection through that activity. In programming the top person is a classic “answer person” – high out degree, low in-degree, connected to isolates and a resulting low clustering coefficient. The top contributor in the Marriage tag is different, however: most of the people they connect to are connected to the other people they connect to: their “friends are friends”. Their clustering coefficient is comparatively high in contrast with the top contributor to the “Programming” tag. The top contributor of questions with the “Wrestling” tag is a hybrid: the author maintains a cluster of highly inter-connected repeat discussion partners while replying to a population of question people like a classic “answer person”.
It is worth noting the marriage resembles wrestling more than programing.
It is also worth noting that this visualization approach, while not perfect, is a nice step forward for information visualizations of complex graphs. Graph vizualization has been stuck for many years: complex graphs are hard to draw in meaningful ways, let alone to do so automatically. This approach side-steps many of the obstacles to the main approach of whole graph visualization to focus on attributes of individuals and distributions of network variation.
The NodeXL add-in for Excel can generate these sub-graphs for any network: select “Insert subgraph images”. The thumbnail of each node’s “egonet” is inserted in the spreadsheet and can be written to a local directory and later stitched into an array.