A great paper and network structure visualization of social roles in Yahoo Answers

I love this paper from Lada A. Adamic, Jun Zhang, Eytan Bakshy and Mark Ackerman at WWW2008:

Knowledge sharing and Yahoo Answers: Everyone knows something

In particular, this image (figure 4) is a great use of an innovative way of handling large network graphs: chop them into a matrix of ego-net thumbnails.

Adamic et al. WWW 2008 Yahoo Answers Roles and Tag Ecologies
Adamic et al. WWW 2008 Yahoo Answers Roles and Tag Ecologies

This is a neat way to side step the “blob” problem of many directed graph visualizations: too many nodes and too many links make the image impossible to understand.

Each grid of images represents a collection of authors who share contribution to questions with the same tag.  In this case the programming, wrestling, and marriage tags.  Each grid is a collection of ego-centric network diagrams, each author is displayed with their “1.5 degree” connections: their links to friends and thei friend’s links to one another.

Display a collection of authors and contrast multiple collections and  several interesting observations are possible.  First, not everyone who contributes to a tag is the same, a few highly active people make significant contributions while most people are lightly connected and make modest contributions.  Second, not everyone who contributes heavily does so in the same way.   In the upper left hand corner of each grid is the “top” person in that sample of the tag population.  Each is highly active but create different types of patterns of connection through that activity.  In programming the top person is a classic “answer person” – high out degree, low in-degree, connected to isolates and a resulting low clustering coefficient.  The top contributor in the Marriage tag is different, however: most of the people they connect to are connected to the other people they connect to: their “friends are friends”.  Their clustering coefficient is comparatively high in contrast with the top contributor to the “Programming” tag.  The top contributor of questions with the “Wrestling” tag is a hybrid: the author maintains a cluster of highly inter-connected repeat discussion partners while replying to a population of question people like a classic “answer person”.

It is worth noting the marriage resembles wrestling more than programing.

It is also worth noting that this visualization approach, while not perfect, is a nice step forward for information visualizations of complex graphs.  Graph vizualization has been stuck for many years: complex graphs are hard to draw in meaningful ways, let alone to do so automatically. This approach side-steps many of the obstacles to the main approach of whole graph visualization to focus on attributes of individuals and distributions of network variation.

The NodeXL add-in for Excel can generate these sub-graphs for any network: select “Insert subgraph images”.  The thumbnail of each node’s “egonet” is inserted in the spreadsheet and can be written to a local directory and later stitched into an array.

NodeXL Subgraph images
NodeXL Subgraph images