ThreadMill 0.1: Social Accounting for Message Thread Collections

The Social Media Research Foundation is pleased to announce the immediate availability of ThreadMill 0.1.  ThreadMill is a free and open application that consumes message thread data and produces reports about each author, thread, forum, and board along with visualizations of the patterns of connection and activity.  ThreadMill is written in Ruby, and depends on MongoDB, SinatraRB, HAML, and Flash to collect, analyze, and report data about collections of conversation threads.

Threaded conversations are a major form of social media.  Message boards, email and email lists, twitter, blog comments, text messages, and discussion forums are all social media systems built around the message thread data structure.  As messages are exchanged through these systems, some messages are sent as a reply to a particular previous message.  As messages are sent in reply to prior messages, chains of messages form.  Message chains come in two major forms: branching and non-branching.  Branching threads are those that allow more than one message to reply to a prior message.  Non-branching threads are single chains, like a string of pearls, that allow only one message to reply to a prior message.  Many web based message boards are non-branching.  Many email systems and discussion forums are branching.

ThreadMill requires a minimal set of data elements to generate its reports.  A data table must minimally have a column of information for each message that includes the name of the message board, the forum, the thread, and the author, along with a unique identifier for each message and the date and time it was posted.  Optional data elements include the unique identifier of the message being replied to, the URL of the message, and the URL for a profile photo.

All forms of threaded message exchange can be measured.  Simple measures like the count of the number of messages or the number of authors are obvious and useful.  Other measures can be created from more sophisticated analysis.  For example, the network of connections that forms as different authors reply to one another can be extracted and analyzed using network analysis methods.  It is possible to calculate metrics from these networks of reply that describe the location of each person in the graph.

ThreadMill generates several data sets that can be used to create visualizations of the activity and structure of a message board collection.

A Treemap data set can illustrate the hierarchy of encapsulated authors within threads, threads within fora, fora within boards, and boards within collections.  Treemap visualizations of collections of threaded conversations can quickly highlight the most active or populous discussions.

An AuthorLine visualization takes the form of a double histogram, with bubbles representing each thread active in each time period sized by the volume of messages the author contributed, sorted by size.  Threads that have been initiated by the author are represented as bubbles above the center line.  Messages that the author contributes to threads started by other authors are represented as bubbles stacked below the center line.  AuthorLines quickly reveal the pattern of activity an author displays and identifies which of several types of contributors the author is.

A scatter plot visualization represents each author as a bubble in an X-Y space defined by the number of different days the author was active against the average number of messages the author contributes to the threads in which they participate.

A time series line chart reveals the days of maximum and minimum activity along with trends.

A network diagram reveals the overall structure of the discussion space and the people who occupy strategic locations within the network graph.

ThreadMill has received generous assistance from Morningside Analytics.  Bruce Woodson implemented ThreadMill.

October 9-11, 2011: IEEE 2011 Social Computing, Boston: NodeXL Paper on “Group-in-a-box” layouts

This year the IEEE Social Computing conference is being held in Boston, October 9-11, 2011.

The NodeXL team from the Social Media Research Foundation have a paper on our newest layout feature in NodeXL: Group-in-a-box.

Abstract: Communities in social networks emerge from interactions among individuals and can be analyzed through a combination of clustering and graph layout algorithms. These approaches result in 2D or 3D visualizations of clustered graphs, with groups of vertices representing individuals that form a community. However, in many instances the vertices have attributes that divide individuals into distinct categories such as gender, profession, geographic location, and similar. It is often important to investigate what categories of individuals comprise each community and vice-versa, how the community structures associate the individuals from the same category. Currently, there are no effective methods for analyzing both the community structure and the category-based partitions of social graphs. We propose Group-In-a-Box (GIB), a metalayout for clustered graphs that enables multi-faceted analysis of networks. It uses the treemap space filling technique to display each graph cluster or category group within its own box, sized according to the number of vertices therein. GIB optimizes visualization of the network sub-graphs, providing a semantic substrate for category-based and cluster-based partitions of social graphs. We illustrate the application of GIB to multi-faceted analysis of real social networks and discuss desirable properties of GIB using synthetic datasets.

The paper is authored by:

Eduarda Mendes Rodrigues*, Natasa Milic-Frayling†, Marc Smith‡, Ben Shneiderman§, Derek Hansen¶
* Dept. of Informatics Engineering, Faculty of Engineering, University of Porto, Portugal – eduardamr @ acm.org
† Microsoft Research, Cambridge, UK -natasamf @ microsoft.com
‡ Connected Action Consulting Group, Belmont, California, USA – marc @ connectedaction.net
§ Dept. of Computer Science & Human-Computer Interaction Lab, University of Maryland, College Park, Maryland, USA – ben @ cs.umd.edu
¶ College of Information Studies, University of Maryland, College Park, Maryland – dlhansen @ umd.edu

A map of the connections among the people who recently tweeted #SocialCom2011:

[flickr id=”6232130442″ thumbnail=”medium” overlay=”true” size=”large” group=”” align=”none”]

[flickr id=”6232129770″ thumbnail=”medium” overlay=”true” size=”large” group=”” align=”none”]
Connections among the Twitter users who recently tweeted the word #socialcom2011 when queried on October 10, 2011, scaled by numbers of followers (with outliers thresholded). Connections created when users reply, mention or follow one another.

Layout using the “Group Layout” composed of tiled bounded regions. Clusters calculated by the Clauset-Newman-Moore algorithm are also encoded by color.

A larger version of the image is here: www.flickr.com/photos/marc_smith/6232130442/sizes/l/in/ph…

Top most between users:
@danielequercia
@gadgetman4u
@bkeegan
@shaunlawson
@maryheston
@mmiiina
@ronaldomenezes
@theshadowhost
@fergal_reid
@cosleydr

Graph Metric: Value
Graph Type: Directed
Vertices: 36
Unique Edges: 119
Edges With Duplicates: 155
Total Edges: 274
Self-Loops: 105
Connected Components: 2
Single-Vertex Connected Components: 1
Maximum Vertices in a Connected Component: 35
Maximum Edges in a Connected Component: 273
Maximum Geodesic Distance (Diameter): 5
Average Geodesic Distance: 2.174551
Graph Density: 0.107936508
NodeXL Version: 1.0.1.179

More NodeXL network visualizations are here: www.flickr.com/photos/marc_smith/sets/72157622437066929/