ThreadMill 0.1: Social Accounting for Message Thread Collections

The Social Media Research Foundation is pleased to announce the immediate availability of ThreadMill 0.1.  ThreadMill is a free and open application that consumes message thread data and produces reports about each author, thread, forum, and board along with visualizations of the patterns of connection and activity.  ThreadMill is written in Ruby, and depends on MongoDB, SinatraRB, HAML, and Flash to collect, analyze, and report data about collections of conversation threads.

Threaded conversations are a major form of social media.  Message boards, email and email lists, twitter, blog comments, text messages, and discussion forums are all social media systems built around the message thread data structure.  As messages are exchanged through these systems, some messages are sent as a reply to a particular previous message.  As messages are sent in reply to prior messages, chains of messages form.  Message chains come in two major forms: branching and non-branching.  Branching threads are those that allow more than one message to reply to a prior message.  Non-branching threads are single chains, like a string of pearls, that allow only one message to reply to a prior message.  Many web based message boards are non-branching.  Many email systems and discussion forums are branching.

ThreadMill requires a minimal set of data elements to generate its reports.  A data table must minimally have a column of information for each message that includes the name of the message board, the forum, the thread, and the author, along with a unique identifier for each message and the date and time it was posted.  Optional data elements include the unique identifier of the message being replied to, the URL of the message, and the URL for a profile photo.

All forms of threaded message exchange can be measured.  Simple measures like the count of the number of messages or the number of authors are obvious and useful.  Other measures can be created from more sophisticated analysis.  For example, the network of connections that forms as different authors reply to one another can be extracted and analyzed using network analysis methods.  It is possible to calculate metrics from these networks of reply that describe the location of each person in the graph.

ThreadMill generates several data sets that can be used to create visualizations of the activity and structure of a message board collection.

A Treemap data set can illustrate the hierarchy of encapsulated authors within threads, threads within fora, fora within boards, and boards within collections.  Treemap visualizations of collections of threaded conversations can quickly highlight the most active or populous discussions.

An AuthorLine visualization takes the form of a double histogram, with bubbles representing each thread active in each time period sized by the volume of messages the author contributed, sorted by size.  Threads that have been initiated by the author are represented as bubbles above the center line.  Messages that the author contributes to threads started by other authors are represented as bubbles stacked below the center line.  AuthorLines quickly reveal the pattern of activity an author displays and identifies which of several types of contributors the author is.

A scatter plot visualization represents each author as a bubble in an X-Y space defined by the number of different days the author was active against the average number of messages the author contributes to the threads in which they participate.

A time series line chart reveals the days of maximum and minimum activity along with trends.

A network diagram reveals the overall structure of the discussion space and the people who occupy strategic locations within the network graph.

ThreadMill has received generous assistance from Morningside Analytics.  Bruce Woodson implemented ThreadMill.

Boardtracker adds AuthorLine visualizations to threaded discussion search service

Boardtracker is a search engine and reporting service for threaded discussions.  Recently, the BoardTracker folks implemented a visualization of author activity overtime that was inspired by work Fernanda Viegas and I did in 2003/2004 called “AuthorLines”.

AuthorLine visualizations represent the weekly rates and nature of contribution from a single author over the span of a year.  Each vertical strip of bubbles represents the activity of that author in a week.  The 52 strips of bubbles in these images represent a year of contribution history.  If users reply to threads other users initiate, they get blue bubbles that  sit below a middle dividing line.  Each thread to which they contribute in that week is represented as a separate bubble.  Each additional message contributed to that thread in that week adds to the size and transparency of the bubble.  If the author initiates new threads, they get a red bubble that sits above the line.

The results is an “at-a-glance” view of the pattern of participation for a year in the life of a threaded conversation message contributor.  The two images below illustrate just how different contributors can be in their patterns of engagement in a threaded conversation environment.  This image represents a contributor of mostly reply messages who starts a small amount of thread initiation at the end of the year but who contributes significantly more to threads started by others.  They have a habit of engaging in lengthy discussions, adding dozens of messages to one or more threads almost every week of the year.

Contrast this image with the distinct pattern created by this author who heavily initiates new threads while responding to those started by others more modestly until the pattern inverts at the ned of the year where they shift to a more heavy reply rather than initiation pattern.

The paper that described the early work on these visualizations can be found in the HICSS 2004 conference:

Viégas, Fernanda B., Marc Smith. “Newsgroup Crowds and AuthorLines: Visualizing the Activity of Individuals in Conversational Cyberspaces“, Proceedings of Hawaii International Conference on Software and Systems (HICSS) 2004. [Best Paper: Persistent Conversation Minitrack]

A related paper applies these visualizations to document the range of variation found in social media spaces built around threaded discussions.

Tammara Turner, Marc Smith, Danyel Fisher and Howard Ted Welser, Picturing Usenet: Mapping computer-mediated collective actionJournal of Computer Mediated Communication, 2005. [Local copy]

[Thanks to BoardTracker CEO Ron Kass for these images!]

RWTH Aachen – Browse ACM conference networks over the web

There are hundreds of conferences sponsored by the ACM on almost every topic related to computing.  In some cases the same person will publish a paper in more than one conference, creating a tie between them.  Below is a network map application that displays a collection of ACM conferences connected by this authorship tie:

The application is a project created by Manh Cuong Pham a graduate student at RWTH Aachen University, Dept. of Databases and Information Systems working with Prof. Ralf Klamma.

2009 - December - RWTH Aachen - AERCS Screenshot

This image displays the isolated component that is composed of the “social” conferences in the ACM schedule: CHI, CSCW, DIS, UIST, GROUP, ECSCW, and Interact.  The overview illustrates the macro structure of the graph, with the prominent giant cluster of core computer science topics like algorithms, machine learning, and logic.  The rows below this cluster are populated by an archipelago of conferences, a few composed of ten to twenty conferences, but most made up of two to five conferences.  These are the more marginal topics in the ACM world, in contrast to the conferences at the cores of the giant component.

It would be nice to see the application add additional network display attributes like size, color, shape, edge thickness to indicate conference attributes like papers published, cited, attendees, and sponsors.  It is a nice example of the insights network visualizations can bring to a data set and the value of an interactive interface (and a web interface at that!) for investigating complex graphs.