Over the edge: Twitter API 1.1 makes “Follows” edges hard to get

The long awaited (and delayed) change to the Twitter API is now here: API 1.1 is now the only service available, the long used API 1.0 is gone.

20130611 - End of TWitter API tweet

This has an impact on people who have been collecting and analyzing data from Twitter.  Twitter has given and taken away with the new 1.1 API.  Mostly taken away.  More Tweets are sometimes available from the new API, up to 18,000 rather than the old 1,500 tweet limit.  This is a big change, but normal users often do not get much benefit from the limit increase if the topic they are interested in has fewer tweets.  The length of time tweets are retained and served is not much longer than it was.

The big change is the effective loss of the “Follows” edge.  Some users of the 1.0 API used to be able to get a significant number of queries that asked about who each user followed.  These queries generated data that allowed a network to be created based on which users followed which other users.  The “Follows” network in Twitter has been very informative, pointing to the key people and groups in social media discussions.  But now the “Follows” edge will be effectively impossible to use.

Twitter API 1.1 changes the limit on the number of queries about who follow who in Twitter to 60 per hour.  In practice, a network may have several hundred or thousand people in it, making a query for each person’s network of followers impractical. With the follows edge effectively gone, the remaining edges, “reply” and “mention” become more important.  These edges are far less common than the “Follows’ edge.  Many people follow lots of other people but mention the name or directly reply to very few. With the loss of the Followers edge, Twitter networks can become very sparse, with few connections remaining.  Dense structures give way to confetti.

Here is a map of the topic #scaladays with the Followers edges compared to the same map with no Follower edges:

#Scaladays with Follows Edges#Scaladays with no Follows Edges

With the “Follows” edges gone, the loss of insight into the nature of the network is profound, but not fatal. The reply and mention network does have some density in many discussions, allowing many kinds of network positions and structures to be observed. Edges can also be synthesized from other evidence, for example a link could be created when two people use words in common that are not commonly used by others.

The NodeXL project has released a version that connects to the new Twitter API 1.1 and we will be releasing additional edge types that will link people when they share content like hashtags, URLs, words and word pairs with other people.  These shared content edges are based on a presumption that when people use similar content that is rarely used by others they are likely to have an underlying connection.  The assumption that shared content use is a surrogate for the “follows” relationship requires additional testing (which will be difficult with out access to the data that Twitter just removed). For now, these connections do return density to networks that have been shattered by the loss of the visibility of the Follows connection and can indicate common interests among Twitter users.

Encyclopedia of Social Network Analysis


My colleague George Barnett has edited the Encyclopedia of Social Network Analysis.

I contributed four entries with co-authors:

WWW Hyperlink Networks

with Robert Ackland, Australian National University

Email networks

with Derek Hansen, Brigham Young University

Blog networks

with John Kelly, Morningside Analytics, Harvard Berkman Center

Facebook networks

with Bernie Hogan, Oxford Internet Institute

Description:

This two-volume encyclopedia provides a thorough introduction to the wide-ranging, fast-developing field of social networking, a much-needed resource at a time when new social networks or “communities” seem to spring up on the internet every day. Social networks, or groupings of individuals tied by one or more specific types of interests or interdependencies ranging from likes and dislikes, or disease transmission to the “old boy” network or overlapping circles of friends, have been in existence for longer than services such as Facebook or YouTube; analysis of these networks emphasizes the relationships within the network. The Encyclopedia of Social Networks offers comprehensive coverage of the theory and research within the social sciences that has sprung from the analysis of such groupings, with accompanying definitions, measures, and research.

Featuring approximately 350 signed entries, along with approximately 40 media clips, organized alphabetically and offering cross-references and suggestions for further readings, this encyclopedia opens with a thematic reader’s guide in the front that groups related entries by topics. A chronology offers the reader historical perspective on the study of social networks. This two-volume reference work is a must-have resource for libraries serving researchers interested in the various fields related to social networks, including sociology, social psychology and communication and media studies.

ThreadMill 0.1: Social Accounting for Message Thread Collections

The Social Media Research Foundation is pleased to announce the immediate availability of ThreadMill 0.1.  ThreadMill is a free and open application that consumes message thread data and produces reports about each author, thread, forum, and board along with visualizations of the patterns of connection and activity.  ThreadMill is written in Ruby, and depends on MongoDB, SinatraRB, HAML, and Flash to collect, analyze, and report data about collections of conversation threads.

Threaded conversations are a major form of social media.  Message boards, email and email lists, twitter, blog comments, text messages, and discussion forums are all social media systems built around the message thread data structure.  As messages are exchanged through these systems, some messages are sent as a reply to a particular previous message.  As messages are sent in reply to prior messages, chains of messages form.  Message chains come in two major forms: branching and non-branching.  Branching threads are those that allow more than one message to reply to a prior message.  Non-branching threads are single chains, like a string of pearls, that allow only one message to reply to a prior message.  Many web based message boards are non-branching.  Many email systems and discussion forums are branching.

ThreadMill requires a minimal set of data elements to generate its reports.  A data table must minimally have a column of information for each message that includes the name of the message board, the forum, the thread, and the author, along with a unique identifier for each message and the date and time it was posted.  Optional data elements include the unique identifier of the message being replied to, the URL of the message, and the URL for a profile photo.

All forms of threaded message exchange can be measured.  Simple measures like the count of the number of messages or the number of authors are obvious and useful.  Other measures can be created from more sophisticated analysis.  For example, the network of connections that forms as different authors reply to one another can be extracted and analyzed using network analysis methods.  It is possible to calculate metrics from these networks of reply that describe the location of each person in the graph.

ThreadMill generates several data sets that can be used to create visualizations of the activity and structure of a message board collection.

A Treemap data set can illustrate the hierarchy of encapsulated authors within threads, threads within fora, fora within boards, and boards within collections.  Treemap visualizations of collections of threaded conversations can quickly highlight the most active or populous discussions.

An AuthorLine visualization takes the form of a double histogram, with bubbles representing each thread active in each time period sized by the volume of messages the author contributed, sorted by size.  Threads that have been initiated by the author are represented as bubbles above the center line.  Messages that the author contributes to threads started by other authors are represented as bubbles stacked below the center line.  AuthorLines quickly reveal the pattern of activity an author displays and identifies which of several types of contributors the author is.

A scatter plot visualization represents each author as a bubble in an X-Y space defined by the number of different days the author was active against the average number of messages the author contributes to the threads in which they participate.

A time series line chart reveals the days of maximum and minimum activity along with trends.

A network diagram reveals the overall structure of the discussion space and the people who occupy strategic locations within the network graph.

ThreadMill has received generous assistance from Morningside Analytics.  Bruce Woodson implemented ThreadMill.