
by Dru!
Can useful observations be made by studying the social media sea one bucket at a time?
NodeXL has data import “spigots” for pulling social networks out of several social media systems including Twitter, YouTube, flickr, and email. Twitter networks of follows and followers, reply and mentions can be extracted based on either a user name or a search string “seed”. There are additional networks inside Twitter: a tie is created whenever two people tweet the same URL, for example, or are connected by tweeting from the same general location. For now, the NodeXL Twitter Data Importer is starting with these three initial twitter “tie-types”.
NodeXL queries are not exhaustive collections of Twitter data, we provide a more modest approach, grabbing a slice of recent content and analyzing that. Twitter has a sea of data, NodeXL is importing something like a study of buckets of ocean water. A recent scientific voyage to the Great Pacific Garbage Gyre, for example, collected hundreds of samples of ocean water as they sailed to the central location of the gyre. Each bucket revealed details about the larger state of the ocean (which does not look good). Simlarly, NodeXL is puling buckets of social media network data from the ocean of twitter and, despite the lack of scale, can do some useful science. In part this is a virtue imposed by necessity – constraints imposed by Twitter (even with a rate limit lifted “whitelisted” account) impose significant limits on what can be squeezed out of the Twitter API. For those who lack access to large data center resources, there are scale limits imposed by the capacities of a desktop/laptop device.
Access to large data sets is certainly a hallmark of the “new era of science” that generates observations not from samples but from exhaustive surveys of data terrains. Small samples miss important phenomena it is argued. The counter argument is that many important phenomena appear in most samples, even small ones.
Using the existing features in NodeXL, I can extract the twitter social network for a small group of user accounts. I can provide the names or ask twitter search to deliver them. Alternatively, a keyword can be used to collect all the users and their connections who recently tweeted containing that term. From this selected sample, several observations can be made:
> Not every keyword is equally connected
> Not every twitter user is equally connected nor are their neighbors
> Selected data extractions can be useful in the absence of a global view