This two-volume encyclopedia provides a thorough introduction to the wide-ranging, fast-developing field of social networking, a much-needed resource at a time when new social networks or “communities” seem to spring up on the internet every day. Social networks, or groupings of individuals tied by one or more specific types of interests or interdependencies ranging from likes and dislikes, or disease transmission to the “old boy” network or overlapping circles of friends, have been in existence for longer than services such as Facebook or YouTube; analysis of these networks emphasizes the relationships within the network. The Encyclopedia of Social Networks offers comprehensive coverage of the theory and research within the social sciences that has sprung from the analysis of such groupings, with accompanying definitions, measures, and research.
Featuring approximately 350 signed entries, along with approximately 40 media clips, organized alphabetically and offering cross-references and suggestions for further readings, this encyclopedia opens with a thematic reader’s guide in the front that groups related entries by topics. A chronology offers the reader historical perspective on the study of social networks. This two-volume reference work is a must-have resource for libraries serving researchers interested in the various fields related to social networks, including sociology, social psychology and communication and media studies.
NodeXL allows collections of vertices in a network to be gathered together into a “Group”. Groups have several properties:
groups can be selected
vertices in selected groups can be operated on as a set
groups can be collapsed or expanded
network metrics can be calculated for each group
groups can be plotted within bounded regions
NodeXL supports creating clusters or groups of vertices in several ways: by attribute or manually, by component, or algorithmically.
Group menu commands are located in the Groups Menu in the NodeXL>Analysis Menu section.
Group menu commands include:
Group by Vertex Attribute
Users can also assign vertices to groups based on any attribute in the vertex worksheet.
The NodeXL>Analysis>Groups>Group by Vertex Attribute allows groups of vertices to be defined by any attribute.
These attributes can be numeric, or categorical:
Groups can also be authored manually. A group is created whenever a new row is populated in the Groups worksheet. A vertex is assigned to a group when it is named with its group in the Group Vertices worksheet.
Find Connected Components:
Each component can be assigned to its own group using the NodeXL>Analysis>Groups>Find Connected Components option.
Find Clusters – Automated Group Assignment Algorithms:
A single workbook may contain data from a single NodeXL data collection, run on a particular day and collecting data from a few hours or days back from that moment (depending on factors like the volume of activity around the selected keyword and the depth of the twitter search catalog, which is often not more than a week or two long and much shorter for active topics). An example of a single network slice is this recent map of the connections among people who mentioned “microsoft research” in Twitter on a single day (December 18th, 2010):
This is a single slice of the network, a day out of months of activity. A still frame can tell a rich story: this is a picture of a crowd that has gathered to discuss a topic of common interest: “microsoft research“. It illustrates a structure common to many large discussions of popular topics — a large set of isolates (the rows at the bottom) who were not observed to have a followed, mentions, or replies relationship to anyone else who tweeted the same term. These are casual mentioners of the topic. At the end of these rows are a small number of dyads, triads, and small components of a handful of people who link to one another but not to the largest connected component. These are pairs or small groups discussing the topic among themselves, but none are connected to a larger component. Above these rows is the “giant component” — the blob of people who do have a connection to someone else who also tweeted a message containing the same term who in turn have a connection that leads to a large number of others. The giant component is itself composed of several sub-components of densely connected groups. At the center of each component are the core users, the people who often hold their cluster together. Between these clusters are the bridges, the people who link otherwise disconnected sub-groups. At the edges are the peripheral people who have just taken the first step up from being an isolate and have formed a single reply, mention, or follows relationship to someone else who also tweeted the search keyword and can bridge them back to the core of the giant component. This is a large and active network with hybrid qualities. There is a “brand” or broadcast element in it: the yellow cluster is a hub and spoke structure centered on the Microsoft Research Twitter account. These people re-tweet what this account publishes but do not connect to one another. Just a few of these people set off second and third waves of retweets. Elsewhere in the graph there are other network structures present, for example the green and blue clusters feature people are centered around their own discussions of the term “microsoft research“.
If you collect many still frames of slices of network activity there is great value in exploring the way the network graph changes over time. In the most recent release NodeXL provides the first step in a series of features related to time and graph comparison. You can now create a workbook that aggregates the overall metrics (edge counts, vertex counts, connected component counts, etc.) for a folder full of NodeXL workbooks. In NodeXL follow the menu path: NodeXL>Analysis>Graph Metrics>Aggregate Overall Metrics to get this:
The result of this feature is a workbook with a row containing the summary data from each of the workbooks in the target folder. Any arbitrary collection of network workbooks can be aggregated but this is particularly useful when the workbooks are sequential time slices.
An example is the time series plot below tracking the rise and fall of several Twitter volume and network measures for the “microsoft research” search term over several months:
This chart tracks the number of vertices (each vertex in this case is a person our data collector saw tweet about the search term “microsoft research“) in each (almost) daily network snapshot. In addition the unique edges or connections between these Twitter users are plotted along with the number of people with no connections (“Single-Vertex Connected Components”). The size of the largest component in the network (“Maximum vertices in a connected component “) is a measure of the changing size of the core community of discussion participants. Measures like the maximum and average “geodesic” distance provide a rough measure of how long and thin (high values) or generally spherical (low values) a particular network is shaped. A “geodesic” is the longest path that can be walked through the network. Long skinny networks may indicate the presence of loosely connected smaller groups that have a few people who act as bridges. Low geodesic values suggest dense networks with people connected to many others with few isolates and sub-groups.
I find the ratios between measures of the size of the large network component and the population of isolates to be interesting. As events go on over a period of days more people connect with others who are talking about the same topic, growing the size of the large connected component. But often the isolate population also grows during this time as people at the periphery of the topic network catch sight of mentions of the event and tweet about it. I could imagine one goal of social media management to be the conversion of isolates to connected component members. Those who follow, reply or mention even a single other person also talking about a topic are more likely to return and engage than those who have zero connections. It is not clear if more connections provide a linear increase in continued engagement, I suspect that the main effect is at the zero/one divide and drops off in effect after the first dozen or so connections. Encouraging cohesion and network density by replying to isolates and encouraging others to do so may help keep a social media population focused and growing.
As mobile devices become a major method for authoring and consuming social media, location data is increasingly a part of many posts, tweets, check-ins, and messages. Many Twitter clients, for example, can add the user’s current latitude and longitude to the metadata associated with a tweet. Other systems like Facebook Places, Google Latitude and Foursquare encourage users to declare where they are to the world, often passing the information to other social media sites.
Using this location data in network analysis opens up a range of new opportunities. Instead of a person – to – person social network, location data allows people to be linked to places and, by extension, places can be linked to other places based on the patterns of connection people create when located in a particular place. A convergence of network analysis and Geographic Information Systems in underway. A great example of this can be found in this wonderful video from the BBC which demonstrates the idea by mapping the flow of telephone calls, texts, and data around the UK and the wider world.
Now, NodeXL (v.156) has the first of a series of features that will start to approximate the experience displayed in the video by supporting the import of location data about networks and plotting networks onto maps.
For now, we have started importing latitude and longitude data that Twitter makes available. If you check “Add a Tweet column to the Vertices worksheet” in NodeXL, Data, Import, From Twitter Search Network or From Twitter User Network, the Twitter user’s geographical coordinates will be added to the Vertices worksheet when they are available.
Note that not every tweet has a latitude and longitude, in fact many do not (yet). Further, note that not every latitude and longitude is accurate, many are not.
We need to implement more features for better location data support in a NodeXL workbook, but this is a start. We are interested in exploring geospatial networks and Twitter is a great data source. With this data in place we may look into features that emit KML files for exploration in other packages like Google Earth. A nifty Google Earth/Spreadsheet importer can take small sets (400) of location data points in a spreadsheet and export them to a KML file, something we could implement in the future as well. In addition we may be able to connect directly with services like Bing Maps and Google Maps to display connections between nodes with known locations. Metrics that calculate the distance between nodes seem sensible as well.
Location coordinates are the key to a cornucopia of related data about a place. Given a latitude and longitude it is possible to find the name of the city it is located in, look up data about that location in official records as well as resources like Wikipedia. Income, education, property values, weather, photos, and more can be pulled together from just a simple lat/long. All of these attributes could be used to cluster or illustrate the network visualization.
In this map nodes represent the major feature groups and functions in the NodeXL application.
This map will become the default file that will open when you run NodeXL for the first time. You will see a dialog like this:
Select Yes to have the graph above imported into the workbook. You can then display the graph using the Show Graph button in the NodeXL menu ribbon.
After that, it will be available via the help menus. When you import the file, all of the data is also available in the spreadsheet part of NodeXL so that you can experiment with changing values there to see the impact in the graph display after you hit the “refresh graphs” button.
Over the coming weeks we plan to release additional sample network data sets that illustrate key concepts and methods in network science. Suggestions for sample networks are welcome!
These are the connections among the people who tweeted the term “wikileaks” on 11 December 2010. The “wikileaks” account is at the center of the inner circle. It is surrounded by a collection of very lightly interconnected people (there is a low clustering coefficient) who retweet messages from the wikileaks account. This ring is surrounded by a thinner ring of 2nd order retweets. This is a very common pattern for a successful “brand” or broadcast account, getting lots of retweets and, even more impressive, retweets of retweets.
These are the connections among people who recently tweeted “pdfleaks” in Twitter on 11 December 2010. It shares the brand/broadcast structure of the wikileaks account shown above. After the wikileaks account itself, the second most between participant is @jeffjarvis who uniquely bridged to many people who would otherwise not be connected in this graph.
The most between people who tweeted the term pdfleaks” on 11 December 2010.
I have a MacBookPro. I can run Windows very nicely using various Virtual Machine products like VMWare’s Fusion or the free and open VirtualBox. In these virtual machines I can run Windows and Office and then NodeXL. It’s pretty neat to see Windows inside the Mac OS window. I am always amazed that it works at all.
But I try to avoid doing this as the main way I run NodeXL at almost any cost. My Mac session slows to a crawl, my Windows session is slow, and the overall usability of the system degrades too much. If you focus on JUST NodeXL in the VM it works well, but context switching is too demanding. So I mostly focus on NodeXL on a Windows machine. When I travel I usually just have my Mac laptop and miss having a zippy version of NodeXL at hand.
Recently, I discovered that Amazon EC2 offers a remarkable way to have my cake and eat it too: I created a modestly powered .micro instance of Windows XP and Office 2010 and installed NodeXL. I then use the Microsoft Remote Desktop Client for Mac OS X to get a window into the remote virtual instance. This system can be a bit slow, it responds a bit like a NetBook, but it does not slow down my Mac OS X instance and can be reached from any machine with the Remote Desktop client and internet access. For those who need a more speedy response or to handle larger data sets, Amazon is happy to sell more powerful instances of Windows for modestly more pennies per hour. For example, the .small instance is merely $0.13/hour or $3.12/day and offers much faster responses. Of course, if you turn the instance off when you are not using it, Amazon does not charge anything.
Recently Adam Fields, my colleague at Morningside-Analytics did me the great favor of recording the process of creating a new .micro instance of an Amazon EC2 Windows system.
Following each step will lead you to the creation of your own virtual machine into which you can install Office and NodeXL. You can then access this image, which runs continuously until you Terminate it – even when you close computer you use to remote desktop. This is both a bug and a feature – the system runs no matter the state of the computer you use to access it. Just remember to really turn off the instance if you do not want to pay a recurring fee! You have to actually _Terminate_ the instance to avoid paying fees shutting it down isn’t enough.
Also note that the AMI Adam used in the video was a 64-bit one, which doesn’t have a “small” option, it jumps straight from micro to large. If you want the small option, you have to use the 32-bit AMI, which is right above the one he picked.
The event focused on social media technologies and practices in the business to business space.
I spoke on a panel about the application of social network analysis to social media to illustrate the ways SNA can illuminate the shape and key contributors to a social media crowd inside the firewall or on the public Internet.
It remains early in this cycle of the use of the socialtech tag, as the conference begins the volume will ramp up and new people will enter the graph bringing with them their unique patterns of connection and interaction. Over the coming days we may see more clusters emerge and new connections form.
As of Monday evening, the socialtech Twitter graph is small but densely connected with few isolates:
Unique Edges 224
Edges With Duplicates 88
Total Edges 312
Graph Density 0.035017375
Single-Vertex Connected Components 6
By Tuesday morning, the event is gaining activity and density.
The most between members of this graph are the accounts: