Clusters are now groups in NodeXL. Recently, the NodeXL team has been focused on a set of new features related to grouping sets of vertices together. In the previous version we released a feature that allowed all sorts of groupings to be recorded in the worksheet. What’s new is that the three clustering algorithms we have already provided are just one form of group, components (connected sets of vertices) are another, and user labeled sets are a third method of creating a group of nodes in NodeXL (this last feature is still pending). This release adds the ability to add vertices to a group and then collapse all of the vertices in that group to a metanode – a composite of all the nodes in that group. It is then possible to expand the collapsed vertices into the graph
These features are part of a larger effort to support time in which “time is but a group” – a set of nodes and edges present in a time slice. We are working on designs in which some groups are sequenced, allowing the user to move up and back through collections of vertices that may appear or disappear over different time slices/groups.
Here are the most recent features: 220.127.116.11 (2010-09-06)
After you group the graph’s vertices (NodeXL, Analysis, Groups), you can now select all the vertices in a group. Go to the Groups worksheet and click on a group name.
Once a group is selected, you can collapse it into a single vertex. Go to NodeXL, Analysis, Groups, Collapse Group. You can expand it again using Expand Group.
The Groups worksheet now includes a column that tells you how many vertices are in the group.
Bug fix: The NodeXL, Help, Check for Updates feature stopped working in version 18.104.22.168.
Bug fix: If you clicked NodeXL, Graph, Show Graph while editing a worksheet cell, you would get a message that started with “Unable to set the Hidden property of the Range class.”
This version introduces the concept of “vertex groups,” or “groups” for short. A group is a set of related vertices. All vertices in a group are shown with the same shape and color. Clusters are an example of groups.
The worksheets that used to be called “Clusters” and “Cluster Vertices” are now called “Groups” and “Group Vertices.”
The NodeXL, Analysis, Find Clusters button in the ribbon has been moved to a new NodeXL, Analysis, Groups menu.
You can now group vertices by connected components, meaning that each group of interconnected vertices will have the same shape and color. Go to NodeXL, Analysis, Groups, Find Connected Components.
You can now group vertices using the values in a column on the Vertices worksheet — all vertices with degree greater than 100 in one group, all vertices with degree greater than 50 in another, for example.
If you open an older NodeXL workbook in this new version of NodeXL, the Clusters and Cluster Vertices worksheets will be automatically renamed.
You cannot open a new NodeXL workbook in an older version of NodeXL. If you attempt to do so, you will get a message that starts with “This document might not function as expected because the following control is missing: Clusters.”
That illustrate the connections among people who tweet the term “#ecomm2010”, scaled by the number of followers.
Abstract: Social network analysis (SNA) is a powerful method for gaining insight into the massive collections of connections created when many people connect to one another through mobile devices. SNA has been widely applied to desktop social media and is moving into the mobile world. Prominent studies of the “call graph” have been produced at national scales.
Mobile providers are applying SNA to identify key subscribers who can reduce churn and help gain adoption of new services and products. Network analysis has historically had a steep learning curve, but now new tools are making SNA easier for less technical users. This talk will describe social network concepts and their application to mobile data sets. A free and open add-in for the popular Excel 2007 spreadsheet called NodeXL (http://www.codeplex.com/nodexl) can perform many complex SNA tasks like data import, scrubbing, metrics calculation, clustering, and visualization. Applying this tool to call graph and subscriber data sets can reveal key positions in the network that can attract and hold other subscribers in the system.
NodeXL has a number of data importers that can create a network of connections from social media data sources like Twitter, YouTube, flickr, email, and the WWW (along with a number of other data import formats like GraphML, UCINet, CSV, and other Excel workbooks with data).
To create a network you just select the search terms and configurations you want from the NodeXL>Data>Import menu.
If you want to create the same network every day (or at any schedule), a recent feature (since version .125) of NodeXL can help. NodeXLNetworkServer.exe is an application that ships with NodeXL along with a sample configuration file called SampleNetworkConfiguration.xml. By editing the configuration file you can set NodeXL to collect anything available in the menu through Excel. So far we have exposed the two Twitter data collectors (more on the way) so the configuration file asks you to select a search term or a user’s name, the size of the network and the details you want reported along with the location and name of the destination file that NodeXL will create. Answer these questions by editing the config file and save it with a useful name that includes the search term.
This is a “pinwheel” diagram using the author’s Facebook personal network (captured July 15, 2009).
Nodes represent the author’s friends and links represent friendships among them. The author is not shown. Each ‘wing’ radiating outwards is a partition using a greedy community detection algorithm (Wakita and Tsurumi, 2007). Wings are manually labelled. Node ordering within each wing is based on degree. Node color and size is also based on degree. Nodes position is based on a polar coordinate system: each node is on an equal angle of n/360º with a radius being a log-scaled measure of betweenness. Higher values are closer to the center indicating a sort of cross-partition ‘gravity’.
This layout has several notable features:
– The angle of each wing is proportionate to its share of the network. Thus 25 percent of nodes go from 0 to 90º.
– Partitions are distinguished by their position rather than a node’s color or shape.
– The tail indicates the periphery of each partition. A wing with many tail nodes indicates many people who are only tied to other group members.
– Edges crossing the center show between-partition connections. Since nodes are sorted by degree it is easy to see if edges originate from the most highly connected nodes or the entire partition.
Bring a laptop (running Windows and Office 2007 or 2010) to this workshop and you can be analyzing a social media network from systems like Twitter, flickr, YouTube and your own email by the end of the day. If you can make a pie-chart in Excel, using the free and open NodeXL (http://nodexl.codeplex.com) you can now make a rich network graph from data extracted from social media systems and other common formats. If you have a network, bring it, if not you can bring a suggested topic that we can map during the course of the day.
Even if you leave your laptop behind or have a Mac (sorry, no version is yet available for MacOS – unless you have a virtual machine with Windows and Office) this workshop will introduce the core concepts of network science with application to social networks in general and social media networks in particular. Applied to a range of topics and services, social media network maps can illuminate a variety of “publics” – populations who share a common interest and may share connections. Maps of topics like “oil spill”, “global warming” and other issue and event related keywords can reveal the groups and factions that cluster around different concepts and terms. Key contributors in these maps can be identified through the application of network measurements that capture various aspects of a person’s location in a network graph.
Businesses, entrepreneurs, individuals, and government agencies alike are looking to social network analysis (SNA) tools for insight into trends, connections, and fluctuations in social media. Microsoft’s NodeXL is a free, open-source SNA plug-in for use with Excel. It provides instant graphical representation of relationships of complex networked data. But it goes further than other SNA tools—NodeXL was developed by a multidisciplinary team of experts that bring together information studies, computer science, sociology, human-computer interaction, and over 20 years of visual analytic theory and information visualization into a simple tool anyone can use. This makes NodeXL of interest not only to end-users but also to researchers and students studying visual and network analytics and their application in the real world. NodeXL has the unique feature that it imports networks from Outlook email, Twitter, flickr, YouTube, WWW, and other sources, plus it offers a rich set of metrics, layouts, and clustering algorithms. This talk will describe NodeXL and our efforts to start the Social Media Research Foundation.
The NodeXL team has just released a new version (v.22.214.171.124) that contains a new “Automation” feature that allows users to define a collection of operations to perform on their network graphs and invoke the complete set in a single button click AND reuse that configuration on other workbook graphs. In fact, the feature will apply the configuration you define to all the files you specify, allowing easy processing of large collections of network data sets.
This week the feature is partially complete. Users can invoke the merge duplicate edges, calculate graph metrics, auto-fill columns, create sub-graph images, find clusters and show graph. These operations can require as many as dozens of clicks when performed manually. If you have dozens or hundreds of network data sets the result is a daunting case of repetitive strain injury and carpal tunnel syndrome. Instead, with automation, these operations can be carried out orders of magnitude more frequently without much pain!
The next release will feature the complete package which will then include control over the layout and graph options. As a result, automatically generated network visualizations can be produced in a pipeline: users will be able to specify a query using the NodeXL desktop network data collector and then automate the processing of large collections of data sets.
The result should be better analysis of time series data sets that have many “slices”. The feature points the way to additional development work for supporting the comparison between networks to evaluate their evolution.
A new paper on visualizing social media has been released on the University of Maryland, Human Computer Interaction Laboratory tech report archive. Co-authored by Derek Hansen, myself, and Ben Shneiderman, the paper describes and visualizes the patterns of connections formed when people tweet about events like conferences and news stories.
Hansen, D., Smith, M., Shneiderman, B. EventGraphs: Charting Collections of Conference Connections
EventGraphs are social media network diagrams constructed from content selected by its association with time-bounded events, such as conferences. Many conferences now communicate a common “hashtag” or keyword to identify messages related to the event. EventGraphs help make sense of the collections of connections that form when people follow, reply or mention one another and a keyword. This paper defines EventGraphs, characterizes different types, and shows how the social media network analysis add-in NodeXL supports their creation and analysis. The paper also identifies the structural and conversational patterns to look for and highlight in EventGraphs and provides design ideas for their improvement.
This is the NodeXL map of connections among people who tweeted the hashtag used for the conference “#sunbelt”.
Having now seen several of these maps for other topics and events (see: http://www.flickr.com/photos/marc_smith/sets/72157622437066929/) this map can be placed in context. It is a small group, but has a high density of connections. It lacks isolates, the people who say the term but do not connect to others who say that term. This means that this is a very “in-group” population: if you know to use the #sunbelt hashtag, you probably connect to someone else who uses the term. It is a single major cluster of connected people, no obvious sub-graphs or clusters are visible. Not everyone is central in the graph, and those who are have a prominent role in the network science community. Here is the top ten list of #sunbelt mentioning twitter users ranked by betweeness centrality.