If you are attending the CHI 2010 conference in Atlanta and are interested in social media network analysis, consider attending this tutorial:
CN03: Introduction to Social Network Analysis
Time: Monday, 12 April 2010, 11:30 to 18:00
Organizers: Marc A. Smith, Panayiotis Zaphiris, C.S. Ang, Derek Hansen
Benefits
This course provides an overview of Social Network Analysis (SNA) and demonstrates through theory and practical case studies how it can be used in HCI (especially computer-mediated communication and CSCW) research and practise. This topic is of particular importance due to the popularity of social networking websites (e.g. YouTube, Facebook, MySpace etc.) and social computing. As people increasingly use online communities for social interaction, new methods are needed to study these phenomena. SNA is a valuable contribution to HCI research as it gives an opportunity to rigorously study the complex patterns of online communication.
Social network theory views a network as a group of actors who are connected by a set of relationships. Actors are often people, but can also be nations, organizations, objects etc. Social Network Analysis (SNA) focuses on patterns of relations between these actors. It seeks to describe networks of relations as fully as possible. This includes teasing out the prominent patterns in such networks, tracing the flow of information through them, and discovering what effects these relations and networks have on people and organizations. It can therefore be used to study network patterns of organizations, ideas, and people that are connected via various means in an online environment. [Read more →]
“The Symposium aims to generate discussion on cutting-edge ideas in political communication, encourage international cooperation and unite scholars and practitioners.”
Organizer and founding Dean of IE School of Communication, Samuel Martín-Barbero notes that the event will gather:
“More than forty international panelists, moderators and speakers (who) will not only reflect on the state of the field, but will also discuss cutting-edge advances in theory, research and practice.”
I will attend along with my colleague John Kelly, from the Berkman Center for Internet & Society, Harvard University and Founder of Morningside Analytics.
The Israel Internet Association is the official Israeli Chapter of the Internet Society. Their annual meeting is a central event of academics (sociologists, psychologists, business and law) as well as industry participants from sectors including mobile cellular companies and internet service suppliers.
My talk title: Analyzing Internet social media: visualizing social networks in (mobile) computer networks
Abstract: Social media systems on the Internet are sociologically interesting: why do some online groups succeed where others fail? How do different collections of online media and populations of authors differ from one another? How do patterns of contribution vary and how do these differences illustrate the roles people play within their communities? Several visualizations of patterns of contribution and connection in a range ofInternet social media including web boards, enterprise social networks services, and personal email are presented to illustrate the range of variation among social media repositories and between types of contributors. These images suggest that a more comprehensive overview of social media can generate sociologically relevant findings, improve community management tasks as well as provide features that can improve search and ranking of user generated content. A freely available tool, NodeXL, will be demonstrated to perform basic social media analysis tasks. Extending these tools to include mobile social software (“mososo”) data sets is a major new direction. In the not too distant future, mobile devices will possess a range of sensors and become more “socially aware”. When phones routinely notice each other the nature of social interaction will change dramatically. How will places and locations change when machines become socially aware? In this talk, sociologist Marc Smith, Chief Social Scientist for Connected Action Consulting Group, a provider of social media analysis platforms and services, will describe these new technologies and some ways of thinking about their implications.
Title: Visualizing collections of social media connections: using social network analysis to assess, evaluate and measure social media engagement
Abstract: Social networks are created whenever people interact. These networks become more visible when interactions take place through social media. Social networks form when people link, reply, comment, edit, tag, and friend one another. Sub-populations are formed whenever people mention the same company, products, event, topic, or personality. Using social network analysis on collections of social media connections reveals important patterns: how are people clustered and grouped, where are the gaps, who plays the roles of bridge, hub, and isolate? In this talk I will display maps of twitter, you tube, flickr, and enterprise email systems and demonstrate several tools that can be used to collect, analyze, map and monitor social media, including the free and open NodeXL (network overview, discovery and exploration) add-in Excel 2007.
Here, for example, is a map of the connections among people who recently mentioned “haifa” in twitter sized by number of followers:
From family photographs and personal papers to health and financial information, vital personal records are becoming digital. At the same time, creation and capture of new digital information has become a part of the daily routine for hundreds of millions of people. But what are the long term prospects for this data?
The combination of new capture devices (more than 1 billion camera phones will be sold in 2010) with the move from older forms of media is reshaping both our personal and collective memories. The size and complexity of personal collections growing, these collections are spread across different media (including film and paper!), and the lines between personal and professional, published and unpublished are being redrawn.
Whether these issues are described as personal archiving, lifestreams, personal digital heritage, preserving digital lives, scrapbooking, or managing intellectual estates, they present major challenges for both individuals and institutions: data loss is a nearly universal experience, whether it is due to hardware failure, obsolescence, user error, lack of institutional support, or any one of many other reasons. Some of these losses may not matter; but the early work of the Nobel prize winners of the 2030s is likely to be digital today, and therefore at risk in ways that previous scientific and literary creations were not. And it isn’t just Nobel winners that matter: the lives of all of us will be preserved in ways not previously possible.
On Tuesday, February 16, the Internet Archive will host a small conference for practitioners in personal digital archiving.
Time for another NodeXL update: sparklines! Sparklines are a nifty and compact way of displaying a line chart in a small area.
Setting dynamic filters in NodeXL has been somewhat like rummaging around in the dark: without a way to see the distribution beneath a filter the user only knows the max and min values, not where the bulk of the observed data is located. This version of NodeXL (1.0.1.111) features an improvement to the Dynamic Filters feature used to limit the nodes and edges displayed in the network visualization pane. Earlier versions of Dynamic Filters allowed users to select a range for each attribute associated with the Edges and Vertices worksheet. In the last version of NodeXLwe added an automated feature for creating those distribution histograms and placing them in a stack on the Overall Metrics worksheet after the user runs the “Graph Metrics” feature. That is helpful but the worksheet is far from the user when they are setting the ranges within the Dynamic Filters dialog. Now, the current release adds “sparklines” to the Dynamic Filters dialog box: as you set the upper and lower bounds for any network edge or vertex attribute, you can see how much of the distribution is included and excluded in the display.
This is one of several features added to NodeXL to make it easier for users to explore their networks and find actionable insights. We have added sparklines in the dynamic filters interface in NodeXL so that you can now see the shape of a value’s distribution as you set the maximum and minimum values to be included in the filter. Histograms also now appear in the Overall Metrics tab of the NodeXL worksheet where they convey the distribution of the major network attributes in the graph.
My good friend and colleague Jeff Ubois recently edited and released a volume entitled Conversations on Innovation, Power and Responsibility for the Fondazione Giannino Bassetti. Some of my comments on the topic of innovation from a conversation with Jeff are included in the volume which collects a wide range of thoughts about the nature and consequence of technical change.
Table Of Contents
Foreword
Introduction
About the Question
Related Concepts
Choosing Subjects: Where Does Responsibility Matter Now?
Genetics And Healthcare
Thomas Murray, The Hastings Center
Ignacio Chapela: Drawing a Boundary Around the Lab
Arthur Caplan: Innovation as Politics
David Magnus & Mildred Cho: True Fictions
Nanotechnology
Christine Peterson: Nanotechnology and Enhancement
Lawrence Gasman: Nanomarkets
Robotics And Computing
Ronald Arkin: Embedding Values in Machines
Jeff Jonas: Applying the UN declaration of human rights
Marc Smith: Invention, mitigation, accounting and externalities
Mikko Ahonen: Open Innovation … and Radiation Safety
Design
Roberto Verganti: Varieties of Design Innovation
Michael Twidale: IRBs, Design, Empowerment,
Accountability, Sustainability
In the most recent prior release of NodeXL we added new metrics that describe networks in terms of their number of components and the length of paths in those networks. In this release we automate creation of histograms of network metrics. It is useful to see the distribution of attributes like in-degree or betweenness to get a feel for the nature of a network. Building a histogram in Excel is easy, but building seven (one for each of the metrics we create: degree, in-degree, out-degree, betweenness, closeness, eigenvector centrality, and clustering coefficient) is a chore. Doing this repeatedly for several networks is too much work! Now, when you calculate metrics in NodeXL we will create these charts for you and place them on the Overall metrics worksheet.
We will add axis markings and titles soon, making these charts ready to use in a variety of network reports. These histograms will also appear in the Dynamic Filters dialog to guide users as they select segments of the distribution to include or filter out of the displayed network graph.
Other updates:
1.0.1.110 (2010-02-03)
The Overall Metrics worksheet now includes more information about the degree, in-degree, out-degree, betweenness centrality, closeness centrality, eigenvector centrality, and clustering coefficient metrics when those metrics are computed. The additional information includes the minimum, maximum, average, and median metric values, and a histogram showing the metric value distribution.
The “Convert Old Workbook” item on the NodeXL, Data, Import menu in the Ribbon is now called “Import from NodeXL Workbook Created on Another Computer.” This menu item can be used to work around the following problem: NodeXL workbooks created on a 64-bit Windows computer cannot be opened directly in Excel on a 32-bit Windows computer, and vice-versa. (If you attempt to do so, you will get an error message whose details include “could not find a part of the path.”)
A Clear All Worksheet Columns Now button has been added to the Autofill Columns dialog box (NodeXL, Visual Properties, Autofill). Also, you can now clear an individual worksheet column by clicking a button in the dialog box’s Options column.
Bug fix: On large-font machines, the buttons at the bottom of the Autofill Columns dialog box didn’t fit within the dialog box.
Bug fix: In some circumstances, vertices were drawn below the bottom of the graph pane and were impossible to see. One such circumstance was when the selection was exported to a new workbook (NodeXL, Data, Export, Selection to New NodeXL Workbook). The graph pane in the new workbook acted as if it were taller than its real height, leading to vertices dropping off the bottom.
I am pleased to announce that we have signed with Elsevier/ Morgan Kaufmann to produce a book: Analyzing Social Media Networks with NodeXL: Insights from a Connected Worldfor a Summer 2010 delivery!
A map of the relationships among the population of people who all tweet a particular keyword can lead to the discovery of the key hubs and influential people in the network. A social network analysis of reply patterns in email collections displays clusters around projects and highlights key people and relationships. Visualizing the connections among your friends in Facebook can reveal the various life stages and communities in which you have participated. When you chart the links between videos and users in YouTube content with interesting network properties is exposed based on well connected content creators and influential commentators. A graph of the individual connections between flickr users illustrates the emergent formation of groups around social networks, locations, and topics.
These kinds of social media network data collection, scrubbing, analysis, and display tasks have historically required a remarkable collection of tools and skills. A great example of the variety of tools that can be used in concert to extract, analyze and display social media networks can be found on Drew Conway’s blog. This is a powerful set of tools for those who can master the demands of python and API interfaces. In contrast, the approach the NodeXL project has taken is to provide an end-user GUI application environment built within the framework of Excel 2007 for performing basic social media network analysis and visualization for non-programmers. The python path is certainly the high road for experts and those with demanding volumes or esoteric data requirements. But for the non-coding user, NodeXL may be one of the easiest ways to both manipulate network graphs and get graphs from a variety of social media sources.
There are already some materials available to guide new users interested in learning about NodeXL, social networks, and social media. A video tutorial for NodeXL demonstrates the extraction of the network of people in twitter who mentioned the term “digg”. A tutorial guide to NodeXL offers a step by step guide to features in the NodeXL toolkit (with supporting data sets). But the book will capture the theory, history, domain and process of social media network analysis in a single volume.
The volume contains a broad introduction to social media, social networks and the operation of the NodeXL application and then features a series of chapters from leading researchers that focus on a particular social media system (email, Facebook, Twitter, YouTube, flickr, Wikis, the WWW hyperlink network) and the networks each contains (replies, friends, follows, subscribes, comments, favorites, edits, links, etc). A final chapter outlines a programmer’s view of the NodeXL code, in contrast to the code-free approach of the remainder of the book.
Our intended audience is the mostly non-programming population that is interested in social media and the techniques of social network analysis. The volume is largely in the form of a how-to guide that readers can follow and replicate all examples. Using your own free and open copy of NodeXL, you will be able to use sample data sets or create similar live queries that map relationships in social media systems.
We have an ambitious production schedule so the book may be on a book store shelf or online retailer search result in summer 2010.
NodeXL has updated again (v.1.0.1.109) with new network metrics. The application now calculates path length data for your network, reporting the Maximum Geodesic Distance and the Average Geodesic Distance. The list of overall metrics NodeXL creates includes: Vertices (the number of nodes in the graph), Unique Edges, Edges With Duplicates, Total Edges, Self-Loops (Edges that point back at the node from which they originate), Connected Components (each set of connected nodes that are not connected to another set of nodes), Single-Vertex Connected Components (all the “singletons” of just one node in a component), Maximum Vertices in a Connected Component (the size of the “Giant” component), Maximum Edges in a Connected Component (the density of the “Giant” component), Maximum Geodesic Distance (Diameter) (the longest path that can be uniquely walked through the graph), Average Geodesic Distance (the average distance between two nodes in the graph (compare this to the “six degrees” standard), Graph Density (the density of the complete network).
More metrics and details on existing metrics are on the way!
NodeXL updated starting with version 1.05 with features that make it fairly easy to create basic “Venn Diagrams”. A Venn diagram is a familiar way to illustrate the overlap (or lack thereof) of two or more “sets” of things.
There are some very amusing Venn diagrams out there! This one in particular made me laugh but I may be dating myself.
A Venn is related to but different from an Euler diagram. An “n-Venn” diagram is a collection of closed curves (“circles”) on a plane where all the circles intersect. A “simple” Venn diagram has just two circles but complex diagrams can have more. A 2 circle Venn diagram has 3 regions (A, B, A+B) and a 3 circle Venn diagram has 7 regions (A, B, C, AB, AC, BC, ABC).
Our implementation is a bit of a hack, we basically let you define the X/Y location of 3 circles. A richer Venn tool would make it easy to take set data and define these circles. We may get that implemented in the coming months.
Facebook recently announced its fellowship program for graduate students pursuing a PhD. Winners will receive tuition, fees, and a stipend for the 2010-2011 academic year. Anyone interested in applying should move quickly, as the deadlines are quite tight in order to ensure that we can provide funding for the upcoming year.
The fellowship is designed to support research in the following areas:
Internet Economics: auction theory and algorithmic game theory relevant to online advertising auctions.
Cloud Computing: storage, databases, and optimization for computing in a massively distributed environment.
Social Computing: models, algorithms and systems around social networks, social media, social search and collaborative environments.
Data Mining and Machine Learning: learning algorithms, feature generation, and evaluation methods to produce effective online and offline models of behavioral signals.
Systems: hardware, operating system, runtime, and language support for fast, scalable, efficient data centers.
Information Retrieval: search algorithms, information extraction, question answering, cross-lingual retrieval and multimedia retrieval
Applications are due on February 15th. For full details, check out the Facebook Fellowship page.
Please pass this information on to any PhD students you think might be interested in applying.
Network visualizations can be very compelling but they are often a smear of unintelligible nodes and edges without refinement and filtering. Creating an automated layout for a complex graph is a challenging area of mathematics and computer science. Several layouts are available and are widely implemented, including the Fruchterman-Reingold layout, the Harel-Koren fast multilevel layout, and a number of geometric designs like circles, grids and trees that can be useful for some data sets.
Improvements to these layouts have been slow in coming: the math behind these layout algorithms have no simple or even best solutions.
Recently, a simple technique has done a great deal to improve many complex network graph layouts by arranging each component in the graph in a grid. Components are pieces of a network that are not connected to any other component. These islands come in various sizes, often there is one large or giant component and many smaller “isolates”. In many layout algorithms these isolates are a problem and are either pushed to the edges of the graph into a circle or ellipse that resembles an “asteroid belt” (Fruchterman-Reingold) or are overlaid on top of all the other isolates (Haren-Koren). A solution is to collect all the “isolates” and organize them sensibly and within a grid such that each component is laid out within its own territory or cell.
*BEFORE*
*AFTER*
The result is a step towards what Ben Shneiderman refers to as “NetViz Nirvana” – a state in which network graphs are more visually intelligible. When isolates are binned in a grid, two graphs can be visually contrasted far more than when they each have a smeared “asteroid belt” of nodes.
We have implemented an initial binning layout method in the latest version of NodeXL that simply breaks out each component and places it within a grid based on the number of nodes and edges in that sub-graph. I can imagine more sophisticated approaches would locate each sub-graph based on a range of attributes.
I think the improvement in network visualization is significant. Isolates no longer impose a big effect on the giant component which often was compacted and compressed as a result of even a single isolate.
There are hundreds of conferences sponsored by the ACM on almost every topic related to computing. In some cases the same person will publish a paper in more than one conference, creating a tie between them. Below is a network map application that displays a collection of ACM conferences connected by this authorship tie:http://bosch.informatik.rwth-aachen.de:5080/AERCS/Networks.jsp
This image displays the isolated component that is composed of the “social” conferences in the ACM schedule: CHI, CSCW, DIS, UIST, GROUP, ECSCW, and Interact. The overview illustrates the macro structure of the graph, with the prominent giant cluster of core computer science topics like algorithms, machine learning, and logic. The rows below this cluster are populated by an archipelago of conferences, a few composed of ten to twenty conferences, but most made up of two to five conferences. These are the more marginal topics in the ACM world, in contrast to the conferences at the cores of the giant component.
It would be nice to see the application add additional network display attributes like size, color, shape, edge thickness to indicate conference attributes like papers published, cited, attendees, and sponsors. It is a nice example of the insights network visualizations can bring to a data set and the value of an interactive interface (and a web interface at that!) for investigating complex graphs.
The deck illustrates the use of NodeXL to extract several social media networks from systems like twitter and facebook to generate maps of communities and identify people and objects in key locations.
Steve Ballmer was right when he told the Search Marketing Expo that Googles biggest advantage is having been good enough before anyone else. But I think theres something Microsoft doesn’t get about Google cul...