NodeXL has new updates to its importers for Twitter users and lists.
We have released an updated version of NodeXL that simplifies and merges the previously separate User and List importers.
The new, streamlined importer treats an individual user as a list of one.
Lists can be defined by pointing to an existing Twitter List or simply entering a list of delimited user names into the text box.
The updated importer now collects many more tweets per person and parses these messages to generate reply and mention edges.
You can now define a group of Twitter users and find out how much they reply and mention one another.
You can even pull in the followers of each person, to see if they reply or mention people they also follow.
But ever since June 11, 2013, Twitter has made access to the “follows” edge data very difficult (its just very slow). Designed and implemented prior to the update that restricted access to the follower network, the original NodeXL Twitter list importers relied mostly on queries that are now impractically slow for all but the smallest lists of users who have small collections of followers.
The update to these User and List importer is partially an adaptation to these changes. The importer shifts away from the follower network to focus on the communication interaction data in the content of Tweets. Since Twitter offers more generous access to Tweets than to information about who follows who, we are obliged to make better use of what they do offer.
Networks are everywhere but collecting, analyzing, visualizing, and gaining insights into connected structures can require advanced technical skills. This session presents a free, easy-to-use tool for network analysis that builds on the familiar Excel spreadsheet called NodeXL. If you can make a pie chart, you can get insights into networks. The tool makes it easy to collect data from a range of social media (Twitter, Facebook, YouTube, etc.). Quickly create visualizations and reports on the shape of connected groups. Identify the key people, groups and topics in a community. Network analysis can reveal the hidden structures in streams of interactions.
Date and Time: Sunday, March 16, 2014, Full day: 9:00Am – 4:30pm
Intended Audience: Managers, decision makers, practitioners, and professionals interested in a broad overview and introduction
Knowledge Level: All levels
Attendees will receive an electronic copy of the course notes and materials.
“Big Data” is everywhere. The topic is impacting every industry and institution. Big excitement about big data comes from the intersection of dramatic increases in computing power and data storage with growing streams of data coming from almost every person and process on Earth. The pressing question is, how do we best make value of all this data – what should we do with it?
Working with big data effectively depends on understanding the sources of data and the issues in storing and analyzing it:
Where does big data come from?
How do you manage, store, and compute on big data?
What qualifies as “big”?
This one day workshop reviews major big data success stories that have transformed businesses and created new markets.
Dr. Smith will cover these revealing stories in order to illustrate the key concepts, tools, and value-proven applications driving the big data revolution.
“Big data” is a open buzzword – it could be defined as any amount of data you can’t afford to handle – but the big, newfound value achieved by computing at scale is no fad.
What you will learn:
Where does big data come from: Common sources of big data.
What makes data big: Velocity, Variety, and Volume!
How can we leverage it: Open tools and platforms for storing and analyzing big data.
The new paradigm: Today’s shift from hypothesis testing to a broad exploration for correlations is a revolutionary change in the way data is explored.
Best practices for analyzing big data: Key methods in data science, predictive analytics, and text analytics to analytically learn from data.
Social Data: Finding key connections in webs of people and events.
Applications of big data insights to business.
Future directions in big data: bigger, bolder, and better.
Workshop starts at 9:00am
First AM Break from 10:00 – 10:15am
Second AM Break from 11:15 – 11:30am Lunch from 12:30 – 1:15pm
First PM Break: 2:00 – 2:15pm
Second PM Break: 3:15 – 3:30pm Workshops ends at 4:30pm
Network analysis is a way of looking at the world that focuses on the shape and structure of collections of relationships.
In a network perspective the world is not primarily composed of individuals (“nodes”, “vertices”, “entities”). Instead, a network approach focuses on relationships between individuals (“edges”, “ties”, “connections”, “links”).
When collections of connections are analyzed, network patterns emerge. Networks have a variety of shapes and within them people occupy a variety of locations within each network. Some people are highly connected, while most people have just a few connections, for example.
Network theory provides a big collection of math that enables the measurement of these shapes and structures.
Using these measures, network analysis can identify key people in important locations in the network (for example: hubs, bridges, and islands). Network metrics allow for the network as a whole to be measured in terms of size and shape. Networks have many basic shapes and we have found six shapes to be common in internet and enterprise social media: divided, unified, fragmented, clustered, outward hub and spoke, inward hub and spoke. These shapes are created when people make individual decisions about who to reply to, link to, and like.
Divided networks are created when two groups of people talk about a controversial topic – but do not connect to people in the “other” group. Unified networks are formed by small to medium sized groups that are obscure or professional topics, conference hashtags are a good example. Fragmented networks have few connections among the people in them: these are often people talking about a brand or popular topic or event. Clusters sometimes grow among the people talking about a brand, indicating a existence of a brand “community”. Broadcast networks are formed when a prominent media person is widely repeated by many audience members, forming a hub-and-spoke pattern with the spokes pointed inward at the hub. The final pattern is the opposite, hub-and-spoke patterns with the hub linking out to a number of spokes. This pattern is generated by technical and customer support accounts like those for computer and airline companies. Additional patterns may exist, but these patterns are prominent in many social media network data sets.
When applied to external conversations, social media networks help identify the “mayor” of a hashtag or topic: these are the people at the center of the network. Network maps can be compared to the six basic types of networks to understand the nature of the topic community. We can look for examples of successful social media efforts and map those topic networks. Social media managers can contrast their topics with those of their aspirational targets and measure the difference between where they are and where they want to be.
When applied to enterprise conversations and connections, network analysis can reveal the experts who answer many people’s questions and “brokers” who bridge otherwise disconnected groups as well as the “structural holes” that show where a bridge or link is needed.
These insights can be useful in mergers, HR evaluation of group and manager performance, and identifying internal subject matter experts.
Research performed using NodeXL shows that work teams that have higher levels of internal connection (which is called “network density”) have higher levels of performance and profit. See:
The impact of intragroup social network topology on group performance: understanding intra-organizational knowledge transfer through a social capital framework Wise, Sean Evan (2013) The impact of intragroup social network topology on group performance: understanding intra-organizational knowledge transfer through a social capital framework. PhD thesis, University of Glasgow.
Full text available as: PDF Download (2499Kb) | Preview http://theses.gla.ac.uk/3793/
I spend a lot of my time studying social media and the networks that form in them. But I have growing doubts about the time I spend on commercial services. Despite seeming like public spaces, these services are really not public.
Social media is increasingly the space in which public life takes place. News, debates and discussions are more likely to take place now in Facebook, Twitter, and other social media services than in public squares, civic buildings, or community centers. Virtual public spaces fill the void created by the lack of public spaces and places in our cities and towns that allow for public mixing and interaction. But virtual public spaces are just that: virtual. They are not real public spaces, and the “virtual” public space they provide is not “as if” or even better than the real thing. Virtual public space lacks many of the features of real public space and is not an upgrade over the real thing.
Virtual public spaces try to seem like public spaces, but they are like shopping malls: commercial spaces that encourage only a subset of public behaviors. Raised in commercial spaces that have replaced public spaces, many people no longer even imagine behaviors that are not welcome in a mall. Protest, petitions, organizing, and protected speech have no place in a shopping mall. Some property owners allow some forms of speech, but no one but the owners have a “right” to speech in a mall. Shoppers, consumers, guests, customers, and visitors are not citizens while they are in a commercial space.
Virtual public spaces are not public spaces, but as we spend our public time in them, we drain the life from alternative public spaces. Our collective chatter in social media becomes the intellectual property of a company not a commonly owned public asset. Our history is not our history.
Social media services vary in terms of how open or restrictive they provide data generated by their users.
Some services, like Wikipedia, are very open, offering many methods to access large and small amounts of data from recent or historical times.
Some services, like LinkedIn, are very closed, offering almost no access to any data from their service.
For many services, the lack of access to data is not an ideological choice, rather it is a practical issue related to the costs associated with storing and serving large volumes of data. These companies are well within their rights to do as they like with their data and business plans.
However, their data is actually my data (and your data). We may soon realize that we prefer to commit our bits to repositories that hold and redistribute our content on terms that support civic goals of open access. What we need are credible alternatives to these services, with alternative funding models: perhaps a “Public Bit Service” or “National Public Retweet”?
The long awaited (and delayed) change to the Twitter API is now here: API 1.1 is now the only service available, the long used API 1.0 is gone.
This has an impact on people who have been collecting and analyzing data from Twitter. Twitter has given and taken away with the new 1.1 API. Mostly taken away. More Tweets are sometimes available from the new API, up to 18,000 rather than the old 1,500 tweet limit. This is a big change, but normal users often do not get much benefit from the limit increase if the topic they are interested in has fewer tweets. The length of time tweets are retained and served is not much longer than it was.
The big change is the effective loss of the “Follows” edge. Some users of the 1.0 API used to be able to get a significant number of queries that asked about who each user followed. These queries generated data that allowed a network to be created based on which users followed which other users. The “Follows” network in Twitter has been very informative, pointing to the key people and groups in social media discussions. But now the “Follows” edge will be effectively impossible to use.
Twitter API 1.1 changes the limit on the number of queries about who follow who in Twitter to 60 per hour. In practice, a network may have several hundred or thousand people in it, making a query for each person’s network of followers impractical. With the follows edge effectively gone, the remaining edges, “reply” and “mention” become more important. These edges are far less common than the “Follows’ edge. Many people follow lots of other people but mention the name or directly reply to very few. With the loss of the Followers edge, Twitter networks can become very sparse, with few connections remaining. Dense structures give way to confetti.
Here is a map of the topic #scaladays with the Followers edges compared to the same map with no Follower edges:
With the “Follows” edges gone, the loss of insight into the nature of the network is profound, but not fatal. The reply and mention network does have some density in many discussions, allowing many kinds of network positions and structures to be observed. Edges can also be synthesized from other evidence, for example a link could be created when two people use words in common that are not commonly used by others.
The NodeXL project has released a version that connects to the new Twitter API 1.1 and we will be releasing additional edge types that will link people when they share content like hashtags, URLs, words and word pairs with other people. These shared content edges are based on a presumption that when people use similar content that is rarely used by others they are likely to have an underlying connection. The assumption that shared content use is a surrogate for the “follows” relationship requires additional testing (which will be difficult with out access to the data that Twitter just removed). For now, these connections do return density to networks that have been shattered by the loss of the visibility of the Follows connection and can indicate common interests among Twitter users.
Bits exist along a gradient from private to public. But in practice they only move in one direction.
Thus, there are two destinies for information: public or oblivion.
Information wants to be copied.
This is not the same as information wanting to be free (or expensive), or information wanting *you* to be free. Information probably prefers to be free because it may increase the rate at which it is copied, not because it is inherently liberating to the user. In fact, the “free” quality of some information is probably not liberating at all. Copying and liberty are orthogonal.
Information diffuses over time: access rights to information can expand over time, but only rarely (ever?) does data become less available, and once available publicly, information is almost never entirely private again.
With enough copies on enough devices, information becomes essentially public. The state of being public may come in degrees, some things are more public than others. Much information is public in principle but enjoys security by obscurity. Obscurity is eroded by increasing availability of computing resources that make collection and machine analysis affordable at large scales. The banality of data is no protection. “No one cares what I think/do/say/click” is not a valid assumption. In aggregate the banal is data and fuel to many business models. Maybe no one *cares* what you tweet, click, buy or search for, but many businesses make it their business to aggregate these scattered faint signals and build detailed profiles to drive commerce and customized views of data.
Some information is destroyed, never to be recovered. This is the only way information can avoid eventually (potentially) becoming public. But less and less data now meets this fate. Delete is a declining feature of many systems.
Information that is not public and has not yet been destroyed is just waiting to change to either state.
Despite security systems, many private bits are eventually exposed by people passing material to someone else who then accidentally makes them public, or they do so unintentionally themselves by leaving files in publicly accessible locations that are visited by search engine spiders and other web crawlers. Even professionally managed private data repositories are subject to subsequent distribution, infiltration or error. Data spills are becoming more common. Billions of records are hemorrhaged into the public regularly. If well funded organizationscannot secure their information, the rest of us should take note.
It may not be possible for big organizations or any organization to secure their networks, or even do so sufficiently effectively to give users a practical period of privacy, however short. Eventually private bits, even when encrypted (no matter how well), become public because the march of computing power makes their encryption increasingly trivial to break and their exchange over networks (no mater how well secured) is subject to leaking, intentional and otherwise. Private bits may only have a “half-life” during which they retain their non-public existence. The length of this half-life may itself be getting shorter. Mary Branscome suggests that there could be a physical law in operation: the natural entropy of access control lists?
All bits that persist are destined to be public, and once public never to be private again. Unless they are destroyed.
I argue that the only bits that you cannot find are the ones you need right now. The only bits you cannot get rid of are the ones that are most embarrassing to you right now. Just because you cannot find the bits you want does not mean that no one else can find those bits.
This issue is getting more important as we are invited to use systems that promise selective sharing of data and other tools generate ever more data to potentially share. Anything that puts your bits into the cloud promises selective sharing. I believe and hope my much beloved Dropbox account is separate from all the others, except for the one’s I chose to share with. And I think it is, expect for that glitch they had, the details of which elude me (but I think we’re good now, and I so depend on Dropbox I do not know what I would do without it). But all these walls are just made out of a few lines of business logic and an Access Control List. ACLs rule our access to digital objects with an iron fist until they don’t for the many human and technical reasons mentioned. Like most human infrastructures these selective sharing mechanisms are subject to failure and attack.
Now new sources of data captured from the details of everyday life by sensors and services are increasingly recorded by external systems and by people themselves, generating new streams of archival material that is richer than all but the most obsessively observed biographies.
Some steps are still in progress: when my phone notices your phone a new set of mobile social software applications become possible as whole populations capture data about other people as they beacon their identities to one another. Additional sensors will collect ever more medical data with the intent of improving our health and safety, as early adopters in the “Quantified Self” movement make clear.
But the consequences of data diffusion are becoming difficult to predict. Social media systems are being linked to one another to enable cascades of events to be triggered from a single message as status updates are passed among Facebook, LinkedIn, Twitter, and blogs. Tools now automatically aggregate the results of searches and post articles that themselves may trigger other events. Taking a photo or updating a status message can now set off a series of unpredictable events.
Add potential improvements in audio and facial recognition and a new world of continuous observation and publication emerges. Some benefits, like those displayed by the Google Flu tracking system, illustrate the potential for insight from aggregated sensor data. More exploitative applications are also likely.
Therefore, all services that promote the idea of “selective sharing” are selling a myth. The more you trust that information you generate can be contained, the more potential there is for an “explosive decompression” as data intended for an individual or a small group becomes suddenly available to a large group or a complete population. Private bits are in a state of high potential energy, always poised to become public.
Engineering is the science, art and practice of containing and directing forces. Information system engineers might be up to the challenge of delivering selective sharing. And when combined with law, regulation and social practices, technology could make selective sharing real the way that engineers manage the flow of powerful but dangerous flows of high pressure steam through power plants. However, recently even high pressure steam engineers working with nuclear fuels have faced some very bad failure conditions beyond their predicted scope. Information technologists may face analogous issues when managing high pressure containers of selectively shared information.
My policy is not to give up all forms of privacy, I still keep my email and other data behind passwords that I do not (knowingly) share. I share lots of pictures on flickr but not all of them are public. I would prefer to keep lots of financial, medical, and personal stuff selectively shared. I’d like these features to work.
But I have started to understand that my data is likely to be open to others, if not now then some day — and probably sooner than I expect. The net/cloud holds a good sized and growing chunk of my digital life and I would like selective sharing features (if I could handle the cognitive tax of managing them). I just do not believe it is a reasonable expectation. In a world of increasing interconnection and unifying name and search spaces, data may not be something you can keep local for long.
Tools that suggest that we can reliably segregate content and limit its diffusion are suggesting that water does not roll down hill. Those who believe that are likely to get wet.