The Myth of Selective Sharing: Why all bits will eventually be public (or be destroyed)

One Way

Bits exist along a gradient from private to public.  But in practice they only move in one direction.

Thus, there are two destinies for information: public or oblivion.

Information wants to be copied.

This is not the same as information wanting to be free (or expensive), or information wanting *you* to be free.  Information probably prefers to be free because it may increase the rate at which it is copied, not because it is inherently liberating to the user.  In fact, the “free” quality of some information is probably not liberating at all.  Copying and liberty are orthogonal.

Information diffuses over time: access rights to information can expand over time, but only rarely (ever?) does data become less available, and once available publicly, information is almost never entirely private again.

With enough copies on enough devices, information becomes essentially public. The state of being public may come in degrees, some things are more public than others.  Much information is public in principle but enjoys security by obscurity. Obscurity is eroded by increasing availability of computing resources that make collection and machine analysis affordable at large scales.  The banality of data is no protection.  “No one cares what I think/do/say/click” is not a valid assumption.  In aggregate the banal is data and fuel to many business models.  Maybe no one *cares* what you tweet, click, buy or search for, but many businesses make it their business to aggregate these scattered faint signals and build detailed profiles to drive commerce and customized views of data.

Some information is destroyed, never to be recovered.  This is the only way information can avoid eventually (potentially) becoming public. But less and less data now meets this fate.  Delete is a declining feature of many systems.

Information that is not public and has not yet been destroyed is just waiting to change to either state.

Despite security systems, many private bits are eventually exposed by people passing material to someone else who then accidentally makes them public, or they do so unintentionally themselves by leaving files in publicly accessible locations that are visited by search engine spiders and other web crawlers.  Even professionally managed private data repositories are subject to subsequent distribution, infiltration or error. Data spills are becoming more common. Billions of records are hemorrhaged  into the public regularly.  If well funded organizations cannot secure their information, the rest of us should take note.

It may not be possible for big organizations or any organization to secure their networks, or even do so sufficiently effectively to give users a practical period of privacy, however short.  Eventually private bits, even when encrypted (no matter how well), become public because the march of computing power makes their encryption increasingly trivial to break and their exchange over networks (no mater how well secured) is subject to leaking, intentional and otherwise.  Private bits may only have a “half-life” during which they retain their non-public existence.  The length of this half-life may itself be getting shorter.   Mary Branscome suggests that there could be a physical law in operation: the natural entropy of access control lists?

All bits that persist are destined to be public, and once public never to be private again. Unless they are destroyed.

I argue that the only bits that you cannot find are the ones you need right now. The only bits you cannot get rid of are the ones that are most embarrassing to you right now.  Just because you cannot find the bits you want does not mean that no one else can find those bits.

All your bits are belong to us.

This issue is getting more important as we are invited to use systems that promise selective sharing of data and other tools generate ever more data to potentially share.  Anything that puts your bits into the cloud promises selective sharing.  I believe and hope my much beloved Dropbox account is separate from all the others, except for the one’s I chose to share with. And I think it is, expect for that glitch they had, the details of which elude me (but I think we’re good now, and I so depend on Dropbox I do not know what I would do without it). But all these walls are just made out of a few lines of business logic and an Access Control List. ACLs rule our access to digital objects with an iron fist until they don’t for the many human and technical reasons mentioned.  Like most human infrastructures these selective sharing mechanisms are subject to failure and attack.

Now new sources of data captured from the details of everyday life by sensors and  services are increasingly recorded by external systems and by people themselves, generating new streams of archival material that is richer than all but the most obsessively observed biographies.

Many organizations are adopting social media and creating data sets that can map their internal social network structure as an accidental by-product of their communication practices.  Studying these data sets is a focus of growing interest.  Research projects like SenseCam are now becoming products and existing services like MingleSticks, Poken, FourSquare, and Google Latitude already deliver many of these features. Devices like iPhone and Android phones are weaving location information into every application.

Some steps are still in progress: when my phone notices your phone a new set of mobile social software applications become possible as whole populations capture data about other people as they beacon their identities to one another. Additional sensors will collect ever more medical data with the intent of improving our health and safety, as early adopters in the “Quantified Self” movement make clear.

But the  consequences of data diffusion are becoming difficult to predict.  Social media systems are being linked to one another to enable cascades of events to be triggered from a single message as status updates are passed among Facebook, LinkedIn, Twitter, and blogs.  Tools now automatically aggregate the results of searches and post articles that themselves may trigger other events.  Taking a photo or updating a status message can now set off a series of unpredictable events.

Add potential improvements in audio and facial recognition and a new world of continuous observation and publication emerges.  Some benefits, like those displayed by the Google Flu tracking system, illustrate the potential for insight from aggregated sensor data.  More exploitative applications are also likely.

The result will be lives that are more publicly displayed than ever before.  The collapse of roles (“lowest common denominator culture”) described by Bernie Hogan (listen starting in about 40 minutes – but the entire talk is good and worth a listen) as described by the sociologist Erving Goffman may be one consequence: we are interacting with everyone when we interact with anyone.  Secret shared meanings may still be possible — but selectively shared bits are not, at least not very reliably so in the short term and almost certainly not in the medium term.

Therefore, all services that promote the idea of “selective sharing” are selling a myth.  The more you trust that information you generate can be contained, the more potential there is for an “explosive decompression” as data intended for an individual or a small group becomes suddenly available to a large group or a complete population. Private bits are in a state of high potential energy, always poised to become public.

Engineering is the science, art and practice of containing and directing  forces. Information system engineers might be up to the challenge of delivering selective sharing.  And when combined with law, regulation and social practices, technology could make selective sharing real the way that engineers manage the flow of powerful but dangerous flows of high pressure steam through power plants.  However, recently even high pressure steam engineers working with nuclear fuels have faced some very bad failure conditions beyond their predicted scope.  Information technologists may face analogous issues when managing high pressure containers of selectively shared information.

My policy is not to give up all forms of privacy, I still keep my email and other data behind passwords that I do not (knowingly) share.  I share lots of pictures on flickr but not all of them are public.  I would prefer to keep lots of financial, medical, and personal stuff selectively shared.  I’d like these features to work.

But I have started to understand that my data is likely to be open to others, if not now then some day — and probably sooner than I expect. The net/cloud  holds a good sized and growing  chunk of my digital life and I would like selective sharing features (if I could handle the cognitive tax of managing them).  I just do not believe it is a reasonable expectation.  In a world of increasing interconnection and unifying name and search spaces, data may not be something you can keep local for long.

Tools that suggest that we can reliably segregate content and limit its diffusion are suggesting that water does not roll down hill.  Those who believe that are likely to get wet.

11 thoughts on “The Myth of Selective Sharing: Why all bits will eventually be public (or be destroyed)

  1. Besides data that is made public or destroyed, there is data that is constantly being updated. Old snapshots of that data will never be accessible unless someone created a snapshot of them … and because modifications are so numerous and frequent, it is hard to snapshot data at all interesting points.

    If I want to see, for example, what location a business was at five years ago, the easiest way to do that is to find a 5-year-old (paper) phone book, not to search the Web. Web sites that provide information are almost entirely oriented to providing current information. While they may provide aggregate data from the past, they do not provide any way to retrieve an individual data item from any point in the past.

  2. Old snapshots of data are available to some, so maybe not public, but still not private.

    Examples: Internet Archive generates a public set of time based snapshots of the web:

    Google and others certainly retain time stamped views of the web – the web history feature stretches back for me to 2006 – See:

    Of course, the United States Library of Congress has its own copy of a selected catalog of public tweets, making them more public:

  3. i am not in disagreement with the fundamental point of this post, that all bits can potentially make their way to potentially anyone. however, this MUST be couched in the reality that the vast, vast majority of data remains obscure and unnoticed. in fact, it is the very process that makes all data public that creates the massive data smog that makes each individual data point largely hidden. as baudrillard might say, if everything is public, than nothing is public (though, i wouldn’t go that far).

    i very recently wrote as essay warning of just this one-sided view of the “everything is public now” articles that are coming by the day recently. i think it sensationalizes the risk of sharing information online, and that this sensationalism furthers the stigma of digital imperfections (a stigma that we know more greatly effects those who are not white males). read the post here:

  4. I suggest that obscurity and scale are no longer meaningful obstacles. The web is big, but not that big. Capturing lots of data is no longer as capital intensive or demanding as it was. There are lots of people copying large amounts of social media for all sorts of purposes.

    I argue not that “everything is public now”, rather that: “(almost) everything will be more public soon”. This is simply because information tends to become more public over time and rarely becomes inaccessible once widely accessible.

  5. Mark, Good article.
    Indeed sounds like second law of thermodynamics for bits. Even though they are logical entities, they obey physical properties. Who would have thunk it?
    Still, you rely on DropBox, Google Docs, and host all your email online, and many other online services whom you sort-of trust with your data (in near future, your medical records) to really store data you would otherwise store on your computer (or in your home safe) in a way that only you and those you authorize could have access to such data. Right?
    Yes, human engineering is what protects those systems (implemented by computer/software/network systems designed by imperfect humans). Nevertheless, the exception is that this data becomes public – those are notable and terrifying exceptions, but the exceptions nonetheless.
    One way to think about it is the risk of flying… we all do it every day despite the risk that some of us might fall out of the sky. It might be that our really bits are protected private on these online services with as much (or even better) certainty than our safety in the sky. It’s like the irony that more people die from accidents on the way to/from an airport than by the actual flights they take. Yet people would fear the flight more than the drive.

    So perhaps we are overstating things? (as I suggest on my blog post: I think not. It’s important public knowledge of where the risks are – and to learn how to minimize, not eliminate, these risks. The more people understand the risks, the flow of data, the consequences, the more they can reduce their risk in this department.

    Nice post.

Comments are closed.