This first appeared in Internet Monitor project's second annual report, Internet Monitor 2014: Reflections on the Digital World. Check it out for more from my Berkman colleagues on the interplay between technological platforms and policy; growing tensions between protecting personal privacy and using big data for social good; the implications of digital communications tools for public discourse and collective action; and current debates around the future of Internet governance.
What would it take to map the Internet? Not just the links, connecting the web of sites to each other, or some map of the network of networks. That’s hard enough in itself.
What if we were to map the flows of data around the Internet? Not just delivering packets, but what those packets contain, where they propagate, how they are passed on, and to what ends they are used.
Between our browser history, cookies, social platforms, sensors, brokers, and beyond, there are myriad parties with economic interests in our data. How those parties interconnect and trade in our data is, for the most part, opaque to us.
The data ecosystem mirrors the structure of the Internet. No single body has dominion or a totalizing view over the flows of information. That also means that no one body is accountable for quality or keeping track of data as it changes hands and contexts.
Data-driven companies like Facebook, Google, Acxiom, and others are building out their proprietary walled gardens of data. They are doing everything they can to control for privacy and security while also keeping control over their greatest assets. Still, they aren’t held accountable for the ads individuals purchase and target on their platforms, or for tertiary uses of data once it leaves their kingdom.
Complexity obscures causality. So many variables are fed into the algorithm and spit back out on a personalized, transient platform that no one can tell you exactly why you saw one post over another one in the feed or that retargeted ad over this one. We conjure up plausible explanations and grasp at folk theories that engineers offer up to explain their outputs.
We have given data so much authority without any of the accountability we need to have confidence in its legitimacy to govern our lives.
As everything, refrigerators and crockpots included, expand the Internet and the ecosystem of data that runs on top of it, everything will leave a data trail. Going forward we have to assume that what can be codified and digitized will become data. What matters is how that data will be used, now and in the future.
The potential harms are hard to pin down, primarily because we won’t know when they are happening. We can’t investigate discrimination that replaces pre-digital prejudice markers like race and sex with proxies correlated from behavioral data. And we run into invisible walls based on statistical assumptions that anticipate our needs but get us wrong if we fall outside the curve. It’s nearly impossible to catch these slights and even harder to develop normative stances on grounds we cannot see.
Before we can start to discuss normative judgments about the appropriate uses of data, we have to understand the extent of what is technically possible. We cannot hope to regulate the misuse of data without means to hold all interconnected parties accountable for the uses and flows of data.
We need to map these relationships and data patterns. Who are the parties involved? How are they collecting, cleansing, inferring and interpreting data? To what ends is the data being used?
Linked Data is one technical solution to this problem. Standards make data flows both machine readable and human legible. Policies that travel as metadata are another approach to distributed accountability. We can also hold some of the largest brokers and users of data to higher standards of ethics. But markets of users won’t move against these systems until we have a better map of the ecosystem.
Read more in the Berkman Center’s Internet Monitor 2014: Reflections on the Digital World.