Legible Data

[Edited 8/20 to add my talk crib notes and video below.]

I had the honor of speaking in a session about a Crash Course in Digital Literacy at The Conference in Malmö this week. It got written up in Wired UK, which provides a really good summary of my talk.

I argued that in order to develop digital literacy, we need to take the first step towards making data legible. To do so, I outlined how platforms, plugins, and personal interventions are allowing us to read the data and algorithms around us, and are teaching us how to interrogate our digital environments. 

The session was intended to be practical and grounded in applied tools, so I put together a brief list of resources for those listening in and interested in following up. My crib notes and the video are below.  Also, it's totally worth checking out my fellow session presenters Ingrid Burrington on the infrastructure of the internet (with gifs!) and Mattan Griffel on learning to code. There's also a joint Q&A from our session. And there were a lot of other great talks throughout the conference that are archived and worth checking out.

I've had a great time at The Conference, and I'm so thankful for the invitation and to the conference organizers and wonderful people I've met in Malmö this week. And I'm so impressed with how quickly their team got this slick video up. They are machines!

LEGIBLE DATA: TOWARDS DIGITAL LITERACY

We’re here to talk about digital literacy today, which is really about developing skills for reading and interpreting the digital world around us. Increasingly, our lives are made up of data, and processed by algorithms. Sensors and computing cycles are turning our activities and behaviors into data.

The most basic unit of the digital is rendered as ones and zeroes; it’s how computers read data. But it usually not human legible. The way we encounter most of the algorithms and data in our everyday lives tends to be obscured.

The idea behind this session, digital literacy, implies a certain kind of comprehension. But what is digital literacy if the data is hidden? Does anyone in the here recognize what this says?

Translated from binary, these ones and zeros turn into the most fundamental elements of human literacy—the alphabet. But I had to run them through a processor to make them human-legible. Maybe a few of you recognized the binary, but even those who code don’t spend a lot of time parsing ones and zeros so close to the metal.

People often say “I have nothing to hide.” But I’m afraid that’s only because we’re not familiar enough with the ways data is being used around us. Technology is digitally codifying our behaviors for those who have the power to collect and process the data, often leaving us in the dark outside the black box.

In order to work towards digital literacy, I argue that the data needs to be clear enough for us to read, first. We need to learn our digital ABC’s before we can move on to words and phrases and complex stories. Only then can we work on developing new digital literacies to understand what the data is telling us, and how others see us.

So today, I’m going to walk through how internet platforms, browser plugins, and individual people are making data and algorithms more legible for human readability.

As this is session is intended to be practical and grounded, I’ve put together a bunch of resources and links, which are listed above.

PLATFORMS

The places with the greatest consolidation of our data also have the most power over our digital experiences. Some of them have begun to acknowledge their responsibility to consumers by introducing new transparency tools. Google, Facebook, and even data brokers like Acxiom have recently developed dashboards that reflect back to us the data they have collected about our behaviors and interests.

Google offers a detailed list of all the things it thinks we might be interested based on our search and browser history. We can take a look by going to the Google Ads Setting page in our profile. 

When I looked at my own profile recently, I saw Google thought I was interested in home furnishings, and I opted to removed it from my profile. Of course keeping track of these settings is time consuming for the average user, but at least Google offers an opportunity to explore what our accumulated online habits say about us.

This summer, Facebook introduced a new interactive feature that allows us click on any ad in our feed. I have the option to tell Facebook that I don’t want to see a particular ad. And more importantly, it also gives me a new explanation about the profile information that the ad might be based on. In this example, an ad for home theater equipment is targeted to people who express interest in television shows on Facebook.

Facebook is taking important steps to draw a direct line between the type of ads we see and the personal data on which they are based. But Facebook reminds us that this is only “one of the reasons” we see an ad. This explanation risks oversimplifying the complexity that goes into displaying any given ad.

Acxiom is one of the largest data brokers in the world, aggregating information from online behaviors, surveys, and even public records. They have responded to consumer protection concerns about their powerful database by developing a personal portal view into our data profiles. Aboutthedata.com shows us everything from household purchasing history, to education, to inferred marital status. When I looked at my own profile, for some reason they associated me with my parent’s driving registration records, so they thought I might be interested in purchasing a truck soon. I have the opportunity to correct that data point through the portal.

But aboutthedata.com doesn't tell me anything about how Acxiom classifies me when it bundles me with other consumers to sell me to potential advertisers. I don't know how that truck detail affects my consumer segmentation. And I certainly don't get to see how Acxiom's data broker customers intend to use my data. There's no link into the larger ecosystem to follow where my data trail ends up.

Each of these internet platforms for data has taken important steps towards becoming more transparent and making data at least partially legible to consumers. But in each of these cases, the data they provide offers only a partial view. We get to see what the data is, but not necessarily how it’s being used. 

We are also confronted with a catch 22. We're invited to correct for faulty assumptions and correlations by changing our profiles, and updating our details to more relevantly reflect our intentions. But that also requires offering up more data to these companies. So before I am convinced of doing that, I think we need more assurances about how the data is being used. I want to be able to follow the impacts of those alterations.

PLUGINS

Going beyond what internet companies tell us about our profiles, some developers and researchers have started to build simple tools and interfaces to make data more legible. Sometimes, all it takes is to install a plugin to the browser, or grant API access to an application to make hidden data visible to us. These are some of the best tools we have as users to begin to see our data in the same way that companies, third parties, and governments might see our data.

After the Snowden revelations introduced us all to the concept of metadata, we started to wonder how much meaning could be uncovered from information about our communications habits. A group of MIT researchers released their tool, Immersion, which taps into our email metadata from Gmail and a few other providers. Interactive visualizations allow us to examine our network and personal timeline.

When I explored my own history, Immersion resurfaced a network of old roommates, and I was reminded of the history of important relationships as I moved cities and changed jobs. Seeing so much personal narrative in the interactive experience makes it all the more clear how meaningful this metadata might be to someone else.

Each time we visit a website, cookies are dropped in our browser that report back our activity to advertisers and other data broker networks. Ghostery is a browser plugin that shows us just how many third parties are listening in as a website loads. In this example, cookies from seven different companies were dropped just by visiting the front page of the New York Times. This plugin takes what’s hidden behind the browser and makes it visible. Ghostery allows us to start questioning the business practices of these companies like ScoreCard and WebTrends. Every time a page loads, we are made more conscious of the broader data ecosystem.

These cookies follow us everywhere we go on the web, and they are part of a larger network. Collusion is another browser plugin that displays the connections between the websites and the third parties listening in. By mapping out the network, we get to see where the data about our behaviors intersect and possibly influence each other.

With each of these plugins, I have to trust that their creators won’t misuse my data after I grant them access. We may learn from making our data legible in new ways, but it also introduces a new point of exposure of our digital activities.

PERSONAL INTERVENTIONS

We don’t always need special tools like plugins to begin to experiment with our digital environments. We can start by manipulating the inputs and watching the outputs of our data. I think of these personal database interventions. These are small scale experiments that expose the seams of the algorithm. At the more advanced end of the spectrum, some researchers and journalists are pulling stunts with their personal data, and in the process they teach us how to critically interrogate the system for ourselves.

Are we human readable, or machine-readable? Most of our friends know who we are on Facebook. So it’s easy to play with sneaky ways to throw advertisers off my scent by introducing false information into my digital profiles. Think of it as a digital version of the game of two truths and a lie. I have seen friends declare themselves the opposite sex, visit random sites that don’t match their hobbies, or plant humorous interests in their profiles to see how these planted data points affect their feed or ads. It is easier to catch the things that trickle down from a little data lie because it grabs our attention more than things that subtly match our more authentic history and behavior. Still, it's hard to know where the harms might be in playing with our data profiles like this. 

Mat Honan, a writer for Wired, recently took his Facebook experimentation a step further. He wanted to see what would happen if he liked literally everything that came through his Facebook feed for 48 hours. He wrote, “After checking in and liking a bunch of stuff over the course of an hour, there were no human beings in my feed anymore.” When his friends started to message, worried that his account had been hacked, he also discovered the network impacts of his activity on his friends’ feeds. Honan’s experiment essentially broke the algorithm, rendering Facebook almost unusable. This scale of intervention isn’t practical for most; it was a stunt. But we can all play with being more conscious of what we choose to like and don’t like to see how it affects our own feeds.

Knowing how important key life events are to advertisers, Janet Vertesi wanted to see if she could successfully hide her pregnancy from from the internet. In a talk at Theorizing the Web this spring, she detailed her efforts to mask any behavior that suggested the coming big change in her life, using everything from Tor to mask her browsing history, to paying cash for gift cards to avoid using her credit cards. She describes this project as an infrastructural inversion. It’s another extreme case, but it shows us just how far we might have to go to address the extent of our digital exposure.

From platforms, plugins, and personal data interventions, we are just beginning to scratch the surface. This serves as a brief introduction to the tools and techniques we have available to us to make data legible. Platforms are starting to change their posture towards us and taking our concerns seriously. Plugins are starting to help us explore the hidden ecosystem of data. And even at a small scale, we’re individually beginning to intervene in our own digital profiles and experiences, and learning about the systems along the way.

LEGIBILITY -> LITERACY -> INTERPRETATION

We are learning to become fluent in digital. Legibility is the first step towards clearly seeing the data and algorithms. We can start to build our vocabulary, sounding out the letters and phrases as we go. But the goal of learning to read isn’t enough.

Eventually we will need to develop even more advanced skills of interpretation. We can only comprehension and understand once we develop these more advanced skills to interpret the text of our digital lives.

COMPUTATIONAL THINKING

Moving on from reading, we can learn the form by trying to write, too. Learning to code is a step towards claiming control over the imbalance of power in the dynamic with data. Some of us here will develop these skills in statistics, mapping networks, and manipulating APIs (and I believe Mattan will be talking more about that later this session.) 

Just because we can read all read doesn’t mean that we’re all cut out to write the next great digital novel. We won’t all learn to read binary, and we may not all need to learn javascript or python either. But we can develop our skills in computational thinking. As a recent Mother Jones piece argues: “The greatest contribution the young programmers bring isn't the software they write. It's the way they think.” 

Computational thinking empowers us to interrogate the data and algorithms that govern our lives. By poking and prodding data inputs and outputs, we are learning to reverse engineer these systems, or at the very least we’re practicing how to form hypotheses that speculate how data is being used. The more possible theories that we explore, the closer we get to holding others accountable for the uses of our data.

CRITICAL DIGITAL LITERACY

I want to leave you all with a few priorities we can focus on to get us closer to developing critical digital literacy.

First, we need to demand more of the digital platforms that manage our data. We have to keep asking Facebook, Google, and everyone else to provide the tools to tweak our personalized feeds and experiment with our own digital experiences. Right now they are offering us only a meager vocabulary list to learn from. We need them to put the pieces of our data story together to form a narrative.

And we still need more tools to help inform our choices about our digital engagement. Designers and engineers in the audience, I encourage you all to find new ways of incorporating digital legibility and literacy into the things that you build. Open up the black box and allow your users to play with features and filters. Make the causal connections more clear. If you have the coding skills, build tools that empower others to intervene and interpret our own data, as well.

And as individuals, we can all become a little more curious. Think critically about how that next click might influence our future experience. Question the underlying premise of the default settings, and the business models that they support. We don’t have to take everything for granted; we can develop digital literacy just by being curious.

Thank you!