I had the privilege of speaking at the Berkman Lunch Series this week and talked about my ideas for telling more personal stories about our relationship to data to ground our understanding in more practical, everyday lived experience. The way I see the problem is that right now we don’t understand the causal relationship between our data and its uses in the world. My talk sets up a few examples that I’ve seen recently that both exposed and obscured what my data says about me and how it’s being used. I talk about why understanding data in our everyday lives matters more than ever, and I set up what personal stories can do to help us. I walk through a few canonical examples, and then end with a pitch for a column to tell these stories on a regular basis. Please send me your ideas, strange screengrabs, and questions—this is just the beginning of an effort to make data and its uses more legible to us.
The video is embedded here, and I’ve also posted my crib notes with links below if you’d prefer to read or want to follow up on some of the examples.
It is also on Soundcloud, in case you’d prefer to podcast the talk.
This talk is a reflection of a lot of the work I’ve been thinking about here as a fellow, but it’s also a kind of proposal for future work, so I’m very much interested in feedback from the braintrust here in the room and watching on the web.
The main idea is that we need more stories that ground data in personal, everyday experience. We need personal data stories make data uses intelligible and impacts personal.
I wanted to start off by talking about what I do and what I do not know about myself as other entities see me through my data.
Facebook advertising engine seems to think I like cheese boards. Even when they aren’t selling cheese, or boards, they are part of my advertisements.
But I don’t know if it is because I talked about my love of cheese boards, or if it is based on image recognition, or some combination of the two. I can’t tell if Facebook thinks I’m demographically bougie, or if it really knows I’m obsessed with cheese.
About the Data, Acxiom’s consumer portal into our data broker data tells me it thinks I am a truck owner and intending to purchase a vehicle. I am not. I’m assuming this is based on my Father’s truck registration (the last time he drove a truck was in the early 1990s).
But About the Data doesn’t tell me what Axciom thinks I’m a “Truckin’ and Stylin’” or “Outward Bound” consumer, one of the many consumer segmentation profiles that might link to that Truck data point. Acxiom, shows us the inferred demographic information of behavioral targeting, but it doesn’t show us how it is being used by its third party customers who very well could be insurance companies or loan underwriters, not just marketers.
When I start to worry about the traces of my connections to friends in my time abroad in the UK and in China, I can use Facebook’s graph search to query how many people in my network I know in China that show up in my “buddy list” as the PRISM documents.
But I don’t have any confidence that I don’t meet the threshold for confidence-based citizenship. I don’t know what it means to be a person on a “buddy list” “associated with a foreign power.” Nor do I know whether my use of VPN would contribute to my score. My algorithmically-determined citizenship is completely opaque to me.
These are just some personal encounters I’ve had recently in my daily life—from the trivial in the commercial, to the consequential in talking about my shifting sense citizenship. The concerns I raise point to an asymmetry that obscures what’s going on behind the scenes in interactions in my daily life.
The crux of the problem is that right now we don’t understand the causal relationship between our data and its uses in the world.
Joanne McNeil has described this as reading the algorithmic tea leaves—it’s a dark art. We don’t understand the how and the why of data’s uses, let alone what our data forecasts about us.
I like to think of it as a kind of uncanny valley of personalization. When we try to understand creepy ads that follow us around or are strangely personal, we can’t figure out if it’s just coarse demographics or hyper-targeted machine learning that generates the ads we see and that leaves us with this sense of the uncanny.
So while data is making our behaviors, habits, and interests more legible to firms and governments, as consumers we haven’t yet developed the critical literacies to understand what our data is saying about us and more importantly how it is shaping our experience.
The other day a medical professional said to me “I have nothing to hide. If they profile so that a terrorist doesn’t blow up the plane that I’m taking to Disney with my kids, I’m okay with that.”
But he was only talking about one use—one that he thought was justified. Disney would be tracking him with their new MagicBands when his family gets there. “I have nothing to hide,” but I don’t know what I’m hiding from.
Right now, big data is a big black box. It’s hard to develop opinions and feelings about what we think should happen with data when most of what is happening right now is obscured and opaque. The flows of data and its uses are hidden.
When I started worrying about personal data while writing about it from the CIO’s perspective, I thought we had an awareness problem. People didn’t understand that by using free services they were paying with data, as it were. I think we’ve moved past that, and Snowden has heightened awareness even further. Right now, we are primed to have a discussion about how we want our data environment to look, but we have only scratched the surface about how our data is actually being used.
I think this is a particularly important moment because we’re moving from a time where data existed about our browsing habits, and about our mobile presence, to a time where more of the physical world is being tracked and measured and becoming data. Our cities, our cars, our homes, our bodies are all extending our data profile. Anything with a sensor becomes fodder for this larger sociotechnical system that we’re building.
We’re also transitioning from a time when we intentionally searched for things we want, and search interfaces clearly delineated paid advertisements, to an interface that anticipates our needs and gives us small bits information, in the early iterations of Google Now. Our choice architectures fall away as interfaces become more embedded and anticipatory.
We are learning to live with data, as more of our domestic life becomes subject to digital scrutiny. But the way we interpret influence in the uses of data is also about to shift dramatically.
PERSONAL DATA STORIES
So, my proposal is: we need stories that make data uses more intelligible and its impacts more personal. We need new tools for thinking about data’s role in our everyday lives.
We need stories to be relatable. They need to go beyond “I have nothing to hide” mentality to illustrate the ways our environments are shaped and influence us.
We need more personal stories to make the uses of data more intelligible and more practical. And we need stories that bring data back from a big data scale, back down to a human scale.
In order to have better conversations about evolving norms for appropriate uses of data, we need to make the uses of data more legible to consumers. That’s the way we’ll be able to hold governments and corporations accountable for their data practices.
I’m going to walk through a few canonical personal data stories that do the work of opening the black box and make the personal effects of data practices legible.
By now you’ve all heard of this Target example. The New York Times profiled of the algorithms that looked at purchasing patterns to identify early pregnancy indicators. It also included a story about how the pregnancy coupons reached one family in particular. The father of the household brought the back to Target, inquiring as to why they would send pregnancy-related coupons to his teenage daughter, and it turned out that she was in fact pregnant.
It’s amazing how widely known the Target/pregnancy story is. It comes up in every meeting on data. Go @nytimes— danah boyd (@zephoria)January 23, 2014
This story has become canonical, because it did a lot to educate us about what was going on behind the scenes in the uses of data in this advanced case, but it also made the impacts of that practice concrete by detailing the social impacts on this given family.
More recently, a Mike Seay received a direct mail envelope from OfficeMax that included “Daughter Killed in Car Crash" in the address. This failure exposed just how egregious the market segmentations could be from these brokers. This exposed the kinds of lists data brokers are keeping on us, and the sorts of information they think is relevant. How might that information be used, and more importantly, how should it be used? This story connected the personal effect of an insensitive reminder of the loss of a child in a traumatic event, and implicated OfficeMax for its use, as well as the data broker for its database categorization. We began to understand how something like this could happen, and now it’s a example of a data use failure.
This last example is from a story I published in The Atlantic. I had deliberately chosen not to update my Facebook status when Nick and I got engaged because I didn’t want to show up in the database. But then Facebook asked me how well I knew him and displayed an ad for a custom engagement ring. It turned out that it was a coincidence that the service-enhancing survey to improve the relevance of my newsfeed happened to match up with a demographically-determined ad. But the coincidence didn’t lessen the effect of feeling as though Facebook had intruded on my personal life.
And even after talking with Facebook to confirm what was going on, I still had no answer as to what factors went in to the algorithm that asked about Nick as opposed to any of my other friends as a person of interest. Was it the sheer number images we were tagged in together, our increasingly overlapping networks?
I also still don’t know if I was getting this engagement ring ad because I was a female between the ages of 18-35 without a relationship status. Or if it was because a more complex series of behaviors across the site alerted Facebook that it seemed like Nick and I were getting more serious. My Facebook story showed that even though the ad and the user survey were coincidentally displayed together, it’s effect on me was not incidental.
WHAT DATA STORIES DO
So what is it about personal data stories? They detail the effects of data and algorithms on our everyday lives. They aren’t about data breaches where we have no idea if we are affected or should be worried.
Data stories explain what’s going on behind the scenes. They give us more information about how these black boxes are working. And they give us a framework and vocabulary to begin to interrogate other data environments. They expose the logic of the engineers building these systems, their data science practices, the reasons for their data interventions. They detail the consequences of design decisions and power structures.
Data stories are also concrete. They happen to real people. They are not obscured behind big data rhetoric. They are grounded in individual experience. They give us a sense of what it means to be a digital person today. They describe the dynamics as our roles change as consumers, citizens, and individuals.
In my research on the Quantified Self community, I found that individuals were using numbers as story telling devices—the show and tell format is quite literally a narrative using data. These data stories are full of thick description, and leave room for discussion about the individual, their feelings, interpretation, and sense of self. Like the personal stories in Quantified Self show and tell presentations, the personal data stories I’m interested in are about identifying personal meaning, or effects on the individual through understanding the uses of data.
Personal data stories have the potential restore the subjectivity of individuals to an otherwise “objective” medium of data.
But personal data stories are hard to tell.
This is a Reddit comment (I know I shouldn’t read them) in response to my Atlantic article, and it indicates the trouble with telling personal stories, and the subtly of talking about privacy from the database, rather than privacy from other people. But it’s not just the internet trolls that make personal data stories challenging to tell.
Data stories are hard to discover. Individuals aren’t necessarily primed to be critical of these patterns. And the strange things happen when there is a coincidence or a fluke or a change in the design that exposes something interesting. These rifts reveal the seams of the system.
Personal data stories are also anecdotes. Sometimes the effects are technically repeatable, but often not. They are exceptional and so by big data standards they are not statistically significant.
Data stories also need resources to reverse engineer what’s going on. Or you need the skills to be able to sandbox and build out hypothetical digital profiles to compare and contrast outcomes. Or you need the journalistic clout to get a response from Facebook to figure out if what you see is related or intentional or not. And so in that sense these stories can be taken out of the voice of the individual affected and end up appropriated by journalists.
And it’s challenging to tell data stories with nuance. There is risk in sensationalizing the concerns, and the Target story has been criticized for that. There’s a delicate balance in highlighting these exceptional cases and grounding it in the effects on our everyday lives.
Personal data stories also risk the personal privacy of the individuals involved by heightening their profile and their plight. The is also danger of personal attacks on these stories.
But the stories are made more all the more compelling if they come from consumers. If we can answer the questions they have, we can get at the core normative concerns of an conscientious but not necessarily technically savvy individual.
Data stories will inform future design choices and policy positions. They serve to educate publics and representatives about the stakes at play. And where individuals are still not sufficiently protected, we’ll start to see where the regulatory holes lie.
I want to see more data stories because I think they change the nature of the conversation we can have as a society. They even the playing field between all interested parties, and ground digital practices in human-scale effects.
Personal data stories will help us uncover the politics, epistemologies, economies, and ecologies of the sociotechnical system for which data is becoming the primary substrate.
I think of this personal stories work fitting into a larger emerging suite of tools and practices that expose the seams of the data uses and algorithmic design of our built digital environment.
Lots of people creating technological interventions building tools to make data more legible. Tools like Immersion take your gmail metadata and by exposing it, allow people to comprehend the stories they can see in their own data.
Ben Grosser’s Demetricator is a browser plugin that hides the Facebook quantifications of likes, friends, and time, and is what he calls critical software, to reveal how Facebook structures use and possibly addiction with quantification.
Another class of interventions are personal, but more performative.
Janet Veresti presented this past weekend at Theorizing the Web on her infrastructure inversion project—hiding her pregnancy from the internet by using cash, browsing maternity websites with Tor, and asking her family members and friends not to write about her pregnancy even in private Facebook messages.
In her recent book, Dragnet Nation, Julia Angwin takes extreme measures to prevent tracking and protect her privacy over the course of a year. She used a faraday case for her mobile phone and she even created a fake identity to separate out some of her commercial online activity.
These examples are as much a performance as they are an experiment. But these performance pieces demonstrate the futility of perfect privacy as a goal. They don’t depict the practicalities of everyday life except in the ways privacy protection hampers life. In contrast, personal stories from average consumers help ground these trade offs and better inform everyday practical decisions.
My interest in personal data stories is grounded in a larger vein of technological criticism. In much the same way that cultural and film critics discuss what is important and interesting about a cultural artifact, technology critics could uncover both the artistic cultural importance of technologies as media, as well as power dynamics inherent in technologies as political artifacts. Technology criticism should explore our relationship to the firms and the governments as individuals and as societies. And so I’m advocating for a technology criticism with anthropological flavor.
PITCH FOR A COLUMN
So to that end, I have a pitch for you today. I want to build a column for telling personal data stories. It would look something like “The Haggler" or "The Consumerist,” but for data and algorithms. There needs to be a platform to tell personal data stories with regularity. The format would be similar—investigate into a particular case to solve a personal problem while exposing the larger systemic issue at hand. The column would be a means to surface more of these stories, explain them for an individual, describe their case and its impact on that person, and reveal what’s going on for the rest of us. Data stories will also develop our attention to notice and scrutinize when we come across something in the course of our digital lives.
I think of this as a regular column in a popular publication, largely for a lay rather than technical audience. At the very least it could be a single purpose website to collect and share data stories. So I’m open to suggestions.
I want to make a call today for more personal stories. I need your help and I’m looking for participation. Share your questions and personal encounters with data questions. Do you have screencaptures of weird ads or algorithmic flukes?
Or what are some compelling example stories in this framework that changed the way you think about data and its uses?
I’d love to get your feedback, and hear your thoughts. This is a work in progress and just getting off the ground.