Dada Data and the Internet of Paternalistic Things

This piece of speculative fiction exploring a possible data-driven future first appeared in Internet Monitor project's second annual report, Internet Monitor 2014: Reflections on the Digital World. Check it out for more from my Berkman colleagues on the interplay between technological platforms and policy; growing tensions between protecting personal privacy and using big data for social good; the implications of digital communications tools for public discourse and collective action; and current debates around the future of Internet governance.



My stupid refrigerator thinks I’m pregnant.

I reached for my favorite IPA, but the refrigerator wouldn’t let me take one from the biometrically authenticated alcohol bin. 

Our latest auto-delivery from peaPod included pickles, orange juice, and prenatal vitamins. We never have orange juice in the house before because I find it too acidic. What machine-learning magic produced this produce? 

And I noticed the other day that my water target had changed on my Vessyl, and I wasn’t sure why. I figured I must have just been particularly dehydrated. 

I guess I should have seen it coming. Our Fountain tracking toilet noticed when I got off hormonal birth control and got an IUD instead. But I thought our toilet data was only shared between Nest and our doctors? What tipped off our Samsung fridge? 

I got a Now notification that I was ovulating a few weeks ago. I didn’t even know it had been tracking my cycle, let alone by basal body temperature through my wearable iRing. I certainly hadn’t turned that feature on. We’re not even trying to have a baby right now. Or maybe my Aria scale picked up on some subtle change in my body fat? 

Or maybe it was ComWarner? All our appliances are hooked up through one @HomeHub. I didn’t think twice about it because it just worked—every time we upgraded the dishwasher, the thermostat. Could it be that the @HomeHub is sharing data between the toilet and our refrigerator? 

I went into our @HomeHub interface. It showed a bunch of usage graphs (we’ve been watching a “below average” amount of TV lately), but I couldn’t find anything that looked like a pregnancy notification. Where was this bogus conception data coming from? 

My iWatch pinged me. The lights in the room dimmed, and a connected aromatherapy candle lit up. The heart monitor on my bra alerted me that my heart rate and breathing was irregular, and that I should stop for some meditative breathing. I sat down on my posture-tracking floor pillow, and tried to sink in.

But I couldn’t keep my mind from wandering. Was it something in the water? Something in my Snap-Texts with Kathryn? If it was true, why hadn’t my doctor called yet? Could I actually be pregnant? 

I turned on the TVTab to distract me, but I was bombarded with sponsored ads for “What to Expect When You’re Expecting 9.0” and domain squatter sites that search for a unique baby name. 

I searched for similar incidents on the Quorums: “pregnancy Samsung refrigerator,” “pregnancy Fountain toilet.” Nothing. I really wanted to talk to someone, but I couldn’t call Google because they don’t have customer service for @HomeHub products. I tried ComWarner. After waiting for 37 minutes to speak with a representative, I was told that the he couldn’t give out any personal data correlations over the phone. What bureaucratic bullshit! 

It can’t be true. Russell has been away in Addis Ababa on business for the three weeks. And I’ve still got the IUD. We aren’t even trying yet. This would have to be a bio-correlative immaculate conception. 

I tapped Russell on his iWatch three times, our signal to call me when he is done with his meeting. I was freaking out. 

I could have really used that beer. But the fridge still wouldn’t let me take it. What if I am really pregnant? I opened up Taskr to see if could get an old fashioned birth control test delivered, but price was three times as expensive as it normally would be. I considered CVS, but I thought better of it since you can’t go in there anymore without a loyalty card. It was far, but I skipped the self-driving Uber shuttle and walked the extra mile to the place that accepts crypto, where I wouldn’t be tracked. I think. And that’s when I got the notification that my funding interview for my new project the following morning had been canceled. 


Read more in the Berkman Center’s Internet Monitor 2014: Reflections on the Digital World.

Mapping the Data Ecosystem

This first appeared in Internet Monitor project's second annual report, Internet Monitor 2014: Reflections on the Digital World. Check it out for more from my Berkman colleagues on the interplay between technological platforms and policy; growing tensions between protecting personal privacy and using big data for social good; the implications of digital communications tools for public discourse and collective action; and current debates around the future of Internet governance.


What would it take to map the Internet? Not just the links, connecting the web of sites to each other, or some map of the network of networks. That’s hard enough in itself. 

What if we were to map the flows of data around the Internet? Not just delivering packets, but what those packets contain, where they propagate, how they are passed on, and to what ends they are used. 

Between our browser history, cookies, social platforms, sensors, brokers, and beyond, there are myriad parties with economic interests in our data. How those parties interconnect and trade in our data is, for the most part, opaque to us. 

The data ecosystem mirrors the structure of the Internet. No single body has dominion or a totalizing view over the flows of information. That also means that no one body is accountable for quality or keeping track of data as it changes hands and contexts. 

Data-driven companies like Facebook, Google, Acxiom, and others are building out their proprietary walled gardens of data. They are doing everything they can to control for privacy and security while also keeping control over their greatest assets. Still, they aren’t held accountable for the ads individuals purchase and target on their platforms, or for tertiary uses of data once it leaves their kingdom. 

Complexity obscures causality. So many variables are fed into the algorithm and spit back out on a personalized, transient platform that no one can tell you exactly why you saw one post over another one in the feed or that retargeted ad over this one. We conjure up plausible explanations and grasp at folk theories that engineers offer up to explain their outputs. 

We have given data so much authority without any of the accountability we need to have confidence in its legitimacy to govern our lives. 

As everything, refrigerators and crockpots included, expand the Internet and the ecosystem of data that runs on top of it, everything will leave a data trail. Going forward we have to assume that what can be codified and digitized will become data. What matters is how that data will be used, now and in the future. 

The potential harms are hard to pin down, primarily because we won’t know when they are happening. We can’t investigate discrimination that replaces pre-digital prejudice markers like race and sex with proxies correlated from behavioral data. And we run into invisible walls based on statistical assumptions that anticipate our needs but get us wrong if we fall outside the curve. It’s nearly impossible to catch these slights and even harder to develop normative stances on grounds we cannot see. 

Before we can start to discuss normative judgments about the appropriate uses of data, we have to understand the extent of what is technically possible. We cannot hope to regulate the misuse of data without means to hold all interconnected parties accountable for the uses and flows of data.

We need to map these relationships and data patterns. Who are the parties involved? How are they collecting, cleansing, inferring and interpreting data? To what ends is the data being used? 

Linked Data is one technical solution to this problem. Standards make data flows both machine readable and human legible. Policies that travel as metadata are another approach to distributed accountability. We can also hold some of the largest brokers and users of data to higher standards of ethics. But markets of users won’t move against these systems until we have a better map of the ecosystem. 


Read more in the Berkman Center’s Internet Monitor 2014: Reflections on the Digital World.

In Good Company

I got pretty excited when people who I admire and respect cited my recent articles about data science, Facebook, and the uncanny this week. Beyond the not-so-humble brag, I'm more excited by the mounting accumulation of voices calling for accountability and ethical approaches to our data and its uses. And I was even more excited to overhear older family members at a friend's wedding this past weekend discussing the Facebook study over breakfast. I think we're starting to get somewhere.

Om Malik, who has supported some of the most extensive industry coverage of data on GigaOM, wrote this week about the Silicon Valley's collective responsibility to use their power wisely:

While many of the technologies will indeed make it easier for us to live in the future, but what about the side effects and the impacts of these technologies on our society, it’s fabric and the economy at large. It is rather irresponsible that we are not pushing back by asking tougher questions from companies that are likely to dominate our future, because if we don’t, we will fail to have a proper public discourse, and will deserve the bleak future we fear the most...Silicon Valley and the companies that control the future need to step back and become self accountable, and develop a moral imperative. My good friend and a Stanford D.School professor Reilly Brennan points out that it is all about consumer trust. Read more.

And danah boyd, who is starting up the Data & Society research institute, summed up what we've learned from the Facebook emotional contagion study echoing my point that it's not just about the Facebook study, it's about the data practices: 

This paper provided ammunition for people’s anger because it’s so hard to talk about harm in the abstract...I’m glad this study has prompted an intense debate among scholars and the public, but I fear it’s turned into a simplistic attack on Facebook over this particular study, rather than a nuanced debate over how we create meaningful ethical oversight in research and practice. The lines between research and practice are always blurred and information companies like Facebook make this increasingly salient. No one benefits by drawing lines in the sand. We need to address the problem more holistically. And, in the meantime, we need to hold companies accountable for how they manipulate people across the board, regardless of whether or not it’s couched as research. If we focus too much on this study, we’ll lose track of the broader issues at stake. Read more.

Both are great reads, and align with a lot of the things I've been exploring in my own work. I'm honored to be in such good company.

[VIDEO] Living with Data: Stories that Make Data More Personal

I had the privilege of speaking at the Berkman Lunch Series this week and talked about my ideas for telling more personal stories about our relationship to data to ground our understanding in more practical, everyday lived experience. The way I see the problem is that right now we don’t understand the causal relationship between our data and its uses in the world. My talk sets up a few examples that I’ve seen recently that both exposed and obscured what my data says about me and how it’s being used. I talk about why understanding data in our everyday lives matters more than ever, and I set up what personal stories can do to help us. I walk through a few canonical examples, and then end with a pitch for a column to tell these stories on a regular basis. Please send me your ideas, strange screengrabs, and questions—this is just the beginning of an effort to make data and its uses more legible to us. 

The video is embedded here, and I’ve also posted my crib notes with links below if you’d prefer to read or want to follow up on some of the examples.

It is also on Soundcloud, in case you’d prefer to podcast the talk. 


This talk is a reflection of a lot of the work I’ve been thinking about here as a fellow, but it’s also a kind of proposal for future work, so I’m very much interested in feedback from the braintrust here in the room and watching on the web.

The main idea is that we need more stories that ground data in personal, everyday experience. We need personal data stories make data uses intelligible and impacts personal.

I wanted to start off by talking about what I do and what I do not know about myself as other entities see me through my data.


Facebook advertising engine seems to think I like cheese boards. Even when they aren’t selling cheese, or boards, they are part of my advertisements.

But I don’t know if it is because I talked about my love of cheese boards, or if it is based on image recognition, or some combination of the two. I can’t tell if Facebook thinks I’m demographically bougie, or if it really knows I’m obsessed with cheese.

About the Data, Acxiom’s consumer portal into our data broker data tells me it thinks I am a truck owner and intending to purchase a vehicle. I am not. I’m assuming this is based on my Father’s truck registration (the last time he drove a truck was in the early 1990s).

But About the Data doesn’t tell me what Axciom thinks I’m a “Truckin’ and Stylin’” or “Outward Bound” consumer, one of the many consumer segmentation profiles that might link to that Truck data point. Acxiom, shows us the inferred demographic information of behavioral targeting, but it doesn’t show us how it is being used by its third party customers who very well could be insurance companies or loan underwriters, not just marketers.

When I start to worry about the traces of my connections to friends in my time abroad in the UK and in China, I can use Facebook’s graph search to query how many people in my network I know in China that show up in my “buddy list” as the PRISM documents.

But I don’t have any confidence that I don’t meet the threshold for confidence-based citizenship. I don’t know what it means to be a person on a “buddy list” “associated with a foreign power.” Nor do I know whether my use of VPN would contribute to my score. My algorithmically-determined citizenship is completely opaque to me.

These are just some personal encounters I’ve had recently in my daily life—from the trivial in the commercial, to the consequential in talking about my shifting sense citizenship. The concerns I raise point to an asymmetry that obscures what’s going on behind the scenes in interactions in my daily life.

The crux of the problem is that right now we don’t understand the causal relationship between our data and its uses in the world.

Joanne McNeil has described this as reading the algorithmic tea leaves—it’s a dark art. We don’t understand the how and the why of data’s uses, let alone what our data forecasts about us.

I like to think of it as a kind of uncanny valley of personalization. When we try to understand creepy ads that follow us around or are strangely personal, we can’t figure out if it’s just coarse demographics or hyper-targeted machine learning that generates the ads we see and that leaves us with this sense of the uncanny.

So while data is making our behaviors, habits, and interests more legible to firms and governments, as consumers we haven’t yet developed the critical literacies to understand what our data is saying about us and more importantly how it is shaping our experience.

The other day a medical professional said to me “I have nothing to hide. If they profile so that a terrorist doesn’t blow up the plane that I’m taking to Disney with my kids, I’m okay with that.”

But he was only talking about one use—one that he thought was justified. Disney would be tracking him with their new MagicBands when his family gets there. “I have nothing to hide,” but I don’t know what I’m hiding from.

Right now, big data is a big black box. It’s hard to develop opinions and feelings about what we think should happen with data when most of what is happening right now is obscured and opaque. The flows of data and its uses are hidden.

When I started worrying about personal data while writing about it from the CIO’s perspective, I thought we had an awareness problem. People didn’t understand that by using free services they were paying with data, as it were. I think we’ve moved past that, and Snowden has heightened awareness even further. Right now, we are primed to have a discussion about how we want our data environment to look, but we have only scratched the surface about how our data is actually being used.


I think this is a particularly important moment because we’re moving from a time where data existed about our browsing habits, and about our mobile presence,  to a time where more of the physical world is being tracked and measured and becoming data. Our cities, our cars, our homes, our bodies are all extending our data profile. Anything with a sensor becomes fodder for this larger sociotechnical system that we’re building.

We’re also transitioning from a time when we intentionally searched for things we want, and search interfaces clearly delineated paid advertisements, to an interface that anticipates our needs and gives us small bits information, in the early iterations of Google Now. Our choice architectures fall away as interfaces become more embedded and anticipatory.

We are learning to live with data, as more of our domestic life becomes subject to digital scrutiny. But the way we interpret influence in the uses of data is also about to shift dramatically.


So, my proposal is: we need stories that make data uses more intelligible and its impacts more personal. We need new tools for thinking about data’s role in our everyday lives.

We need stories to be relatable. They need to go beyond “I have nothing to hide” mentality to illustrate the ways our environments are shaped and influence us.

We need more personal stories to make the uses of data more intelligible and more practical. And we need stories that bring data back from a big data scale, back down to a human scale.

In order to have better conversations about evolving norms for appropriate uses of data, we need to make the uses of data more legible to consumers. That’s the way we’ll be able to hold governments and corporations accountable for their data practices.


I’m going to walk through a few canonical personal data stories that do the work of opening the black box and make the personal effects of data practices legible.

By now you’ve all heard of this Target example. The New York Times profiled of the algorithms that looked at purchasing patterns to identify early pregnancy indicators. It also included a story about how the pregnancy coupons reached one family in particular. The father of the household brought the back to Target, inquiring as to why they would send pregnancy-related coupons to his teenage daughter, and it turned out that she was in fact pregnant.

This story has become canonical, because it did a lot to educate us about what was going on behind the scenes in the uses of data in this advanced case, but it also made the impacts of that practice concrete by detailing the social impacts on this given family.


More recently, a Mike Seay received a direct mail envelope from OfficeMax that included “Daughter Killed in Car Crash" in the address. This failure exposed just how egregious the market segmentations could be from these brokers. This exposed the kinds of lists data brokers are keeping on us, and the sorts of information they think is relevant. How might that information be used, and more importantly, how should it be used? This story connected the personal effect of an insensitive reminder of the loss of a child in a traumatic event, and implicated OfficeMax for its use, as well as the data broker for its database categorization. We began to understand how something like this could happen, and now it’s a example of a data use failure.


This last example is from a story I published in The Atlantic. I had deliberately chosen not to update my Facebook status when Nick and I got engaged because I didn’t want to show up in the database. But then Facebook asked me how well I knew him and displayed an ad for a custom engagement ring. It turned out that it was a coincidence that the service-enhancing survey to improve the relevance of my newsfeed happened to match up with a demographically-determined ad. But the coincidence didn’t lessen the effect of feeling as though Facebook had intruded on my personal life.

And even after talking with Facebook to confirm what was going on, I still had no answer as to what factors went in to the algorithm that asked about Nick as opposed to any of my other friends as a person of interest. Was it the sheer number images we were tagged in together, our increasingly overlapping networks?

I also still don’t know if I was getting this engagement ring ad because I was a female between the ages of 18-35 without a relationship status. Or if it was because a more complex series of behaviors across the site alerted Facebook that it seemed like Nick and I were getting more serious. My Facebook story showed that even though the ad and the user survey were coincidentally displayed together, it’s effect on me was not incidental.


So what is it about personal data stories? They detail the effects of data and algorithms on our everyday lives. They aren’t about data breaches where we have no idea if we are affected or should be worried.

Data stories explain what’s going on behind the scenes. They give us more information about how these black boxes are working. And they give us a framework and vocabulary to begin to interrogate other data environments. They expose the logic of the engineers building these systems, their data science practices, the reasons for their data interventions. They detail the consequences of design decisions and power structures.

Data stories are also concrete. They happen to real people. They are not obscured behind big data rhetoric. They are grounded in individual experience. They give us a sense of what it means to be a digital person today. They describe the dynamics as our roles change as consumers, citizens, and individuals.

In my research on the Quantified Self community, I found that individuals were using numbers as story telling devices—the show and tell format is quite literally a narrative using data. These data stories are full of thick description, and leave room for discussion about the individual, their feelings, interpretation, and sense of self. Like the personal stories in Quantified Self show and tell presentations, the personal data stories I’m interested in are about identifying personal meaning, or effects on the individual through understanding the uses of data.

Personal data stories have the potential restore the subjectivity of individuals to an otherwise “objective” medium of data.

But personal data stories are hard to tell.

This is a Reddit comment (I know I shouldn’t read them) in response to my Atlantic article, and it indicates the trouble with telling personal stories, and the subtly of talking about privacy from the database, rather than privacy from other people. But it’s not just the internet trolls that make personal data stories challenging to tell.

Data stories are hard to discover. Individuals aren’t necessarily primed to be critical of these patterns. And the strange things happen when there is a coincidence or a fluke or a change in the design that exposes something interesting. These rifts reveal the seams of the system.

Personal data stories are also anecdotes. Sometimes the effects are technically repeatable, but often not. They are exceptional and so by big data standards they are not statistically significant.

Data stories also need resources to reverse engineer what’s going on. Or you need the skills to be able to sandbox and build out hypothetical digital profiles to compare and contrast outcomes. Or you need the journalistic clout to get a response from Facebook to figure out if what you see is related or intentional or not. And so in that sense these stories can be taken out of the voice of the individual affected and end up appropriated by journalists.

And it’s challenging to tell data stories with nuance. There is risk in sensationalizing the concerns, and the Target story has been criticized for that. There’s a delicate balance in highlighting these exceptional cases and grounding it in the effects on our everyday lives.

Personal data stories also risk the personal privacy of the individuals involved by heightening their profile and their plight. The is also danger of personal attacks on these stories.


But the stories are made more all the more compelling if they come from consumers. If we can answer the questions they have, we can get at the core normative concerns of an conscientious but not necessarily technically savvy individual.

Data stories will inform future design choices and policy positions. They serve to educate publics and representatives about the stakes at play. And where individuals are still not sufficiently protected, we’ll start to see where the regulatory holes lie.

I want to see more data stories because I think they change the nature of the conversation we can have as a society. They even the playing field between all interested parties, and ground digital practices in human-scale effects.

Personal data stories will help us uncover the politics, epistemologies, economies, and ecologies of the sociotechnical system for which data is becoming the primary substrate.


I think of this personal stories work fitting into a larger emerging suite of tools and practices that expose the seams of the data uses and algorithmic design of our built digital environment.

Lots of people creating technological interventions building tools to make data more legible. Tools like Immersion take your gmail metadata and by exposing it, allow people to comprehend the stories they can see in their own data.

Ben Grosser’s Demetricator is a browser plugin that hides the Facebook quantifications of likes, friends, and time, and is what he calls critical software, to reveal how Facebook structures use and possibly addiction with quantification.

Another class of interventions are personal, but more performative.

Janet Veresti presented this past weekend at Theorizing the Web on her infrastructure inversion project—hiding her pregnancy from the internet by using cash, browsing maternity websites with Tor, and asking her family members and friends not to write about her pregnancy even in private Facebook messages.

In her recent book, Dragnet Nation, Julia Angwin takes extreme measures to prevent tracking and protect her privacy over the course of a year. She used a faraday case for her mobile phone and she even created a fake identity to separate out some of her commercial online activity.

These examples are as much a performance as they are an experiment. But these performance pieces demonstrate the futility of perfect privacy as a goal. They don’t depict the practicalities of everyday life except in the ways privacy protection hampers life. In contrast, personal stories from average consumers help ground these trade offs and better inform everyday practical decisions.


My interest in personal data stories is grounded in a larger vein of technological criticism. In much the same way that cultural and film critics discuss what is important and interesting about a cultural artifact, technology critics could uncover both the artistic cultural importance of technologies as media, as well as power dynamics inherent in technologies as political artifacts. Technology criticism should explore our relationship to the firms and the governments as individuals and as societies. And so I’m advocating for a technology criticism with anthropological flavor.


So to that end, I have a pitch for you today. I want to build a column for telling personal data stories. It would look something like “The Haggler" or "The Consumerist,” but for data and algorithms. There needs to be a platform to tell personal data stories with regularity. The format would be similar—investigate into a particular case to solve a personal problem while exposing the larger systemic issue at hand. The column would be a means to surface more of these stories, explain them for an individual, describe their case and its impact on that person, and reveal what’s going on for the rest of us. Data stories will also develop our attention to notice and scrutinize when we come across something in the course of our digital lives.

I think of this as a regular column in a popular publication, largely for a lay rather than technical audience. At the very least it could be a single purpose website to collect and share data stories. So I’m open to suggestions.

I want to make a call today for more personal stories. I need your help and I’m looking for participation. Share your questions and personal encounters with data questions. Do you have screencaptures of weird ads or algorithmic flukes?

Or what are some compelling example stories in this framework that changed the way you think about data and its uses?

I’d love to get your feedback, and hear your thoughts. This is a work in progress and just getting off the ground. 

Berkman Talk—Living with Data: Stories that Make Data More Personal

My Berkman lunch talk is coming up soon! Join in person if you can make it. The talk will also be webcast live and archived on the website shortly after. 

Living with Data: Stories that Make Data More Personal
with Berkman Fellow, Sara Watson

April 29, 2014 at 12:30pm ET
Berkman Center for Internet & Society, 23 Everett St, 2nd Floor
RSVP required for those attending in person via the form below
This event will be webcast live (on this page) at 12:30pm ET.

We are becoming data. Between our mobile phones, browser history, wearable sensors, and connected devices in our homes, there’s more data about us than ever before. So how are we learning to live with all this data

Inspired by her ethnographic interview work with members of the quantified self community, Sara hopes to make these larger systemic shifts more relatable and concrete with personal narratives. This talk will share some examples of how we find clues, investigate, and reverse engineer what’s going on with our data, and call for more stories to help personalize our evolving relationship to data and the algorithms that govern it.