Things Facebook Thinks I Care About, Ranked

20. Cats
19. Millennials
18. Adventure
17. Fatherhood
16. Renminbi
15. Cloud computing
14. Orange (fruit)
13. Gratitude
12. Bag
11. Fluid dynamics
10. Edible mushroom
9. Laser
8. Company
7. Pressure
6. Cervical vertebrae
5. Self-esteem
4. Life
3. Water
2. Year
1. Human skin color

Sourced from Facebook Ad Preferences. This post is also published on Medium.

Big Data, So What?

I got the chance to interview friend and fellow Bruce Schneier for Radio Berkman 218: The Threats and Tradeoffs of Big Data, talking about his latest book Data and Goliath: The Hidden Battles to Capture Your Data and Control Your World. I had the pleasure of seeing Bruce walking around Berkman with his book-in-process binder at all times over the last year and got a chance to read some drafts, so focused on some of the nuts and bolts of the book writing process, his audience, and his aims for impact. I highly recommend both Bruce's book and the rest of the Radio Berkman podcast.

Big Data with Human Characteristics

Due to sudden illness this past week, I am not able to travel to Beijing to attend the Tencent Internet and Society Institute Summit on Big Data on July 25, 2014. I'm grateful for the invitation from the Oxford Internet Institute, Renmin University, and Tencent and I am sorry to miss what promises to be a fascinating event. I recorded this brief video to share my work and thoughts on "Big Data with Human Characteristics" and the importance of maintaining the subjectivity of individuals in when we look at humanity through an otherwise "objective" big data lens. I'm including the full text of the script below. And huge thanks to my husband, Nick for helping me with the Chinese. It's been a while!


I am thankful still to be able to be able to join you by video today, and I wanted to share some of my work on big data.

Big data is becoming a dominant paradigm for making sense of the world around us. It promises novel insights and knowledge at scale. But the power dynamics of big data privilege those with the consolidated information and with the tools to analyze and interpret at that scale. This power has the potential to go unchecked and unquestioned because of its reliance on the authority and objectivity associated with "data" and because of the black box processes that obscure big data practices.

Of course, we came together today to discuss the potential power of big data because we are interested in what it might tell us about humanity. To be sure, there is great potential. But there is also something about operating at this scale that makes us susceptible to forgetting the individuals that collectively make up big data sets. We risk missing the trees for the forest.

We have to remember that big data is always made up of individuals. It might be our personal purchasing habits, our interest profile, our friends list, the collection of our published thoughts, or perhaps all of the above. On a macro scale, each of those data points allow researchers and firms to categorize populations or segment markets. But it takes work at the micro scale to grasp a contextual view of the individual. Research efforts and funding support must keep this in mind—big data methods can answer some questions, but certainly not all.

My work has focused on the lived experience of data, using qualitative interview methods to understand approaches to thinking about data. I look at data as a medium for personal knowledge creation, interpretation, and meaning making. I have closely studied the Quantified Self, a community in which individuals use mobile applications and wearable sensors to create data about our bodies and behaviors. The technology companies building these tools have interests in aggregate insights, but there is much to be learned from individuals about what our data means to us at a personal scale. 

So what do we need to do to avoid the potential biases and dehumanizing effects of looking at individuals through a big data lens? As those who are developing and supporting big data methods, we need to actively seek out means of preserving the subjectivity of the humans to which this data refers. I urge researchers, designers of internet platforms, and those in the business of data, to keep the humanity of data in mind. Remember to look at a small scale alongside large scale interpretations. This approach is sure to keep us in touch with what big data means across scales of humanity: from the globe, to national populations, to the user base of large internet platforms, to local communities, right down to the individual. 

I realize that Western thinking tends to privilege the position of the individual above the collective. And in turn, eastern traditions tend to privilege the collective over the individual. But we all share a common interest in humanizing our policies and interventions based on big data. In China this is embodied in 以人为本, as a principle of human-centered policy. I argue that we need Big Data with Human Characteristics. Figuring out what that means across cultures will be hard work, but uncovering and engaging with the commonalities and differences in the way we think about humans in big data will be revealing. This is an important step in what is sure to be a fruitful collaboration, as we work together to grapple with what big data means to us all.


What’s Meaningful Transparency for Consumers?

I wrote a piece for AdExchanger on the FTC Data Broker report on consumer data transparency. I argue that both the FTC and industry leaders don’t go far enough by exposing data. Consumers need to understand how data is being used.

The FTC’s definition of “sensitive information” is far too vague. So is the definition that industry leaders use.

“Data regarding personal information that pertains to employment or insurability decisions, or that relates to sensitive health-related issues or confidential matters deserves much different treatment than data that would indicate that I am a sports fan,” Scott Howe of Acxiom writes.

Some information is intuitively sensitive. But is driving a truck sensitive? I can imagine that my proclivity for truck driving could be used both to target advertising to me, but could also be a factor in risk-based calculations or inferred political leanings. Big data methods promise to surface novel correlations from matching up disparate data. That means innocuous-seeming data like sports fandom quickly becomes sensitive when used as proxies and towards unintuitive ends. It is unfair to expect consumers to understand the far-reaching effects of potential correlations.

Read more

[VIDEO] Living with Data: Stories that Make Data More Personal

I had the privilege of speaking at the Berkman Lunch Series this week and talked about my ideas for telling more personal stories about our relationship to data to ground our understanding in more practical, everyday lived experience. The way I see the problem is that right now we don’t understand the causal relationship between our data and its uses in the world. My talk sets up a few examples that I’ve seen recently that both exposed and obscured what my data says about me and how it’s being used. I talk about why understanding data in our everyday lives matters more than ever, and I set up what personal stories can do to help us. I walk through a few canonical examples, and then end with a pitch for a column to tell these stories on a regular basis. Please send me your ideas, strange screengrabs, and questions—this is just the beginning of an effort to make data and its uses more legible to us. 

The video is embedded here, and I’ve also posted my crib notes with links below if you’d prefer to read or want to follow up on some of the examples.

It is also on Soundcloud, in case you’d prefer to podcast the talk. 


This talk is a reflection of a lot of the work I’ve been thinking about here as a fellow, but it’s also a kind of proposal for future work, so I’m very much interested in feedback from the braintrust here in the room and watching on the web.

The main idea is that we need more stories that ground data in personal, everyday experience. We need personal data stories make data uses intelligible and impacts personal.

I wanted to start off by talking about what I do and what I do not know about myself as other entities see me through my data.


Facebook advertising engine seems to think I like cheese boards. Even when they aren’t selling cheese, or boards, they are part of my advertisements.

But I don’t know if it is because I talked about my love of cheese boards, or if it is based on image recognition, or some combination of the two. I can’t tell if Facebook thinks I’m demographically bougie, or if it really knows I’m obsessed with cheese.

About the Data, Acxiom’s consumer portal into our data broker data tells me it thinks I am a truck owner and intending to purchase a vehicle. I am not. I’m assuming this is based on my Father’s truck registration (the last time he drove a truck was in the early 1990s).

But About the Data doesn’t tell me what Axciom thinks I’m a “Truckin’ and Stylin’” or “Outward Bound” consumer, one of the many consumer segmentation profiles that might link to that Truck data point. Acxiom, shows us the inferred demographic information of behavioral targeting, but it doesn’t show us how it is being used by its third party customers who very well could be insurance companies or loan underwriters, not just marketers.

When I start to worry about the traces of my connections to friends in my time abroad in the UK and in China, I can use Facebook’s graph search to query how many people in my network I know in China that show up in my “buddy list” as the PRISM documents.

But I don’t have any confidence that I don’t meet the threshold for confidence-based citizenship. I don’t know what it means to be a person on a “buddy list” “associated with a foreign power.” Nor do I know whether my use of VPN would contribute to my score. My algorithmically-determined citizenship is completely opaque to me.

These are just some personal encounters I’ve had recently in my daily life—from the trivial in the commercial, to the consequential in talking about my shifting sense citizenship. The concerns I raise point to an asymmetry that obscures what’s going on behind the scenes in interactions in my daily life.

The crux of the problem is that right now we don’t understand the causal relationship between our data and its uses in the world.

Joanne McNeil has described this as reading the algorithmic tea leaves—it’s a dark art. We don’t understand the how and the why of data’s uses, let alone what our data forecasts about us.

I like to think of it as a kind of uncanny valley of personalization. When we try to understand creepy ads that follow us around or are strangely personal, we can’t figure out if it’s just coarse demographics or hyper-targeted machine learning that generates the ads we see and that leaves us with this sense of the uncanny.

So while data is making our behaviors, habits, and interests more legible to firms and governments, as consumers we haven’t yet developed the critical literacies to understand what our data is saying about us and more importantly how it is shaping our experience.

The other day a medical professional said to me “I have nothing to hide. If they profile so that a terrorist doesn’t blow up the plane that I’m taking to Disney with my kids, I’m okay with that.”

But he was only talking about one use—one that he thought was justified. Disney would be tracking him with their new MagicBands when his family gets there. “I have nothing to hide,” but I don’t know what I’m hiding from.

Right now, big data is a big black box. It’s hard to develop opinions and feelings about what we think should happen with data when most of what is happening right now is obscured and opaque. The flows of data and its uses are hidden.

When I started worrying about personal data while writing about it from the CIO’s perspective, I thought we had an awareness problem. People didn’t understand that by using free services they were paying with data, as it were. I think we’ve moved past that, and Snowden has heightened awareness even further. Right now, we are primed to have a discussion about how we want our data environment to look, but we have only scratched the surface about how our data is actually being used.


I think this is a particularly important moment because we’re moving from a time where data existed about our browsing habits, and about our mobile presence,  to a time where more of the physical world is being tracked and measured and becoming data. Our cities, our cars, our homes, our bodies are all extending our data profile. Anything with a sensor becomes fodder for this larger sociotechnical system that we’re building.

We’re also transitioning from a time when we intentionally searched for things we want, and search interfaces clearly delineated paid advertisements, to an interface that anticipates our needs and gives us small bits information, in the early iterations of Google Now. Our choice architectures fall away as interfaces become more embedded and anticipatory.

We are learning to live with data, as more of our domestic life becomes subject to digital scrutiny. But the way we interpret influence in the uses of data is also about to shift dramatically.


So, my proposal is: we need stories that make data uses more intelligible and its impacts more personal. We need new tools for thinking about data’s role in our everyday lives.

We need stories to be relatable. They need to go beyond “I have nothing to hide” mentality to illustrate the ways our environments are shaped and influence us.

We need more personal stories to make the uses of data more intelligible and more practical. And we need stories that bring data back from a big data scale, back down to a human scale.

In order to have better conversations about evolving norms for appropriate uses of data, we need to make the uses of data more legible to consumers. That’s the way we’ll be able to hold governments and corporations accountable for their data practices.


I’m going to walk through a few canonical personal data stories that do the work of opening the black box and make the personal effects of data practices legible.

By now you’ve all heard of this Target example. The New York Times profiled of the algorithms that looked at purchasing patterns to identify early pregnancy indicators. It also included a story about how the pregnancy coupons reached one family in particular. The father of the household brought the back to Target, inquiring as to why they would send pregnancy-related coupons to his teenage daughter, and it turned out that she was in fact pregnant.

This story has become canonical, because it did a lot to educate us about what was going on behind the scenes in the uses of data in this advanced case, but it also made the impacts of that practice concrete by detailing the social impacts on this given family.


More recently, a Mike Seay received a direct mail envelope from OfficeMax that included “Daughter Killed in Car Crash" in the address. This failure exposed just how egregious the market segmentations could be from these brokers. This exposed the kinds of lists data brokers are keeping on us, and the sorts of information they think is relevant. How might that information be used, and more importantly, how should it be used? This story connected the personal effect of an insensitive reminder of the loss of a child in a traumatic event, and implicated OfficeMax for its use, as well as the data broker for its database categorization. We began to understand how something like this could happen, and now it’s a example of a data use failure.


This last example is from a story I published in The Atlantic. I had deliberately chosen not to update my Facebook status when Nick and I got engaged because I didn’t want to show up in the database. But then Facebook asked me how well I knew him and displayed an ad for a custom engagement ring. It turned out that it was a coincidence that the service-enhancing survey to improve the relevance of my newsfeed happened to match up with a demographically-determined ad. But the coincidence didn’t lessen the effect of feeling as though Facebook had intruded on my personal life.

And even after talking with Facebook to confirm what was going on, I still had no answer as to what factors went in to the algorithm that asked about Nick as opposed to any of my other friends as a person of interest. Was it the sheer number images we were tagged in together, our increasingly overlapping networks?

I also still don’t know if I was getting this engagement ring ad because I was a female between the ages of 18-35 without a relationship status. Or if it was because a more complex series of behaviors across the site alerted Facebook that it seemed like Nick and I were getting more serious. My Facebook story showed that even though the ad and the user survey were coincidentally displayed together, it’s effect on me was not incidental.


So what is it about personal data stories? They detail the effects of data and algorithms on our everyday lives. They aren’t about data breaches where we have no idea if we are affected or should be worried.

Data stories explain what’s going on behind the scenes. They give us more information about how these black boxes are working. And they give us a framework and vocabulary to begin to interrogate other data environments. They expose the logic of the engineers building these systems, their data science practices, the reasons for their data interventions. They detail the consequences of design decisions and power structures.

Data stories are also concrete. They happen to real people. They are not obscured behind big data rhetoric. They are grounded in individual experience. They give us a sense of what it means to be a digital person today. They describe the dynamics as our roles change as consumers, citizens, and individuals.

In my research on the Quantified Self community, I found that individuals were using numbers as story telling devices—the show and tell format is quite literally a narrative using data. These data stories are full of thick description, and leave room for discussion about the individual, their feelings, interpretation, and sense of self. Like the personal stories in Quantified Self show and tell presentations, the personal data stories I’m interested in are about identifying personal meaning, or effects on the individual through understanding the uses of data.

Personal data stories have the potential restore the subjectivity of individuals to an otherwise “objective” medium of data.

But personal data stories are hard to tell.

This is a Reddit comment (I know I shouldn’t read them) in response to my Atlantic article, and it indicates the trouble with telling personal stories, and the subtly of talking about privacy from the database, rather than privacy from other people. But it’s not just the internet trolls that make personal data stories challenging to tell.

Data stories are hard to discover. Individuals aren’t necessarily primed to be critical of these patterns. And the strange things happen when there is a coincidence or a fluke or a change in the design that exposes something interesting. These rifts reveal the seams of the system.

Personal data stories are also anecdotes. Sometimes the effects are technically repeatable, but often not. They are exceptional and so by big data standards they are not statistically significant.

Data stories also need resources to reverse engineer what’s going on. Or you need the skills to be able to sandbox and build out hypothetical digital profiles to compare and contrast outcomes. Or you need the journalistic clout to get a response from Facebook to figure out if what you see is related or intentional or not. And so in that sense these stories can be taken out of the voice of the individual affected and end up appropriated by journalists.

And it’s challenging to tell data stories with nuance. There is risk in sensationalizing the concerns, and the Target story has been criticized for that. There’s a delicate balance in highlighting these exceptional cases and grounding it in the effects on our everyday lives.

Personal data stories also risk the personal privacy of the individuals involved by heightening their profile and their plight. The is also danger of personal attacks on these stories.


But the stories are made more all the more compelling if they come from consumers. If we can answer the questions they have, we can get at the core normative concerns of an conscientious but not necessarily technically savvy individual.

Data stories will inform future design choices and policy positions. They serve to educate publics and representatives about the stakes at play. And where individuals are still not sufficiently protected, we’ll start to see where the regulatory holes lie.

I want to see more data stories because I think they change the nature of the conversation we can have as a society. They even the playing field between all interested parties, and ground digital practices in human-scale effects.

Personal data stories will help us uncover the politics, epistemologies, economies, and ecologies of the sociotechnical system for which data is becoming the primary substrate.


I think of this personal stories work fitting into a larger emerging suite of tools and practices that expose the seams of the data uses and algorithmic design of our built digital environment.

Lots of people creating technological interventions building tools to make data more legible. Tools like Immersion take your gmail metadata and by exposing it, allow people to comprehend the stories they can see in their own data.

Ben Grosser’s Demetricator is a browser plugin that hides the Facebook quantifications of likes, friends, and time, and is what he calls critical software, to reveal how Facebook structures use and possibly addiction with quantification.

Another class of interventions are personal, but more performative.

Janet Veresti presented this past weekend at Theorizing the Web on her infrastructure inversion project—hiding her pregnancy from the internet by using cash, browsing maternity websites with Tor, and asking her family members and friends not to write about her pregnancy even in private Facebook messages.

In her recent book, Dragnet Nation, Julia Angwin takes extreme measures to prevent tracking and protect her privacy over the course of a year. She used a faraday case for her mobile phone and she even created a fake identity to separate out some of her commercial online activity.

These examples are as much a performance as they are an experiment. But these performance pieces demonstrate the futility of perfect privacy as a goal. They don’t depict the practicalities of everyday life except in the ways privacy protection hampers life. In contrast, personal stories from average consumers help ground these trade offs and better inform everyday practical decisions.


My interest in personal data stories is grounded in a larger vein of technological criticism. In much the same way that cultural and film critics discuss what is important and interesting about a cultural artifact, technology critics could uncover both the artistic cultural importance of technologies as media, as well as power dynamics inherent in technologies as political artifacts. Technology criticism should explore our relationship to the firms and the governments as individuals and as societies. And so I’m advocating for a technology criticism with anthropological flavor.


So to that end, I have a pitch for you today. I want to build a column for telling personal data stories. It would look something like “The Haggler" or "The Consumerist,” but for data and algorithms. There needs to be a platform to tell personal data stories with regularity. The format would be similar—investigate into a particular case to solve a personal problem while exposing the larger systemic issue at hand. The column would be a means to surface more of these stories, explain them for an individual, describe their case and its impact on that person, and reveal what’s going on for the rest of us. Data stories will also develop our attention to notice and scrutinize when we come across something in the course of our digital lives.

I think of this as a regular column in a popular publication, largely for a lay rather than technical audience. At the very least it could be a single purpose website to collect and share data stories. So I’m open to suggestions.

I want to make a call today for more personal stories. I need your help and I’m looking for participation. Share your questions and personal encounters with data questions. Do you have screencaptures of weird ads or algorithmic flukes?

Or what are some compelling example stories in this framework that changed the way you think about data and its uses?

I’d love to get your feedback, and hear your thoughts. This is a work in progress and just getting off the ground.