Things Facebook Thinks I Care About, Ranked

20. Cats
19. Millennials
18. Adventure
17. Fatherhood
16. Renminbi
15. Cloud computing
14. Orange (fruit)
13. Gratitude
12. Bag
11. Fluid dynamics
10. Edible mushroom
9. Laser
8. Company
7. Pressure
6. Cervical vertebrae
5. Self-esteem
4. Life
3. Water
2. Year
1. Human skin color

Sourced from Facebook Ad Preferences. This post is also published on Medium.

“But Ferguson was Trending in my Feed”

This essay appeared in the Berkman Center's Youth and Media essay collection, “Youth and Online News: Reflections and Perspectives,” which is available for download through SSRN. My contribution is among a great set of pieces that offer insightful, provoking, and out-of-the-box reflections at the intersection of news, digital media, and youth. 

I was at a journalism conference recently where the topic of algorithmic curation came up. One of the speakers cited the comparison between Ferguson trending on Twitter while the Ice Bucket Challenge was all the rage on Facebook. It was held up as an example of how platforms influence and shape news and shape sharing behaviors of their users.

One student in the audience raised her hand, piping up that she contested the premise that Ferguson hadn’t trended on Facebook. “She was originally from St. Louis and all her friends from home had been talking about it, about race, about police violence, about protests. Ferguson was all over her Facebook newsfeed.”

The discrepancy provided an illustrative moment. One the one hand, opinion and data had made claims about how algorithmic filtering practices of platform affect access to news on Facebook. “On the other hand”, a personal experience of the same news event had differed drastically from the larger collective narrative about how news spreads online, and how politically sensitive topics are discussed within youth peer networks on Facebook.

That one student, away from home at school in Milwaukee, hadn’t felt distant from the activities in Ferguson. She was deep in it in her feed. The news was blowing up within her situated sphere of influence. This is how she experienced Ferguson.

Still, she had a hard time conceiving how Ferguson hadn’t made it into the feeds of others on Facebook. She contested the speaker’s claim with her own, situated and personal experience of the algorithmic curation.

Digital Literacy in Context

The greatest challenge we face in addressing the technical platforms that shape our information experiences is in demonstrating the effects between inputs and outputs in the system. Just as news literacy aims to develop skills to “understand a source’s agendas, motivations and backgrounds,” digital literacy needs to do the same of the platforms and their business models and motivations for providing value to consumers. We need tools that not only build diversity and solve for homophily problems, but also introduce us to the underlying editorial structures of these novel information platforms.

Digital news literacy ought to be taught by example and in context. Youth need to understand how algorithms affect their unique experience, not just how they influence everyone’s experience abstractly and in principle. We need more tools that allow youth to interact with the algorithm and see the micro effects of subtle changes from various inputs, like who you follow, what posts you comment on or re-share, and what things you like and click through.

Tools like Floodwatch’s ad tracking database allow us to compare our personal experience to that of others in a shared demographic profile. We could use still more technical interventions to help show variation in personalization.

What can youth learn about the way technical platforms work by comparing and contrasting the trending topics they see on Facebook and Twitter with peers in their network, and with others outside their network? What will they learn about what newsworthiness is in these personalized contexts?

If we take into account the personal, contextual experience of youth in teaching news literacy, we can help them to understand their place in a larger civic discourse around news and access to information by making it grounded, personal and real in the contexts where they get information today.

Ethnography in Youth and Media Research

News literacy goes beyond the sources that youth get information from, and how social media influences their filter bubble. It’s also about developing algorithmic literacy, for understanding the curatorial and editorial role of the platforms they interact with in their media environments.

Ethnographic interview work has vastly expanded our understanding of youth media practices by meeting them where they are and elevating their voices and concerns. Youth news experiences are inherently personalized now, and research methods for understanding those technical experiences must be as well.

Ethnography in Technology Journalism

Ethnographic approaches to knowledge and experience of algorithms should also expand to the media outlets covering our evolving relationship to technology. Journalists can play a role in developing digital literacy for access to information for their audiences by paying attention to and covering grounded, individual interactions with these systems.

That has been my methodological approach to “Living with Data,” the series I developed for Al Jazeera America. In it I examine encounters that illustrate our personal, situated experience of these tools, following reader submissions about our expectations about how these systems work or should work, and what is actually technically happening. The series aims to teach critical digital literacy through examples.

In part, this series was designed to refute the common argument that “I have nothing to hide” or that privacy concerns are too abstract for people to understand their effects. My aim is to illustrate through real experiences how autonomy and privacy are influenced by these sociotechnical systems that govern our access to information. A mission to develop critical digital literacies becomes especially important for a generation that takes Facebook and other social media platforms for granted.

This grounded approach makes the harms, or the surprises of data more personal, and more relatable. So while your experience may be very different from mine, I can begin to understand the inner workings of these algorithmic curatorial decisions because I can grasp the effects at a personal scale. I can compare my experience of Ferguson on Facebook against everyone else’s experience of the Ice Bucket Challenge.

Grounding coverage of these technical stories makes technical subjects more accessible, but also helps to make the individual stakes more present and clear. 

In Good Company

I got pretty excited when people who I admire and respect cited my recent articles about data science, Facebook, and the uncanny this week. Beyond the not-so-humble brag, I'm more excited by the mounting accumulation of voices calling for accountability and ethical approaches to our data and its uses. And I was even more excited to overhear older family members at a friend's wedding this past weekend discussing the Facebook study over breakfast. I think we're starting to get somewhere.

Om Malik, who has supported some of the most extensive industry coverage of data on GigaOM, wrote this week about the Silicon Valley's collective responsibility to use their power wisely:

While many of the technologies will indeed make it easier for us to live in the future, but what about the side effects and the impacts of these technologies on our society, it’s fabric and the economy at large. It is rather irresponsible that we are not pushing back by asking tougher questions from companies that are likely to dominate our future, because if we don’t, we will fail to have a proper public discourse, and will deserve the bleak future we fear the most...Silicon Valley and the companies that control the future need to step back and become self accountable, and develop a moral imperative. My good friend and a Stanford D.School professor Reilly Brennan points out that it is all about consumer trust. Read more.

And danah boyd, who is starting up the Data & Society research institute, summed up what we've learned from the Facebook emotional contagion study echoing my point that it's not just about the Facebook study, it's about the data practices: 

This paper provided ammunition for people’s anger because it’s so hard to talk about harm in the abstract...I’m glad this study has prompted an intense debate among scholars and the public, but I fear it’s turned into a simplistic attack on Facebook over this particular study, rather than a nuanced debate over how we create meaningful ethical oversight in research and practice. The lines between research and practice are always blurred and information companies like Facebook make this increasingly salient. No one benefits by drawing lines in the sand. We need to address the problem more holistically. And, in the meantime, we need to hold companies accountable for how they manipulate people across the board, regardless of whether or not it’s couched as research. If we focus too much on this study, we’ll lose track of the broader issues at stake. Read more.

Both are great reads, and align with a lot of the things I've been exploring in my own work. I'm honored to be in such good company.

Adjectives to Describe Me

In high school, I used to ask friends for three adjectives to describe me. I cringe a little now reflecting on the attitudinal teen navel-gazing activity of self definition, but it was an interesting exercise. It told me as much about how others saw me as it did about my friends themselves. I judged responses on how I thought they suited me, and on their creativity with vocabulary (once a word nerd always a word nerd).

Lately, I’ve been asking what my data says about me to other people, firms, and governments, and more importantly, to systems that interpret and target me as a data subject. 

Five is a new program that allows us to see how others might analyze our status updates and profile information by reflecting back a personality assessment, complete with spectrums of behavior and five isolated adjectives to describe us.

Five suggests that I’m apparently “inventive, assertive, restless, friendly, and efficient.” They aren’t spot on, but they aren’t terribly off the mark either. Still, they aren’t really meaningful without any explanation of why Five came up with these particular descriptors for me. There’s no causal explanation.

The five point personality chart does offer some explanation when you click through on the scale. For example, I’m scoring 87% on the openness scale, based on words like “writing,” “book,” and “i’ve been” that “have high correlation with this trait.” Based on my Facebook profile, I exhibit an interest “art…unusual ideas, creativity,” which all sounds right to me. My lowest scored trait is conscientiousness, which seems to be based on a lack of updates featuring the word “work.” This says more to me about the correlations (people like to write about how they are “back at work” while they are on Facebook first thing Monday morning?) than it does about my conscientiousness. But what am I supposed to do with the suggestion that I’m 76% neurotic except be comforted by the fact that Mark Zuckerberg (81%) and Lil Wayne (85%) are both more neurotic than me? 

For now, Five is of limited use because it analyses only Facebook statuses. For my sample, that’s a paltry 1,113 words. I don’t post to Facebook all that much, so it’s not as interesting as say Twitter analysis would be to me (my voice is decidedly different on each platform, too).

But it’s an important proof of concept towards building up the cadre of tools we have that make data and its uses more legible to us. Others include Immersion, Demetricator, and Collusion, all of which allow us to visualize and play with our data that is otherwise hidden. Sometimes they even share some of the interpretive analysis.

Sure, Five comes off about as meaningful as a multiple choice personality test in the latest issue of Seventeen. But this trivializes its importance as a tool for reflecting back to you what your data says about you. As I’ve argued, we need more opportunities and tools like this, whether they are browser and API plugins, or they are are integrated as features of the system like Facebook’s new advertising explanations. Adding it to the toolbox…

[VIDEO] Living with Data: Stories that Make Data More Personal

I had the privilege of speaking at the Berkman Lunch Series this week and talked about my ideas for telling more personal stories about our relationship to data to ground our understanding in more practical, everyday lived experience. The way I see the problem is that right now we don’t understand the causal relationship between our data and its uses in the world. My talk sets up a few examples that I’ve seen recently that both exposed and obscured what my data says about me and how it’s being used. I talk about why understanding data in our everyday lives matters more than ever, and I set up what personal stories can do to help us. I walk through a few canonical examples, and then end with a pitch for a column to tell these stories on a regular basis. Please send me your ideas, strange screengrabs, and questions—this is just the beginning of an effort to make data and its uses more legible to us. 

The video is embedded here, and I’ve also posted my crib notes with links below if you’d prefer to read or want to follow up on some of the examples.

It is also on Soundcloud, in case you’d prefer to podcast the talk. 

CRIB NOTES

This talk is a reflection of a lot of the work I’ve been thinking about here as a fellow, but it’s also a kind of proposal for future work, so I’m very much interested in feedback from the braintrust here in the room and watching on the web.

The main idea is that we need more stories that ground data in personal, everyday experience. We need personal data stories make data uses intelligible and impacts personal.

I wanted to start off by talking about what I do and what I do not know about myself as other entities see me through my data.

image

Facebook advertising engine seems to think I like cheese boards. Even when they aren’t selling cheese, or boards, they are part of my advertisements.

But I don’t know if it is because I talked about my love of cheese boards, or if it is based on image recognition, or some combination of the two. I can’t tell if Facebook thinks I’m demographically bougie, or if it really knows I’m obsessed with cheese.

About the Data, Acxiom’s consumer portal into our data broker data tells me it thinks I am a truck owner and intending to purchase a vehicle. I am not. I’m assuming this is based on my Father’s truck registration (the last time he drove a truck was in the early 1990s).

But About the Data doesn’t tell me what Axciom thinks I’m a “Truckin’ and Stylin’” or “Outward Bound” consumer, one of the many consumer segmentation profiles that might link to that Truck data point. Acxiom, shows us the inferred demographic information of behavioral targeting, but it doesn’t show us how it is being used by its third party customers who very well could be insurance companies or loan underwriters, not just marketers.

When I start to worry about the traces of my connections to friends in my time abroad in the UK and in China, I can use Facebook’s graph search to query how many people in my network I know in China that show up in my “buddy list” as the PRISM documents.

But I don’t have any confidence that I don’t meet the threshold for confidence-based citizenship. I don’t know what it means to be a person on a “buddy list” “associated with a foreign power.” Nor do I know whether my use of VPN would contribute to my score. My algorithmically-determined citizenship is completely opaque to me.

These are just some personal encounters I’ve had recently in my daily life—from the trivial in the commercial, to the consequential in talking about my shifting sense citizenship. The concerns I raise point to an asymmetry that obscures what’s going on behind the scenes in interactions in my daily life.

The crux of the problem is that right now we don’t understand the causal relationship between our data and its uses in the world.

Joanne McNeil has described this as reading the algorithmic tea leaves—it’s a dark art. We don’t understand the how and the why of data’s uses, let alone what our data forecasts about us.

I like to think of it as a kind of uncanny valley of personalization. When we try to understand creepy ads that follow us around or are strangely personal, we can’t figure out if it’s just coarse demographics or hyper-targeted machine learning that generates the ads we see and that leaves us with this sense of the uncanny.

So while data is making our behaviors, habits, and interests more legible to firms and governments, as consumers we haven’t yet developed the critical literacies to understand what our data is saying about us and more importantly how it is shaping our experience.

The other day a medical professional said to me “I have nothing to hide. If they profile so that a terrorist doesn’t blow up the plane that I’m taking to Disney with my kids, I’m okay with that.”

But he was only talking about one use—one that he thought was justified. Disney would be tracking him with their new MagicBands when his family gets there. “I have nothing to hide,” but I don’t know what I’m hiding from.

Right now, big data is a big black box. It’s hard to develop opinions and feelings about what we think should happen with data when most of what is happening right now is obscured and opaque. The flows of data and its uses are hidden.

When I started worrying about personal data while writing about it from the CIO’s perspective, I thought we had an awareness problem. People didn’t understand that by using free services they were paying with data, as it were. I think we’ve moved past that, and Snowden has heightened awareness even further. Right now, we are primed to have a discussion about how we want our data environment to look, but we have only scratched the surface about how our data is actually being used.

DATA PROLIFERATION

I think this is a particularly important moment because we’re moving from a time where data existed about our browsing habits, and about our mobile presence,  to a time where more of the physical world is being tracked and measured and becoming data. Our cities, our cars, our homes, our bodies are all extending our data profile. Anything with a sensor becomes fodder for this larger sociotechnical system that we’re building.

We’re also transitioning from a time when we intentionally searched for things we want, and search interfaces clearly delineated paid advertisements, to an interface that anticipates our needs and gives us small bits information, in the early iterations of Google Now. Our choice architectures fall away as interfaces become more embedded and anticipatory.

We are learning to live with data, as more of our domestic life becomes subject to digital scrutiny. But the way we interpret influence in the uses of data is also about to shift dramatically.

PERSONAL DATA STORIES

So, my proposal is: we need stories that make data uses more intelligible and its impacts more personal. We need new tools for thinking about data’s role in our everyday lives.

We need stories to be relatable. They need to go beyond “I have nothing to hide” mentality to illustrate the ways our environments are shaped and influence us.

We need more personal stories to make the uses of data more intelligible and more practical. And we need stories that bring data back from a big data scale, back down to a human scale.

In order to have better conversations about evolving norms for appropriate uses of data, we need to make the uses of data more legible to consumers. That’s the way we’ll be able to hold governments and corporations accountable for their data practices.

EXAMPLE STORIES

I’m going to walk through a few canonical personal data stories that do the work of opening the black box and make the personal effects of data practices legible.

By now you’ve all heard of this Target example. The New York Times profiled of the algorithms that looked at purchasing patterns to identify early pregnancy indicators. It also included a story about how the pregnancy coupons reached one family in particular. The father of the household brought the back to Target, inquiring as to why they would send pregnancy-related coupons to his teenage daughter, and it turned out that she was in fact pregnant.

This story has become canonical, because it did a lot to educate us about what was going on behind the scenes in the uses of data in this advanced case, but it also made the impacts of that practice concrete by detailing the social impacts on this given family.

image

More recently, a Mike Seay received a direct mail envelope from OfficeMax that included “Daughter Killed in Car Crash" in the address. This failure exposed just how egregious the market segmentations could be from these brokers. This exposed the kinds of lists data brokers are keeping on us, and the sorts of information they think is relevant. How might that information be used, and more importantly, how should it be used? This story connected the personal effect of an insensitive reminder of the loss of a child in a traumatic event, and implicated OfficeMax for its use, as well as the data broker for its database categorization. We began to understand how something like this could happen, and now it’s a example of a data use failure.

image

This last example is from a story I published in The Atlantic. I had deliberately chosen not to update my Facebook status when Nick and I got engaged because I didn’t want to show up in the database. But then Facebook asked me how well I knew him and displayed an ad for a custom engagement ring. It turned out that it was a coincidence that the service-enhancing survey to improve the relevance of my newsfeed happened to match up with a demographically-determined ad. But the coincidence didn’t lessen the effect of feeling as though Facebook had intruded on my personal life.

And even after talking with Facebook to confirm what was going on, I still had no answer as to what factors went in to the algorithm that asked about Nick as opposed to any of my other friends as a person of interest. Was it the sheer number images we were tagged in together, our increasingly overlapping networks?

I also still don’t know if I was getting this engagement ring ad because I was a female between the ages of 18-35 without a relationship status. Or if it was because a more complex series of behaviors across the site alerted Facebook that it seemed like Nick and I were getting more serious. My Facebook story showed that even though the ad and the user survey were coincidentally displayed together, it’s effect on me was not incidental.

WHAT DATA STORIES DO

So what is it about personal data stories? They detail the effects of data and algorithms on our everyday lives. They aren’t about data breaches where we have no idea if we are affected or should be worried.

Data stories explain what’s going on behind the scenes. They give us more information about how these black boxes are working. And they give us a framework and vocabulary to begin to interrogate other data environments. They expose the logic of the engineers building these systems, their data science practices, the reasons for their data interventions. They detail the consequences of design decisions and power structures.

Data stories are also concrete. They happen to real people. They are not obscured behind big data rhetoric. They are grounded in individual experience. They give us a sense of what it means to be a digital person today. They describe the dynamics as our roles change as consumers, citizens, and individuals.

In my research on the Quantified Self community, I found that individuals were using numbers as story telling devices—the show and tell format is quite literally a narrative using data. These data stories are full of thick description, and leave room for discussion about the individual, their feelings, interpretation, and sense of self. Like the personal stories in Quantified Self show and tell presentations, the personal data stories I’m interested in are about identifying personal meaning, or effects on the individual through understanding the uses of data.

Personal data stories have the potential restore the subjectivity of individuals to an otherwise “objective” medium of data.

But personal data stories are hard to tell.

This is a Reddit comment (I know I shouldn’t read them) in response to my Atlantic article, and it indicates the trouble with telling personal stories, and the subtly of talking about privacy from the database, rather than privacy from other people. But it’s not just the internet trolls that make personal data stories challenging to tell.

Data stories are hard to discover. Individuals aren’t necessarily primed to be critical of these patterns. And the strange things happen when there is a coincidence or a fluke or a change in the design that exposes something interesting. These rifts reveal the seams of the system.

Personal data stories are also anecdotes. Sometimes the effects are technically repeatable, but often not. They are exceptional and so by big data standards they are not statistically significant.

Data stories also need resources to reverse engineer what’s going on. Or you need the skills to be able to sandbox and build out hypothetical digital profiles to compare and contrast outcomes. Or you need the journalistic clout to get a response from Facebook to figure out if what you see is related or intentional or not. And so in that sense these stories can be taken out of the voice of the individual affected and end up appropriated by journalists.

And it’s challenging to tell data stories with nuance. There is risk in sensationalizing the concerns, and the Target story has been criticized for that. There’s a delicate balance in highlighting these exceptional cases and grounding it in the effects on our everyday lives.

Personal data stories also risk the personal privacy of the individuals involved by heightening their profile and their plight. The is also danger of personal attacks on these stories.

IMPACTS

But the stories are made more all the more compelling if they come from consumers. If we can answer the questions they have, we can get at the core normative concerns of an conscientious but not necessarily technically savvy individual.

Data stories will inform future design choices and policy positions. They serve to educate publics and representatives about the stakes at play. And where individuals are still not sufficiently protected, we’ll start to see where the regulatory holes lie.

I want to see more data stories because I think they change the nature of the conversation we can have as a society. They even the playing field between all interested parties, and ground digital practices in human-scale effects.

Personal data stories will help us uncover the politics, epistemologies, economies, and ecologies of the sociotechnical system for which data is becoming the primary substrate.

INTERVENTIONS

I think of this personal stories work fitting into a larger emerging suite of tools and practices that expose the seams of the data uses and algorithmic design of our built digital environment.

Lots of people creating technological interventions building tools to make data more legible. Tools like Immersion take your gmail metadata and by exposing it, allow people to comprehend the stories they can see in their own data.

Ben Grosser’s Demetricator is a browser plugin that hides the Facebook quantifications of likes, friends, and time, and is what he calls critical software, to reveal how Facebook structures use and possibly addiction with quantification.

Another class of interventions are personal, but more performative.

Janet Veresti presented this past weekend at Theorizing the Web on her infrastructure inversion project—hiding her pregnancy from the internet by using cash, browsing maternity websites with Tor, and asking her family members and friends not to write about her pregnancy even in private Facebook messages.

In her recent book, Dragnet Nation, Julia Angwin takes extreme measures to prevent tracking and protect her privacy over the course of a year. She used a faraday case for her mobile phone and she even created a fake identity to separate out some of her commercial online activity.

These examples are as much a performance as they are an experiment. But these performance pieces demonstrate the futility of perfect privacy as a goal. They don’t depict the practicalities of everyday life except in the ways privacy protection hampers life. In contrast, personal stories from average consumers help ground these trade offs and better inform everyday practical decisions.

TECHNOLOGY CRITICISM

My interest in personal data stories is grounded in a larger vein of technological criticism. In much the same way that cultural and film critics discuss what is important and interesting about a cultural artifact, technology critics could uncover both the artistic cultural importance of technologies as media, as well as power dynamics inherent in technologies as political artifacts. Technology criticism should explore our relationship to the firms and the governments as individuals and as societies. And so I’m advocating for a technology criticism with anthropological flavor.

PITCH FOR A COLUMN

So to that end, I have a pitch for you today. I want to build a column for telling personal data stories. It would look something like “The Haggler" or "The Consumerist,” but for data and algorithms. There needs to be a platform to tell personal data stories with regularity. The format would be similar—investigate into a particular case to solve a personal problem while exposing the larger systemic issue at hand. The column would be a means to surface more of these stories, explain them for an individual, describe their case and its impact on that person, and reveal what’s going on for the rest of us. Data stories will also develop our attention to notice and scrutinize when we come across something in the course of our digital lives.

I think of this as a regular column in a popular publication, largely for a lay rather than technical audience. At the very least it could be a single purpose website to collect and share data stories. So I’m open to suggestions.

I want to make a call today for more personal stories. I need your help and I’m looking for participation. Share your questions and personal encounters with data questions. Do you have screencaptures of weird ads or algorithmic flukes?

Or what are some compelling example stories in this framework that changed the way you think about data and its uses?

I’d love to get your feedback, and hear your thoughts. This is a work in progress and just getting off the ground.