Paying with Data

Given the increasing adoption of services like Facebook and the ubiquity of Google’s multi-service platform, there’s been a lot of energy and thinking devoted to data, privacy, and the internet already. The discussion has centered on privacy concerns around exposure of PII (personally identifiable information) with work from folks like Alexander Acquisti’s privacy nudges and danah boyd’s talk on system defaults and their relation to privacy and publicity at SXSW last year. More recently the spotlight has moved to exposing the extent of online marketers’ use of cookies and tracking, as covered recently in the WSJ series What They Know. That interest has bubbled up into the pending FTC plan around Do Not Track mechanisms. Meanwhile, Viktor Mayer-Schoenberger contributed a proposed expiration date for data in his book, Delete. And on the enterprise IT side of things, everyone’s been talking a lot about Big Data and the budding opportunity it presents (ie, how can older companies model themselves to make data-driven decisions more like Google does). 

Still, little work has been done so far to help users better understand their relationship to the free services they so happily feed information to on a daily basis in the form of Facebook status updates and photos, Google searches and Gmail accounts, exposed Twitter APIs, etc. Aza Raskin is working on exposing privacy policies with easier to understand privacy icons. Chris Anderson wrote on the economics of free and the internet, focusing more on how the cost of products themselves were getting cheaper, how traditional advertising models for free content continue to work, and how labor exchanges like captchas make things free. What he did not discuss in his work 2008 work was the idea that data is behind every one of those free models, subsidizing services by creating aggregate value at scale.

I’d like to propose new ways of thinking about the interaction between a user and a free internet service in terms of exchanges and transactions. It’s worth considering these interactions (clicks, statuses, profile fields) in terms of micro-payments of data in order to help the general public better understand the aggregate burden of those accumulating micro data points - in terms of both data ownership and exposure.

We need to get beyond a basic understanding of ad-supported freeness to explain how the underlying transaction behind ads is only made possible and lucrative with data. Moreover, such a framework would better incorporate the idea that services improve as they begin to incorporate feedback from user behaviors as expressed through clicks and usage patterns.

Thinking about the economic value of data in the relationship between users and free services is helpful both at the individual level and the macro-system level: the more Google and Facebook can pull together about your profile, the better they can serve up more relevant ads to you as an individual. And the more these services know about the system at large through aggregate data sets, the more they can do to improve the system overall (that is, making it more profitable by either encouraging more lucrative behaviors or by increasing engagement and making people want to stay there longer). 

Framing in transactional terms is not meant to be alarmist, as much privacy-focused rhetoric has been. There is certainly mutual value in greater relevance and improved services, but have we ever recognized that we’re paying for those improvements with our data? And unlike proposed behavioral privacy nudges that assume user irrationality, framing data in economic terms might help users think about their relationships with free services as complicit agents in the transaction, rather than as targets prone to lucrative/exploitative default settings.

I began thinking about this back in July when pulling together the SXSW panel proposal, and a lot of interesting work has been done since then to begin to expose the general public to the extent of tracking that’s going on today. The pending Do Not Track legislation is a good indication of that progress in awareness raised. I’m also excited to see people like Ben McAllister exploring this idea in a post that suggests that there are hidden costs of exposure and explores the problems of a priceless market in the “personal data economy.”

I’ll have the opportunity to hash out these ideas further with some of the leading thinkers who are playing with these kinds of data issues in the upcoming SXSW panel set for March 13, Paying with Data: How Free Services on the Internet Are Not Free


Arvind Narayanan, Postdoctoral Researcher, Stanford University
His research has exposes flaws in the purported privacy of public APIs, suggesting that it only takes 33 bits of data to identify an individual. He is also one of the researchers behind the “Do Not Track" proposal. 

Julia Angwin, The Wall Street Journal
Her work in the “What They Know" series in the WSJ has led the way in exposing the extent of internet tracking, primarily for advertisers and profilers like RapLeaf and others. 

Sam Yagan, CEO, OkCupid
OkCupid is all about data, and has always been up front about their use of it in their common language privacy policy. They’ve also generated a lot of interest in their company by provoking conversations about data for marketing in the OkTrends blog. 

Moderator: Sara Watson, Web Ecology Project
My thoughts on the topic began while researching enterprise data use and business intelligence, and grew out of my interest in exploring how people understand the technology they use. I’m hoping to present a framework that let gives more weight to the idea that nothing on the internet ever truly free.