Protecting Privacy With Data Promiscuity

The Badness of Dave’s Paranoid Brain led him to sound the alarm: get off facebook now! Get yr email onto a small ISP! Keep your data away from big companies that hold profits or subpoena’s above their user’s interests!The concerns are valid, but the remedy doesn’t scale. Its hard enough to keep the bits flowing without having to worry about the moral fiber of your service providers. There’s another way to play this: instead of taking on the task of policing everyone who has access to your data, let’s build systems to obfuscate our data. I’m thinking of systems modeled on Tor and spam. Yes, spam. More after the jump.

First, the spam. About a year ago I started noticing snippets of classical English lit—Shakespeare, Faulkner, whatever—mixed into spam, right after the v14gr@ pitches. Spammers were seeding real, human-generated text into their pitches in an effort to defeat spam filters that would recognize human generated text.

That’s part one. Part two?

Tor is a software project that helps you defend against traffic analysis…Tor protects you by bouncing your communications around a distributed network of relays run by volunteers all around the world: it prevents somebody watching your Internet connection from learning what sites you visit, and it prevents the sites you visit from learning your physical location.

The best analogy I’ve heard (can’t remember where, sorry) is a football huddle. If you want the sender of a message to be anonymous, take the message, pass it amongst a circle of volunteers, and then randomly have one of them pass the message outside the circle along to the next node.

So how does this come together? Facebook users volunteer their content (or it’s scraped), which is pooled into a bank of human-generated FB data. When you want to obfuscate your profile, inject random data into your profile. Polluting the demographic data is almost impossible for FB (or Google, or whoever) to defend against, and scales nicely. Ready to leave FB? How about a tool that will auto-generate plausible—but false—data about your tastes, activities, and interactions, and insert in automatically at random intervals over a period of time. Want to mask the timing? Have a steady trickle of false data pumped in, even when you’re still using FB.

While Dave’s recommendations are sound:

control your own data. Use client-side applications as much of the time as possible (i.e. read your mail with Thunderbird on your own machine with mail hosted by a smaller ISP vs. logging into a web site…sign up for as few memberships as possible online…don’t sign into any Google services when you are searching, don’t sign up for any social networking sites, chatting on major services is problematic…don’t trust big corporations (or small ones) to do the right thing with your data. They don’t love you, they just want money.

they’re not practical for a large number of users, either because its too technically difficult or just too inconvenient. If the Internet is a copying machine, our strategy should play to its advantages, not its weaknesses. As Cory Doctorow says, it will never be harder to copy bits than it is today. Our defense should take advantage of data promiscuity.

The problem is real, but the solution isn’t less data—its more.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s