A friend sent me an email this morning with this subject line: “This is Amazing.”
The message said:
Check this metadata app (you can only use it of you use a gmail account): immersion.media.mit.edu
I wasn’t the only one to learn about this new creation from the MIT Media Lab. A lot of people wanted to try it out. So it took a long time to get through. But eventually I did.
I gave the Media Lab permission to see the metadata from my gmail account. Yes, you have to surrender your privacy to see what surrendering your privacy could be like. But what the hell. It’s only metadata. Metadata’s innocuous.
If you’d like to try Immersion, but either don’t use gmail or don’t want to share your account with MIT, here’s a link to an Immersion demonstration: https://immersion.media.mit.edu/demo
And here is a link to a seven minute video explaining Immersion: https://vimeo.com/69464265
Here’s what the Media Lab’s Immersion Project showed me about my gmail metadata, covering 2004 through July 2013 (names removed):
Interesting, but what could it mean?
I found James Vincent’s description of the Immersion Project in The Independent:
Plugging your Gmail address into MIT’s Immersion allows the system to scrape your email account for its metadata, and produces a complex bubble map showing who you talk to, how much you talk to them, and what your relationships with your contacts are.
Vincent’s article led me to a blog post by Ethan Zuckerman, describing how he used the tool.
Among his observations:
The Obama administration and supporters have responded to criticism of these programs [identified by Snowden] by assuring Americans that the information collected is “metadata”, information on who is talking to whom, not the substance of conversations. As Senator Dianne Feinstein put it, “This is just metadata. There is no content involved.” By analyzing the metadata, officials claim, they can identify potential suspects then seek judicial permission to access the content directly. Nothing to worry about. You’re not being spied on by your government – they’re just monitoring the metadata.
Sociologist Kieran Healy shows another set of applications of these techniques, using a much smaller, historical data set. He looks at a small number of 18th century colonists and the societies in Boston they were members of to identify Paul Revere as a key bridge tie between different organizations. In Healy’s brilliant piece, he writes in the voice of a junior analyst reporting his findings to superiors in the British government, and suggests that his superiors consider investigating Revere as a traitor. He closes with this winning line: “…if a mere scribe such as I — one who knows nearly nothing — can use the very simplest of these methods to pick the name of a traitor like Paul Revere from those of two hundred and fifty four other men, using nothing but a list of memberships and a portable calculating engine, then just think what weapons we might wield in the defense of liberty one or two centuries from now.”
Zuckerman published the Immersion Project’s image of his gmail account, along with an analysis.
The largest node in the graph, the person I exchange the most email with, is my wife, Rachel. I find this reassuring, but [two people involved with Immersion] have told me that people’s romantic partners are rarely their largest node. Because I travel a lot, Rachel and I have a heavily email-dependent relationship, but many people’s romantic relationships are conducted mostly face to face and don’t show up clearly in metadata. But the prominence of Rachel in the graph is, for me, a reminder that one of the reasons we might be concerned about metadata is that it shows strong relationships, whether those relationships are widely known or are secret.
The Immersion image of my emails allowed me to identify people who are key in my network. Here’s an image of one of them, again I have removed the names:
I am also able to see, based on the thickness of the connecting lines, who in my network has the strongest ties to this central person. And that’s just scratching the metadata surface.
Back to Zuckerman’s blog. After describing some additional implications of his Immersion-generated social network image, he writes:
My point here isn’t to elucidate all the peculiarities of my social network (indeed, analyzing these diagrams is a bit like analyzing your dreams – fascinating to you, but off-putting to everyone else). It’s to make the case that this metadata paints a very revealing portrait of oneself. And while there’s currently a waiting list to use Immersion, this is data that’s accessible to NSA analysts and to the marketing teams at Google. [my emphasis] That makes me uncomfortable, and it makes me want to have a public conversation about what’s okay and what’s not okay to track.
Jonathan O’Donnell commented on Zuckerman’s post with a brief literature review about the consequences of data tracking (see the original posting for links to the cited research):
For me, the classic paper in this area is Paul Ohm’s analysis of why anonymization doesn’t work. He shows that small amounts of metadata, and a modicum of known facts, will reveal big amounts of private information (Ohm, 2010).
In 1997, two students at Massachusetts Institute of Technology (MIT) analyzed the Facebook profiles of 6,000 past and present MIT students. They demonstrated that they were able to predict, with a very high degree of certainty, whether someone was gay or not, based on their friendship group (Jernigan & Mistree, 2009).
In 2009, Acquisti and Gross demonstrated that they could ‘guess’ a large number of American social security numbers using just the birth date and place of a person (Acquisti and Gross, 2009).
In 2009, Zheleva and Getoor demonstrated that friendship and group affiliation on social networks could be used to recover the information of private-profile users. They found that they could predict (with reasonable degrees of success) country of residence (Flickr), gender (Facebook), breed of dog (Dogster) and whether someone was a spammer (BibSonomy), even when 50% of the sample group were private-profile users (Zheleva and Getoor, 2009).
In 2011, Calandrino and others demonstrated that you could use the “You might also like” feature on Hunch, Last.fm, LibraryThing, and Amazon to predict individual purchasing, listening and reading habits of users of these systems. As long as you knew a small number of items that were true about a person, you could use the system to investigate their private behaviour on these sites (Calandrino et al, 2011).
…I’m pretty sure that these techniques can be chained, so that if you are a prolific user of social networks, people can tell your gender, sexual orientation, country of residence, breed of dog, purchasing, listening, reading and spamming activities, your social security number and your name, even if you were anonymous.
But so what, if you’ve done nothing wrong? Why be concerned?
Some of my colleagues ask me that.
I know of at least one major police department that is concerned the ease of social network tracking is making life more dangerous for its undercover officers. The officers practice safe social networking. But they have little control over the social network practices of other people in their professional and social networks — let alone control over the people in the friends of their friends networks. It gets megacomplex really quickly.
A few months ago, Bruce Schneier wrote that it’s too late to talk about control. The Internet won, he says. Privacy lost.
The Internet is a surveillance state. Whether we admit it to ourselves or not, and whether we like it or not, we’re being tracked all the time. … [It] is ubiquitous surveillance: All of us being watched, all the time, and that data being stored forever. This is what a surveillance state looks like, and it’s efficient beyond the wildest dreams of George Orwell.
Sure, we can take measures to prevent this. We can limit what we search on Google from our iPhones, and instead use computer web browsers that allow us to delete cookies. We can use an alias on Facebook. We can turn our cell phones off and spend cash. But increasingly, none of it matters.
There are simply too many ways to be tracked. The Internet, e-mail, cell phones, web browsers, social networking sites, search engines: these have become necessities, and it’s fanciful to expect people to simply refuse to use them just because they don’t like the spying, especially since the full extent of such spying is deliberately hidden from us and there are few alternatives being marketed by companies that don’t spy.
So, we’re done. Welcome to a world where Google knows exactly what sort of porn you all like, and more about your interests than your spouse does. Welcome to a world where your cell phone company knows exactly where you are all the time. Welcome to the end of private conversations, because increasingly your conversations are conducted by e-mail, text, or social networking sites.
And welcome to a world where all of this, and everything else that you do or is done on a computer, is saved, correlated, studied, passed around from company to company without your knowledge or consent; and where the government accesses it at will without a warrant.
Welcome to an Internet without privacy, and we’ve ended up here with hardly a fight.
Oh well, there’s always Pong. Pong’s innocuous.