Two very detailed and interesting stories on the role of data analysis in homeland security and intelligence were posted online on Wednesday. When read together, they reveal the paradox of the current debate about the role of data analysis: are the government’s analytical activities encroaching on personal freedoms, or are they not nearly effective enough today?
The first story, from the Christian Science Monitor, looks at apparent data analysis activities within DHS’s Science and Technology directorate:
The US government is developing a massive computer system that can collect huge amounts of data and, by linking far-flung information from blogs and e-mail to government records and intelligence reports, search for patterns of terrorist activity.
The system – parts of which are operational, parts of which are still under development – is already credited with helping to foil some plots. It is the federal government’s latest attempt to use broad data-collection and powerful analysis in the fight against terrorism. But by delving deeply into the digital minutiae of American life, the program is also raising concerns that the government is intruding too deeply into citizens’ privacy….
The core of this effort is a little-known system called Analysis, Dissemination, Visualization, Insight, and Semantic Enhancement (ADVISE). Only a few public documents mention it. ADVISE is a research and development program within the Department of Homeland Security (DHS), part of its three-year-old “Threat and Vulnerability, Testing and Assessment” portfolio. The TVTA received nearly $50 million in federal funding this year….
A major part of ADVISE involves data-mining – or “dataveillance,” as some call it. It means sifting through data to look for patterns. If a supermarket finds that customers who buy cider also tend to buy fresh-baked bread, it might group the two together. To prevent fraud, credit-card issuers use data-mining to look for patterns of suspicious activity.
What sets ADVISE apart is its scope. It would collect a vast array of corporate and public online information – from financial records to CNN news stories – and cross-reference it against US intelligence and law-enforcement records. The system would then store it as “entities” – linked data about people, places, things, organizations, and events, according to a report summarizing a 2004 DHS conference in Alexandria, Va. The storage requirements alone are huge – enough to retain information about 1 quadrillion entities, the report estimated.
The article raises concerns about ADVISE and its potential impact on privacy. But the story also indicates that this has essentially been a research project to date, and I’m inclined to withhold judgment until I know more about its potential effectiveness. My general opinion is that we need to be pursuing this kind of data analysis research, rather than shutting down research projects before they come to fruition. That’s what happened with DARPA’s Total Information Awareness project in 2003 – it was shut down after Congress learned about it.
Or was it shut down? The second big story today, by Newsweek’s Michael Hirsh, notes how TIA never really went away:
Ironically, one of the most hopeful new intelligence surveillance programs is one that is still demonized in the media and on Capitol Hill. This is the Pentagonâ€™s Total Information Awareness (TIA) project, which was canceled after the last big civil-liberties scandal in late 2002. TIA was the creation of Adm. John Poindexter, the Iran-contra figure who was brought in to run the new program but was cashiered after it was uncovered by The New York Times. TIA was an effort to vacuum up as much U.S. transactions information as possible, such as the purchase of plane tickets or, say, large amounts of fertilizer as a way of anticipating terror plots. But the program was dropped after several senators blasted some of Poindexterâ€™s odder suggestions, like creating a “futures market” in which terror experts could bet on likely terror events and thereby add to the governmentâ€™s knowledge base.
Yet today, very quietly, the core of TIA survives with a new codename of Topsail (minus the futures market), two officials privy to the intelligence tell NEWSWEEK. It is in programs like these that real data mining is going on andâ€”considering the furor over TIAâ€”with fewer intrusions on civil liberties than occur under the NSA surveillance program. “Itâ€™s the best thing to come out of American intelligence in decades,” says John Arquilla, an intelligence expert at the Naval Postgraduate School in Monterey, Calif. “It is truly Poindexterâ€™s brainchild. Of all the people in the intelligence business, he has the keenest appreciation of using advanced information technology for intelligence gathering.” Poindexter, who lives just outside Washington in Rockville, Md., could not be reached for comment on whether he is still involved with Topsail.
Elsewhere in the story, Hirsh contrasts TIA with the prevailing “cold-war mentality” that hampers innovation at the NSA and elsewhere in the national security apparatus:
The legal controversy over the NSA surveillance program has obscured an intelligence issue that is at least as important to the nationâ€™s future: sheer competence. Do we have any idea what weâ€™re doing? One reason the NSA is listening in on so many domestic conversations fruitlesslyâ€”few of the thousands of tips panned out, according to The Washington Postâ€”is that the agency barely has a clue as to who, or what, it is supposed to be monitoring.
While soaking up the lionâ€™s share of the $40 billion annual intel budget, the NSA continues to preside over an antiquated cold-war apparatus, one designed to listen in on official communications pipelines in nation-states. Today it is overwhelmed by cell-phone and Internet traffic….
Reading these two stories, I’m struck by the fact that there’s such a wide variance in opinion about what’s wrong with our nation’s data analysis capabilities for intelligence and homeland security. Are they dangerous and conspiratorial? Or are they actually insufficient, far from where they need to be in order to improve our security? I’m inclined to think it’s the latter, but I also think there needs to be far more internal oversight of these activities, consistent with national values and constitutional norms.
For example, there probably needs to be a new, government-wide process that oversees the transition from “research” to “operation” for data analysis technologies. There should be relatively few constraints on data analysis research: we need free and open innovation from many sources if we’re ever going to solve the “connect the dots” problems highlighted in the 9/11 Commission report. But there needs to be a new government-driven process that examines this research when it’s ready to be used in an operational setting.
This process shouldn’t be legalistic, full of absolute, bright-line rules; instead, it should be driven by an expert and apolitical analysis of the potential security benefits and the privacy costs. This validation at the transition point from research to implementation could help to assuage fears that new technologies are being adopted without oversight, and it will ensure that only technologies that deliver real security value will be adopted.
This type of oversight won’t satisfy everyone, but it will help to restore trust in the nation’s intelligence and homeland security activities, and to ensure that researchers still have the incentives to pursue leading-edge data analysis technologies.
Update (2/9): W. David Stephenson has a thoughtful commentary on my post on his site.
Update 2 (2/9): Pages 7-9 of this PDF document from a 2004 workshop (referenced in the CS Monitor story) provide an overview of ADVISE.