BY AYNA AGARWAL
About three years ago, a little-known researcher named Aleksandr Kogan began a social science experiment at Cambridge University. Nothing unusual here. But just a few years later, he became embroiled in a Silicon Valley scandal of epic proportions.
Over 80 million raw profiles of users—including their friends, activity, and private information—were scraped from Facebook by Kogan to run an experiment to determine personality profiles based on digital footprints. The researcher stretched his boundaries by collecting more information than initially agreed upon. He then turned around and sold that data to Cambridge Analytica, in violation of the Terms of Service agreement between him and Facebook.
“Everyone sees Facebook or the government as being at fault,” says Jim Waldo, Professor of Computer Science and Public Policy at Harvard University. “But the actor really at fault here is the researcher.” Facebook routinely allows researchers to access its data for academic purposes, and users consent to share their information when they set up a Facebook account.
We can blame Facebook for creating a technical architecture that allows additional data to be scraped, for not aggressively requiring the researcher to delete data beyond its Terms of Service, or for not developing stronger auditing mechanisms. While it is on trend to play the blame game on various Silicon Valley companies, the real question is: How do we prevent data breaches from happening again?
One might assert that in the wake of this scandal, Facebook should remove data access from academic researchers. But limiting access to human data for research would be a terrible misstep for human understanding and progress.
A huge amount of data resides online with aggregators like Facebook, Google, Snapchat, and others. This is a valuable trove of information for social scientists to leverage—helping us make sense of the world’s toughest issues. Cass Sunstein, a professor at Harvard Law School, describes how social science can yield crucial insights for policy and public service on a range of issues. Simpler financial aid forms may increase the likelihood that a student attends college. Design nudges can encourage people to eat healthier, improving overall public health. Automatically enrolling citizens in savings accounts encourages better long-term financial planning than offering tax benefits. These are a just few of the myriad ways in which social science research can improve our society.
On the other hand, we risk the privacy of potentially millions of individuals by letting researchers access user data (perhaps enriching themselves in the process) with few to no audits. Imagine if you consented to provide your genome data to a doctor for research, and the doctor sold it to fitness companies to better market their services to you.
Our focus in these debates often centers around the benefits and risks of research based on user data. What we tend to forget are the risks of not doing this research.
The Cambridge Analytica controversy has drawn attention to data privacy regulations in the United States, but companies and regulators should not punish all researchers because of this one case. Instead, regulators should establish better safeguards and due diligence mechanisms.
Facebook has already made a number of changes to how it manages its users’ data and communicates its privacy policies. Last week, Facebook launched a new research initiative with the Social Science Research Council with improved methods for reviewing, engaging, and monitoring researchers who gain access to Facebook data.
Data stewards within corporations and academic institutions alone, however, are not enough. Government regulators must require more of these organizations and hold them accountable. It is government’s role to protect its citizens—including their privacy.
Unfortunately, privacy laws in the US have historically been designed to protect individuals from the government itself, rather than from corporations or academia. Existing privacy requirements for corporations are weak, and there is not a single federal law that requires or governs corporate privacy.
The regulatory entity that manages data privacy closest is the Federal Trade Commission (FTC), which has a legal obligation to protect American consumers from “anticompetitive, deceptive, and unfair business practices, enhancing informed consumer choice.” The FTC’s Fair Information Practice principles are the government’s closest attempt to recommend best practices for corporate privacy. But this is not nearly enough.
While the US does not need to follow the European Union’s forward-thinking privacy regulations in lockstep, the government must impose stricter requirements on companies. Otherwise, both the risks of data leaks and unfair limitations on researchers will remain high.
The hidden nature of data sharing has, in part, fueled the public’s recent outcry. Regulators could help social media companies become more transparent about data sharing with academic researchers. Regulators also could require more stringent auditing and reporting mechanisms of organizations that share data with external parties, and require evidence beyond verbal statements of compliance. They could also help companies better screen for trusted researchers.
The US must find a balance between privacy safeguards and data governance. Data is a tool. There are many benefits from informing research and decisions with better data, while still being protected from its harmful impacts. For, as we have seen, data can be used for both good and bad.
Ayna Agarwal is an MPA Candidate at Harvard’s Kennedy School. She graduated with a B.S. from Stanford University and worked at Palantir Technologies. She is also the Co-Founder of she++.