“I would be surprised if there’s even a single person that can answer that narrow question conclusively,” the engineer said, in an exchange from court testimony that was first reported by the Intercept. Facebook provided the court with a list of 55 systems and databases where user data might be stored.
Tech giants like Google, Facebook and Twitter were founded more than 15 years ago, and they developed freewheeling cultures in which individual engineers and teams could build databases, algorithms and other software independently of one another. Speed was prioritized over security measures that could slow things down. This was before years of privacy lawsuits and legislation pushed the companies to tighten up their data practices.
But experts said that companies are still struggling to pay off years of technical debt as regulators and consumers demand more from tech companies, such as the ability to delete data or to know what exactly is being gathered about a person. And some of those practices that prioritized speed have not changed.
“Many engineers at Twitter had a stance that security measures made their lives difficult and slowed people down,” said Edwin Chen, who has held engineering roles at Twitter, Google and Facebook and is now CEO of the content-moderation start-up Surge AI. “And this is definitely a bigger problem than just Twitter.”
Some of these systems are black boxes even to the people who built them, said Katie Harbath, former Facebook policy director and CEO of the consultancy Anchor Change (Facebook changed its name to Meta last year). Even if the correct policies are in place, they can be tough to implement when the underlying databases were not built to answer questions such as what are all the places where a person’s location or profile might have been stored.
“It’s hard to start from scratch, particularly the bigger you get,” she said. “The way these platforms were originally set up, every team had a huge amount of autonomy.”
In Meta’s court case, a class action in Northern California relating to the Cambridge Analytica privacy scandal that the company settled last month, plaintiffs asked the company to show them the entirety of the information it collects and stores about them. That could include people’s precise locations throughout the day, health conditions they have searched for or groups that they have joined, and inferences such as how likely a person is to be married.
Facebook initially offered up data from the company’s “Download Your Information” tool, but a judge found in 2020 that the information Facebook provided was too limited. Yet Facebook’s response, recorded in a deposition this summer, was essentially that even the companies’ own engineers aren’t sure where all the data lived.
Dina El-Kassaby, a spokeswoman for Meta, Facebook’s parent company, said that the deposition did not mean that the company was failing at security or data access issues. “Our systems are sophisticated and it shouldn’t be a surprise that no single company engineer can answer every question about where each piece of user information is stored,” she said. “We’ve built one of the most comprehensive privacy programs to oversee data use across our operations and to carefully manage and protect people’s data. We have made — and continue making — significant investments to meet our privacy commitments and obligations, including extensive data controls.”
In Tuesday’s Senate hearing with Zatko, the whistleblower and former security chief made similar comments about Twitter. He noted that in a recent data breach, Twitter had accidentally leaked the personal information of 50 million employees (Zatko’s lawyer later issued a correction statement saying Zatko meant to say 20,000).
Zatko noted in the hearing that Twitter doesn’t have anything approaching that many employees — the current number is 7,000 — and pointed out that Twitter is keeping too much information on former employees and contractors that it fails to delete.
He repeatedly asserted that the company had up to 4,000 engineers — more than half of all employees at the company — with broad access to internal systems, and few ways to formally track who accessed what. This was a dangerous situation, he said, because an individual employee could take over a Twitter account and impersonate it.
If that employee were secretly working for a foreign government, the risks from giving employees wide latitude to access user data are far greater. Zatko has alleged that Twitter knowingly had employees who worked for both the Indian and Chinese governments but has not provided proof to back up these allegations.
And in a separate report on the company’s ability to tackle misinformation that was included in the trove Zatko provided to Congress, an independent auditor noted that Twitter lacked a formal system to track cases of users who had broken the company’s rules.
Twitter has repeatedly pushed back against Zatko’s arguments. A spokeswoman, Rebecca Hahn, previously told The Washington Post that Twitter had tightened up security extensively since 2020, that its security practices are within industry standards and that it had specific rules about who can access company systems. In response to Tuesday’s hearing, Hahn reiterated that Zatko’s arguments were “riddled with inconsistencies and inaccuracies” but declined to specify any details.
David Thiel, chief technical officer at the Stanford Internet Observatory at Stanford University and a former Facebook security engineer, said that after reading Zatko’s disclosures, he had the impression that Twitter’s security processes appeared to be years behind Facebook’s. He noted that Facebook tightened up access significantly in response to various controversies over the years, including the allegation that Facebook had allowed the Cambridge Analytica company access to user data, to the point that if an engineer accessed a system they didn’t have permission to access, “someone will come after you and you will get fired.”
But he said that it’s still common in Silicon Valley to give engineers broad access so that they can “build interesting products quickly.”
“The emphasis,” he said, “is still on speed and access.”
He said that sometimes companies, including Facebook, truly cannot know everything that’s inside their systems.
For example, machine learning systems and software algorithms are made up of tens of thousands of data points, often calculated instantaneously. While it’s possible to put data points into the system, one cannot then work backward to retrieve the original inputs. He drew a food analogy, noting that it would be impossible to turn soup back into its original ingredients.
But other data, he said, is merely complex, and companies are resistant to the extensive work it could take to track it all down — and would probably do so only if compelled by new laws or court rulings.
It’s not “so complicated that it’s not doable,” he said.