AI didn't invent privacy abuse - how the history of data privacy informs our future

Profile picture for user Neil Raden By Neil Raden May 24, 2021
Summary:
There is no AI Ethics without data privacy. But how do we account for new privacy legislation like GPDR and CPRA - and also the resistance to it? Here's how the regulation of privacy evolved.

Privacy threat at work - laptop with eyes in screen © Juergen Faelchle - shutterstock
( © Juergen Faelchle - shutterstock)

In the expanding universe of AI Ethics discourse, the three most central concepts discussed are bias, discrimination, and privacy.

Bias is the pre-condition that leads to discrimination, so we can say there are just two. The first piece was about bias and algorithmic discrimination (Statistical bias in context - AI didn't invent quantitative methods of bias). This article is about privacy.

AI does not own these problems. Professional mathematicians, statisticians, social scientists and actuaries have employed detailed models to justify discrimination and exclusion, principally of the poor and racial minorities, for almost 150 years. Attempts to find solutions by crafting ethically-driven organizations are not sufficient.

It is not AI itself that is flawed. It is us. In this essay, we will trace the origins of quantitative methods and privacy. 

There is a tendency to think of privacy as something that doesn't need to be explained. As Sarah Igo wrote in The Known Citizen; A history of Privacy in Modern America: "If we want to understand how Americans in varied contexts and times understood privacy, we need to abandon the notion of it having a static definition." This is a little unnerving. Privacy is a valuable concept derived from the liberal tradition of philosophers from John Locke, John Stuart Mill, to John Rawls. To consider it fungible and context-specific is disturbing.

We can trace a jurisprudential defining of privacy in the US to 1890 to an article, "The Right to Privacy," published in the Harvard Law Review by two young lawyers, Samuel Warren and Louis Brandeis (Brandeis was an associate justice on the Supreme Court of the United States from 1916 to 1939). It defined privacy as, essentially, the right to be left alone. What's significant about this is that before the Civil War, privacy was equated with land owned exclusively by men of means. Women were not endowed with this right. On the contrary, their "privacy" was in the form of seclusion and isolation, among other unsavory practices.

Privacy in the 20th century - technology intervened

Technology intervened at the beginning of the twentieth century. Cameras were first, followed by fingerprinting as the police and an emerging national security apparatus collected information at an alarming rate, especially when directed against minorities, the poor and criminals. During the First World War, the fear of German spies and collaborators accelerated the US surveillance of public and private institutions ‚ aided and abetted by insurance companies, building inspectors, and the credit bureau. By the 1920s, the elite and workers alike were invoking the right to privacy, in court cases, "anti fingerprinting" picket lines and ACLU campaigns and demanding their right to privacy, and the mounting scrutiny from above.

Americans' attention was diverted during the FDR administration by the New Deal, Social Security and the war effort. But my father, who was born in 1913, told me that when Social Security was instituted and identified you by a number, there was some suspicion. FDR promised it would be a "Sacred Secret" between you and the Social Security Administration, obviously, a promise not kept. However, under the circumstances of 1929-1945, people recognized their employers as a more significant threat to their privacy than their government. As Igor writes, "they wanted to conceal their union affiliation, religion, or, for women, their age and marital status."

From the 1920s to the 1940s, two people stand out as the creators of market research‚ Daniel Starch and George Gallup. By the 1940s, a growing corps of professionals pioneered new advertising techniques, marketing, surveys, and motivation research. Consumers were surveyed, students were measured. Psychological testing for personality traits was to pinpoint the "ideal, productive man and screen out the abnormal, the sexually deviant, and the neurotic."

The 1960s and privacy - the Supreme Court weighs in

Privacy finally received legal status in a series of decisions in the 1960s by the Supreme Court, "as a right to be free from government intrusion." In Griswold vs. Connecticut, which was widely interpreted as joining privacy and reproductive rights, Igor points out that it was a right to "marital privacy," designating the martial bed a" timeless pre-constitutional privileged state."

Absent completely was a right to privacy of the individual. In 1965, Lyndon Johnson began the "War on Poverty," with the noble goals of "not only to relieve the symptom of poverty, but to cure it and, above all, to prevent it." The War on Poverty fell far short of those goals. Still, in the process, it "instituted a bureaucratic jumble of treatment, counseling, training, and rehabilitation programs‚" adding unprecedented scrutiny to those enrolled in welfare and other forms of public assistance.

The procession of Supreme Court rulings refined and broadened the definition of privacy. Some significant cases are:

Olmstead v. United States 1928. The Court ruled in favor of wiretapping absent any motivation or reason because the Constitution does not expressly prohibit it (how could it, drafted in 1789 ). Justice Brandeis' wrote the dissent so powerfully that it, not the majority decision, set the stage for subsequent rulings on privacy. 

Skinner v. Oklahoma 1942. In one of many forced sterilization laws by states, Oklahoma called for the sterilization of "habitual criminals," a blatantly racist attempt. The Court struck it down, deciding that people have a fundamental right to choose marriage and procreation, even though no such right is explicitly written Constitution.

Griswold v. Connecticut 1965. As recently as 1965, Connecticut made it a crime to distribute contraceptives and contraceptive information to married couples. The Court clearly saw this as a gross invasion of privacy, especially as it concerned decisions about families and procreation where the government has no business to interfere.  

Roe v. Wade 1972. It may come as a surprise that Roe v. Wade was less about abortion as a central issue as it was a furtherance of the right to privacy. The landmark decision stunned conservatives by deciding that women have a fundamental right to have an abortion. It followed earlier decisions where Supreme Court gradually evolved the doctrine that privacy is protected by the Constitution, particularly when it comes to matters involving children and procreation

The emergence of technology - challenging privacy

Getting closer to our situation today, in the 1970s, a raft of legislation put some guardrails around personal information gathering: the Fair Credit Reporting Act, the Family Educational Rights and the Privacy Act itself. As encouraging as that sounds, the Privacy Act only applied to the public sector, and virtually no restrictions were applied to businesses. No new regulations were applied to the private sector.

The FTC rules in the FCRA allowed consumer reporting agencies to use collected data for decisions about insurance, housing, employment, and credit. It did not, however, prohibit the sale of consumer data for marketing and other purposes. 

Continued Progress in Some Areas

The Supreme Court ruled in 2012 in United States v. Jones that attaching a GPS device to a vehicle without a warrant was a violation of the Fourth Amendment. Notice that this case involves the right of privacy from federal agents. This is a recurring theme - breaches of confidentiality have no jurisdictional clout in the private sector, with a few exceptions.

In Carpenter v. United States, The Supreme Court ruled that Fourth Amendment protections from "unreasonable searches and seizure" apply to cell phone location data. Specifically, police need a warrant to electronically retrace a cell phone owner's steps in a criminal investigation 

In the ruling, Chief Justice John Roberts wrote, "We decline to grant the state unrestricted access to a wireless carrier's database of physical location information." Writing for the majority, citing the "deeply revealing nature of cell-site location information, its depth, breadth, and comprehensive reach, and the inescapable and automatic nature of its collection." Continuing with, "Here, the progress of science has afforded law enforcement a powerful new tool to carry out its important responsibilities. At the same time, this tool risks Government encroachment of the sort the Framers, 'after consulting the lessons of history'" certainly implied a measure of personal privacy, but once again, a limitation on government and law enforcement, but mute on the private sector. 

In current valuation, the value of the personal data holdings on Google, Facebook and Amazon total a staggering $140 Trillion, and data brokers $20 Trillion. If you believe that data has gravity, it's unlikely that this will change soon. GDPR was only able to take root in the EU because, at its inception, the Internet megaliths only had data in Europe 2 or three orders of magnitude less. Today, Google has already fined $50 million, it's just the start. But as you can see from the judicial record above, there has never been a concept business giving consumer privacy.

In the near term, given the Court's composition, it is unlikely we'll see rulings that restrict the activities of non-opt-in data brokers, and the aching desire of AI modelers to gain an advantage by incorporating that data. The simple ethical prescription, whether there is regulation, is simply don't do it.

What about HIPPA?

Your medical data is supposed to be private and secured. What could be a greater invasion of privacy than a complete record of your health and medical issues? However, if you need medical help at a clinic or a hospital, the questionnaire will ask you "waive your HIPPA rights." If you decline, they will show you the door. The assumption is that the more data they have, the better they can serve you and aggregate that data across facilities for research. That sounds compelling, but it isn't mostly true. Healthcare data is stored in silos, meaning each hospital or hospital organization creates its own standards for capturing and modeling data, making it incompatible with other facilities. It's balkanized. 

Data brokers and many other commercial data companies routinely buy, enhance and sell this information while still marginally in compliance with HIPPA, because the data is stripped of individual identity. Most of these data brokers are unregulated. However, they contain identifiable information such as age, gender, partial zip code, and a doctor's name. It is simple for any data scientist to match this data with other publicly available data. Typically, voter registration, real estate transactions or, as Netflix discovered, the IMDB database, can easily be applied to de-anonymize the HIPPA records, defeating the privacy scheme without breaking the law. 

GDPR and beyond

Non-opt-In data and emerging regulations? Where does this leave us? The European Union enacted privacy regulations (GDPR) with stiff fines for violations. Google was fined $50 million, the largest to date, but it is too early to tell if this will substantially impact. In addition, GDPR does not apply to the US, except when those operations land in the EU. California has taken the lead in the US with two laws, California Data Broker Law and the California Consumer Privacy Act (CCPA). The Data Broker Law has is aimed squarely at businesses that sell the personal information of consumers with whom they don't have a direct relationship. The CCPA is too complicated to review here, but a good summary can be found here.

GDPR is mainly concerned with the protection of data and how long an organization can retain it. However, under GDPR, it is explicitly forbidden to use non-opt-in private data. Therefore, consumers should give explicit consent on using their data in marketing efforts. Under GDPR, it is explicitly forbidden to use non-opt-in personal data. Therefore, consumers should give explicit consent on using their data in marketing efforts.

How valuable is your personal data, and how extensive is your diminished privacy due to technology and unwillingness on the part of the US federal government to reign in the major players? In Who Owns Americans' Personal Information and What Is It Worth?  the authors present an alarming table of how just three companies, Google, Facebook, and Amazon, control a staggering $103 trillion in value, 53% of the total estimated market of $197 trillion. 

Bob Dylan wisely opined in the sixties, "Money doesn't talk, it swears." It is virtually unlikely that the federal government would enact legislation eliminating or at least curtailed $200 trillion in private sector assets. What percentage of this data is non-opt-in and covered by GDPR and CPPA I know, but it is likely to be significant and, pruning that data will carry a high cost. Google, Facebook and Amazon have opt-in and opt-out features, but they aren't comprehensive, and all three purchase data from data brokers.

My take

  1. No matter your jurisdiction, cease and desist from any practice that is now frowned upon by the FTC, New York, California and the EU. These requirements will expand, and there is no economic or ethical stance for continuing.
  2. Adherence to #1 will set you on your way to Ethical AI with respect to privacy, and will save you hand-wringing and soul searching about ethics in AI. It isn't your problem. Just follow the lead of regulations whether they currently apply to you or not.
  3. It's a good idea to divorce your models from third-party non-opt-in data brokers. Their data is dirty, and its use creates intrusive models on an individual level, which is not acceptable. It's better to know your client by engaging with him/her.
  4. In the US, there is no universal guidance for the privacy of individuals in the private sector. 

The US federal government is less likely to protect consumers than the EU. While certain states, such as California, New York and Washington, have made limited attempts, the lobbying power of the massive data collectors and commercial enterprises that fund much of the election campaigns dampen the will of the government to enact broader privacy legislation.