Funding an Artificial Intelligence (AI) project in an organization requires understanding the company's process. Implementing an AI project in production requires understanding how it will interact with other systems, especially those AI applications that present emergent properties as they “learn.”
AI has been introduced in every business. Companies are exposed to new risks, such as bias in the AI application, fear of job loss due to automation, privacy violations, and discrimination. Many opinions about these risks centered on ethical issues, but there are many causes of problems with AI development that aren't ethical issues but can cause them:
- Data: Machine Learning (ML) isn't developed in Excel. The volume of data needed for an ML model is vastly more than a human can examine for errors or faults. Data quality tools are helpful to a point but only for one data source at a time. Merging tables creates hidden problems that even current data management tools sometimes need to spot.
- ML and even deep learning can cause unpredictable errors when facing situations that differ from the training data. This is because such systems are susceptible to "shortcutlLearning," statistical associations in the training data allow the model to produce correct answers for the wrong reasons. Machine Learning, neural nets and deep learning do not learn the concepts; instead, they learn shortcuts to connect answers to the training set.
- Adversarial perturbations: Adversarial attacks involve generating slightly perturbed versions of the input data that fool the classifier (i.e., change its output) but stay almost imperceptible to the human eye.
- Immutability: Great care must be taken to ensure the model cannot be tampered with.
Other elements can override the ethical process, such as senior management and the work environment, e.g., some pressures come into play:
- When an organization pressures development that may not seem ethical
- You adopt an "it's only the math" excuse or "that's how we do it.
- You engage in fairwashing: concocting misleading excuses for the results
- You don't know that you're doing these things
- The whole process is complicated and opaque in operation
- The organization is not used to introspection before you embark on a solution
- There is an "aching desire" to do something cool that obscures your judgment
Four fundamental tensions:
- Deploying models for customer efficiency versus preserving their privacy
- Significantly improving the precision of predictions versus maintaining fairness and non-discrimination
- Pressing the boundaries of personalization versus supporting community and citizenship
- Using automation to make life more convenient versus de-humanizing interactions
Often overlooked are those undesirable effects of AI that do not directly involve people: those that promote, excuse or damage the environment; those that cause loss to property; and those that, when embedded, cause breakdowns in automated processes. They may not be considered unethical, but they are just as dangerous to a company's brand or its ability to fulfill its commitments in a supply chain.
The good news is that some influential actors in the AI Ethics field have understood and have recently proceeded with one or two things (or both)
- Recruited staff with corporate development skills, preferably with AI, to act as traditional consultants to attack a problem with professionalism from a selection of a problem: data governance and the craft of building and testing a model for, among other things, credible results and fairness.
A mature ethical AI practice operationalizes its principles or values through responsible product development and deployment — uniting disciplines such as product management, data science, engineering, privacy, legal, user research, design, and accessibility — to mitigate the potential harms and maximize the social benefits of AI.
Some comments on LinkedIn about the latter article;
In-app controls like marking specific fields as zero/first party and thus eligible for LLM training = great idea.
GPT3 took 1.287-gigawatt hours or about as much electricity to power 120 U.S. homes for a year, and 700,000 liters of clean freshwater
One thing I wanted to pick out is your point about companies using "zero-party or first-party data," which I couldn't agree with more. This is one of the reasons that I've spent the last few years focused on building solutions based on self-sovereign identity principles for data exchange.
This is not only a great suggestion, "zero-party or first-party data," it is perhaps the most frequent mistake AI developers and data scientists make, searching for external data to train their models.
Data brokers are individuals or companies who specialize in the collection of data. They scoop up whatever data they can get their hands on. They could compile all publicly available information that they can get their hands on. One might assume that their medical records are safe. Still, data brokers can sidestep medical privacy laws since the files are only designed to contain certain identifiable information. Medical companies sell massive amounts of data that are supposed to be anonymized, but with enough data, time, and processing power, it is easier to de-anonymize information.
In the book Big Data and Privacy Rights, by M.M. Eboch, a story is shared of a man named Chris Whong who requested information about taxis in New York via the Freedom of Information Act. The data was anonymized, but it took the hackers two hours to de-anonymize taxi information containing only pickup, drop-off points, and taxi numbers. The hackers did it to prove a point, but medical information is significantly more valuable when attributed to individuals.
It is a known and proven fact that in any industry that deals with privacy data, companies still sell users' data because it is far more valuable than the benefit they get from the actual user themselves, so they will keep selling users' data as long as it is legal. The process of re-identifying data can be costly and resource intensive. However, experts still say that de-identification can be beneficial if the entities holding the information are committed to protecting their data and make no attempts to re-identify.
It's a reasonable point if the person attempting to re-identify or de-anonymize the information still needs to have a body of attributed data. The process becomes easier as the data collector has more data available to them. The second issue with this argument is that it assumes a profit motive for the person attempting to re-identify the data, which may cover a majority of data collectors, such as Cambridge Analytica. However, it omits entities like foreign governments and agents acting on their behalf.
A company called Epsilon denied the Congressional Committee's request for information by saying that the data was the core of their business and would harm their business to release such information. This information contained many diagnoses of lifelong medical conditions, the bedrock of a company worth $4.4B. Epsilon is not the only company selling medical information; Optum, a subsidiary of UnitedHealth Group, has medical information for approximately 150 million Americans. For perspective, that is just under half of the U.S. citizens. The company collected and paired information regarding medical treatments and costs with socioeconomic data, including information about a person's insurance coverages, living situation, employment and education level. This information is highly susceptible to abuse if misused. However, data can help identify abuse and discrimination in a company, but as demonstrated by Epsilon, getting that data can be tricky.
It is rather convenient that most people carry a GPS tracking device in their pockets. Cell phone carriers allowed third parties to locate cell phones via their network. In an article by Colin Lecher, Sprint, T-Mobile, and AT&T pledge again to close data access after location-tracking scandal. He tells how he paid a bounty hunter as little as $300 to locate a cell number. All he had to provide the bounty hunter was a cellphone number and wait a few minutes, and he had the real-time location.
The cell phone carriers claimed it was data misuse, but it is believed the mobile carriers were selling the information to third parties. It is worth mentioning that the mobile carriers terminated most of the services access, though they have yet to give an exact number, minus the ones necessary for emergency services.
It is encouraging that discussion has turned to risk in AI, especially the risks of GenAI developed in house.