Among the concerns expressed about AI this year are that generative and large-language models (LLMs) may be automating bias, racism, and false information. Or scraping copyrighted content, malware, and misinformation, as well as verified, permissioned, and public-domain sources.
But two fears are less obvious. One is that we may be divulging too much personally identifiable information or data (PII or PID) to the likes of OpenAI, Google, Stability AI, and the rest, when we use their tools. It’s an important issue: a Bloomberg Intelligence survey last week found that 60% of 16-34-year-olds already trust ChatGPT queries more than Google searches.
The other is that by prompting the same handful of popular AIs with me-too concepts, we are already beginning to lose sight of ourselves: of our humanity, skills, experience, and capacity for original thought – and original work.
But guess what? There are AIs for those things too. One is from a start-up called Private AI, which aims to stop PID haemorrhaging out of organizations, by redacting, de-identifying, or replacing personal data with synthetic alternatives. A persuasive idea (more on that later).
And the other is the flipside of that coin.
A new venture wants to link artificial intelligence ever more closely with you. In the long run, it aims to capture the essence of your identity and voice in code: a virtual ‘you’ to send out into the world as your agent or avatar. One day, AIs like it may be all that is left of the real person you once were: a sobering thought.
Suman Kanuganti is co-founder and CEO of New York-headquartered Personal AI, which claims to help users “retain, reinforce, and recall human memory”. The start-up also specializes in capturing and transcribing audio recordings to “deliver the right information from the client's virtual memory bank”.
But what does all that mean? Speaking from his home in San Diego, California, Kanuganti explains the inspiration behind an intriguing idea:
My mentor – my investor and co-founder at my previous company [Aira], the late Larry Bock – created 47 different companies, and was also blind. I learned so much from that inspiring man, but within in just 18 months I lost him to cancer.
So, my mantra became, ‘What would Larry do?’ I was constantly stepping into his shoes. But over a period of time, those memories began to fade, to the point where the genesis was, ‘I wish I had a Larry AI, so I could still have conversations with him’. Not just from an emotional perspective, but also an intellectual one.’
This moving, personal revelation led to the kernel of an idea that recalls science fiction movies such as Marjorie Prime, or series such as Black Mirror: can technology capture someone’s essence? Their language, voice, knowledge, thought processes, experiences, and unique perspective on the world? A concept that seems both inevitable and unsettling: a merging of machine and human, thanks to reams and reams of data.
What if we each had a personal AI that was trained on the entirety of our individual knowledge, on our style, wisdom, facts, and opinions? One that could not only augment us as individual humans, but also elevate our communications with the people around us.
For people like me, I could still have a conversation with ‘Larry’. At the same time, my colleagues or team members could feel like they are communicating with me. But in reality, it would be my personal AI, augmenting whatever I want to say. Because if I said it once, I don't need to say it again myself, right?
That extraordinary last statement is the essence of Personal AI. But what actually is it? He explains:
It’s a small model, rather than a large [language] model, a 20-million-parameter model in contrast to a 70-billion-parameter one. A model that understands you – your facts, opinions, and thought process. One that constructs outputs or responses as if they are coming, uniquely, from you. Or from me. We would each have our own.
In its early stages, therefore, Personal AI is essentially a time-saving personal chatbot that learns its subject’s or owner’s ‘voice’, knowledge, and communication style – a boon for anyone fielding more queries a day than they can deal with themselves, perhaps.
A smart idea and a utility, then; yet also, potentially, a legal, security, and privacy minefield.
Consider this. Many of us use the same devices for work and for private, family, and relationship conversations. What if your unique AI confuses your intimate, personal, and business communications? What if it offends people using your real voice, or divulges confidantes’ secrets? And who would be liable for those problems? (Could you sue your virtual self?)
More, what if your personal AI were to be hacked, then used to spam people or commit crimes – as you? And with this technology, what’s to stop anyone scraping your content from public-domain sources and creating a false ‘you’ that could communicate using your voice, and your accumulated knowledge? A new form of deep fake that is, in a sense, also authentic.
Suddenly it doesn’t sound like such an enticing prospect. To his credit, Kanuganti can see the risks of his early-stage venture.
Yes, we should all worry about things like that. It's natural for every consumer to be wary of how to leverage a technology. There will be misuse, just like with any other technology. But I think from my position as the person developing and pushing it, the best thing is to create all the guardrails and controls, and enable people to make their own choices.
You do have control. Whether you want your AI to automatically send messages, or if you prefer to review and control them yourself first, then you have that choice.
Personal AI also offers an accuracy/confidence score in its responses, giving recipients some insight into a message’s trustworthiness.
But these questions aside, does Kanuganti have grander ambitions in the long run? Does he foresee personal chatbots, over time, evolving to become full recreations of the humans they model and learn from? A form of eternal life in the cloud, perhaps? He says:
Yes. But the goal is to make it look simple. Often, we get too enamoured with technologies, we get scary with them, right? So, how can we keep this comfort, this idea of our own, personal AI, but at the same time, offer a level of control, trust, simplicity, and utility that we can integrate into our day?
But talking of that future vision, I could get there, and I could get other people there too. But not everybody is comfortable with getting there. Yet at the same time, I think everybody deserves to have their own data working in their favour, within their own AI. Rather than just spitting out these large-language models, which are an aggregation of humanity into a single intelligence.
I see a better, alternative path. Not one artificial general intelligence, but billions of artificial personal intelligences. That’s how I believe the history of humanity will be told 100 years from now, from two very different perspectives. One by individuals, and the other by a collective.
A familiar sci-fi concept. Perhaps each of us should copyright ourselves in the meantime?
Integrating privacy into the data process
But another company wants to prevent people from just giving themselves away, as it were; to keep personal data from being violated, stolen, or lifted by accident when we engage with generative AIs. Patricia Thaine is co-founder and CEO of Toronto, Canada-based Private AI. Was there a ‘eureka’ moment for creating the company? She says:
There were several, for integrating privacy into the data process, and for what exactly to tackle.
When I was doing research on acoustic forensics – who's speaking, and recording what kind of educational background they have – such information can be really useful to enhance automatic speech recognition systems. But there are a couple of problems. One, if you do get access to that data, there's a major privacy risk. And two, because of that, you often can't get access to the data. So, both sides of that coin prevent innovation from happening.
Put another way, what allows us to unlock 80 to 90% of the data that companies have is AI. But what prevents us tends to be privacy.
So, I started looking at privacy-enhancing technologies and mixing that with natural-language, and spoken-language, processing. Because the natural language we produce contains some of the most sensitive data. I’m not just talking about credit card information; we may also be sharing insights about our health without realizing it, and so on.
For example, there's research taking place on whether you can tell if somebody has a degenerative neurological disease, based on how they're speaking. So, there's a lot of information that can be tied back to an individual – including information that they don't know they're producing.
In our recent interview with IBM’s Calvin Lawrence, author of an important book on AI and racism, he noted that accent and choice of words are sometimes used to racially profile consumers without their knowledge too, pushing some black Americans to ‘try to sound white’ on the phone to prevent being excluded from services.
Unsurprisingly, Thaine is a supporter of data privacy regulations like Europe’s GDPR, the core ideas within which are being more widely adopted. But there are issues with compliance, she explains.
The GDPR, for example, requires data minimization as a core to business practices. And that means removing personal information you don't need. But it’s hard to remove it if you don't know what you have in the first place.
So, a better solution is to strip it out at source, she believes, before it enters databases, cloud-based systems, and LLMs. Private AI automatically redacts, de-identifies and/or anonymizes PII or PID (in 49 different languages, at present). Where some form of PID is needed in training data, Private AI can replace it with synthetic data to prevent compromising customer privacy. And it can redact PDFs, images, and audio.
Putting yourself at risk
All welcome, logical ideas. But from whom or what do citizens and customers need to be protected in this AI summer? Is the suggestion that the likes of OpenAI, Microsoft, Google, and others, are actively using our data to train LLMs, or have accumulated vast amounts of it by scraping the net?
When you’re engaging with any AI company [in a prompt or chat], you're sending your data to a third party, which is storing it for a certain amount of time.
They might, reasonably, look at it to check for misuse and abuse – for example, someone asking how to make a bomb, they would want to prevent that kind of query. So, AI companies have measures in place to block answers to certain types of prompt. But of course, they haven't thought of everything.
So, the goal in that type of data collection is being able to view anything that might be suspicious, and also to further train their models. Some have made that an opt-in process. However, you can opt for them to not collect and store your data, but then you don't have conversation history.
But regardless of the reason for AIs looking at personal data and storing it, you're still putting yourself or your users at risk by sharing it with a third party. There have been data leaks and hacks, and examples of chatbots spewing out personal or offensive information. [In some cases, this suggests developers have illegally scraped private conversations.]
So, using our technology, any such data gets removed or replaced with synthetic data before it is sent. But the conversations are stored in your browser, so we can re-integrate personal data back into the response.
Of course, Private AI is itself an AI and machine-learning company – one that needs our trust and safeguards to use it, as with any other provider. So, why does its solution also need to be powered by AI? Thaine says:
It’s important, because there are a lot of disfluencies in language, and a lot of unexpected or fragmented information [in conversations]. That might be around something as simple as a credit card number. You might say, ‘Hey, I'm calling to replace the product I bought last week, on card ending 4590. No, sorry, that's not a five, that's a three.’
So, you have to be able to recognize that type of unstructured information. And the only way to do that properly is with AI, because it can capture context.
So, why aren't the big AI companies doing all this themselves? Is the implication that it’s not in their best interests to stop private data from being collected? Thaine says:
They've dipped their toes in the water. But this needs to be the core competency of a company to be done well.
On my recent trip to the US, news channels were running stories about a new type of scam: criminals using the words and AI-sampled voices of people’s friends or relatives to make fake emergency calls to them, asking for money. So, be mindful of what you share online.
This is the world we are already living in: a world of deep fakes and data-scraping on a vast scale, for both good and ill. In such a world, AI could be an authentic ‘you’, a virtual self that stores your thoughts and personality for the future. Or it could protect you, from all the other AIs. The choice is yours. Good luck!