GPT-3 demystified - spooky-good AI or overrated text generator?

Neil Raden Profile picture for user Neil Raden January 20, 2022
Summary:
OpenAI's GPT-3 AI-based text generator is one of the most hyped AI developments in recent years. But does it really understand language? Are AI advancements, such as the upcoming GPT-4, really about expanding the computing parameters, or are their better criteria?

time-for-review

GPT-3's ability to write code, letters even novels, often with the input of one or a few words, is downright spooky. But there are two things you should know:

1, It has no idea of the meaning of what you say (type).

2. It has no idea of the meaning of what it outputs.

It's all math, trained neural networks. 

Before we dig into GPT-3, a quick review of NLP is in order:

By now, everyone is familiar with conversational NLP like Siri, Alexa, or Cortana. This is Natural Language Processing. 

There are a few sub-disciplines in NLP, such as:

  • Optical Character Recognition - Converting written or printed text into data.
  • Speech Recognition - Converting spoken words into data or commands to be followed.
  • Machine Translation - Converting your spoken or written language into another person's language and vice versa.
  • Natural Language Generation - The machine producing meaningful speech in your language.
  • Sentiment Analysis - Determining the emotions expressed by language.

For NLP-enhanced business analytics, the conversation may be, "Download the latest pricing analysis to my phone." The critical thing to remember is that the computer does not understand what you are saying. It can process and answer but make no mistake --   it's all done with math. 

Organizations that offer NLP capabilities don't have to start from scratch. There are open-source Python libraries that software can integrate with, such as spaCy, textacy, or neuralcrret, and a few in other languages such as CoreNLP in Java. John Snow Labs developed and maintained an open-source NLP library, Spark NLP. 

The steps a natural language processor goes through to satisfy your question:

  1. Sentence segmentation, break the words apart.
  2. Word Tokenization: words = tokens.
  3. Predict the part of speech for each token. Feed the token with some surrounding tokens for context into a trained part of speech classifier.
  4. Text Lemmatization: know the base form of every word and its inflections; finding the most basic form of every word.
  5. Identify "stop" words (such as a, an, the, …) and filter them out.
  6. Dependency parsing.
  7. Find noun phrases: groups of words that talk about the same things.
  8. NER (Named Entity Recognition): Detect and label nouns to real-world concepts. Names of people, companies, geolocation, dates and times, amounts of money, names of events, etc.
  9. Coreference resolution: attach meaning to words like pronouns, or it.

The above steps are employed to understand your written, typed, spoken, or even machine-generated request. The underlying implementation of the technology is machine learning, typically various kinds of neural networks. 

A little background on NLP models

Google's developed the BERT model that was trained with 340 million parameters from millions of books and Wikipedia. It was designed to handle straightforward Question and Answer queries. The accuracy of the model was pretty good. Microsoft and Facebook together developed BERT-based models, RoBERTa and codeBERT. The industry concluded that larger natural language models improved accuracy. Microsoft, on their own developed Project Turing, releasing a product called the Turing Natural Language Generation (T-NLG), at 17 billion parameters, which was the most massive model ever trained in early 2020.

About six months later, OpenAI debuted their beta version of GPT-3, based on an unparalleled 175 billion parameters. To put that in perspective, 175 billion is almost all of the internet.

For any of you, like me, who have always found SQL to be tedious and nuanced (I never get it right on the first try), GPT-3 can accept as input "Display the total number of employees in the HR department," and instantly construct a well-formed SQL statement, "Select count(*) from Emp where dept_id = ‘HR'.

Don't have time to write an email? GPT-3 can help:

"Compose an email from Neil to Mark at abchotels to make a reservation for a suite at Menlo Park location for three nights with my usual preferences"

To Mark at abc hotels, from Neil Raden, subject: request for reservation at Menlo Park location.

Greetings Mark. I am writing to request a reservation for a suite at your Menlo Park location for three nights. I am currently in the Bay Area on business and would like to make the reservation for April 1, 2 and 3. I would also like to request a room with a view of the Golden Gate Bridge (OK, I made this up. You can't see the GGB from Menlo Park, I don't think). Please let me know if you have availability, and I will be happy to provide my credit card information over the phone. Thank you.

The training data for GPT-3 are controversial. A great deal of it is available openly, such as Twitter "tweets," a repository notorious for its violent, racist and misogynist language. Though the model performs to a greater extent, researchers fear it can heavily threaten disinformation, where bad actors can use it to create an endless amount of fake news, spread misinformation etc. 

Here is the tweet by the Sam Altman, the CEO of OpenAI, creators of GPT-3:

My take

OpenAI is currently training a GPT-4, rumored to have ONE HUNDRED TRILLION parameters.

This approach has its critics. Stuart Russell, a computer science professor at Berkeley and AI pioneer, argues that "focusing on raw computing power misses the point entirely […] We don't know how to make a machine really intelligent - even if it were the size of the universe."

There is another element: GPT-3 costs around $4.6 million in computing. That would put a price of $8.6 billion for the computer to train GPT-4. There is some pushback that these monstrous models are out of control.

There is another issue, too. Sam Altmman believes that each iteration of GPT will get closer to the inevitable AGI (Artificial General Intelligence), but there is as credible fallacy. "Why AI is harder than we think" - that's the title of a recent paper by Melanie Mitchell at the Santa Fe Institute. Her contention is that the prevailing attitude, and definitely one at OpenAI, is that narrow intelligence is on a continuum with general intelligence. Mitchell, however, argues that advances in narrow AI aren't "first steps" toward AGI (Artificial General Intelligence) because they still lack common-sense knowledge. 

The implication is that the path to truly thinking machines is not through ever-more-enormous computers, but better theories leading to better, more economical algorithms. That fits perfectly with my training in topology, where I had a professor who would not accept a proof longer than two pages.

Image credit - Woman using tablet pc, pressing on virtual screen and selecting time for review. © WrightStudio - Fotolia.com.

A grey colored placeholder image