It’s not like I came in on January 1st and said, ‘Let’s start doing press releases’, but it does feel like that.
It does indeed, Satya Nadella, it does indeed. That was the candid confession from the Microsoft CEO about ChatGPT during his keynote address to the firm’s Build developer conference, an event that should leave no-one in any doubt that this is now an AI company.
While ChatGPT has been a seemingly ‘overnight’ phenomenon - hence Nadella’s quip about January press releases! - it’s been a much longer play, he insisted, with some clear precedents:
To just sort of put this in perspective, in fact last summer I was reading Mitchell Waldrop’s Dream Machine, while I was playing with DV3, as GPT-4 was called then, DaVinci 3, and it just brought into perspective what this is all about.
I think that concept of Dream Machine perhaps best communicates what we have really been doing over the last 70 years, right, all the way starting with what Vannevar Bush wrote in his most seminal paper, "As We May Think," where he had all these concepts like associative memory, or Licklider, who was the first one who even conceptualized the human computer symbiosis, the mother of all demos that came in ‘68 to the Xerox Alto, and then, of course, the PDC that I attended, which was the PC Server 1 in ’91.
In ‘93 is when we had the Mosaic moment, and there was the iPhone and the cloud, and all of these will be one continuous journey. And then, in fact, the other thing I’ve always loved is [Apple co-founder Steve] Jobs’ description of computers as bicycles for the mind’. It’s sort of a beautiful metaphor, and I think it captures the essence of what computing is. And then, last November, we got an upgrade, right? We went from the bicycle to the steam engine with the launch of ChatGPT. It was like the Mosaic moment for this generation of the AI platform.
Full steam ahead
With that romp down the by-ways of history out the way, it’s full speed ahead with AI, according to Microsoft CTO Kevin Scott:
There’s an incredible amount of attention being paid right now to what’s happening with the rapid progress with these AI models, these foundation models as we’re calling them now, and in particular, the rapid pace of innovation that’s being driven by OpenAI in their partnership with Microsoft. We really are setting the pace of innovation in the field of AI right now.
I think even for us, it’s been surprising to see how much of the zeitgeist is being captured by things like ChatGPT, and applications that people are building on top of these large foundation models. The reason that this partnership between OpenAI and Microsoft has been so successful is that we really do have an end-to-end platform for building AI applications.
We build the world’s most powerful supercomputers, we have the world’s most capable foundation models, either hosted that we built ourselves and make available to you all via API, or open source, which run great on Azure. We also have the world’s best AI developer infrastructure. So whether that is using these super powerful computers to train your models from scratch or to build these applications that we’re going to be talking about at Build this year, on top of that infrastructure, like, we have that end-to-end platform.
The story so far
As to how we got here, that was left to Greg Brockman, President and co-founder of OpenAI, to talk about his experiences building ChatGPT and GPT-4:
ChatGPT was a really interesting process, both from a infrastructure perspective and ML [machine learning] perspective. We’d actually been working on the idea of having a chat system for a number of years. We’d even demoed at Build an early version called Web GPT, and it was cool. It was a fun demo. We had a couple hundred contractors, literally people we had to pay to use this system, and they were like, ‘Eh, it’s kind of useful, it can kind of help with coding tasks’.
But for me, the moment that really clicked was when we had GPT-4, and that we had a traditional process GPT-3, where we’d just deployed the base model, so we had just pre-trained it, but we hadn’t really tuned it in any direction, and that was in the API. For 3.5, we’d actually gotten to the point where we’re doing instruction following, where we had contractors who were given, ‘Here’s an instruction and here’s how you’re supposed to complete it’.
That training was done on GPT-4 and threw up some interesting conclusions, said Brockman:
As a little experiment I was like, ‘Well, what happens if you follow up with a second instruction after it already generated something?’. And the model replied with a perfectly good response that incorporated everything from before then. And so you realize that this model was capable enough. It had really generalized this idea of, ‘Well, if you really want me to follow instructions, and you give me a new instruction, maybe you really want me to have a conversation?’.
And so for me, that was the moment that it clicked that, ‘OK, we have this infrastructure that’s already in place with the earlier model’, and this new model, even just using this sort of technology that wasn’t meant for chat, it wants to chat, like it’s going to work. This was a real ‘Aha!' moment. From there, we just were just like, ‘We’ve got to get this thing out. It’s going to work’.
As for GPT-4, that was very much a labor of love, according to Brockman:
As a company, we had actually, after GPT-3, multiple attempts to surpass the performance of that model. It’s not an easy thing. And what we ended up doing was going back to the drawing board, rebuilding our entire infrastructure. A lot of the approach we took was to get every detail right. I’m sure that there are still bugs, and I’m sure there are still more details to be found, but you know, an analogy from…one of the leads on the project, was that it’s almost like building a rocket, where you want all the engineering tolerances to be incredibly tiny.
The thing to me that is interesting is we’re almost on a bit of a TikTok cycle, where you come up with an innovation and then you really push it. With GPT-4, we’re in that early stage of really pushing it. We have vision capabilities that have been announced, but we’re still productionizing (sic). And I think that will change how these systems work and how they feel, and applications that can be built on top of them.
If you also look back at the history of it, over the past couple of years, I think we did like a 70% price reduction two years ago, whereas basically this past year we did a 90% cost reduction, like a 10x cost drop, and that’s crazy, right? I think we’re going to be able to do the same thing repeatedly with new models. GPT-4 right now, it’s expensive and it’s not fully available, but that’s one of the things that I think will change.
With generative AI and Microsoft, there hasn’t been the pivotal turnaround moment akin to Bill Gates Pauline conversion to the internet 25 years ago, which resulted in his famous email to employees to refocus all efforts on the internet in every part of the business overnight. But almost every announcement coming out of Build this week has been about AI. The highest profile was possibly the rollout of live search results from Bing to ChatGPT, meaning that whereas answers have been limited to information up to 2021, now users will be able to get more up to date answers from across the web. But there was an AI thread running through just about everything, for better or worse. The future lies this way.