Not too big, not too small, but just right - size does matter when it comes to gen AI language models

Martin Banks Profile picture for user mbanks March 21, 2024
SLMs are likely to be what businesses really need for AI - discuss!

big and small

Are Large Language Models (LLMs) too large for most business applications, and too slow and too costly as well? Are they also actually not the right tool for what many business users think they might want to achieve?

There are already suspicions that the past year's emergence of LLMs has created the latest spate of rushed updates of existing applications - just as happened when `big data' was the `buzz-phrase de jour' or when every application rapidly became `cloud capable'. There are already signs that there are many such applications which have had ChatGPT nailed on, a quick polish administered, and out it goes as the latest ‘AI Version'. There is likely to be a fair amount of disappointment ahead for many customers.

As highlighted by Maria Markstedter, CEO and founder of Azeria Labs in my piece about the potential for insecurity that currently exists in LLMs, the combination of AI and automation is likely to be one the sharpest double-edged swords IT has so far had to deal with. Working together they will certainly produce significant benefits and introduce entirely new ways of doing things. On the other hand, it will be possible for bad things to happen on a grand scale because LLMs plus automation could create unstoppable damage to be done. 

LLMs are ideal for those tasks that are both very big and very complex, and those where getting the best possible result is more important than when an answer appears, or how much it costs. Tasks such as research into new genetics-based medical developments are perhaps the classic applications, where such models are proving to be stunningly successful. But Chat GPT chatbots are something of an overkill for a real time customer support voice interface.

But SLMs, new ranges of ‘lite' implementations of the leading LLMs, are now appearing from Microsoft, Meta, and  now Google with Gemma, the lite version of Gemini. For the majority of business users these are likely to be the tools they will eventually exploit. The two common sizes now are two billion and seven billion parameters, with specific reasons existing for those two sizes.  

A seven billion parameter model fits snugly in a node running a single high-end GPU with maximum memory and therefore offers the best compromise in terms of performance, capabilities and cost. Any more than seven billion parameters and a second GPU node has to be spun up. A two billion parameter model can be run on any high-end PC equivalent with an integral GPU, such as those described in the new Intel reference designs for high-end PCs. These are also model sizes that are expected to actually fit the majority of business process related tasks most companies will require most of the time. 

I sought out a couple of specialist outfits focussing on implementing generative AI models for customers to gather opinions on what was best-suited to the real AI needs of businesses when it comes to performance, capabilities, suitability, cost and, of course, security.

Domino Lab

First up was Domino Data Lab and its Head of AI Strategy, Kjell Carlsson, who may be known to some through his previous job as Head of AI coverage at analyst company, Forrester Research. The company is a specialist in Machine Learning Operations (MLOps), currently focussing its attention on supporting the data scientists of major companies in researching and developing new technologies, products and services in sectors such as life sciences, insurance, retail and manufacturing. As he pitches it:  

Domino is like the ring of Sauron, the one the one ring to unify them all. In this case, it’s the data science platform that unifies all of the different parts of the data science ecosystem. So if you want to use open source tools, if you use proprietary ones, that's great. All organizations have this broad range of different data scientists who are working on different projects using different data with different needs and different training. But you all want them to collaborate, you'll want them to be able to share what they're doing, you want that to be governed so that everybody knows what steps you've taken.

This puts it out on the bleeding edge of working with the largest of LLMs and the issues that they come with, not least of which for many user companies is the fact that, while they have been analyzing data for many years this is the first time they have moved into generating it – and many are only just realising they need to understand what they are going to do with it all. 

And then there's that minor detail of actually operationalizing it, says Carlsson: 

It’s really challenging. The models are giant and expensive, and they hallucinate. And there are data privacy issues. I, as a proponent of these technologies, think it's wonderful. At the same time I despair because looking out six months to a year, there's going to be the full scale Trough of Disillusionment. I speak with companies that are doing incredible things with it, like the bio-pharma companies that are doing really cool stuff with synthetic proteins. It's just not this panacea Silver Bullet whereby it can do anything and everything.”

He does see the introduction of SLMs such as Gemma as being where proper AI usage starts for the majority of business users because they are easier to train, easier to use and a great deal less expensive that LLMs. The difference between the two and seven billion parameter SLMs and Google’s Gemini LLM at 1.5 Trillion parameters is broadly mirrored in the training requirements, the time to obtain a result and the cost of operations, he suggests: 

LLMs are useful for things like if you are conducting research, if you're a biologist and are looking to go in and summarise research papers, or go in and use this on molecular data to suggest some sort of new molecule, sure, you don't care.

But  Carlsson admits that he knows of companies that are using ChatGPT for customer service automation, but argues that it is just not fast enough for that type of application. What is more, the cost is such that all the signs suggest that its operational costs are greater than using humans, though user companies are not yet willing to admit that:

We are slowly seeing is that folks are also training more task- and domain-specific models. So, there are models specifically trained on legal contracts. And so they're optimized for lawyers.

This looks like the classic precursor step to the development of a wide range of specialist third party  model building service providers, each with one or two market sectors where they can effectively resell the training and operations expertise they have developed. He sees the potential for the development such a business model, but not yet a while:

It's mostly still coming out from the tech giants themselves or from startups who are going in and creating their own specialised models, only because I think that broader market has not the talent yet and the data capabilities, or have amassed the data corpuses to go in and create their own fine-tuned models.

The more specific nature of the targeted domain and context of the SLMs obviously limits the scope of any model, which in turn allows for greater control over the data sources worked with and the type of data selected for use. This reduces the scope for spurious data and/or malicious code to find an entry point into a model, so model security has a better starting point as it is bounded by the terms of the answer being sought.  For example, a chatbot handling customer service complaints can be run on an LLM, but in Carlsson’s view it is more efficient and accurate to pull that out and have a different traditional machine learning model go in and figure that out.

They should also be less prone to hallucinations if the work is put into fine tuning a model to a specific task, which then rolls back to having the talent available for the task, which in turn rolls back to the need for, and growth of, specialist third party service providers. He acknowledges that hallucinations are still a major issue, no matter how hard they try to control a model:

You can try and force it to be as consistent as you like. You can give it exactly the same things and lo and behold, they'll still give you something different. And we don't exactly know why. So there's always that long tail whereby there's some risk of your model going and hallucinating something really wacky.

Interestingly, he noted that SLMs are likely to be more prone to hallucinations than LLMs because there is more chance that the users will not put sufficient effort into the fine tuning, either through lack of available talent or just the cost of doing it. Most of the dedicated LLM users have committed to the time and cost such work requires.

Data poisoning of open source data sources is certainly possible, much in the same way that right now a reasonably effective way of going in and introducing a vulnerability into software is go in and post vulnerable code on Stack Overflow. Another entry point is likely to be the data collected by scraper services, which could be one of the core data sources that businesses will use for training and the everyday real time data updates many businesses live on. These could be a good target for making pre-poisoned malicious – or just miss-leading – data available. Carlsson does see a defence, however:

You can use LLMs  on the other side as well, to go in and help sanitise your open source code repositories, to go in and scan for those. So there's an arms race that you can have between leveraging those MLMs for good and evil, and you should be already having processes that are going in and vetting your code after it's created. And you can use large language models there in order to help with that.

The second outfit I spoke to,, has been in the AI modelling sector for some eight years, longer than many would have guessed such a marketplace has existed. Its goal from the beginning was to provide a comprehensive suite for researchers to be able to stay on top of state of the art, extracting the right knowledge at the right time. Its main platform now is Researcher Workspace, where researchers can search for knowledge, ask questions, and have the data as well. It can aggregate knowledge from different sources as well as summarise information and extract important facts. 

According to the company’s CTO, Victor Botev, the focus has always been on factual correctness, and it moved into open source models as they came available, training them specifically for its particular use case. But that is a step that can come with security issues for the unready:

Our clients are heavy R&D organizations that don't want to share anything to the outside world. So for them, even giving their questions to ChatGPT is risky: a way that some people might actually gain insights on what they're doing and so forth. If you have used the free versions, most probably your data will be there. But then, if you have the full control over the system, it comes to question about costs. If you want to deploy your own GPT4 model, then you need a big pile of money.

In addition to its work with major research organizations, it has also built up a significant use case for business users, providing competitive analysis, especially in areas such as product design and use of the latest technologies. This does involve the company in an interesting new element when it comes to sources of information: Magazines and the wider area of published data are an obvious source of the what/and how a client’s competitors are doing, but the company takes the view it has to ensure that the client has all relevant permissions and licences to use that data. Botev says: 

If you use this data directly for training, then it can have serious impacts on the final results. If you use this data to search, get the result and put it into the context of the model, it's not that harmful, though still can be. And then the third protection is so our machine is able to actually evaluate or put a ranking on the sources from which the information comes in. There also exist human rankings of these resources, so you're able to use those to some extent. So if there is an injected malicious content, it is very likely that when we rank over the objective criteria that are specified in the query, this will rank lower. I'm not saying that they will not creep in there in the results, but at least they won't be the top results.

There is also the old technology step of human intervention, where the source data can be examined by those with knowledge of the specific field. But users then need to know in advance that the model they are using will allow you access to the real reference. Some charge extra for such access, and some do not allow it, but without it, verification of source data may be very difficult:

For the closed models, it's really difficult to trace where the errors are, which makes it difficult to flag this misinformation.

It was Botev that made the observation above about the importance of working with either 2 or billion parameter models and he noted that they work well, are cheaper to train, and they are cheaper to execute. So that, in his opinion, makes them good for business applications. He also has noted that models from different sources are suited to different tasks, depending on how they were trained by the providers:

So what we can see, for example, that the Google models are more heavily trained on mathematical data, or mathematical structure. So they perform better if you ask them about tasks related to mathematics. The Llama models are more trained on social data, which means that they're able to follow conversations better.

My take

With the headlong rush to incorporate Large Language Model Generative AI almost everywhere a significant 'Trough of Disillusionment' is expected in the not too distant future. Fortunately,  Smaller Language Model systems are now appearing that may make far more sense for the majority of business users, not least because they fit into convenient sizes of system SKUs cloud service providers offer. They are faster, easier to train and far cheaper to use, and look like they will also overcome, or circumvent, many of the security threats that currently hover over the use of LLMs.


A grey colored placeholder image