Hello all
I hope this post finds you in good health and spirit. In this post we will talk about Small Language Model (SLM). Never heard of them? No problem, let’s start from beginning. 🙂
ChatGPT is talk of town ever since it was introduced. It crossed the million-user mark in five days of release and within two months, it skyrocketed to 100 million active users. Just to compare, TikTok took nine months to reach 100 million monthly users, and Instagram about 2.5 years. As ChatGPT made headlines across industries, people started to understand the technology and that’s how they came across to another buzzword Language Model, precisely Large Language Model (LLM). ChatGPT, Gemini or similar chatbots are all powered by LLMs.
What are language models and LLMs?
A language model is a computer program that’s been trained to understand and process human language. It uses Machine Learning (ML) techniques to understand meaning and context of what user is saying and then generate text, image, video as output. Fr example when you will say “Sky is _______”, it will understand the context and fill the blanks with “blue”. Generating this kind of response is very complex and resource-intensive work. This require model to be trained on “large” dataset of books, articles, codes, videos etc. The language models which are trained by large datasets and have billions of parameters are called Large Language Models or LLMs.
So how large is “large”? To give you idea, ChatGPT is powered by GPT 4.0 LLM which has over trillion parameters. Gemini another chatbot from Google is powered by LLM Bard which also has over 100 Billions parameters.
While LLMs have created exciting new opportunities but their size require significant computing resources to operate. So what about edge devices as mobile phones or devices without internet.? Where will complex computation run for them?
Enters Small Language Models (SLM)
SLMs are smaller counterpart of LLMs. They have million or a few billion parameters against LLMs with parameters ranging to Trillions as we discussed earlier. Its offers significant advantages as:
- Need less computing power making it suitable for smaller and edge devices
- It can be easily fine trained for specific industry or use case
- Can be easily deployed to places where LLMs are not feasible due to their computation requirements
Below is comparison between SLM and LLM:
| Parameter | SLM | LLM |
| Size | Millions to few Billions | Billions to Trillions |
| Computation Requirement | Minimal | Higher |
| Performance | Simple task | Complex task |
| Cost | Cost-effective | Expensive |
| Customization | Easier | Complex |
SLM Use cases
SLM can be deployed for domain specific tasks, simple use cases, edge devices or the places which are not connected to cloud. Here are few use cases:
- Chatbots to answer customer queries or to provides services
- Edge devices as smart phones, traffic lights, car computers, smart sensors etc.
- Sentiment analysis and gain insights from customer feedback and responding to it
- Personal assistant with voice recognition, text prediction etc.
- Realtime language translation
Example of few Popular SLMs
- Phi-3 by Microsoft
- TensorFlow Lite by Google
- DistilBERT by Hugging Face
- MobileBERT
Final words
So will SLM replace LLM? No, both have their advantages, limitation and unique use cases. While LLM will still be utilized for complex task, SLM has its place for computation on the edge and on-device, allowing tasks to be performed without complex computation. I will suggest you check Microsoft Phi-3 page to give you more insight on its capabilities: https://azure.microsoft.com/en-us/products/phi-3
Phew!!! that was all in this post. I will see you soon with some other interesting stuff. Till then, bye-bye 🙂