Artificial Intelligence (AI) is a phrase that’s been in the limelight since OpenAI’s launch of the ChatGPT platform in November of 2022. When you think of OpenAI’s ChatGPT, Google’s Gemini, Anthropic’s Claude.ai and other such platforms, they’re considered generic AI with text generation capabilities.
While that is generally acceptable, a more precise phrase for such platforms would be that they are “Large Language Models” or LLMs for short. LLMs are a type of artificial intelligence system that is trained on massive amounts of text data to generate human-like text and engage in natural conversations with contextual memory.
Although LLMs have been commercially available for less than 16 months, these incredibly complex pieces of software have been transformative and are set to revolutionize every sphere of the economy and the human condition imaginable.
The exciting part about all this is that the field of LLMs and, more broadly, AI are still in their infancy.
What are LLMs?
As the name suggests, Large Language Models are computer software “models” of natural human language that are extremely large. They are large because they are “trained” on massive amounts of text data, which often run into petabytes. A petabyte is equivalent to 1 million gigabytes. The process of training such models on petabytes of raw text data, often scoured or ‘crawled’ from the internet, is called machine learning.
There are many different types of LLMs currently in the market and active development. The most popular is, of course, OpenAI’s GPT, which stands for “Generative Pre-Trained Transformer.”
Other popular LLMs in the market include LaMDA (Language Model for Dialogue Applications) by Google (which is currently used by Gemini) and Meta’s LLaMA (Large Language Model by Meta AI).
How LLMs work
The designing and building of LLMs is a complex, expensive and challenging task. As previously mentioned, LLMs work by being trained on extremely large amounts of human and machine-generated data.
One of Artificial Intelligence’s building blocks, Machine Learning processes, are often performed with specialized hardware called GPU clusters, which are expensive to own and maintain, therefore, training an LLM involves a huge upfront cost.
After the basic training, the models are then fine-tuned and “aligned” in accordance with the design goals to mitigate or minimize the outputs of inappropriate, offensive or inaccurate information.
After the fine-tuning and aligning phase, the software is then released to ‘beta testers’ and other early adopters in order to validate the effectiveness and accuracy of its response. Once the model hits the required accuracy and effectiveness threshold, it becomes a candidate for public release.
The Future of LLMs
There are exciting research and developments happening in the field of LLMs. Commercial LLMs (like ChatGPT) have already taken our world by storm, and future developments would likely involve the use of a special type of machine learning called “Reinforced Learning with Human Feedback” (RLHF). The goal would be to increase accuracy while aligning the models to better understand the needs of humans and respond with greater accuracy, reduced biases and inclusion of more diverse perspectives.
Some active projects also include increasing the efficiency of the LLM training process using microprocessor architectures that use less energy, such as ARM. This is important, considering the ever-growing carbon footprint of the machine learning process and the technology industry.
In conclusion, the future of LLMs and artificial intelligence in general looks promising. LLM and AI will be as revolutionary to the decade of 2020s as the original industrial revolution was to the 18th and 19th centuries.
The best is yet to come.