AI Large Model Competition: From Academic Breakthroughs to Longer Free-for-All

2025-08-09 21:09:31

AI Large Model Competition: From Academic Hotspots to Engineering Challenges

Last month, a fierce "Animal War" erupted in the field of AI.

On one side is the Llama, which is popular among developers due to its open-source nature. On the other side is the large model called Falcon. In May, after the release of Falcon-40B, it surpassed Llama to top the open-source LLM rankings.

This leaderboard is created by the open-source model community and provides standards for evaluating LLM capabilities. The ranking is basically a back-and-forth refresh between Llama and Falcon. After the launch of Llama 2, the llama family temporarily took the lead; however, at the beginning of September, Falcon released the 180B version, achieving a higher ranking again.

Interestingly, the developers of "Falcon" are not a tech company, but a technology innovation research institute in the capital of the UAE. Government officials stated that their involvement in this project is to disrupt core players.

Today, the AI field has entered a stage of diverse development. Strong countries and companies are building their own large language models. In the Gulf region alone, there is more than one player - in August, Saudi Arabia just purchased over 3,000 H100 chips for domestic universities to train LLMs.

An investor complained: "Back then, I looked down on the innovation of internet business models, thinking there were no barriers. I didn't expect that hard technology and large model startups would still be a battle of hundreds of models..."

How did the so-called hard technology, which is claimed to be highly difficult, turn into a trend that everyone can participate in?

The Rise of Transformers

American startups, Chinese tech giants, and Middle Eastern oil tycoons can all dive into large models thanks to the famous paper "Attention Is All You Need."

In 2017, eight computer scientists publicly introduced the Transformer algorithm in this paper. This paper is currently the third most cited document in the history of AI, and the emergence of Transformer has sparked this wave of AI enthusiasm.

Currently, various large models, including the globally renowned GPT series, are built on the foundation of Transformers.

Before this, "teaching machines to read" has long been recognized as an academic challenge. Unlike image recognition, human reading not only focuses on the current words and sentences but also understands them in context. Early neural networks had inputs that were independent of each other, making it difficult to understand long texts, which often resulted in translation errors.

In 2014, Google scientist Ilya Sutskever made a breakthrough. He used Recurrent Neural Networks (RNN) to process natural language, significantly improving the performance of Google Translate. RNN proposed "recurrent design," allowing each neuron to receive inputs from both the current and previous moments, thereby gaining the ability to "combine context."

The emergence of RNN ignited enthusiasm for research in academia, but developers quickly discovered its serious flaws: the algorithm uses sequential computation, which, while solving the context problem, has low operational efficiency and struggles to handle a large number of parameters.

Since 2015, Noam Shazeer and seven other researchers have been developing alternatives to RNN, and the final result is the Transformer. Compared to RNN, the Transformer has two major innovations: first, it replaces the cyclic design with positional encoding, achieving parallel computation, significantly improving training efficiency, and pushing AI into the era of large models; second, it further enhances contextual understanding capabilities.

The Transformer has solved multiple defects at once and gradually developed into a mainstream solution in the NLP field. It has turned large models from theoretical research into purely engineering problems.

In 2019, OpenAI developed GPT-2 based on Transformer, astonishing the academic community. Google then launched the more powerful Meena, which surpassed GPT-2 simply by increasing parameters and computing power. This deeply shocked Transformer author Shazeer, who wrote a memo titled "Meena Devours the World."

After the emergence of Transformers, the pace of algorithm innovation at the academic level has significantly slowed down. Engineering factors such as data engineering, computing power scale, and model architecture have increasingly become key components of AI competition. Companies with a certain level of technical strength can develop large models.

Computer scientist Andrew Ng pointed out during a speech at Stanford University: "AI is a collection of tools, including supervised learning, unsupervised learning, reinforcement learning, and generative AI. These are all general technologies, similar to electricity and the internet."

Although OpenAI remains a benchmark for LLMs, analysts believe that GPT-4's advantages mainly lie in engineering solutions. If open-sourced, any competitor can quickly replicate it. The analyst expects that other large tech companies will soon be able to create large models comparable in performance to GPT-4.

Weak Moat

Nowadays, the "Battle of Hundreds of Models" is no longer an exaggeration, but an objective reality.

Relevant reports indicate that as of July this year, the number of large models in the country has reached 130, surpassing the 114 in the United States. Various myths and legends are no longer sufficient for domestic tech companies to use for naming.

Apart from China and the United States, some relatively affluent countries have also initially realized the "one country, one model": Japan and the UAE already have their own large models, as well as Bhashini led by the Indian government and HyperClova X developed by the South Korean internet company Naver.

This scene seems to take us back to the early days of the internet when bubbles were everywhere. As mentioned earlier, Transformers have turned large models into purely engineering problems; as long as someone has money and GPUs, the rest is left to the parameters. However, a low entry barrier does not mean that everyone can become a giant in the AI era.

The "Animal Battle" mentioned at the beginning is a typical case: although Falcon has surpassed Llama in rankings, it is hard to say how much impact it has had on Meta.

As we all know, companies open-source their achievements not only to share technological dividends but also to harness social intelligence. As various sectors continue to use and improve Llama, Meta can apply these results to its own products.

For open-source large models, an active developer community is the core competitiveness.

Meta established an open-source approach as early as 2015 when it set up its AI lab; Zuckerberg, who started with social media, is more adept at "maintaining good relations with the public."

In October, Meta also launched an "AI Creator Incentive" program: developers using Llama 2 to address social issues such as education and the environment have the chance to receive a $500,000 grant.

Nowadays, Meta's Llama series has become a benchmark for open-source LLMs. By early October, 8 out of the top 10 open-source LLMs on a certain ranking were developed based on Llama 2. On this platform alone, there are over 1500 LLMs using the Llama 2 open-source license.

Of course, improving performance like Falcon is also possible, but currently, most LLMs on the market still have a significant gap compared to GPT-4.

For example, not long ago, GPT-4 ranked first in the AgentBench test with a score of 4.41. AgentBench was jointly launched by Tsinghua University and several prestigious American universities to evaluate the reasoning and decision-making abilities of LLMs in multidimensional open environments. The test content covers 8 different scenarios, including operating systems, databases, knowledge graphs, and card battles.

The test results show that the second place, Claude, only scored 2.77 points, which is still a significant gap. As for those grandiose open-source LLMs, their scores are mostly around 1 point, less than a quarter of GPT-4.

It is important to know that GPT-4 was released in March of this year, which is the result of more than half a year of the global competition catching up. The reason for this gap is the excellent team of scientists at OpenAI and their long-accumulated experience in LLM research, which allows them to always stay ahead.

In other words, the core advantage of large models is not parameters, but rather ecological construction ( open source ) or pure inference capability ( closed source ).

As the open-source community becomes increasingly active, the performance of various LLMs may converge, as everyone is using similar model architectures and datasets.

Another more intuitive challenge is: apart from Midjourney, it seems that no large model has been able to truly turn a profit.

Where is the Value Anchor

In August of this year, an article titled "OpenAI may go bankrupt by the end of 2024" attracted attention. The main point of the article can almost be summarized in one sentence: OpenAI is burning through cash too quickly.

The article mentions that since the development of ChatGPT, OpenAI's losses have rapidly expanded, with a loss of approximately $540 million in 2022 alone, relying solely on investments from Microsoft for support.

The title of the article may be exaggerated, but it reflects the reality of many large model providers: there is a serious imbalance between costs and revenues.

High costs have led to the fact that currently, only Nvidia is making big money from AI, possibly followed by Broadcom.

According to consulting firm Omdia, Nvidia sold more than 300,000 H100 chips in the second quarter of this year. This is a chip with extremely high AI training efficiency, and global tech companies and research institutions are scrambling to purchase it. If these 300,000 H100 chips were stacked together, their weight would be equivalent to 4.5 Boeing 747 aircraft.

NVIDIA's performance has soared, with a year-on-year revenue increase of 854%, shocking Wall Street. Currently, the H100 is being traded on the second-hand market for 40,000 to 50,000 USD, while its material cost is only about 3,000 USD.

The high cost of computing power has become a hindrance to the development of the industry to some extent. Sequoia Capital once estimated that global tech companies are expected to spend $200 billion annually on large model infrastructure; in contrast, large models can generate at most $75 billion in revenue each year, leaving a gap of at least $125 billion.

In addition, with a few exceptions like Midjourney, most software companies have yet to find a clear profit model after investing heavily. This is especially true for industry leaders Microsoft and Adobe, whose explorations have been somewhat stumbling.

The AI code generation tool GitHub Copilot developed by Microsoft in collaboration with OpenAI charges $10 per month, but due to facility costs, Microsoft is actually losing $20 per user, with heavy users causing Microsoft to lose $80 per month. Based on this, it can be inferred that the Microsoft 365 Copilot, priced at $30, may incur even greater losses.

Similarly, Adobe, which just released the Firefly AI tool, has quickly launched a points system to prevent users from excessive usage leading to company losses. Once users exceed their monthly allocated points, Adobe will reduce service speed.

It is important to know that Microsoft and Adobe are already software giants with clear business scenarios and a large number of paying users. Meanwhile, most large models with a mountain of parameters still have chatting as their main application scenario.

It is undeniable that if OpenAI and ChatGPT had not emerged, this AI revolution might not have happened at all; however, the value created by training large models is still questionable.

Moreover, with the intensifying homogenized competition and the increasing number of open-source models, pure large model providers may face greater challenges.

The success of the iPhone 4 was not due to the A4 processor with a 45nm process, but because it could run applications like Plants vs. Zombies and Angry Birds.

GPT1.51%

View Original

This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.

8 Likes