Why Is Artificial Intelligence Progressing Slowly? The Advancement of AI Needs to Accelerate

baoshi.rao

The resignation of Altman led to the emergence of the Q* algorithm, which seemingly suggested a conclusion: strong artificial intelligence is coming. However, the reality might be the opposite. AI has indeed made progress and possesses immense potential and disruptive power, but overall, the progress is not too fast—it's too slow.

In 1950, the Turing Test proposed that if a person could not distinguish between a human and a machine in a conversation without seeing them, then the machine would be considered to have passed the Turing Test.

Now, in certain scenarios, large models can indeed pass the Turing Test. Therefore, this version of the Turing Test has become outdated and holds little significance.

However, the fundamental core of the Turing Test is valuable. It defines a scenario where artificial intelligence completes a task, and the external perception of whether it can be distinguished as intelligent remains relevant—in fact, it has become even more critical.

Expanding on the Turing Test, we can define a position or scenario in economic activities and examine whether AI can accomplish it, while the service recipient remains unaware of whether it was provided by a human or a machine. If AI succeeds, it passes Turing Test 2.0; otherwise, it does not.

Because the original Turing Test was more about testing an intelligent entity in a virtual space—it didn’t need to distinguish between real or fake, as long as it ensured logical consistency, the goal of passing the test could be achieved. In this process, nonsensical responses didn’t matter. This is a technical perspective.

There’s an unassuming yet classic sci-fi movie called The Man from Earth, where a man claims to be a 14,000-year-old caveman who has witnessed human history and civilization’s evolution, even interacting with Buddha and Jesus.

The scientists in the same room tried to use logic to verify whether he was talking nonsense, but they found that by merely sitting in the room, it was impossible to determine the truth. From an abstract perspective within the room, as long as a person has sufficient knowledge and can maintain logical consistency, it's fundamentally impossible to discern the truth. However, stepping outside the room changes everything—other facts and feedback can quickly determine what's true or false.

Similarly, whether artificial intelligence is truly intelligent is both an academic and technical issue, as well as a commercial one. Therefore, it must step out and undergo testing in broader scenarios; it cannot remain merely a conversational language model. At this point, it becomes essential to revisit the core of the Turing Test for intelligence comparison and upgrade it with the same approach.

As explored in Can AI Make Money?, this method is referred to as the full-scenario coverage approach. With the increasing attention on artificial intelligence, this perspective seems ever more critical. Our entire civilization is built upon intelligence, so there are countless angles from which to view AI. For example: one is anchorless fantasy, where AI can do anything, akin to an imagined superhuman—useful for writing fiction; the other is a purely technical perspective, which leads to extreme optimism or pessimism—either dismissing its potential usefulness (despite its current popularity, most AI researchers were pessimistic over the past decade) or fearing world domination with every new breakthrough.

Without anchors and scale, it's easy to swing left and right, but scale itself is the essence.

If we compare within the technology circle, the progress is indeed significant, whether in past recognition rates or current content generation. Large models have made considerable advancements. However, from the perspective of the Turing Test 2.0 mentioned earlier, you'll find that even today, they still can't pass. It's like a curve that infinitely approaches but never breaks through.

We can further illustrate this with the division of labor within enterprises. Typical corporate positions are:

Each position is further subdivided vertically and horizontally. Vertical refers to hierarchy, commonly known as the reporting line, while horizontal refers to divisions of responsibilities like front-end, back-end, and APP.

In a product company with 100-200 employees, there are typically various similar positions. From the perspective of the Turing Test 2.0, which parts can current artificial intelligence pass?

Probably none, even in programming where the most progress has been made.

During programming, current AI cannot complete the mapping from requirement models to development models (though it can indeed enable one person to do the work of two). This means someone still needs to abstract the requirement model and turn it into prompts. Secondly, once problems arise, corrections become even more challenging because this requires a holistic understanding of the entire program—a comprehension that is likely inaccurate. As a result, modifying old programs becomes more laborious, requiring assistance from someone with a comprehensive understanding; otherwise, the corrections will be incorrect.

Therefore, large-scale model-based artificial intelligence cannot pass the Turing Test 2.0, and its failure to achieve commercial value poses a significant issue (passing it doesn't necessarily mean there are no problems).

From a scenario perspective, despite over a decade of effort, progress has been far slower than anticipated. OpenAI is currently mobilizing all possible resources for a breakthrough, and we should genuinely hope they succeed rather than face setbacks.

The outcome presents two scenarios: failure would relegate AI to occasional utility like a water reservoir, while success would unleash its pent-up potential like a flood.

The first step in this industry transformation is more likely to be an extreme compression, a fold of intense competition, before rebirth emerges.

The concept of 'folding' might be difficult to understand at first. Let's take the evolution of e-commerce as an example:

E-commerce undoubtedly disrupted traditional department stores and gave rise to new industries like food delivery and live-stream shopping. However, the first step was the 'folding' (replacement) of traditional retail, which gradually led to the current landscape of collective live-stream commerce.

If artificial intelligence passes Turing Test 2.0, the impact would be similar. For instance, if AI completely takes over routine copywriting tasks, the commercial value created through APIs might be reduced to mere thousandths of the original. But this would first make the copywriting profession obsolete, after which new roles and positions could emerge.

In this process of transformation lies a second challenge: many existing positions can be phased out, but this may not necessarily lead to a sustainable model that continues to develop. (If it stops here, it would mean harming others without benefiting oneself.)

During Altman's brief departure, a piece of news surfaced: every call to OpenAI results in a loss. This indicates that OpenAI operates under a fragile balance, specifically relying on global attention to attract massive capital—a pattern whose trajectory, when interpreted inversely, resembles that of a Ponzi scheme (not implying AI is a scam, but highlighting the similarity in operational characteristics, including cryptocurrencies). Under this trend, the critical factor is whether genuine commercial value can ultimately be delivered to sustain the next cycle. All Ponzi schemes don’t fail due to lack of interim returns but because they ultimately cannot meet expectations, leading to a rapid collapse.

From this perspective, for AI to establish a virtuous cycle: it must first pass the Turing test, then inspire new positive feedback mechanisms. Only then can it mirror the rise of the internet in 2000 and the emergence of native AI applications—otherwise, it remains merely a prelude. Viewed this way, the notion that AI is advancing too rapidly at this stage seems almost absurd.

Of course, this is not just a problem for a few companies like OpenAI, but also involves a large number of startup projects this time.

If the general large models, which play the role of engines, cannot pass Turing Test 2.0, then the various attempts based on them will not yield favorable results.

Recently, I came across introductions to many startup projects from this wave, and the feeling after reading them was: If the peak intelligence of large models ultimately cannot pass Turing Test 2.0, these projects will slowly die out, like fish in a drying lake.

It's difficult to comment on specific projects, but let's take an abstract example. For instance, I might notice that a company spends a lot of manual effort aligning data across multiple platforms, and this could be improved using RPA combined with models. Does this have value? Yes, but if the level of intelligence isn't sufficient, the value created may not justify the cost, making it commercially unviable.

Another example: many people dislike doing household chores, so having a robot would be valuable. But if the intelligence isn't adequate, it won't result in a truly useful product.

By the way, over the weekend, I attended an event and saw a few robots being used there. It was almost heartbreaking. These so-called embodied robots haven't made any fundamental progress compared to a decade ago—they're still just a chassis with a tablet attached. The real progress has been in smart speakers, where significant time has been spent refining features like voice recognition, which now works reasonably well even in noisy environments.

There are many similar projects, including those in the supply chain providing chips and data for AI companies. Everyone wants to be the next Nvidia, but unless they can pass the Turing Test 2.0, there might be one more successful company, but not many.

If the peak of intelligence cannot be further elevated, these products will all be stuck below a certain line. The necessary expenses remain unchanged, yet no new value is created.

From this perspective, it becomes clearer that the real issue with AI development is not that it's too fast, but rather too slow. The question is: how much stamina does each player have, and what's the total stamina pool available?

In the current economic system, humans essentially function as large-scale tools, with this role consuming time that could be spent on family and personal life. Only a very small fraction of people find enjoyment in this instrumental role, while the vast majority do not - yet everyone needs to work. This is what we previously referred to as alienation, differing only in degree when compared to the era of Modern Times.

Humans, tools, and organizational models collectively form an upper limit of capability. As the pursuit of this limit intensifies, the conveyor belt beneath people's feet moves faster, manifesting as increasing busyness for certain individuals.

When people consider losing a role they dislike, they often feel even more anxious, as it feels like severing an economic lifeline.

This is the most intriguing part: how can one find joy in losing something they dislike?

Artificial intelligence is one of the elements of civilization, providing the power to restructure past social frameworks, though it is not the entirety. Based on its advancements, we may be able to solve currently unsolvable problems at a lower cost, such as poverty and hunger. It will enhance societal freedom, giving people more space to address issues and enabling a more advanced synthesis.

On this point, I agree with Kevin Kelly: Technology always brings both good and bad, but always a little more good. It at least expands the realm of possibilities.

From this perspective, the development of artificial intelligence is also slow.

In the field of artificial intelligence, technology, societal imagination, and commercial judgment have now converged, leading to frequent emergence of various viewpoints. However, at this stage, purely technical or purely societal interpretations of AI may not be very meaningful. Only from a commercial perspective can one more clearly see its paradoxical state of being on the verge of death yet full of vitality. Therefore, returning to Turing Test 2.0 should be meaningful.