Fully Aligning with OpenAI, Zhipu AI Wants Developers

baoshi.rao

From its inception, Zhipu AI has been benchmarking against OpenAI and is also known as 'China's OpenAI.' The company has now released its new-generation foundational large model GLM-4, with performance significantly enhanced compared to the previous generation, approaching GPT-4. Additionally, much like Satya Nadella's approach years ago in leading Microsoft to fully embrace developers and open-source, Zhipu AI has directly adopted the slogans 'GLM ️ Open Source' and 'GLM ️ Developers.'

"We are striving to catch up with OpenAI's full-stack large model ecosystem," said Zhang Peng, CEO of Zhipu AI. On January 16th Beijing time, two months after OpenAI's first DevDay where GPT-4Turbo and GPTs App Store were announced, Zhipu AI held its first technology open day (DevDay) in Beijing, introducing the new generation foundational model GLM-4, which shows substantial overall performance improvements over its predecessor, approaching GPT-4. It supports longer context, has stronger multimodal capabilities, faster inference speed, better high-concurrency response ability, further reduced inference costs, and significantly enhanced agent capabilities.

In addition to the model itself, Zhipu AI also introduced the GLM-4All Tools suite, comparable to OpenAI's GPT-4All Tools, which enables automatic multi-tool invocation. It can autonomously understand and plan complex instructions based on user intent, freely calling web browsers, code interpreters, and multimodal text-to-image models to accomplish complex tasks. For us, the most significant change is that we no longer need to use prompt words or programming languages to invoke the various capabilities of large models, such as Q&A interactions, drawing, programming, data analysis, and processing various files.

The understanding of drawing above is simple—through direct contextual drawing instructions, continuous iteration is possible. What developers should pay attention to is the coding capability. For problems like polynomial solving, GLM-4 will automatically call the Python interpreter to write the solving code and then execute it to obtain the solution.

What developers should focus on most is that, just like OpenAI introduced GPTs and an app store, allowing everyone to build their own GPTs and even monetize them by uploading their GPTs to the app store.

GLMs and their app store are here! No programming skills are required to create your own GLM agent—just use simple prompt instructions. Additionally, users can share their created agents through the newly launched Agent Center. At the event, Zhang Peng also previewed that the GLMs Model App Store Developer Revenue Sharing Program will be announced soon.

Zhipu AI is truly "down-to-earth," avoiding direct claims of "surpassing" competitors. Instead, it clearly presents its performance across multiple benchmarks, specifying where it exceeds, matches, or falls short. It comprehensively benchmarks against OpenAI, neither lacking nor exceeding in any aspect.

What lies behind this thinking of Zhang Peng and Zhipu AI? When Sam Altman, in his conversation with Bill Gates, mentioned that multimodal AI is a milestone for the next two years and predicted that AI will experience a steep growth curve over the next five to ten years, stating that "current models will become the dumbest," where does Zhipu AI see itself in this landscape? At the DevDay event, CSDN exclusively interviewed Zhipu AI CEO Zhang Peng to hear his thoughts and insights.

The Battle of Models

CSDN: Today, it appears that Zhipu AI's releases closely follow the path already taken by OpenAI. Many say this is "fully benchmarking against OpenAI—nothing less, nothing more." Why did you choose this approach? Zhang Peng: Actually, we've had internal discussions about this issue before—whether to do something a bit different. The conclusion we reached was that instead of spending energy on flashy ideas, it's better to honestly share what we've done. We've accomplished a lot, but organizing events specifically for developers or similar activities is the first time for us. I think it's necessary to do this at this moment.

Of course, we also considered OpenAI's approach, but indeed, Zhipu is more down-to-earth and unafraid of comparisons. This reflects a kind of courage—openly acknowledging where we fall short and directly sharing our successes for everyone to evaluate, as well as accepting questions and challenges. However, while OpenAI's DevDay and ours are two months apart, the gap between us isn't simply two months. From an academic perspective, the growth curve shows that the later stages require higher costs and longer timeframes. CSDN: GLM-4's performance has significantly improved compared to its predecessor, approaching GPT-4 levels. Have you considered that GPT-5 might bring exponential improvements? We've seen indications that GPT-5 might be released in 2024, and OpenAI CEO Sam Altman has previously suggested that entrepreneurs should develop based on GPT-5 rather than GPT-4.

Zhang Peng: Altman's advice is directed at developers and users, not model manufacturers. For model manufacturers, it currently seems impossible to leap directly to version 5 without fully realizing version 4. For users, however, it's different—they don't need to understand the energy consumption, processes, or principles; they can simply use the technology. Altman's advocacy is more about focusing on how to utilize the technology rather than creating "shell" applications. I agree with this perspective. If you're not a model development company, there's indeed no need to engage in such efforts—just choose the best option available, whether it's GPT-4, GLM-4, or others. CSDN: Has there been any internal discussion at Zhipu AI regarding the direction of GPT-5? What perspectives can you share?

Zhang Peng: Indeed, there has. Regarding future directions, Academician Zhang Bo's presentation today provided a thorough perspective, highlighting the evolution of multimodal, intelligent agents, and embodied intelligence. It's easy to reach a consensus on the general direction, as the industry's understanding and judgment don't differ significantly. CSDN: Was Academician Zhang Bo the most influential figure in your career?

Zhang Peng: Yes. We studied and worked in the Department of Computer Science and Technology at Tsinghua University, where Academician Zhang Bo was a faculty member. He frequently organized lectures and discussions on the future development and planning of artificial intelligence, generously sharing his insights. Each time I listened to him, I gained new inspiration. As one of the world's top scientists, every step he takes advances into uncharted territory, which is incredibly valuable and has provided me with profound enlightenment. In 2020, when our company celebrated its first anniversary coinciding with GPT-3's release, I consulted Academician Zhang Bo whose response left a profound impression. He acknowledged the technology's novelty but clearly pointed out its existing issues, notably foreseeing what later became widely discussed as the 'hallucination' problem.

CSDN: Among the three aspects Academician Zhang Bo mentioned - multimodal, Agent (currently being developed) - could embodied intelligence be the next direction? Will Zhipu AI attempt this?

Zhang Peng: It could be a potential direction. Actually, whether it's Agent, embodied intelligence or even multimodal approaches, these concepts aren't new in AI field with long traceable histories. We need to focus on the scientific principles and fundamental theories behind embodied intelligence - examining the completeness of theoretical systems, existing research, and identifying successful or failed attempts. CSDN: I noticed your presentation slides directly feature "GLM ️ Open Source" and "GLM ️ Developers" - is this inspired by Microsoft?

Zhang Peng: I actually wasn't aware of Microsoft's story in this regard. This has been our team's consistent tradition in open source and supporting developer communities, dating back to our academic research days. We strongly believe in collective wisdom - individual intelligence has relatively limited power, so building a good ecosystem is essential.

CSDN: So this is a case of great minds thinking alike? Zhang Peng: I understand this should be the current state.

Native applications of large models may be the key to whether generative AI will collapse

CSDN: Today we see that ZhiPu AI has a very systematic output for developers, including GLMs agents, app stores, and developer revenue-sharing plans. What are ZhiPu AI's future plans for developers in the coming years? Zhang Peng: It will likely continue for a long time. I have strong feelings about this. Over the past year, I've been on the front lines and learned that the market has a very urgent demand for the implementation of large model applications, especially hoping for a breakthrough. But I'm surprised that people's expectations for the rapid emergence of intelligent applications aren't as high.

CSDN: Yes. We all say that LLMs will reshape all industries and applications, but there are many points of confusion. On one hand, as you mentioned before, for native large model applications, the expectation is that they should be something entirely new, not just an upgrade of existing applications. For developers and companies specializing in applications, the challenge is how to break out of the traditional mindset to see new opportunities for integrating LLMs with applications and creating new products. It's full of uncertainties. What are your thoughts on this? Zhang Peng: This can be viewed from Gartner's prediction of the technology cycle, which forecasts that generative AI is currently in the 'Peak of Inflated Expectations' (i.e., the bubble phase), and will subsequently enter the 'Trough of Disillusionment'. We can examine this from two perspectives: first, whether its conclusion is accurate, and second, if it is accurate, what methods can we use to break this pattern? For everyone, can we accept AI going through another cycle of ups and downs into a winter? Do we have ways to make the pattern more stable rather than falling into a trough?

CSDN: As a practitioner who has personally participated in this field, do you have a clear direction on this matter? How can we ensure it doesn't enter the 'Trough of Disillusionment'? Zhang Peng: Actually, you've already mentioned it earlier, the key to this problem might be 'large model native applications'. Its essence relies on innovation, combining external and internal forces, similar to nuclear fission and fusion. The image above was generated by CogView, along with a description: The image has been generated, depicting a futuristic cityscape where large holographic screens display various generative AI applications. These screens showcase diverse and advanced technologies such as virtual reality, robots, and autonomous vehicles, all driven by large model native applications. The entire scene is bustling and technologically advanced, reflecting a world deeply integrated with generative AI.

CSDN: What might the external and internal forces be respectively? I am in the middle of the situation, and I am still unaware of external forces. The only thing I can be certain of is the internal strength—we can hammer out all the 'nails' inside, which is predictable.

CSDN: The cost of trying is still quite high.

Zhang Peng: Exactly, that's the price we have to pay. CSDN: Will this involve a wave of elimination?

Zhang Peng: There are two possibilities when it comes to applications. One is to use AI as a 'hammer' to rework existing applications; the other is innovation, seeking new scenarios to create incremental value—this is the part most likely to survive in the long run. Therefore, I believe Altman's emphasis is on not getting bogged down in tasks like model fine-tuning but instead focusing on what more creative things can be done with the best models available. CSDN: From my understanding, the model is handling various hierarchical tasks, allowing developers to dedicate more energy and resources to innovation.

Zhang Peng: Exactly. As I mentioned, we've provided everyone with the 'hammer' of large models. Beyond just hitting existing 'nails', we should explore more ways to utilize this tool – that's what I strongly recommend trying. CSDN: Today you also shared the achievements of ChatGLM's open source. Some in the community lament that ChatGLM's open source move was too slow, allowing Llama to establish its ecosystem first. What are your thoughts on this?

Zhang Peng: Supporting the open-source community and contributing to open-source technology is fundamentally about advancing technological progress, not based on commercial considerations. Of course, there are commercial benefits, but the primary goal is to push forward the evolution of technology itself and attract all developers to explore its theory and practice. The purpose of the open-source community is to maintain technological innovation and diversity. We aim to take from open source and give back to open source. Our essence is to promote the prosperity of the community. The greater the impact, the more meaningful our efforts become. CSDN: Since its establishment in 2019, Zhipu AI has been positioned as China's counterpart to OpenAI. With the emergence of GPT-3 and developments up to now, how do you view being consistently referred to as 'China's OpenAI'? What are the key elements required to embody this role?

Zhang Peng: We deeply admire OpenAI's foresight and perseverance. They've been at this for nearly nine years, starting remarkably early and maintaining a remarkably straight path—steadfastly pursuing what now clearly appears to be the right direction.

Secondly, while we share similar goals and philosophies, we acknowledge there's still a gap. They're advancing faster and achieving more. Our priority is to learn from them while maintaining independent thinking throughout this learning process. Ultimately, we don't care too much about how others judge us. What truly matters is what we aim to achieve.

CSDN: The vision has never changed - to create AGI and make machines think like humans.

Zhang Peng: So we share the same ultimate goal, and our paths have been quite similar so far. But when you examine the core technologies, many aspects are actually quite different. After GPT-3, everyone has been developing independently, and we've been exploring many things on our own. CSDN: Looking at the present, on one hand, at the model level, Altman says current models will eventually become the dumbest models in the future; on the other hand, at the application level, intelligent applications still have vast unknown potential, with more and more users without programming backgrounds joining the developer community. For current professional developers, what thoughts and suggestions can you share?

Zhang Peng: I have one suggestion. For existing professional developers, understanding the essence may be most crucial. The current situation isn't just about models scaling from a few megabytes to hundreds of gigabytes - the underlying principles have undergone significant changes. As Academician Zhang Bo proposed, 'Next token prediction' is actually a brilliant concept that might truly help humanity solve all known problems. If we remain stuck in traditional ways of thinking, we might not even know where to swing our 'hammers'. CSDN: So the core is a shift in thinking?

Zhang Peng: This is the most difficult part, which can be referred to as a cognitive shift.

CSDN: To summarize, when it comes to cognitive intelligence, developers need to undergo a cognitive shift. Zhang Peng: That's right, because this generation is about the revolution of cognition. Only by making fundamental changes in cognition can we keep up with this era.