The Era of AI Large Models: Background and Current Development Challenges

baoshi.rao

Background Analysis and Prospects of Large Models
In recent years, artificial intelligence technology has developed rapidly, especially with the progress in machine learning and deep learning, ushering in the era of large models. Large models have quickly permeated various technical fields, driven by the development of big data and computational power. These models require massive amounts of data for training, and the growth rate of data resources has nearly doubled annually in recent years. Compared to 5–10 years ago, the number of organizations capable of developing large models globally has significantly increased. Today, not only tech companies, academic institutions, and research teams but even student groups can release their own large models. The abundance of big data resources has laid the foundation for the development of large models.

Computational Power and AI Applications
Large models also require robust computational support, particularly in the field of artificial intelligence applications. In 2021, China's intelligent computing capacity reached 202 EFLOPS, ranking second globally with a growth rate of 85%. General computing, supercomputing, and edge computing have also seen rapid development. Strong computational resources provide solid support for large models. Large models represent the pinnacle of AI technology, featuring vast parameters and computational resources, supported by big data and powerful computing capabilities. The emergence of large models signifies that AI has entered the intelligent era, with profound impacts that will reshape our lives and work. In the future, large models will continue to evolve.

Future Trends and Challenges
Model scales will expand further, with parameters reaching hundreds of billions or more. Generalization and robustness will improve, and multimodal models will become more widespread, enabling multi-sensory fusion. Model interpretability and robustness will also become key research focuses. The era of large models is still in its infancy, presenting both opportunities and challenges. How to better utilize large models for the benefit of humanity is a question we must consider. In the field of natural language processing, deep learning models like Google's BERT, OpenAI's GPT, and Baidu's ERNIE have rapidly advanced, with parameter scales expanding to billions and data usage increasing significantly, greatly enhancing model comprehension and generation capabilities.

Training and Application Scenarios
These models typically employ pre-training and fine-tuning methods, first undergoing self-supervised learning on vast unlabeled data, followed by fine-tuning with small datasets for specific tasks to achieve superior recognition, understanding, decision-making, and generation results. Deep learning models show immense potential in industrial intelligence upgrades, applicable in search, recommendation, intelligent interaction, and production process transformation. However, challenges remain, such as high computational costs, data quality and security issues, and model interpretability and trustworthiness. The massive parameter count allows models to better capture complex relationships and patterns in data, delivering outstanding performance across tasks.

Model Architecture and Computational Dependencies
Models often feature deeper neural network structures, including multiple layers and sub-networks, enabling multi-level feature extraction and abstraction. They are typically pre-trained on large-scale datasets to acquire broad knowledge, then fine-tuned for specific tasks to achieve better performance. The pre-training-fine-tuning strategy has been highly successful in natural language processing. Due to their scale and complexity, large models require substantial computational resources for training and inference, often relying on high-performance units like GPUs or TPUs. Computational power, data, and algorithms are interdependent, collectively building the model application ecosystem. Computational power affects training speed and model scale, with stronger computing supporting larger models, longer training times, and higher precision.

Data and Algorithmic Improvements
The diversity, quality, and scale of data significantly impact model performance and generalization. Rich data helps models better understand different contexts and problems, improving performance. Algorithmic advancements can reduce reliance on computational power and data, enabling more efficient training or better performance with limited data. The core capabilities of deep learning models include transfer learning, expressive power, and creative learning. Models possess strong knowledge and memory, learning rich semantic and knowledge representations from vast corpora for downstream task transfer. They exhibit human-like comprehension, generating new information and understanding complex linguistic phenomena.

Challenges and Ethical Considerations
Models also have reasoning, planning, and learning abilities, enabling goal-oriented decision-making and interaction with the environment. They can predict future scenarios based on existing data and create works that meet specific descriptions. However, deep learning models face issues like hallucinations, where they lack common sense and ethical judgment, potentially producing fictional, incorrect, or harmful outputs. Repetition and bias are also problems, with models sometimes lacking creativity and imagination, outputting similar or prejudiced content, and being criticized for lacking humanity and empathy. As technology advances, these issues will gradually improve. Future AI agents will become more intelligent and human-like.

Deep learning models are becoming increasingly significant in corporate competition, and this rivalry is expected to intensify next year, especially among large enterprises. The applications of deep learning models are gradually permeating various industries, including finance, healthcare, retail, and manufacturing. Internet companies, with their high user sensitivity, are particularly impacted by these models. In summary, the rapid development and widespread application of deep learning models make them a topic worth continuous attention. The future direction and speed of AI development remain a subject of great interest.

In April 2023, the Central Politburo meeting explicitly emphasized the importance of developing large AI models. Currently, large models are widely used in both civilian and military sectors. However, their development presents both opportunities and challenges. The opportunities lie in their ability to solve multiple problems and generate significant economic benefits. The challenges include the vast number of parameters involved, making it difficult to achieve professional-level performance across all fields. Large internet companies can no longer ignore the impact of large models. The lack of a robust ecosystem for these models means that businesses may find their growth constrained. Smaller companies, in particular, struggle to compete with larger firms and often rely on the ecosystems of these giants to develop their operations.

The development of large models has become a necessity for major internet companies—either embrace them actively or risk being left behind. Compared to corporate competition, governments are more focused on providing financial and policy support for large model development. As of August 2023, cities like Beijing, Shanghai, Hangzhou, and Suzhou have introduced policies offering subsidies of up to 50 million yuan for large model research and development. For instance, Shanghai launched the 'Model Capital Initiative' to create an open-source AI ecosystem and support the clustering of large model enterprises. Hangzhou selects 10 large model application projects annually, providing subsidies of up to 5 million yuan each. Suzhou has set a goal for intelligent computing by 2025, offering up to 10 million yuan in funding for large model companies.

Wuhan's 'Optics Valley Software Ten Measures' encourages open-source projects for large models, with subsidies of up to 30 million yuan. Beijing has formulated the Zhongguancun Science City Computing Power Subsidy Plan, offering up to 10 million yuan for large model R&D. Currently, general-purpose models dominate large model applications. These models, such as ChatGPT and Wenxin Yiyan, can address multiple tasks, including AI dialogue, article writing, mathematical calculations, and code programming. However, their weakness lies in their excessive parameters, which often result in subpar performance in specialized fields. This is due to the difficulty in preparing comprehensive training data across all domains and the challenge of having experts verify the accuracy of model outputs.

In summary, the future of large models is promising, but they face both opportunities and challenges. Achieving AI proficiency in specialized fields will require more data and expert support. Greater progress in civilian and military applications will depend on collaborative efforts between governments and businesses. It is hoped that large models can benefit human society while ensuring national security. Despite challenges like parameter inflation and deployment constraints, technological advancements will likely make large models more accessible. In recent years, the capabilities of large models have improved significantly, but the number of parameters has also increased dramatically. Excessive parameters can lead to overfitting and prolonged training times, placing high demands on hardware.

Theoretically, increasing parameters can enhance model performance, but in practice, it is not the only option. Reducing parameters to enable deployment on various devices, improving efficiency, and saving computing power are emerging trends in large model development. For example, Microsoft's Phi model and Shanghai AI Lab's InternLM model have successfully reduced parameters while maintaining performance, making them more practical for real-world applications. With technological progress, large models may no longer be 'large,' evolving from room-sized systems to handheld tools, much like the first computers. Currently, specialized large models outperform general-purpose ones. With the same number of parameters, specialized models excel in specific tasks and hold greater practical value for future applications.

Models like MathGPT for mathematical problems and CodeGPT for programming demonstrate that specialized large models far surpass general-purpose models in their respective fields, offering higher commercial value. The development of large models will likely progress from general-purpose to specialized models and eventually to world models. Creating autonomous AI requires predictive world models, which demand multimodal capabilities. The evolution of large models is not only critical for corporate growth but also for national competitiveness and the journey toward artificial general intelligence (AGI). Despite the challenges of parameter inflation and deployment, the future of large models is undeniably bright. Theoretically, increasing parameters can boost model performance, but in practice, it is not the sole solution.

On the contrary, reducing parameters allows models to be deployed on various devices, improving efficiency and saving computational power, which is the trend in large model development. With technological advancements, large models will become increasingly accessible to the general public. Currently, specialized large models outperform general-purpose ones. In the future, the development of large models will progress from general models to specialized models and eventually to world models. Building autonomous AI requires predictive world models, which demand multimodal predictive capabilities. The development of large models is not only crucial for enterprises and national competitiveness but also pivotal for humanity's journey toward AGI.

What opportunities and challenges does large model development face? What distinguishes specialized large models from general-purpose ones? What kind of large models are needed to construct autonomous AI? These questions are worth deep consideration.

The path of large model development is long and arduous, requiring collective exploration and effort.