2024 AI Industry Predictions: Open Source Models Defeat GPT-4, Agent Explosion

baoshi.rao

Generative AI dominated many headlines in 2023, and 2024 will likely be no different.

With the continuous advancement of large models, many players are discussing: What direction will AI take? Will the AI industry cool down in 2024? Or will there be new breakthroughs and broader applications? How will regulators and the public respond?

"Top AI Players" has compiled and summarized recent key perspectives from major AI companies, researchers, venture capitalists, and tech media worldwide. They shared their predictions on topics like the future of generative AI, AI Agents, multimodal systems, the debate between open-source and proprietary models, and AI safety. While opinions vary, one thing is certain: 2024 is poised to be a decisive year for generative AI.

@OpenAI co-founder Greg Brockman

In terms of AI's capabilities, safety, and its positive potential impact, 2024 will be a groundbreaking year. From a longer-term perspective, this is just another year of exponential development that will make everyone's life better than it is today.

@BillGates

Bill Gates believes that AI, as the most far-reaching innovative technology currently on Earth, will completely sweep the globe within 3 years.

"If I had to make a prediction, in high-income countries like the United States, I estimate we're about 18 to 24 months away from widespread use of AI by the general public.

In African countries, I expect to see similar levels of adoption in about three years. There will still be a gap, but it's much shorter than the lag times we've seen with other innovations."

@NVIDIA Senior Scientist Jim Fan

2024 will be the year of video. While robotics and embedded agents are just getting started, I believe AI video will see breakthrough developments in the next 12 months, encompassing both input and output aspects.

"I": Video input. GPT-4V's understanding of video is still quite primitive, as it treats videos as a series of discrete images. What is the smartest way to reduce information redundancy? What should the learning objectives be? Next-frame prediction has a clear analogy to next-word prediction, but is it the best approach? How to interleave with language? How to guide video learning for robots and AI? The industry has not yet reached a consensus.

"O": Video output. In 2023, we witnessed a wave of text-to-video synthesis: WALT (Google), EmuVideo (Meta), Align Your Latents (NVIDIA), Pika, and many more. However, most generated clips remain short. I consider them as AI video's "System 1"—"unconscious" local pixel movements.

In 2024, we will see video generation with high resolution and long-term coherence. This will require more 'thinking,' specifically System 2 reasoning and long-term planning.

Open-source models surpass GPT-4, smaller models gain popularity

@Meta researcher Martin Signoux

Here are eight AI predictions for 2024:

AI smart glasses become a reality.

ChatGPT's position in the realm of AI assistants will not mirror Google's dominance in search.

Large Multimodal Models (LMMs) will continue to emerge and eventually replace Large Language Models (LLMs) amid ongoing debates.

GPT-5 does not represent a major breakthrough, but it demonstrates improvements across various aspects.

Small Language Models (SLMs) have emerged, with cost-effectiveness and sustainability considerations accelerating this trend.

Open models outperform GPT-4, blurring the lines between open and closed models.

No single benchmark, ranking, or evaluation tool can serve as a one-stop solution for model evaluation.

Compared to existing risks, the potential risks do not attract much discussion.

1B models will surpass 70B models.

The model will be deployed on CPUs at almost no cost, rather than as an API service.

Data quality will improve performance by 10 times.

Combinations of open-source models will outperform the best proprietary models.

The compiler will increase model (training and inference) speed by at least 80%.

Legislation will support content creators, rather than model developers.

@Slater Stich, Partner at Bain Capital Ventures

2024 will be the year of real-time diffusion applications.

In 2023, we witnessed some major theoretical improvements in the inference speed of diffusion models, such as the original consistency model paper by Song et al. and the recent LCM (Latent Consistency Models). (Additionally, adversarial diffusion distillation.) We've already started seeing projects utilizing these ideas, like Dan Wood's Art Spew (77 512×512 images per second on a single RTX 4090), Modal's Turbo.art (based on SDXL Turbo), and fal.ai's 30fps face swapping.

By 2024, we'll see even more real-time image, audio, and video generation propagation applications.

@LlamaIndex founder Jerry Liu

RAG will still be a major focus (we haven't solved it yet).

Every AI engineer still needs strong software engineering fundamentals. Delivering LLM applications to production is equivalent to software engineering.

Vector databases are beginning to develop SQL-like interfaces and support multimodality.

Multimodal models will be increasingly used for document processing (but first, cost/latency needs to be reduced).

Fully functional GPT-4-like models will become open-source, faster, and cheaper. This excites me as much as GPT-5.

If this is truly the case, Agent development will flourish once again. Agents that can automate workflows, interact with other Agents, and improve over time.

Prompts remain as important as ever, but the significance of prompt engineering (token misuse) will decline.

@DingTalk

On January 3, DingTalk, in collaboration with the internationally renowned consulting firm IDC, released the first '2024 AIGC Application Layer Top Ten Trends White Paper.' IDC predicts that by 2024, over 500 million new applications will emerge globally, equivalent to the total number of applications developed in the past 40 years.

'AIGC will accelerate the formation of super portals'—minimalist interactions based on natural language will replace some traditional graphical interfaces, and the 'no App' concept will reshape the portals and user landscape formed during the mobile internet era. Application functionalities will be fragmented and integrated into certain super apps, allowing users to directly access and use various tools through conversations within a single app.

IDC's research shows that 97% of enterprises acknowledge that super portals will become the mainstream application form in the future (survey respondents: 100 large enterprises in manufacturing, healthcare, internet, finance, and retail industries with annual revenues exceeding 500 million). Prediction: Microsoft and Amazon will introduce artificial intelligence hardware devices.

For years, Apple and Samsung have collectively held 70% to 80% of the US smartphone market share, forming a duopoly in hardware. I predict that next year, we will see new types of AI devices that could pose a serious threat to existing players in the smartphone space, particularly Apple, the current market leader in the US.

The initial AI devices may not be smartphones. We've already seen examples of novel mobile devices that make AI the core feature rather than treating it as an afterthought, as is the case with today's smartphones. Humane's Ai Pin, a $699 device, can engage in conversations and perform real-time language translation. There are reports that former Apple designer Jony Ive, OpenAI's Sam Altman, and SoftBank's Masayoshi Son have joined forces to discuss creating some form of AI device.

Microsoft is set to launch a device built around AI integration, as it has already begun incorporating AI companions into software products like Office.

At the same time, Apple's efforts to improve AI features in products like the iPhone have been disappointing. Compared to its peers in the big tech arena, Apple has been slower to respond to the popularity of AI products like ChatGPT, despite the company actively developing a range of generative AI products. However, Apple will struggle to keep up, partly due to its aggressive stance on privacy, which will prevent it from fully leveraging the most advanced forms of AI running in the cloud.

@TechCrunch author Devin Coldewey

2024 will be a critical moment for AI technology to shift from hype to reality. Here are some possible trends:

OpenAI will become a product company, focusing on market share and customers.

The development of agent-based models and generative multimedia has led to more experimental applications.

The limitations of single large language models have become more apparent, prompting a shift towards smaller, more specialized models.

AI marketing claims are facing real-world tests, potentially leading to customer churn and legal disputes.

Apple may enter the AI market, launching optimized and practical products or services.

Legal cases related to AI misuse are increasing, and the AI compliance industry is emerging.

Early adopters are proactively implementing new AI regulations, such as the EU's AI Act.

The 2024 U.S. presidential election may be influenced by AI-generated content, potentially increasing chaos and distrust.

@Tech Blogger Matthew Berman

Meta will release LLaMA3 in Q1 2024.
OpenAI will launch GPT-4.5 in the first half of the year. It will be better, faster and cheaper, but still based on GPT-4.
Google's Gemini Ultra will compete with GPT-4 and provide a strong alternative. However, they will face immediate issues after launch: hallucinations, errors and unreliability.
Robotics will accelerate development. Optimus will make significant progress, and many other robotics companies will also release updates.
The gap between open-source LLMs and GPT-4 will narrow. I believe that in 2024, we will finally see an open-source model that can rival GPT-4.
AI Agents will improve. Agents will not only become mainstream and find use cases in the real world, but will also begin to exhibit human-like behaviors. We will use Agents in fields such as botany, marketing, and game theory to help us predict human behavior.

7. No AGI. Sam Altman's view on AGI still seems like a distant dream. The debate about AGI's definition and timeline continues, but we won't see AGI in 2024.

Surge in synthetic data. Synthetic data is becoming crucial in the AI field, especially in sensitive areas like healthcare and finance (as a solution for privacy and bias). If we can solve synthetic data issues, it will benefit open source, as purchasing massive datasets isn't an option in the open-source world.
Multimodal AI will become the new norm. Apple's Ferret and Tesla's FSD are leading the trend. But challenges will follow. Vision, hearing, and even touch?
Robots will be indistinguishable from humans. In 2024, we will no longer be able to tell the difference between robots and humans. The internet will be hit hard: spam, deepfakes, scams, and more. Be cautious.

@Jukedeck founder Ed Newton-Rex

At least one major court case involving AI and the creative industry has concluded with a victory for the creators or a significant settlement.

A highly impressive video generation model has been launched, trained exclusively on licensed data.

A major AI company has underperformed in a funding round (or similar), partly due to investor concerns about copyright infringement.

More cross-industry renowned creators are standing up against AI-generated models trained without the creators' consent.

@Radical Ventures partner Rob Toews

Nvidia will intensify efforts to become a cloud provider, making its relationship with Amazon, Microsoft, and Google increasingly complex.
Stability AI will shut down. The recent brain drain and persistently high burn rate have put it in trouble.
Terms like "large language model" and "LLM" will become less common, and the terminology used to describe models will become increasingly multidimensional.
The most advanced closed models will continue to significantly outperform the most advanced open models. We suspect that the enormous costs of open-sourcing cutting-edge models without corresponding revenue returns may lead companies like Mistral to retain proprietary control over their most advanced models in order to monetize them.
Multiple Fortune 500 companies will establish a new executive position: Chief AI Officer.
Alternative architectures to Transformers will gain genuine adoption.
Strategic investments by cloud providers in AI startups and related accounting impacts will face regulatory challenges.
Tensions will emerge between Microsoft and OpenAI. As OpenAI aggressively expands its enterprise business, it will increasingly find itself competing directly with Microsoft for customers. For Microsoft, as a supplier of cutting-edge AI models, there are compelling reasons to diversify beyond OpenAI.
Venture capital may return to the crypto space in 2024. Some excessive AI hype will shift elsewhere.

10. Current leading generative AI models have been trained on vast amounts of copyrighted content, which may trigger massive liabilities and alter the industry's economic landscape. At least one U.S. court will rule that generative AI models trained on internet content constitute copyright infringement. This issue will begin moving toward the U.S. Supreme Court.

@Intel Chief Information Security Officer Michael DeBolt

While there doesn't seem to be a killer AI application for cybercriminals yet, its powerful capabilities could assist with some mundane backend tasks performed by cybercriminals.

For example, using LLMs to sort through massive amounts of stolen data to identify the most critical information needed for corporate extortion. Or employing chatbots for preliminary ransom negotiations.

Another hypothetical innovation could be an AI tool that calculates the maximum ransom an organization would pay based on stolen data. We reported some examples in Q2 2023 where participants implemented AI in their products, including initial access brokers (IABs) using AI to provide free translation services. In May 2023, we reported a threat actor offering a tool allegedly capable of bypassing ChatGPT restrictions.

AI and machine learning tools can simulate through video and audio, posing a threat to identity and access management. AI-rendered videos are now relatively easy to detect, but synthetic voice cloning remains a significant threat for organizations using voice biometrics as part of their authentication process.

We still believe that AI cannot be fully relied upon to carry out more complex cybercrimes, as doing so in its current form may produce flawed results. However, the field is developing so rapidly that it's difficult to foresee what's coming next.

The proliferation of open-source LLMs and services—some intentionally built without security safeguards to prevent malicious use—means this area remains an unknown variable.

Three things that won't change

@AI Scholar Andrew Ng

Prominent AI scholar Andrew Ng has published a new article predicting AI trends on the official website of DeepLearning.AI, the AI education technology company he founded. He believes there are three aspects in the AI field that will remain unchanged over the next decade:

We need an AI community. People with friends and allies perform better than those without. Although the AI world seems to bring breakthroughs every week, it's best to distinguish what's real from hype, validate ideas, support each other, and create together with friends.

Those who know how to use AI tools work more efficiently. Individuals and businesses skilled in manipulating data can better understand the truth, make better decisions, and achieve more results. As AI continues to advance, this will become a reality.

AI needs good data to function properly. Just as humans require good data to make decisions—from determining marketing strategies to deciding what to feed their children—AI also needs good data, even as our algorithms continue to expand, evolve, and improve.