Apple's $10 Billion Car Dream Shattered: A Roadmap of Apple's AI Strategy

baoshi.rao

Apple's electric vehicle project, commonly known as 'Project Titan,' was launched in 2014 with billions of dollars in investment. 'Titan' originates from Greek mythology, symbolizing creativity and immense divine power. This highly anticipated 'divine project' within Apple is about to be halted, with the team partially transitioning to generative artificial intelligence projects.

'Divine Power' Shifted to Generative AI, which has become an increasingly important strategic focus for Apple. The transitioning team will report to John Giannandrea, who joined Apple in 2018 as Senior Vice President of Machine Learning and AI Strategy. His responsibilities at Apple include leading the company's AI team, driving improvements and development in Siri and other AI projects. On September 8, 2023, The Information exclusively revealed the core members of Apple's AI model development team, featuring John Giannandrea alongside Java creator Arthur Van Hoff and neural network expert Ruoming Pang - an all-star lineup driving Apple's AI ambitions.

All these personnel changes became public only through media leaks. The famously secretive company has remained conspicuously silent about its AI strategy while other U.S. tech giants loudly announce theirs, with no official disclosures through press events or channels.

Longtime Apple reporter Mark Gurman indicated in his Power On newsletter that Apple plans to introduce a suite of generative AI tools at the upcoming Worldwide Developers Conference (WWDC), potentially including Siri enhancements. These new features, expected as part of iOS 18, promise more natural conversational abilities and personalized user experiences. Some media speculate Apple's foundational AI model may officially debut at WWDC 2024. In this wave of generative AI, has Apple really fallen behind? This article attempts to trace Apple's ongoing investments in the AI field, starting from the introduction of its voice assistant Siri in 2011.

Apple was once an undisputed pioneer in the AI revolution.

Thirteen years ago, when Siri first debuted at the iPhone 5 launch event, this fluent conversational AI voice assistant represented cutting-edge future technology in people's eyes: it was the pinnacle of artificial intelligence at that time. Even Scott Forstall, Apple's software chief with years of AI research experience, found it hard to believe this product could be realized. He remarked, "I've worked in artificial intelligence for a long time, but this still blows me away." Completely different from the deep learning that GPT relies on, the technological marvel at that time was NLP (Natural Language Understanding), which achieved its conversational abilities through a hard-coded 'command and control' system. It could only understand pre-programmed question frameworks and requests, as well as a set of isolated phrases, such as 'Check today's humidity in Beijing' or 'Call mom.' Once a request fell outside the programmed system, Siri would immediately become powerless. As a result, Apple had a 20-person team dedicated to anticipating user questions and updating them into the system. Its improvements were also extremely cumbersome. In an interview with The New York Times, its former project leader stated that Siri's database was vast and complex, and any simple update to this system required rebuilding the entire database, taking over six weeks. Adding new complex features could take nearly a year. This is the main reason why, over the past 13 years, your Siri has only become slightly smarter, but not by much.

Because the market truly lacked strong competitors—Amazon's Alexa and Google's voice assistant didn’t offer a significantly better user experience—Apple also appeared lackadaisical in updating this capability, allowing corporate inertia to accumulate. The lack of iteration has even made Siri disliked within Apple itself. According to a report by Information, the VisionPro manufacturing team once considered replacing the voice assistant controlling the XR system due to Siri's poor performance. The same Information report on Siri's chaotic situation mentioned that before Apple's new AI head John Giannandrea joined from Google in 2018, the Siri team was plagued by infighting among senior employees, wavering technical directions, and nearly stagnant development. Giannandrea successfully integrated the team and accelerated Siri's evolution. In 2019, Laserlike, a search team acquired by Siri, released the first new Siri feature that applied the Transformer architecture, allowing Siri to synthesize web information to answer user questions. However, Google had already been applying this technology for at least five years prior.

Although Siri has started to make progress, it still faces three major challenges: bureaucracy, top-down micromanagement, and conservative opinions from the design team. Bureaucracy consumes significant human resources. For example, the stillborn Siri Blackbird project aimed to rewrite and simplify Siri's architecture to make it more responsive, allow app developers to create features, and enable on-device operation on iPhones. However, to meet the demands of senior employees' 10-year Siri anniversary project, personnel were reassigned, resulting in only the on-device operation feature being implemented.

Another major issue is the leadership's obsession with reputation and excessive micromanagement, which has almost prevented all major technological innovations. Due to potential PR risks from Siri's voice prompts—such as when a 13-year-old boy in Indiana tried using Siri to find methods for a school shooting in early 2019—Apple executives, including Tim Cook, frequently scrutinize Siri's responses. This makes it difficult for them to adopt less accurate but innovative technologies, such as deep learning. The design team's supreme status and stringent requirements within Apple's system have made implementing functional changes extremely difficult. For instance, before launching a new search feature, engineers and designers clashed over answer accuracy standards. The Siri design team demanded near-perfect responses, while engineers advocated for an 80% accuracy threshold. It took engineers months to convince designers that not every answer required manual verification, as this limitation would prevent Siri from scaling to handle the vast number of user queries. The design team only relinquished this requirement a year later.

Even a simple user feedback button for reporting Siri flaws was vetoed by designers because "they wanted Siri to appear omniscient."

These issues culminated in late 2022 when three key members of Laserlike - Apple's most innovative AI team - became disillusioned and joined Google. Another consequence was Apple, the company that pioneered AI 13 years ago, falling far behind in the technological wave brought by OpenAI at the end of 2022.

Finally, ChatGPT's disruptive impact shattered Apple's complacency. Throughout 2023's earnings calls, shareholders' questions about Apple's AI development became media focal points, yet CEO Tim Cook consistently avoided disclosing any details about their AI plans. But over the past year, Apple has actually been operating at full throttle behind the scenes: clarifying its business path, making massive financial investments, adjusting teams, and fostering cross-departmental collaboration. After a quiet year, it is preparing to make a comeback in 2024.

Although the results are not yet significant, by observing team changes and technical papers, we can still piece together what Apple has done in 2023 and what it plans to deliver in 2024.

According to several engineers who have worked on machine learning at Apple, the company's leadership appears to place greater emphasis on 'edge AI,' which involves running AI model software on devices rather than on cloud servers. Apple's AI strategy has never been about the 'bigger and stronger' foundational models that tech giants are racing to release. Unlike large language models that rely on cloud computing, edge AI operates on local devices, eliminating the need for cloud servers or internet connections, thereby delivering faster, more secure, and more reliable AI computing performance. According to IDC data, iPhone led the market in 2023 with a 20.1% share. Statistics suggest Apple likely covers around one billion devices worldwide. This means that integrating new AI features into iOS would quickly reach billions of devices, impacting hundreds of millions of users. This market access advantage is unparalleled by other companies.

1. Team Efforts to Catch Up with Updates

Apple has no shortage of talent; what's needed now is empowering them to drive change. Although Giannandre was skeptical about the capabilities of large language models until 2023, he was thoroughly convinced by various chatbot demonstrations early this year.

However, he had already prepared for large language models by forming the Foundational Models team four years ago to develop this new technology. The original team was the Blackbird project team, previously mentioned for attempting to reinvent Siri, led by Van Hoff, one of the creators of Java. After the Blackbird project failed, he turned to exploring large language models. In 2021, Rouming Pang, a neural network learning expert with 15 years at Google, joined Apple and became the new leader of the Foundational Models team. By 2023, his development of the Axlearn training architecture provided Apple with the foundation to develop large models. These three individuals are the core trio of Apple's AI project.

This previously non-core team at Apple finally gained sufficient resources to realize their ambitions. When reported by Information on September 6, the team had only 16 members but was already able to mobilize millions of dollars in daily training funds. Over the year, starting with Ajax GPT, they gradually enhanced their capabilities, culminating in the release of a multimodal large model compact enough to fit into smartphones by year-end, keeping pace with the latest developments in AI. Information disclosed about Apple's large language model core architecture—the three musketeers are all in this diagram.

2. The Three Musketeers—John Giannandrea, Craig Federighi, and John Ternus—Redefine Apple's AI Roadmap

The goal is AI that can operate on iPhones, which makes the technical approach relatively clear: train a new on-device model, use it to reinvent Siri, and position it as the new "brain" for Apple products in the AI era. As shown in the diagram below, the ultimate objective of the three musketeers all points to this direction. The Three Musketeers collaborate on Apple AI across foundational models, software engineering, and hardware.

John Giannandrea's foundational model team is responsible for the research and development of several key technologies, including foundational models, multimodal capabilities, on-device models, and spatial computing. Currently visible achievements include: the team's Ajax GPT model, which may have over 200 billion training parameters and could functionally surpass GPT-3.5, which was considered the industry benchmark at the time. In late 2023, they quietly released the Ferret multimodal large language model, which supports multimodal inputs of text, audio, and video. Through its unique mixed-region representation technology, it effectively identifies and describes complex spatial relationships in images. Additionally, the Ferret model demonstrates relatively high efficiency when performing language model reasoning tasks. Unlike Apple's traditionally closed approach, this time—possibly influenced by John Giannandrea's philosophy—Apple open-sourced Ferret's code and the extensive GRIT dataset, further proving its potential in multimodal understanding and generation tasks. In the field of LLM mobile optimization technology development, on January 14, 2024, we saw Apple update a research paper (https://arxiv.org/pdf/2312.11514.pdf) primarily addressing how to efficiently conduct large language model (LLM) inference on memory-constrained devices. Apple's research team proposed a new method that solves the challenge of running LLMs on resource-limited devices by storing model parameters in flash memory and dynamically loading them into dynamic random-access memory (DRAM) as needed.

To explain in simpler but less precise terms:

These language models typically require substantial memory to operate, but encounter problems when device memory is limited. Apple came up with a new idea: they store the language model's parameters (like the model's brain memory) in flash storage, which offers larger capacity but slower speeds. Then when needed, they temporarily move these parameters to DRAM, which is faster but has limited space. The advantage of this approach is that it only transfers necessary parameters on demand rather than moving everything back and forth, saving both time and memory. They used two techniques to achieve this idea:

Windowing: This technique is like reading a book where you don't need to start from the first page every time, but only focus on the part you currently need. In language models, this means we only load those neurons that have been previously activated (as if they've been used), thereby reducing the repeated loading of the same information. Row-Column Packing: This technique leverages the faster speed of flash memory when reading large continuous data segments. Imagine moving a pile of bricks—it's much quicker to move the entire pile at once than to move each brick individually. When processing data, we pack related data together and read it all at once, which significantly improves efficiency.

Through these methods, even devices with limited memory can run models larger than the available memory, and at much faster speeds than before. It's like hosting a large party in a small room by cleverly arranging everything. Such research enables memory-constrained devices like smartphones or tablets to utilize advanced language models. Beyond the efforts of the foundational model team, the software engineering department led by Craig Federighi aims to expand Siri's core functionalities to better align with user habits, such as enabling Siri and the Messages app to auto-complete sentences. They will also integrate large language models into development tools like Xcode, allowing iOS app developers to write new applications more efficiently, similar to having a Windows Copilot. Meanwhile, the services department led by Eddy Cue will explore how to apply the latest AI technologies across other Apple ecosystem software, such as automatically generating presentations in the productivity tool Keynote.

The third prong is the hardware team led by John Terners. Unlike large language models reliant on cloud computing, edge AI runs on local devices without needing cloud servers or internet connections, delivering faster, more secure, and reliable AI performance. For edge AI to perform swiftly and accurately, two prerequisites must be met: powerful hardware support (e.g., chips) and robust software-hardware integration. Take Apple's latest A17 chip as an example—it is often compared to Qualcomm's Snapdragon 8Gen3, with both chips demonstrating comparable capabilities. While Qualcomm announced that the Snapdragon 8Gen3 can support billion-parameter large models on mobile edge devices, Apple has not specified the A17's AI model support. However, based on chip parameter comparisons, this seems achievable for Apple. Beyond mobile, PC-side AI capabilities are also a market focus, with the M-series chips designed specifically for the Mac lineup. We can anticipate whether Apple's hardware upgrades this year will explicitly address AI functionality enhancements. When foundational model capabilities, software prowess, and hardware computational power converge and advance in tandem, we can't help but wonder: could this herald the birth of another Apple-like revolution?

Starting from the much-criticized Siri, Apple has never ceased its continuous investments in the field of machine learning. As illustrated below, the company has published research achievements across computer vision, natural language processing, and multimodal learning domains.

Contrary to its perceived 'passive' image, Apple is actually the most aggressive tech giant in AI acquisitions. According to Stocklytics.com data, by 2023 Apple had acquired 32 AI startups - more than any other tech titan (Google: 21, Meta: 18, Microsoft: 17). Stocklytics financial analyst Edith Reads commented on the data:

In the ongoing AI arms race, Apple is engaging in large-scale deals with numerous AI startups, primarily to secure a favorable position for future development. By acquiring promising AI startups, Apple gains access to top-tier talent and core innovative technologies, consolidating its position in key AI domains and ensuring a competitive edge in the rapidly evolving tech landscape.

Apple's investment strategy reflects its focus areas in the AI field, including AI talent, critical technologies, and intellectual property. As early as 2020, Apple acquired Voysis, an AI startup involved in creating digital voice assistants that naturally understand natural language. The purpose of Apple's acquisition was to improve its virtual assistant Siri in its devices.

Apple also acquired WaveOne in March 2023, whose technology aids in large-scale video compression. Other technologies obtained by Apple include Emotient, Laserlike, Drive.AI, and AI.Music, some of which have already been integrated into iPhones, Apple Watches, and Macs.

A notable feature of Apple's acquisition spree is its emphasis on acquiring early-stage startups, indicating an aggressive strategy to identify and invest in AI trends and technologies before they reach mainstream adoption. Last year, apart from large models and their applications, one of the most discussed topics in the AI industry was: Will smartphones remain the future intelligent terminal devices? If not, what form will AI-native smart devices take?

Ai Pin and Rabbit R1 suggest they don't need to be. These devices have been miniaturized, with simplified interactions, even eliminating screens entirely.

However, humans are visual creatures. Until screen technology is replaced by another visual technology, regardless of how interactions evolve or applications develop, we will still need screens. Therefore, the form of AI smart devices may not differ significantly from smartphones that balance portability and user experience, with hardware foundations remaining largely similar except for chips. Among display technologies with the potential to replace traditional screens, XR stands out, and Apple has already established its ecosystem niche with Vision Pro.

If we're talking about smartphones, what truly defines an AI phone? After discussions with AI teams from major device manufacturers, the consensus is this: breaking down application barriers, vertically integrating on-device apps, and enabling users to mobilize multiple apps with a single command to automatically meet their needs. This requires device manufacturers to have robust ecosystem integration capabilities and sufficient influence over each app. In this regard, Apple indeed holds a crushing advantage. Strategically, Apple can still reign supreme.

In terms of timing, releasing the first-generation system integrated with GenAI in 2024 is not considered slow. The application of generative AI on devices is far from mature. For example, Apple's main competitor, Google, has incorporated many AI features into Pixel, but most are minor additions like one-click AI background removal or automatic message replies—far from being products that can fundamentally transform the user experience. Its flagship product, Gemini, didn't even have an app version until late 2023. The application of AI in smartphones remains underdeveloped, partly because the usage models are still in the exploratory phase, and partly because current on-device computing power can only support limited generative AI capabilities. Both issues are expected to see preliminary solutions in 2024. Now that Apple has entered the game, the real competition begins.