Apple Boosts Conversational AI with Daily Budget Surpassing Millions

baoshi.rao

On September 7th, foreign media reports citing insiders disclosed that Apple has consistently expanded its budget for AI computational needs, now exceeding millions of dollars per day. One of the company's goals is to develop functionalities allowing iPhone users to automate complex, multi-step tasks through simple voice commands. For instance, this technology could enable users to instruct Siri to create a GIF from their five most recent photos and send it to a friend—a process currently requiring manual execution.

According to sources familiar with the team, Apple's AI chief John Giannandrea was authorized four years ago to assemble a team dedicated to developing conversational AI, specifically large language models (LLMs), long before the technology became a software industry focal point. This foresight became evident after OpenAI's ChatGPT launch last year, which catalyzed the LLM boom.

Several Apple employees noted that while Giannandrea has repeatedly expressed skepticism about AI-powered chatbots' potential applications, the company was not entirely unprepared for the LLM surge, thanks to his efforts in reshaping Apple's software research culture.

These changes are now paying off as LLMs promise to revolutionize creation methods—from coding and slide presentations to books—while automating tedious, text-based tasks like summarization, corporate IT ticketing, or customer service inquiries.

The conversational AI team, named "Foundational Models," comprises engineers with extensive Google experience, including Ruoming Pang, who joined Apple in 2021 after 15 years at Google, including under Giannandrea's leadership in AI research.

Apple's Multiple LLM Teams

Insiders reveal the "Foundational Models" team remains small (∼16 members), but its budget for training cutting-edge models has grown to millions daily. In contrast, OpenAI CEO Sam Altman stated GPT-4's training cost exceeded $100 million over several months.

Apple's team mirrors AI units at Google and Meta, where researchers build models later integrated into products. Additionally, Apple reportedly has at least two newer teams: a "Visual Intelligence" group developing image/video/3D generation tools (per research papers and LinkedIn profiles) and a multimodal AI team (formerly led by ex-Google AI researcher Jon Shlens) working on models processing text, images, and video—paralleling Google's upcoming Gemini model.

Apple's 'Foundation Model' team has developed several advanced models and is conducting internal testing. According to a source familiar with Apple's chatbot development, a large language model-powered chatbot may eventually interact with AppleCare customers for warranty and technical support services.

Siri Upgrade

The Siri team also plans to integrate large language models to enable the voice assistant to automatically complete complex tasks that are currently impossible, such as creating and sending GIFs with simple commands. This new feature, related to Apple's Shortcuts app, is expected to launch next year with the new iPhone OS. Google is reportedly working on similar integration for its voice assistant.

Apple's AJAX GPT model is believed by team members to surpass OpenAI's GPT-3.5 in capability, though OpenAI has since released more powerful models. Details remain unclear about how Apple will implement these models in products, with former machine learning engineers noting the company's preference for on-device processing for privacy and performance - though this presents challenges given AJAX GPT's 200+ billion parameters.

There are precedents for scaling down large language models, such as Google's PaLM 2 which comes in four sizes including one for on-device use. Apple declined to comment.

Google's Influence

John Giannandrea (known as J.G.), who joined Apple to improve Siri and enhance machine learning capabilities, was initially skeptical about chatbot applications but has recently acknowledged the technology's potential after seeing internal demonstrations. The formation of Apple's Foundation Model team reflects Giannandrea's effort to make Apple more like his former employer Google by allowing greater research flexibility, addressing previous recruitment challenges despite Apple's successful implementation of early AI technologies.

After joining Apple in 2018, Giannandrea helped recruit key engineers and researchers from Google. He also advocated for increased use of Google's cloud services, including servers equipped with Google-developed AI chips (Tensor Processing Units) to train Apple's machine learning models, thereby enhancing Siri and other product features.

According to those familiar with Ruoming Pang, his published research on neural networks attracted a significant following. Neural networks, a subset of machine learning, involve training software to recognize patterns and relationships in data, similar to how the human brain works. Some of Pang's notable research focused on how neural networks interact with mobile processors and the use of parallel computing to train neural networks—a process that breaks down larger problems into smaller tasks that multiple processors can compute simultaneously.

Open-Source Movement
Pang's influence at Apple is evident in AXLearn, an internal software developed by his team over the past year for training AJAX GPT. AXLearn is a machine learning framework designed for rapid model training. Parts of AXLearn are based on Pang's research and optimized for Google's Tensor Processing Units.

AXLearn is a fork of JAX, an open-source framework developed by Google researchers. If Apple's AJAX GPT is likened to a house, AXLearn serves as the blueprint, while JAX represents the pen and paper used to draw it. The data Apple uses to train its large language models primarily comes from the construction industry and has not been publicly disclosed.

In July of this year, Apple's "Foundation Models" team quietly uploaded AXLearn's code to GitHub, making it available for public use in training their own large language models without starting from scratch. The reason behind Apple's decision to open-source AXLearn remains unclear, but such moves typically aim to encourage improvements from external engineers. Before Giannandrea's arrival, this decision to release commercially usable source code was unusual for Apple, known for its secrecy.

Team Leadership
The team, initially led by Dutch computer scientist Arthur van Hoff, later became the core of Apple's "Foundation Models" team. Those familiar with van Hoff note that he was an early member of Sun Microsystems' Java development team in the 1990s and later became a prominent entrepreneur. Van Hoff joined Apple in 2019, initially working on a new version of Siri codenamed Blackbird, which Apple eventually abandoned.

Van Hoff's team then shifted focus to building large language models, aiming to integrate them into Blackbird's foundational version. The team started with just a few members, notably two British researchers specializing in natural language processing: Tom Gunter and Thomas Nickson. Both earned advanced degrees from Oxford University and joined Apple in 2016 to work on Siri.

In 2021, Pang joined Apple to help train cutting-edge large language models. Unlike other Apple researchers, he was granted permission to remain in New York and establish a new outpost for the company's machine learning team there. Months later, Apple hired former Google AI executive Daphne Luong to oversee van Hoff's team and brought on Google researcher Samy Bengio to lead a parallel team focused on long-term machine learning research.

Pang Ruoming has now taken over the 'Foundation Models' team, while Hoff began an indefinite leave earlier this year. According to informed sources, several members of Pang's team are currently based in New York.

Google Cloud Deal

Pang's recruitment came as Apple increasingly recognized the growing importance of large language models in machine learning. Sources reveal that OpenAI's GPT-3 release in June 2020 prompted Apple's machine learning team to request additional funding to train their own models.

Two individuals with direct knowledge disclosed that to cut costs, Apple management has historically encouraged machine learning engineers to use Google's cloud computing services over Amazon's similar offerings due to Google's lower pricing.

A former Apple executive familiar with the discussions revealed that Google executives previously told Apple their discounted cloud pricing partly acknowledged the extensive commercial partnership between the companies. Under their agreement, Google Search remains Safari's default provider. Apple has long been one of the world's largest cloud server lessees and became a major Google Cloud client while maintaining significant business with Amazon.

One source noted Apple's active recruitment from Google and Meta's AI teams. Since AXLearn code was uploaded to GitHub in July, 18 contributors have improved it - at least 12 joined Apple's machine learning team in the past two years, with 7 previously working at Google or Meta.