Alibaba Open-Sources Mobile-Agent 3: A Powerful GUI Agent Family
-
Today, the X-PLUG team officially released its latest project, Mobile-Agent-v3, on GitHub. This is a cross-platform multi-agent framework based on GUI-Owl. Mobile-Agent-v3 boasts robust planning, progress management, reflection, and memory capabilities, aiming to enhance users' GUI automation experience.
GUI-Owl, as the foundational model of Mobile-Agent-v3, integrates multiple functionalities such as perception, reasoning, planning, and execution, making it a native end-to-end multimodal agent. Its design ensures smoother cross-platform interactions and multi-round decision-making, with clear intermediate reasoning capabilities. This means users can achieve more stable performance when handling multi-tasking operations.
The X-PLUG team specifically mentioned that Mobile-Agent-v3 not only optimizes functionalities but also enhances exception handling and reflection capabilities, ensuring efficient operations even when faced with interruptions like pop-ups and ads. Additionally, Mobile-Agent-v3's key information recording feature makes cross-application task execution more convenient, greatly facilitating users' daily operations.
Meanwhile, several predecessor versions of Mobile-Agent, such as Mobile-Agent-v2 and PC-Agent, were accepted at the NeurIPS2024 and ICLR2025 conferences, demonstrating the project's broad influence in academic research.
It is worth noting that the X-PLUG team also provides extensive resource support, including technical reports, demo videos, and code repositories, enabling developers and researchers to explore the potential of Mobile-Agent more deeply. Through these resources, users can not only experience the powerful functionalities of Mobile-Agent but also participate in its subsequent development and optimization.