Google DeepMind Releases Gemini 2.5 Computer Use Model

In a significant leap for artificial intelligence, Google DeepMind has announced the launch of the Gemini 2.5 Computer Use model, a specialised version of its Gemini 2.5 Pro AI designed to directly interact with graphical user interfaces (GUIs). Available in preview through the Gemini API via Google AI Studio and Vertex AI Studio, this new model introduces an advanced layer of autonomy, allowing AI systems to perform real-world computer tasks seamlessly.
A New Dimension of AI Interaction
Unlike traditional AI systems limited to text or code-based interactions, the Gemini 2.5 Computer Use model enables agents to fill out forms, click buttons, scroll through content, manipulate dropdowns, and even operate behind logins. This breakthrough represents a step toward developing general-purpose AI agents capable of handling dynamic, real-time digital environments.
DeepMind explained that “the ability to natively fill out forms, manipulate interactive elements like dropdowns and filters, and operate behind logins is a crucial next step in building powerful, general-purpose agents.”
How the Model Works
Developers can access this functionality through the computer-use tool, which operates in an interactive loop. Each cycle takes a user request, a screenshot of the current environment, and a record of recent actions as input. The model then generates UI actions such as clicks or text entries that are executed by client-side code. This loop continues with updated screenshots and context until the task is completed or stopped by the user.
While the model is currently optimised for web browsers, DeepMind notes that it has promising potential for mobile user interface (UI) control, though it is not yet designed for desktop-level system operations.
Real-World Applications and Demonstrations
Early demonstrations showcase the model’s versatility, including tasks like transferring pet-care data to a CRM system and organising digital sticky notes into categories activities that previously required human input. The system has already shown strong performance on major benchmarks such as Online-Mind2Web, WebVoyager, and AndroidWorld, achieving over 70% accuracy with low latency of around 225 seconds.
Focus on Safety and Responsible Use
DeepMind has placed a strong emphasis on safety and ethical deployment, recognising the potential risks of AI agents operating computers autonomously. Concerns such as misuse, unexpected behaviours, or exposure to online scams have been proactively addressed.
The company has integrated multiple safety features within the model, providing developers with control settings that can restrict or require confirmation for high-stakes actions. “Developers can further specify that the agent either refuses or asks for user confirmation before it takes specific kinds of high-stakes actions,” DeepMind stated.
Towards the Future of AI Autonomy
The release of the Gemini 2.5 Computer Use model signifies an important stride toward AI-driven task automation. By bridging the gap between text-based reasoning and hands-on interaction, DeepMind is laying the groundwork for a future where AI systems can truly function as digital collaborators not just assistants.
As AI continues to integrate deeper into our digital workflows, Gemini 2.5 could redefine how humans and machines interact, marking a new era of intelligent, secure, and adaptive computing.