The Computer Use model is available for preview through the API. This specialized model is based on Gemini 2.5 Pro, and it can help agents interact with user interfaces.
Earlier this year, Gemini AI launch 2025 that we would be adding computer use to the Gemini API so that developers can build agents that use computers. Today, we are introducing the Google Gemini Computer Use Gemini 2.5 AI model, a specialized model built on Gemini 2.5 Pro’s advanced visual understanding and reasoning abilities to power agents that can understand and act within user interfaces (UIs). It sets new state-of-the-art performance on several web and mobile control benchmarks, while also exhibiting lower latencies. This model will be available through the Gemini API on Google AI Studio and Vertex AI.
While a number of AI models already work with software through structured APIs, much of the world’s digital workloads require reasoning within graphical user interfaces (GUIs) — for example, filling out forms and then submitting them. For these tasks, agents will need to read and act on web pages and applications as a human would: by clicking, typing, and scrolling.
How it works
Gemini 2.5 features primary Gemini 2.5 capabilities are available through the new computer_use tool in the Gemini API and is designed to be run in a loop. The inputs to this tool are the user’s request, a screenshot of the interface, and recent action history. You can also pass in a list of UI actions to exclude, or you can provide additional Google Gemini update.
The model processes the inputs and outputs a response, typically as a function call for a particular UI action like clicking or typing. For some actions, like making a purchase, the client might have a separate step for end-user confirmation. When the client-side code has executed the action, a new screenshot and the current URL are sent to the Computer Use model to restart the loop. This process repeats for a series of actions until the task is complete, an error is encountered, or the interaction is terminated by a safety response or user choice.