ukkin vs ChatGPT Operator vs Anthropic Computer Use: On-Device vs Cloud AI Agents

A practical comparison of ukkin, ChatGPT Operator, and Anthropic Computer Use for autonomous mobile and desktop AI agents — privacy, autonomy, and the on-device vs cloud trade-off.

The question

Should I build my AI agent with ukkin, ChatGPT Operator, or Anthropic Computer Use?

The autonomous-agent space has split into two distinct camps: cloud agents (Operator, Computer Use) and on-device agents (ukkin, plus various mobile-RPA tools). The trade-off is fundamental: capability vs privacy, latency vs autonomy. This post is the comparison we wish we had when we started ukkin.

The 60-second version: Operator and Computer Use are cloud-based agents that see the user’s screen by streaming it to a remote server. ukkin is an on-device agent that sees the screen, reasons, and acts entirely locally — no data leaves the phone.

What each project is

ukkin is a mobile-first on-device AI agent framework written in Flutter. It integrates with llamafu for local LLM inference, uses accessibility APIs for screen understanding, and acts through Android’s AccessibilityService (iOS is more constrained). All reasoning and action happens on the device.

ChatGPT Operator is OpenAI’s cloud-based agent for desktop browsers. It takes control of a hosted browser, navigates web pages, and completes tasks. The browser is rendered on OpenAI’s servers; the user provides credentials; the agent operates the browser in the cloud.

Anthropic Computer Use is Anthropic’s cloud-based agent for desktop. Similar to Operator but exposed via the Claude API. It takes screenshots of the user’s actual desktop, sends them to Claude, and Claude returns mouse and keyboard actions. The user gives it access to their actual machine.

The five dimensions

DimensionukkinChatGPT OperatorAnthropic Computer Use
Where it runsOn the user’s deviceOpenAI’s cloudAnthropic’s cloud
Screen captureAccessibility APIs (no pixel capture)Hosted browser (pixel capture)User’s actual desktop (pixel capture)
LLMOn-device (llamafu)OpenAI’s frontierClaude’s frontier
Data sent to cloudNoneBrowser content + actionsScreen content + actions
Autonomy tierTiered (observe / reversible / irreversible)User-approved per sessionUser-approved per session
Reasoning qualityLower (smaller model)HighestHighest
LatencyLow (local inference)Higher (round trip)Higher (round trip)
PrivacyStrong (nothing leaves device)Weak (browser content sent to OpenAI)Weakest (full screen sent to Anthropic)
ComplianceStrong (no data leaves device → GDPR-friendly)Medium (OpenAI’s data handling)Medium (Anthropic’s data handling)
Mobile supportFirst-classBrowser onlyBrowser only
Custom actionsBring your own (the agent API)Limited (browser actions)Limited (OS actions)
LicenseMITProprietaryProprietary
CostFree, runs on device$200/month (Pro plan)Pay-per-token

When to use which

Use ukkin when:

  • The data the agent sees is sensitive (personal messages, banking, medical, corporate). Anything that sends screen contents to a cloud server is a privacy non-starter.
  • The agent runs on a mobile device and the user expects the agent to work offline.
  • You need a tiered autonomy model: the user can choose how much the agent can do without confirmation.
  • You are building an internal agent for an enterprise that has data-residency requirements (EU AI Act, GDPR, HIPAA).

Use ChatGPT Operator when:

  • The task is web-only and the data is not sensitive.
  • The user is already an OpenAI customer and the monthly fee is acceptable.
  • The task requires frontier-model reasoning that on-device LLMs cannot match.

Use Anthropic Computer Use when:

  • The task requires desktop OS actions (not just browser actions).
  • The user is already an Anthropic customer and the per-token pricing is acceptable.
  • The data is not sensitive enough to require on-device processing.

The privacy / capability trade-off

The fundamental trade-off is between capability and privacy:

  • Cloud agents (Operator, Computer Use) have access to frontier reasoning. They can solve complex tasks that on-device LLMs cannot. But they require the user’s data to leave the device.
  • On-device agents (ukkin) have access to whatever quantised model fits on the phone. The reasoning is less sophisticated. But the data never leaves the device.

The middle ground is what ukkin is actually for: the agent runs on-device for the data, and the agent can call a cloud LLM for the reasoning — but only for the specific task at hand, with a clear contract about what data crosses the boundary. This is the ephemeral credentials pattern from perishable: the LLM API call is short-lived and scoped.

A concrete example: agentic personal assistant

Imagine a personal assistant that reads your email, calendar, and notes, and helps you plan your day. The agent needs to:

  1. Read your screen to know what app you’re in.
  2. Read emails and calendar events to know your commitments.
  3. Reason about conflicts and suggest reschedules.
  4. Optionally send messages or create events.

With ukkin:

  • Steps 1-2 happen entirely on-device. The email content never leaves the phone.
  • Step 3 happens on-device (with the local model).
  • Step 4 can either happen on-device (with confirmation) or via a perishable-scoped LLM call (also on-device by default, with the cloud LLM as an option).

With Operator:

  • Steps 1-2 require the user to log into their email in a hosted browser. OpenAI sees the content.
  • Step 3 happens on OpenAI’s servers.
  • Step 4 happens via the hosted browser.

The privacy story is fundamentally different. ukkin is the choice if you care about the email content not leaving the device.

What ukkin does NOT do

  • No frontier reasoning. ukkin’s on-device model is a quantised 3-7B model. It can plan a day, but it cannot write a novel.
  • No multi-app workflows yet. ukkin is iOS-constrained; Android is broader but still single-app at a time.
  • No natural-language UI yet. The user configures goals in code, not in chat.

These are all research projects, on the ukkin roadmap.