Architecture
OpenPocket is a local-first phone-use runtime: automation runs against configurable local Agent Phone targets, and state remains auditable on disk.
A single install can host one default agent plus many managed agents. Each agent remains a single-target runtime.
End-to-End Topology
Runtime Planes
- Manager plane: agent registry, model template capture, manager ports, target exclusivity locks, manager dashboard, shared relay hub.
- Agent instance plane: one config, one workspace, one state, one gateway, one dashboard, one selected target.
- Intelligence plane:
AgentRuntime+ModelClientfor one-step multimodal decisions. - Prompt/context plane: workspace templates + skills +
/contextdiagnostics. - Extensibility plane:
SkillLoader,ScriptExecutor,CodingExecutor,MemoryExecutor. - Capability plane:
PhoneUseCapabilityProbefor camera/microphone/location/photos/payment signal detection. - Execution plane:
AdbRuntimedrives the selected target and captures snapshots. - Persistence plane: sessions, memory, screenshots, onboarding state, and generated artifacts.
- Human-auth plane:
HumanAuthBridge+ relay/tunnel for remote approval/delegation handoff.
Multi-Agent Runtime Model
OpenPocket does not implement one agent controlling multiple targets simultaneously.
Instead, it implements:
- one install
- many isolated agent instances
- one selected target per agent instance
This keeps these boundaries simple and auditable:
- workspace and session continuity remain agent-local
- channel state remains agent-local
- target binding remains exclusive
- gateway busy/locks remain agent-local
Deployment Targets
emulator: default onboarding path and fully documentedphysical-phone: USB + Wi-Fi ADB path, ready for daily usageandroid-tv: type and baseline flow available, broader hardening in progresscloud: type/config placeholder exists, provider integrations in progress
Target rules:
- switching target is explicit via
openpocket target set ... - managed agents use
openpocket --agent <id> target set ... - the selected agent gateway must be stopped before target changes
- two agents cannot bind or run against the same target fingerprint
Primary Task Loop
- Receive task from CLI, channel, or cron.
- Select one agent config/workspace/state context.
- Create session context and resolve model profile/auth.
- Capture screen snapshot and call model for exactly one normalized action.
- Execute action by target executor:
AdbRuntimefor phone actionsScriptExecutorforrun_scriptCodingExecutorfor file/shell/process toolsMemoryExecutorfor memory tools
- Run capability probe checks around interactive actions and optionally escalate to Human Auth.
- Persist step thought/action/result and optional screenshot.
- Emit selective progress narration through the selected gateway.
- Stop on
finish, max steps, error, or explicit stop. - Finalize session, append daily memory, and generate reusable artifacts on success.
Manager Services
Manager dashboard
The manager dashboard is an install-level overview that shows:
- all registered agents
- per-agent target binding
- configured channels
- gateway running state
- links to each agent dashboard
Shared relay hub
The relay hub is also install-level:
- managed agents register their private local relay endpoints to it
- one optional ngrok public URL can be reused across agents
- requests are routed by
/a/<agentId>/... - actual request state/artifacts stay inside the target agent's state directory
Permission and Human Auth Boundary
- Android runtime permission dialogs inside Agent Phone are handled locally by policy.
request_human_authis for real-world sensitive checkpoints (OTP, camera, microphone, payment, OAuth, delegated files/data).- In agentic delegation mode, runtime stores/describes artifacts; the agent decides how to apply them using capability skills.
Auto Skill Experience Engine
On successful runs, AutoArtifactBuilder can produce:
workspace/skills/auto/*.md(behavior fingerprint + semantic UI target traces)workspace/scripts/auto/*.sh(replay script from deterministic steps)
At inference time, SkillLoader injects:
- summarized skill catalog with names, descriptions, and file locations
- requirement-gated skill discovery only; the model must
read(location)to load a skill body on demand
This remains scoped to one agent workspace. Auto-generated experience is not shared across agents.
Model Endpoint Compatibility
Endpoint fallback order:
- task loop (
ModelClient):chat->responses - chat assistant (
ChatAssistant):responses->chat->completions
This keeps provider compatibility high without changing user workflow.
Why This Shape
- no hosted cloud phone runtime required
- device control and artifacts stay local
- users can choose emulator for convenience or physical phone for production-like behavior
- multiple isolated agents can coexist without corrupting each other's workspace or target state
- one shared relay entry can still serve many agents when free ngrok limits force a single public URL
