Action and Output Schema
OpenPocket action schema is a tagged union.
Model Step Output
ts
interface ModelStepOutput {
thought: string;
action: AgentAction;
raw: string;
}AgentAction Types
ts
type AgentAction =
| { type: "tap"; x: number; y: number; reason?: string }
| { type: "swipe"; x1: number; y1: number; x2: number; y2: number; durationMs?: number; reason?: string }
| { type: "type"; text: string; reason?: string }
| { type: "keyevent"; keycode: string; reason?: string }
| { type: "launch_app"; packageName: string; reason?: string }
| { type: "shell"; command: string; reason?: string }
| { type: "run_script"; script: string; timeoutSec?: number; reason?: string }
| {
type: "request_human_auth";
capability:
| "camera"
| "sms"
| "2fa"
| "location"
| "biometric"
| "notification"
| "contacts"
| "calendar"
| "files"
| "oauth"
| "payment"
| "permission"
| "unknown";
instruction: string;
timeoutSec?: number;
reason?: string;
}
| { type: "wait"; durationMs?: number; reason?: string }
| { type: "finish"; message: string };Normalization Defaults
When fields are missing/invalid:
tap:x=0,y=0swipe: coords default0,durationMs=300type:text=""keyevent:keycode="KEYCODE_ENTER"launch_app:packageName=""shell:command=""run_script:script="",timeoutSec=60request_human_auth:capability="unknown",instruction="Human authorization is required to continue.",timeoutSec=300wait:durationMs=1000finish:message="Task finished."- unknown type ->
wait(durationMs=1000)
Execution Semantics
tap:adb shell input tap <x> <y>swipe:adb shell input swipe <x1> <y1> <x2> <y2> <durationMs>type: triesadb shell input text; for non-ASCII or failure, falls back to clipboard + paste keyeventkeyevent:adb shell input keyevent <keycode>launch_app:adb shell monkey -p <package> -c android.intent.category.LAUNCHER 1shell: executes command tokens afteradb shellrun_script: handled byScriptExecutorinAgentRuntimerequest_human_auth: pauses task and waits for human approval throughHumanAuthBridgewait: async sleepfinish: marks successful task completion
Current Screen Snapshot Schema
ts
interface ScreenSnapshot {
deviceId: string;
currentApp: string;
width: number;
height: number;
screenshotBase64: string;
capturedAt: string;
}