Action and Output Schema
OpenPocket action schema is a tagged union used by the task loop. The model returns one tool call per step, and runtime normalizes it to AgentAction.
Model Step Output
ts
interface ModelStepOutput {
thought: string;
action: AgentAction;
raw: string;
}AgentAction Types
ts
type AgentAction =
| { type: "tap"; x: number; y: number; reason?: string }
| {
type: "swipe";
x1: number;
y1: number;
x2: number;
y2: number;
durationMs?: number;
reason?: string;
}
| {
type: "drag";
x1: number;
y1: number;
x2: number;
y2: number;
durationMs?: number;
reason?: string;
}
| {
type: "long_press_drag";
x1: number;
y1: number;
x2: number;
y2: number;
holdMs?: number;
durationMs?: number;
reason?: string;
}
| { type: "type"; text: string; reason?: string }
| { type: "keyevent"; keycode: string; reason?: string }
| { type: "launch_app"; packageName: string; reason?: string }
| { type: "shell"; command: string; reason?: string }
| { type: "run_script"; script: string; timeoutSec?: number; reason?: string }
| { type: "read"; path: string; from?: number; lines?: number; reason?: string }
| { type: "write"; path: string; content: string; append?: boolean; reason?: string }
| { type: "edit"; path: string; find: string; replace: string; replaceAll?: boolean; reason?: string }
| { type: "apply_patch"; input: string; reason?: string }
| {
type: "exec";
command: string;
workdir?: string;
yieldMs?: number;
background?: boolean;
timeoutSec?: number;
reason?: string;
}
| {
type: "process";
action: "list" | "poll" | "log" | "write" | "kill";
sessionId?: string;
input?: string;
offset?: number;
limit?: number;
timeoutMs?: number;
reason?: string;
}
| {
type: "memory_search";
query: string;
maxResults?: number;
minScore?: number;
reason?: string;
}
| {
type: "memory_get";
path: string;
from?: number;
lines?: number;
reason?: string;
}
| {
type: "request_human_auth";
capability:
| "camera"
| "qr"
| "microphone"
| "voice"
| "nfc"
| "sms"
| "2fa"
| "location"
| "biometric"
| "notification"
| "contacts"
| "calendar"
| "files"
| "oauth"
| "payment"
| "permission"
| "unknown";
instruction: string;
timeoutSec?: number;
reason?: string;
uiTemplate?: {
templateId?: string;
title?: string;
summary?: string;
capabilityHint?: string;
artifactKind?: "auto" | "credentials" | "payment_card" | "form";
requireArtifactOnApprove?: boolean;
allowTextAttachment?: boolean;
allowLocationAttachment?: boolean;
allowPhotoAttachment?: boolean;
allowAudioAttachment?: boolean;
allowFileAttachment?: boolean;
fileAccept?: string;
middleHtml?: string;
middleCss?: string;
middleScript?: string;
approveScript?: string;
approveLabel?: string;
rejectLabel?: string;
noteLabel?: string;
notePlaceholder?: string;
fields?: Array<{
id: string;
label: string;
type:
| "text"
| "textarea"
| "password"
| "email"
| "number"
| "date"
| "select"
| "otp"
| "card-number"
| "expiry"
| "cvc";
placeholder?: string;
required?: boolean;
helperText?: string;
options?: Array<{ label: string; value: string }>;
autocomplete?: string;
artifactKey?: string;
}>;
style?: {
brandColor?: string;
backgroundCss?: string;
fontFamily?: string;
};
};
templatePath?: string;
}
| { type: "wait"; durationMs?: number; reason?: string }
| { type: "finish"; message: string };Tool Name Mapping
Tool calls use function names from src/agent/tools.ts. One canonical mapping exists:
type_text(tool name) ->type(AgentAction.type)
All other tool names map to the same action name.
Normalization Defaults
When fields are missing/invalid, runtime normalizes as follows:
tap:x=0,y=0swipe: coords default0,durationMs=300drag: coords default0,durationMs=360long_press_drag: coords default0,holdMs=450,durationMs=300type:text=""keyevent:keycode="KEYCODE_ENTER"launch_app:packageName=""shell:command=""run_script:script="",timeoutSec=60read:from=1,lines=200write:content="",append=falseedit:find="",replace="",replaceAll=falseapply_patch:input=""exec:yieldMs=0,background=false,timeoutSec=1800process: invalid action ->action="list",offset=0,limit=200,timeoutMs=0memory_search:query="",maxResults=6,minScore=0.2memory_get:from=1,lines=120request_human_auth:capability="unknown",instruction="Human authorization is required to continue.",timeoutSec=300wait:durationMs=1000finish:message="Task finished."- unknown type ->
wait(durationMs=1000)
Execution Semantics
ADB-backed actions
tap:adb shell input tap <x> <y>swipe:adb shell input swipe <x1> <y1> <x2> <y2> <durationMs>drag:adb shell input swipe <x1> <y1> <x2> <y2> <durationMs>long_press_drag:adb shell input swipe <x1> <y1> <x2> <y2> <holdMs + durationMs>type: triesadb shell input text; for non-ASCII or failure, falls back to clipboard + pastekeyevent:adb shell input keyevent <keycode>launch_app:adb shell monkey -p <package> -c android.intent.category.LAUNCHER 1shell: executes command tokens afteradb shellwait: async sleep
Script executor action
run_script: executes in controlled sandbox (ScriptExecutor) with allowlist, deny patterns, timeout, and output caps
Coding executor actions
read,write,edit,apply_patch,exec,processare handled byCodingExecutor- workspace path boundary is enforced when
codingTools.workspaceOnly=true execsupports foreground, background sessions, andyieldMsearly returnprocessmanages background sessions (list|poll|log|write|kill)
Memory executor actions
memory_search: searches onlyMEMORY.mdandmemory/*.mdmemory_get: reads onlyMEMORY.mdandmemory/*.md
Human authorization action
request_human_auth: pauses task and waits forHumanAuthBridgedecision- approved artifacts can be auto-applied:
- text artifact -> typed into focused field
- geo artifact ->
adb emu geo fix <lon> <lat> - image artifact -> pushed to
/sdcard/Download/...
Terminal action
finish: marks successful task completion and finalizes session
Current Screen Snapshot Schema
ts
interface ScreenSnapshotCaptureMetrics {
totalMs: number;
ensureReadyMs: number;
screencapMs: number;
screenSizeMs: number;
currentAppMs: number;
scaleMs: number;
uiDumpMs: number;
overlayMs: number;
uiElementsSource: "fresh" | "cache" | "cache_fallback" | "fresh_empty";
uiElementsCount: number;
visualHash: string;
visualHashHammingDistance: number | null;
uiDumpTimedOut: boolean;
}
interface UiElementSnapshot {
id: string;
text: string;
contentDesc: string;
resourceId: string;
className: string;
clickable: boolean;
enabled: boolean;
bounds: { left: number; top: number; right: number; bottom: number };
center: { x: number; y: number };
scaledBounds: { left: number; top: number; right: number; bottom: number };
scaledCenter: { x: number; y: number };
}
interface ScreenSnapshot {
deviceId: string;
currentApp: string;
width: number;
height: number;
screenshotBase64: string;
somScreenshotBase64: string | null;
capturedAt: string;
scaleX: number;
scaleY: number;
scaledWidth: number;
scaledHeight: number;
installedPackages?: string[];
uiElements: UiElementSnapshot[];
captureMetrics?: ScreenSnapshotCaptureMetrics;
}