Introduction

🌐 中文文档

What is Octos?

Octos is an open-source AI agent platform that turns any LLM into a multi-channel, multi-user intelligent assistant. You deploy a single Rust binary, connect your LLM API keys and messaging channels (Telegram, Discord, Slack, WhatsApp, Email, WeChat, and more), and Octos handles everything else – conversation routing, tool execution, memory, provider failover, and multi-tenant isolation.

Think of it as the backend operating system for AI agents. Instead of building a chatbot from scratch for each use case, you configure Octos profiles – each with their own system prompt, model, tools, and channels – and manage them all through a web dashboard or REST API. A small team can run hundreds of specialized AI agents on a single machine.

Octos is built for people who need more than a personal assistant: teams deploying AI for customer support across WhatsApp and Telegram, developers building AI-powered products on top of a REST API, researchers orchestrating multi-step research pipelines with different LLMs at each stage, or families sharing a single AI setup with per-person customization.

Operating Modes

Octos operates in two primary modes:

Chat mode (octos chat): Interactive multi-turn conversation with tools, or single-message execution via --message.
Gateway mode (octos gateway): Persistent daemon serving multiple messaging channels simultaneously.

Key Concepts

Term	Description
Agent	AI that executes tasks using tools
Tool	A capability (shell, file ops, search, messaging)
Provider	LLM API service (Anthropic, OpenAI, etc.)
Channel	Messaging platform (CLI, Telegram, Slack, etc.)
Session	Conversation history per channel and chat ID
Sandbox	Isolated execution environment (bwrap, macOS sandbox-exec, Docker)
Tool Policy	Allow/deny rules controlling which tools are available
Skill	Reusable instruction template (SKILL.md)
Bootstrap	Context files loaded into system prompt (AGENTS.md, SOUL.md, etc.)

Quick Start

This guide walks you through the essential steps to get Octos running.

1. Initialize Your Workspace

Navigate to your project directory and initialize Octos:

cd your-project
octos init

This creates a .octos/ directory with default configuration, bootstrap files (AGENTS.md, SOUL.md, USER.md), and directories for memory, sessions, and skills.

2. Set Your API Key

Export at least one LLM provider key:

export ANTHROPIC_API_KEY="sk-ant-..."

Add this to your ~/.bashrc or ~/.zshrc for persistence. You can also use octos auth login --provider openai for OAuth-based login.

3. Check Setup

Verify everything is configured correctly:

octos status

This shows your config file location, active provider and model, API key status, and bootstrap file availability.

4. Start Chatting

Launch an interactive multi-turn conversation:

octos chat

Or send a single message and exit:

octos chat --message "Add a hello function to lib.rs"

5. Run the Gateway

To serve multiple messaging channels as a persistent daemon:

octos gateway

This requires a gateway section in your config with at least one channel configured. See the Configuration chapter for details.

6. Launch the Web UI

If you built with the api feature, start the web dashboard:

octos serve

Then open http://localhost:50080 in your browser.

Installation & Deployment

Prerequisites

Requirement	Version	Notes
Rust	1.85.0+	Install via rustup.rs
macOS	13+	Apple Silicon or Intel
Linux	glibc 2.31+	Ubuntu 20.04+, Debian 11+, Fedora 34+
Windows	10/11	Native build or WSL2

You also need an API key from at least one supported LLM provider.

Optional Dependencies

Dependency	Used For	Install
Node.js	WhatsApp bridge, PPTX creation skill	`brew install node` / `apt install nodejs`
ffmpeg	Media/video skills	`brew install ffmpeg` / `apt install ffmpeg`
Chrome/Chromium	Browser automation tool	`brew install --cask chromium`
LibreOffice	Office document conversion	`brew install --cask libreoffice`
Poppler	PDF rendering (`pdftoppm`)	`brew install poppler` / `apt install poppler-utils`

Build from Source

git clone https://github.com/octos-org/octos
cd octos

# Recommended: canonical feature set (matches scripts/milestone-ci.sh).
# Includes the REST API + dashboard (`octos serve`) and every messaging
# channel adapter. Build this first if you don't know which features
# you need — it's what release artifacts ship.
cargo install --path crates/octos-cli \
    --features "api,telegram,discord,dingtalk,whatsapp,feishu,twilio,wecom,wecom-bot"

# Minimal: CLI + chat + gateway with CLI channel only.
# This produces a binary that does NOT have `octos serve` (the api
# feature is what registers that subcommand) and that has no
# messaging channel adapters compiled in.
cargo install --path crates/octos-cli

# Trim the feature list to your needs. Available channel features:
#   telegram, discord, dingtalk, slack, whatsapp, feishu, email, wecom, wecom-bot,
#   matrix, qq-bot, twilio, wechat
# Required for `octos serve`: api
# Other features: git (gitoxide), ast (tree-sitter)
# Note: the browser tool (headless Chrome via CDP) is always compiled
# in — there is no `browser` feature.
cargo install --path crates/octos-cli --features "api,telegram,slack"

# Verify
octos --version

Deploy Script

For a streamlined installation, use the deploy script:

# Minimal install (CLI + chat only)
./scripts/local-tenant-deploy.sh --minimal

# Full install (all channels + dashboard + app-skills)
./scripts/local-tenant-deploy.sh --full

# Custom channels
./scripts/local-tenant-deploy.sh --channels telegram,discord,api

Node Name Guidelines

For cloud signup and managed tenant installs, the node name becomes both the tenant id and the public subdomain, for example alice.your-cloud.example.

Use 1 to 64 characters.
Allowed characters are lowercase letters, numbers, and hyphens.
Do not start or end the name with a hyphen.
Choose something stable and easy to remember, because reinstall, support, and diagnostics may refer to the same node name later.
Avoid temporary names tied to a one-off machine state if you expect to reuse the same public address.

Platform-Specific Instructions

NixOS

If you use Nix, Octos provides a flake with packages, a development shell, and NixOS / nix-darwin modules. See the Nix page for details.

macOS

# 1. Install Rust
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
source "$HOME/.cargo/env"

# 2. Install optional deps
brew install node ffmpeg poppler
brew install --cask libreoffice

# 3. Clone and deploy
git clone https://github.com/octos-org/octos.git
cd octos
./scripts/local-tenant-deploy.sh --full

# 4. Set API key and run
export ANTHROPIC_API_KEY=sk-ant-...
octos chat

Background service (launchd system daemon):

The deploy script creates /Library/LaunchDaemons/io.octos.serve.plist.

# Start service (requires sudo)
sudo launchctl load /Library/LaunchDaemons/io.octos.serve.plist

# Stop service
sudo launchctl unload /Library/LaunchDaemons/io.octos.serve.plist

# Check status
sudo launchctl print system/io.octos.serve

# View logs
tail -f ~/.octos/serve.log

Linux (Ubuntu/Debian)

# 1. Install system deps
sudo apt update
sudo apt install -y build-essential pkg-config libssl-dev

# 2. Install Rust
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
source "$HOME/.cargo/env"

# 3. Install optional deps
sudo apt install -y nodejs npm ffmpeg poppler-utils

# 4. Clone and deploy
git clone https://github.com/octos-org/octos.git
cd octos
./scripts/local-tenant-deploy.sh --full

# 5. Set API key and run
export ANTHROPIC_API_KEY=sk-ant-...
octos chat

Background service (systemd system unit):

The deploy script creates /etc/systemd/system/octos-serve.service.

# Start service
sudo systemctl start octos-serve

# Enable on boot
sudo systemctl enable octos-serve

# Check status
sudo systemctl status octos-serve

# View logs
sudo journalctl -u octos-serve -f

# Stop service
sudo systemctl stop octos-serve

Linux (Fedora/RHEL)

# System deps
sudo dnf install -y gcc pkg-config openssl-devel

# Then follow Ubuntu steps from step 2 onward

Windows (Native)

Octos builds and runs natively on Windows. Shell commands are executed via cmd /C.

# 1. Install Rust (download rustup-init.exe from https://rustup.rs)
rustup-init.exe

# 2. Clone and build with the canonical feature set
#    (omit features only if you just want `octos chat`; `octos serve`
#    requires the `api` feature).
git clone https://github.com/octos-org/octos.git
cd octos
cargo install --path crates/octos-cli `
    --features "api,telegram,discord,dingtalk,whatsapp,feishu,twilio,wecom,wecom-bot"

# 3. Set API key and run
$env:ANTHROPIC_API_KEY = "sk-ant-..."
octos chat

Windows notes:

Sandbox is disabled on Windows (no bubblewrap/sandbox-exec equivalent); shell commands run without isolation. Docker sandbox mode still works if Docker Desktop is installed.
API keys are stored via Windows Credential Manager.
Process management uses taskkill for cleanup.

Windows (WSL2)

Alternatively, use WSL2 for a Linux environment:

# 1. Install WSL2 (PowerShell as admin)
wsl --install -d Ubuntu

# 2. Open Ubuntu terminal, then follow Linux (Ubuntu) steps above

When running octos serve inside WSL2, the dashboard is accessible from your Windows browser at http://localhost:50080 (WSL2 auto-forwards ports).

Docker

docker compose --profile gateway up -d

Deploy Script Reference

./scripts/local-tenant-deploy.sh [OPTIONS]

Options:
  --minimal          CLI + chat only (no channels, no dashboard)
  --full             All channels + dashboard + app-skills
  --channels LIST    Comma-separated: telegram,discord,dingtalk,slack,whatsapp,feishu,email,twilio,wecom
  --no-skills        Skip building app-skills
  --no-service       Skip launchd/systemd service setup
  --uninstall        Remove binaries and service files
  --debug            Build in debug mode (faster compile, larger binary)
  --prefix DIR       Install prefix (default: ~/.cargo/bin)
  --no-tunnel        Skip frpc tunnel setup even in --full mode
  --tenant-name NAME Tenant subdomain (e.g. "alice")
  --frps-token TOKEN shared frps auth token
  --frps-server ADDR frps server address (recommend a DNS-only host such as frps.example.com)
  --ssh-port PORT    SSH tunnel remote port (default: 6001)
  --domain DOMAIN    Tunnel domain (default: octos-cloud.org)
  --auth-token TOKEN Dashboard auth token (default: auto-generated)

For Windows native installs, use .\scripts\install.ps1 (PowerShell).

What the script does:

Checks prerequisites (Rust, platform deps)
Builds the octos binary with selected features
Builds app-skill binaries (unless --no-skills)
Signs binaries on macOS (ad-hoc codesign)
Creates the runtime data directory and writes ~/.octos/config.json with mode = "local" or mode = "tenant"
Creates a background service when dashboard/API features are enabled
Optionally configures the frpc tunnel for tenant deployments

For hosted deployments behind Cloudflare, keep the public site on the apex/wildcard domain and use a separate DNS-only hostname such as frps.example.com for the raw frps control port.

Uninstall / purge:

./scripts/local-tenant-deploy.sh --uninstall
./scripts/local-tenant-deploy.sh --purge
./scripts/local-tenant-deploy.sh --uninstall --purge

--uninstall removes binaries, octos serve, and frpc service files.
--purge removes the local data directory only.
--uninstall --purge does both.

Post-Install Verification

Set API Keys

Set at least one LLM provider key:

# Add to ~/.bashrc, ~/.zshrc, or ~/.profile
export ANTHROPIC_API_KEY=sk-ant-...
# Or
export OPENAI_API_KEY=sk-...
# Or use OAuth login
octos auth login --provider openai

Verify

octos --version              # Check binary
octos status                 # Check config + API keys
octos chat --message "Hello" # Quick test

Upgrading

cd octos
git pull origin main
./scripts/local-tenant-deploy.sh --full   # Rebuilds and reinstalls

# If running as a service, restart it:
# macOS:
sudo launchctl unload /Library/LaunchDaemons/io.octos.serve.plist
sudo launchctl load /Library/LaunchDaemons/io.octos.serve.plist
# Linux:
sudo systemctl restart octos-serve

Troubleshooting

Problem	Solution
`octos: command not found`	Add `~/.cargo/bin` to PATH: `export PATH="$HOME/.cargo/bin:$PATH"`
Build fails on Linux	Install `build-essential pkg-config libssl-dev`
macOS codesign warning	Run: `codesign -s - ~/.cargo/bin/octos`
Dashboard not accessible	Check port: `octos serve --port 50080`, open `http://localhost:50080`
WSL2 port not forwarded	Restart WSL: `wsl --shutdown` then reopen terminal
Service won’t start	Check logs: `tail -f ~/.octos/serve.log` or `journalctl --user -u octos-serve`
API key not found	Ensure env var is set in the service environment, not just your shell

Configuration

Config File Locations

Configuration files are loaded in order (first found wins):

.octos/config.json – project-local configuration
~/.config/octos/config.json – global configuration

Basic Config

A minimal configuration specifies the LLM provider and model:

{
  "provider": "anthropic",
  "model": "claude-sonnet-4-20250514",
  "api_key_env": "ANTHROPIC_API_KEY"
}

Gateway Config

To run Octos as a multi-channel daemon, add a gateway section:

{
  "provider": "anthropic",
  "model": "claude-sonnet-4-20250514",
  "gateway": {
    "channels": [
      {"type": "cli"},
      {"type": "telegram", "allowed_senders": ["123456789"]},
      {"type": "discord", "settings": {"token_env": "DISCORD_BOT_TOKEN"}},
      {"type": "slack", "settings": {"bot_token_env": "SLACK_BOT_TOKEN", "app_token_env": "SLACK_APP_TOKEN"}},
      {"type": "whatsapp", "settings": {"bridge_url": "ws://localhost:3001"}},
      {"type": "feishu", "settings": {"app_id_env": "FEISHU_APP_ID", "app_secret_env": "FEISHU_APP_SECRET"}}
    ],
    "max_history": 50,
    "system_prompt": "You are a helpful assistant."
  }
}

Environment Variable Expansion

Use ${VAR_NAME} syntax anywhere in config values:

{
  "base_url": "${ANTHROPIC_BASE_URL}",
  "model": "${OCTOS_MODEL}"
}

Full Config Reference

The complete configuration structure with all available fields:

{
  "version": 1,

  // LLM Provider
  "provider": "anthropic",
  "model": "claude-sonnet-4-20250514",
  "base_url": null,
  "api_key_env": null,
  "api_type": null,

  // Fallback chain
  "fallback_models": [
    {
      "provider": "deepseek",
      "model": "deepseek-chat",
      "base_url": null,
      "api_key_env": "DEEPSEEK_API_KEY"
    }
  ],

  // Adaptive routing
  "adaptive_routing": {
    "enabled": false,
    "latency_threshold_ms": 30000,
    "error_rate_threshold": 0.3,
    "probe_probability": 0.1,
    "probe_interval_secs": 60,
    "failure_threshold": 3
  },

  // Gateway
  "gateway": {
    "channels": [{"type": "cli"}],
    "max_history": 50,
    "system_prompt": null,
    "queue_mode": "followup",
    "max_sessions": 1000,
    "max_concurrent_sessions": 10,
    "llm_timeout_secs": null,
    "llm_connect_timeout_secs": null,
    "tool_timeout_secs": null,
    "session_timeout_secs": null,
    "browser_timeout_secs": null
  },

  // Tool policies
  "tool_policy": {"allow": [], "deny": []},
  "tool_policy_by_provider": {},
  "context_filter": [],

  // Sub-providers (for spawn tool)
  "sub_providers": [
    {
      "key": "cheap",
      "provider": "deepseek",
      "model": "deepseek-chat",
      "description": "Fast model for simple tasks"
    }
  ],

  // Agent settings
  "max_iterations": 50,

  // Embedding (for vector search in memory)
  "embedding": {
    "provider": "openai",
    "api_key_env": "OPENAI_API_KEY",
    "base_url": null
  },

  // Voice
  "voice": {
    "auto_asr": true,
    "auto_tts": false,
    "default_voice": "vivian",
    "asr_language": null
  },

  // Hooks
  "hooks": [],

  // MCP servers
  "mcp_servers": [],

  // Sandbox
  "sandbox": {
    "enabled": true,
    "mode": "auto",
    "allow_network": false
  },

  // Email (for email channel)
  "email": null,

  // Dashboard auth (serve mode only)
  "dashboard_auth": null,

  // Monitor (serve mode only)
  "monitor": null
}

Human Approval Rules

Tool calls matching a configured rule suspend the turn until an authorized human approves or denies them on the channel (Matrix first; capable clients like Robrix render native Approve/Deny buttons, others show a text fallback):

{
  "approval_policy": {
    "default": "allow",
    "rules": [{
      "tools": ["shell", "write_file"],
      "require_approval": true,
      "risk_level": "critical",
      "authorized_approvers": ["@alice:example.org"],
      "expires_in_secs": 600,
      "on_timeout": "notify"
    }]
  }
}

Rules match by exact tool name; the first matching rule wins.
Approvals are bound to the exact tool arguments (SHA-256 digest), the originating room, and the authorized_approvers list; each request can be consumed once.
expires_in_secs bounds how long a request stays answerable; on expiry the chat receives a notice (on_timeout: "notify").
Pending approvals are in-memory: a gateway restart drops them (the request card stays in chat but answering it reports the request as unknown).
Decisions are appended to the JSONL audit log under <data_dir>/audit/ (OCTOS_APPROVALS_AUDIT_* env vars control rotation/retention).
Also available per-profile via profile.config.approval_policy.

Environment Variables

LLM Providers

Variable	Description
`ANTHROPIC_API_KEY`	Anthropic (Claude) API key
`OPENAI_API_KEY`	OpenAI API key
`GEMINI_API_KEY`	Google Gemini API key
`OPENROUTER_API_KEY`	OpenRouter API key
`DEEPSEEK_API_KEY`	DeepSeek API key
`GROQ_API_KEY`	Groq API key
`MOONSHOT_API_KEY`	Moonshot/Kimi API key
`DASHSCOPE_API_KEY`	Alibaba DashScope (Qwen) API key
`MINIMAX_API_KEY`	MiniMax API key
`ZHIPU_API_KEY`	Zhipu (GLM) API key
`ZAI_API_KEY`	Z.AI API key
`NVIDIA_API_KEY`	Nvidia NIM API key

Search

Variable	Description
`BRAVE_API_KEY`	Brave Search API key
`PERPLEXITY_API_KEY`	Perplexity Sonar API key
`YDC_API_KEY`	You.com API key

Channels

Variable	Description
`TELEGRAM_BOT_TOKEN`	Telegram bot token
`DISCORD_BOT_TOKEN`	Discord bot token
`DINGTALK_BOT_WEBHOOK`	DingTalk custom robot webhook URL
`DINGTALK_BOT_SECRET`	DingTalk robot signing secret
`SLACK_BOT_TOKEN`	Slack bot token
`SLACK_APP_TOKEN`	Slack app-level token
`FEISHU_APP_ID`	Feishu/Lark app ID
`FEISHU_APP_SECRET`	Feishu/Lark app secret
`WECOM_CORP_ID`	WeCom corp ID
`WECOM_AGENT_SECRET`	WeCom agent secret
`EMAIL_USERNAME`	Email account username
`EMAIL_PASSWORD`	Email account password

Email (send-email skill)

Variable	Description
`SMTP_HOST`	SMTP server hostname
`SMTP_PORT`	SMTP server port
`SMTP_USERNAME`	SMTP username
`SMTP_PASSWORD`	SMTP password
`SMTP_FROM`	SMTP from address
`LARK_APP_ID`	Feishu mail app ID
`LARK_APP_SECRET`	Feishu mail app secret
`LARK_FROM_ADDRESS`	Feishu mail from address

Voice

Variable	Description
`OMINIX_API_URL`	OminiX ASR/TTS API URL

System

Variable	Description
`RUST_LOG`	Log level (error/warn/info/debug/trace)
`OCTOS_LOG_JSON`	Enable JSON-formatted logs (set to any value)

File Layout

~/.octos/                        # Global config directory
├── auth.json                   # Stored API credentials (mode 0600)
├── profiles/                   # Profile configs (serve mode)
│   ├── my-bot.json
│   └── work-bot.json
├── skills/                     # Global custom skills
└── serve.log                   # Serve mode log file

.octos/                          # Project/profile data directory
├── config.json                 # Configuration
├── cron.json                   # Scheduled jobs
├── AGENTS.md                   # Agent instructions
├── SOUL.md                     # Personality definition
├── USER.md                     # User information
├── HEARTBEAT.md                # Background tasks
├── sessions/                   # Chat history (JSONL)
├── memory/                     # Memory files
│   ├── MEMORY.md               # Long-term
│   └── 2025-02-10.md           # Daily
├── skills/                     # Custom skills
├── episodes.redb               # Episodic memory DB
└── history/
    └── chat_history            # Readline history

LLM Providers & Routing

Octos supports 14 LLM providers out of the box. Each provider needs an API key stored in an environment variable (except local providers like Ollama).

Supported Providers

Provider	Env Variable	Default Model	API Format	Aliases
`anthropic`	`ANTHROPIC_API_KEY`	claude-sonnet-4-20250514	Native Anthropic	–
`openai`	`OPENAI_API_KEY`	gpt-4o	Native OpenAI	–
`gemini`	`GEMINI_API_KEY`	gemini-2.0-flash	Native Gemini	–
`openrouter`	`OPENROUTER_API_KEY`	anthropic/claude-sonnet-4-20250514	Native OpenRouter	–
`deepseek`	`DEEPSEEK_API_KEY`	deepseek-chat	OpenAI-compatible	–
`groq`	`GROQ_API_KEY`	llama-3.3-70b-versatile	OpenAI-compatible	–
`moonshot`	`MOONSHOT_API_KEY`	kimi-k2.5	OpenAI-compatible	`kimi`
`dashscope`	`DASHSCOPE_API_KEY`	qwen-max	OpenAI-compatible	`qwen`
`minimax`	`MINIMAX_API_KEY`	MiniMax-Text-01	OpenAI-compatible	–
`zhipu`	`ZHIPU_API_KEY`	glm-4-plus	OpenAI-compatible	`glm`
`zai`	`ZAI_API_KEY`	glm-5	Anthropic-compatible	`z.ai`
`nvidia`	`NVIDIA_API_KEY`	meta/llama-3.3-70b-instruct	OpenAI-compatible	`nim`
`ollama`	(none)	llama3.2	OpenAI-compatible	–
`vllm`	`VLLM_API_KEY`	(must specify)	OpenAI-compatible	–

Configuration Methods

Config File

Set provider and model in your config.json:

{
  "provider": "moonshot",
  "model": "kimi-2.5",
  "api_key_env": "KIMI_API_KEY"
}

The api_key_env field overrides the default environment variable name for the provider. For example, Moonshot defaults to MOONSHOT_API_KEY, but you can point it at KIMI_API_KEY instead.

CLI Flags

octos chat --provider deepseek --model deepseek-chat
octos chat --model gpt-4o  # auto-detects provider from model name

Auth Store

Instead of environment variables, you can store API keys through the auth CLI:

# OAuth PKCE (OpenAI)
octos auth login --provider openai

# Device code flow (OpenAI)
octos auth login --provider openai --device-code

# Paste-token (all other providers)
octos auth login --provider anthropic
# -> prompts: "Paste your API key:"

# Check stored credentials
octos auth status

# Remove credentials
octos auth logout --provider openai

Credentials are stored in ~/.octos/auth.json (file mode 0600). The auth store is checked before environment variables when resolving API keys.

Auto-Detection

When --provider is omitted, Octos infers the provider from the model name:

Model Pattern	Detected Provider
`claude-*`	anthropic
`gpt-`, `o1-`, `o3-`, `o4-`	openai
`gemini-*`	gemini
`deepseek-*`	deepseek
`kimi-`, `moonshot-`	moonshot
`qwen-*`	dashscope
`glm-*`	zhipu
`llama-*`	groq

octos chat --model gpt-4o           # -> openai
octos chat --model claude-sonnet-4-20250514  # -> anthropic
octos chat --model deepseek-chat    # -> deepseek
octos chat --model glm-4-plus       # -> zhipu
octos chat --model qwen-max         # -> dashscope

Custom Endpoints

Use base_url to point at self-hosted or proxy endpoints:

{
  "provider": "openai",
  "model": "gpt-4o",
  "base_url": "https://your-azure-endpoint.openai.azure.com/v1"
}

{
  "provider": "ollama",
  "model": "llama3.2",
  "base_url": "http://localhost:11434/v1"
}

{
  "provider": "vllm",
  "model": "meta-llama/Llama-3-70b",
  "base_url": "http://localhost:8000/v1"
}

API Type Override

The api_type field forces a specific wire format when a provider uses a non-standard protocol:

{
  "provider": "zai",
  "model": "glm-5",
  "api_type": "anthropic"
}

"openai" – OpenAI Chat Completions format (default for most providers)
"anthropic" – Anthropic Messages format (for Anthropic-compatible proxies)

Fallback Chains

Configure a priority-ordered fallback chain. If the primary provider fails, the next provider in the list is tried automatically:

{
  "provider": "moonshot",
  "model": "kimi-2.5",
  "fallback_models": [
    {
      "provider": "deepseek",
      "model": "deepseek-chat",
      "api_key_env": "DEEPSEEK_API_KEY"
    },
    {
      "provider": "gemini",
      "model": "gemini-2.0-flash",
      "api_key_env": "GEMINI_API_KEY"
    }
  ]
}

Failover rules:

401/403 (authentication errors) – failover immediately, no retry on the same provider
429 (rate limit) / 5xx (server errors) – retry with exponential backoff, then failover
400 (content-format errors) – failover if the error contains “must not be empty”, “reasoning_content”, “API key not valid”, or “invalid_value”
Timeouts – failover immediately, no retry (don’t waste 120s × retries on an unresponsive provider)
Circuit breaker – 3 consecutive failures marks a provider as degraded

Adaptive Routing

When multiple fallback models are configured, adaptive routing dynamically selects the best provider based on real-time performance metrics instead of following the static priority order. Three mutually exclusive modes are available:

{
  "adaptive_routing": {
    "mode": "hedge",
    "qos_ranking": true,
    "latency_threshold_ms": 30000,
    "error_rate_threshold": 0.3,
    "probe_probability": 0.1,
    "probe_interval_secs": 60,
    "failure_threshold": 3,
    "weight_latency": 0.3,
    "weight_error_rate": 0.3,
    "weight_priority": 0.2,
    "weight_cost": 0.2
  }
}

Adaptive Modes

Mode	Description
`off` (default)	Static priority order. Failover only when a provider is circuit-broken (N consecutive failures). No scoring, no racing.
`hedge`	Hedged racing: fire each request to 2 providers simultaneously, take the winner, cancel the loser. Both results accumulate QoS metrics.
`lane`	Score-based lane changing: dynamically pick the best single provider based on a 4-factor scoring formula. Cheaper than hedge (no duplicate requests).

QoS Ranking

Setting qos_ranking: true enables quality-of-service ranking using a unified model catalog (model_catalog.json). The catalog provides baseline metrics (stability, latency, output quality) that blend with live traffic data via EMA:

Cold start: Baseline catalog values are used (10 synthetic samples seeded).
Warm state: Live metrics gradually replace baselines (weight ramps from 0 to 1 over 10 calls).
Export: Live catalog is exported to model_catalog.json for observability.

Scoring Formula

Each provider is scored on 4 factors (lower score = better). All weights are configurable via adaptive_routing:

Factor	Weight key	Default	Description
Stability	`weight_error_rate`	0.3	Blended baseline + live error rate. EMA blend: weight ramps from 0→1 over 10 calls.
Quality	`weight_latency`	0.3	60% normalized ds_output quality + 40% normalized throughput (output tokens/sec EMA)
Priority	`weight_priority`	0.2	Config-order preference (primary = 0). Normalize to [0, 1].
Cost	`weight_cost`	0.2	Normalized output cost per million tokens. Unknown cost → 0 (no penalty).

Provider Metadata

Setting	Default	Description
`latency_threshold_ms`	30000	Providers with average latency above this are penalized
`error_rate_threshold`	0.3	Providers with error rates above 30% are deprioritized
`probe_probability`	0.1	Fraction of requests sent to non-primary providers as health probes
`probe_interval_secs`	60	Minimum seconds between probes to the same provider
`failure_threshold`	3	Consecutive failures before the circuit breaker opens

Hedge Mode Details

When Hedge is active:

The primary provider and the cheapest alternate are raced via tokio::select!.
The winner’s response is returned; the loser is cancelled.
Both completed requests record metrics (cancelled requests do not).
If the primary fails, the alternate is tried sequentially (it was cancelled by the race).

Auto-Escalation

When sustained latency degradation is detected (3 consecutive responses exceeding 3× baseline), the session actor auto-activates Hedge mode + Speculative queue. The ResponsivenessObserver learns a median baseline from the first 5 requests (robust to outliers), then adapts every 20 samples via 80/20 EMA blend with the current window median. When the provider recovers (one normal-latency response), both revert to normal.

Provider Wrappers

The routing stack is composed of layered wrappers:

Wrapper	Purpose
`AdaptiveRouter`	Top-level: metrics-driven scoring, Hedge/Lane modes, circuit breaker, probe requests
`ProviderChain`	Ordered failover with per-provider circuit breaker (failure count ≥ threshold → degraded)
`FallbackProvider`	Primary + QoS-ranked fallbacks with cooldown tracking via `ProviderRouter`
`RetryProvider`	Exponential backoff on 429/5xx. Timeout → no retry (failover instead)
`ProviderRouter`	Sub-agent multi-model routing. Prefix-based key resolution, cooldown, QoS-scored fallbacks
`SwappableProvider`	Runtime model swap via `RwLock` (e.g. `switch_model` tool). Leaks ~50 bytes per swap

Gateway & Channels

Octos runs as a gateway that bridges messaging platforms to your LLM agent. Each platform connection is called a channel. You can run multiple channels simultaneously – for example, Telegram and Slack in the same gateway process.

Channel Overview

Channels are configured in the gateway.channels array of your config.json. Each entry specifies a type, optional allowed_senders for access control, and platform-specific settings.

Check which channels are compiled and configured:

octos channels status

This shows a table with each channel’s compile status (feature flags) and config summary (environment variables set or missing).

Requires a bot token from @BotFather.

export TELEGRAM_BOT_TOKEN="123456:ABC..."

{
  "type": "telegram",
  "allowed_senders": ["your_user_id"],
  "settings": {
    "token_env": "TELEGRAM_BOT_TOKEN"
  }
}

Telegram supports bot commands, inline keyboards, voice messages, images, and files.

Slack

Requires a Socket Mode app with both a bot token and an app-level token.

export SLACK_BOT_TOKEN="xoxb-..."
export SLACK_APP_TOKEN="xapp-..."

{
  "type": "slack",
  "settings": {
    "bot_token_env": "SLACK_BOT_TOKEN",
    "app_token_env": "SLACK_APP_TOKEN"
  }
}

Discord

Requires a bot token from the Discord Developer Portal.

export DISCORD_BOT_TOKEN="..."

{
  "type": "discord",
  "settings": {
    "token_env": "DISCORD_BOT_TOKEN"
  }
}

DingTalk

DingTalk supports outbound custom-robot sends and incoming outgoing-robot callbacks.

export DINGTALK_BOT_WEBHOOK="https://oapi.dingtalk.com/robot/send?access_token=..."
export DINGTALK_BOT_SECRET="SEC..."

{
  "type": "dingtalk",
  "allowed_senders": ["staff-id-1"],
  "settings": {
    "webhook_url_env": "DINGTALK_BOT_WEBHOOK",
    "secret_env": "DINGTALK_BOT_SECRET",
    "webhook_port": 8650
  }
}

For inbound events, configure the DingTalk outgoing robot callback URL as:

https://YOUR_OCTOS_HOST/webhook/dingtalk/<profile_id>

Build with the dingtalk feature flag:

cargo build --release -p octos-cli --features dingtalk

Requires a Node.js bridge (Baileys) running at a WebSocket URL.

{
  "type": "whatsapp",
  "settings": {
    "bridge_url": "ws://localhost:3001"
  }
}

Feishu (China)

Feishu uses WebSocket long-connection mode by default (no public URL needed).

export FEISHU_APP_ID="cli_..."
export FEISHU_APP_SECRET="..."

{
  "type": "feishu",
  "settings": {
    "app_id_env": "FEISHU_APP_ID",
    "app_secret_env": "FEISHU_APP_SECRET"
  }
}

Build with the feishu feature flag:

cargo build --release -p octos-cli --features feishu

Lark (International)

Larksuite (international) does not support WebSocket mode. Use webhook mode instead, where Lark pushes events to your server via HTTP POST.

Lark Cloud --> ngrok --> localhost:9321/webhook/event --> Gateway --> LLM

Developer Console Setup

Go to open.larksuite.com/app and create (or select) an app
Add Bot capability under Features
Configure event subscription:
- Events & Callbacks > Event Configuration > Edit subscription method
- Select “Send events to developer server”
- Set request URL to https://YOUR_NGROK_URL/webhook/event
Add event: im.message.receive_v1 (Receive Message)
Enable permissions: im:message, im:message:send_as_bot, im:resource
Publish the app: App Release > Version Management > Create Version > Apply for Online Release

Config

export LARK_APP_ID="cli_..."
export LARK_APP_SECRET="..."

{
  "type": "lark",
  "allowed_senders": [],
  "settings": {
    "app_id_env": "LARK_APP_ID",
    "app_secret_env": "LARK_APP_SECRET",
    "region": "global",
    "mode": "webhook",
    "webhook_port": 9321
  }
}

Settings Reference

Setting	Description	Default
`app_id_env`	Env var name for App ID	`FEISHU_APP_ID`
`app_secret_env`	Env var name for App Secret	`FEISHU_APP_SECRET`
`region`	`"cn"` (Feishu) or `"global"` / `"lark"` (Larksuite)	`"cn"`
`mode`	`"ws"` (WebSocket) or `"webhook"` (HTTP)	`"ws"`
`webhook_port`	Port for webhook HTTP server	`9321`
`encrypt_key`	Encrypt Key from Lark console (for AES-256-CBC)	none
`verification_token`	Verification Token from Lark console	none

Encryption (Optional)

If you configure an Encrypt Key in the Lark console (Events & Callbacks > Encryption Strategy), add it to your config:

{
  "type": "lark",
  "settings": {
    "app_id_env": "LARK_APP_ID",
    "app_secret_env": "LARK_APP_SECRET",
    "region": "global",
    "mode": "webhook",
    "webhook_port": 9321,
    "encrypt_key": "your-encrypt-key-here",
    "verification_token": "your-verification-token"
  }
}

With encryption enabled, Lark sends encrypted POST bodies. The gateway decrypts using AES-256-CBC with SHA-256 key derivation and validates signatures via the X-Lark-Signature header.

Supported Message Types

Inbound: text, images, files (PDF, docs), audio, video, stickers

Outbound: markdown (via interactive cards), image upload, file upload

Running

# Start ngrok tunnel
ngrok http 9321

# Start gateway
LARK_APP_ID="cli_xxxxx" LARK_APP_SECRET="xxxxx" octos gateway --cwd /path/to/workdir

Troubleshooting

Issue	Solution
404 on WS endpoint	Larksuite international does not support WebSocket. Use `"mode": "webhook"`
Challenge verification fails	Ensure ngrok is running and the URL matches the Lark console
No events received	Publish the app version after adding events. Check Event Log in the console
Bot does not reply	Verify `im:message:send_as_bot` permission is granted
Ngrok URL changed	Free ngrok URLs change on restart. Update the request URL in Lark console

Email (IMAP/SMTP)

Polls an IMAP inbox for inbound messages and replies via SMTP. Feature-gated behind email.

export EMAIL_USERNAME="bot@example.com"
export EMAIL_PASSWORD="app-specific-password"

{
  "type": "email",
  "allowed_senders": ["trusted@example.com"],
  "settings": {
    "imap_host": "imap.gmail.com",
    "imap_port": 993,
    "smtp_host": "smtp.gmail.com",
    "smtp_port": 465,
    "username_env": "EMAIL_USERNAME",
    "password_env": "EMAIL_PASSWORD",
    "from_address": "bot@example.com",
    "poll_interval_secs": 30,
    "max_body_chars": 10000
  }
}

WeCom (WeChat Work)

Requires a Custom App with a message callback URL. Feature-gated behind wecom.

export WECOM_CORP_ID="ww..."
export WECOM_AGENT_SECRET="..."

{
  "type": "wecom",
  "settings": {
    "corp_id_env": "WECOM_CORP_ID",
    "agent_secret_env": "WECOM_AGENT_SECRET",
    "agent_id": "1000002",
    "verification_token": "...",
    "encoding_aes_key": "...",
    "webhook_port": 9322
  }
}

WeChat (via WorkBuddy Bridge)

Regular WeChat users can connect to your agent through a WorkBuddy desktop bridge. WorkBuddy handles the WeChat transport; Octos handles the AI logic via its WeCom Bot channel.

WeChat (mobile) --> WorkBuddy (desktop) --> WeCom group robot (WSS) --> octos wecom-bot channel

Setup

Create a WeCom group robot in the WeCom Admin Console under Applications > Group Robot. Note the Bot ID and Secret.
Configure the wecom-bot channel:

export WECOM_BOT_SECRET="your_robot_secret_here"

{
  "type": "wecom-bot",
  "allowed_senders": [],
  "settings": {
    "bot_id": "YOUR_BOT_ID",
    "secret_env": "WECOM_BOT_SECRET"
  }
}

Build and start:

cargo build --release -p octos-cli --features "wecom-bot"
octos gateway

Install the WorkBuddy desktop client, link it to your WeChat via QR scan, and connect it to the same WeCom group robot.

Connection Details

Property	Value
Protocol	WebSocket (WSS)
Endpoint	`wss://openws.work.weixin.qq.com`
Heartbeat	Ping/pong every 30 seconds
Auto-reconnect	Yes, exponential backoff (5s–60s)
Max message length	4096 characters
Message format	Markdown

The wecom-bot channel uses an outbound WebSocket connection – no public URL or port forwarding is required. This makes it suitable for servers behind NAT or firewalls.

Limitations

Text only – voice and image messages are passed as placeholders
No message editing – responses are sent as new messages
One direction – WeChat-to-Octos is automatic; for proactive messages, use cron jobs

Session Control Commands

In any gateway channel, the following commands manage conversation sessions:

Command	Description
`/new`	Create a new session (forks the last 10 messages from the current conversation)
`/new <name>`	Create a named session
`/s <name>`	Switch to a named session
`/s`	Switch to the default session
`/sessions`	List all sessions for this chat
`/back`	Switch to the previously active session
`/delete`	Delete the current session

Only one session is active at a time per chat. Messages are routed to the active session. Inactive sessions can still run background tasks (deep search, pipelines, etc.). When an inactive session finishes work, you receive a notification – use /s <name> to view the results.

Voice Transcription

Voice and audio messages from channels are automatically transcribed before being sent to the agent. The system tries local ASR first (via the OminiX engine) and falls back to cloud-based Whisper when local ASR is unavailable. The transcription is prepended as [transcription: ...].

# Local ASR (preferred) -- set automatically by octos serve
export OMINIX_API_URL="http://localhost:8080"

# Cloud fallback
export GROQ_API_KEY="gsk_..."

Voice configuration in config.json:

{
  "voice": {
    "auto_asr": true,
    "auto_tts": true,
    "default_voice": "vivian",
    "asr_language": null
  }
}

auto_asr – automatically transcribe incoming voice/audio messages
auto_tts – automatically synthesize voice replies when the user sends voice
default_voice – voice preset for auto-TTS
asr_language – force a specific language for transcription (null = auto-detect)

Access Control

Use allowed_senders to restrict who can interact with the agent. An empty list allows everyone.

{
  "type": "telegram",
  "allowed_senders": ["123456", "789012"]
}

Each channel type uses its own sender identifier format (Telegram user IDs, email addresses, WeCom user IDs, etc.).

Cron Jobs

The agent can schedule recurring tasks that deliver messages through any channel:

octos cron list                          # List active jobs
octos cron list --all                    # Include disabled jobs
octos cron add --name "report" --message "Generate daily report" --cron "0 0 9 * * * *"
octos cron add --name "check" --message "Check status" --every 3600
octos cron add --name "once" --message "Run migration" --at "2025-03-01T09:00:00Z"
octos cron remove <job-id>
octos cron enable <job-id>               # Enable a job
octos cron enable <job-id> --disable     # Disable a job

Jobs support an optional timezone field with IANA timezone names (e.g., "America/New_York", "Asia/Shanghai"). When omitted, UTC is used.

When running Matrix through a management bot (for example, a BotFather-style profile), the same cron capability can also be reached through chat commands:

/schedule 20秒之后提醒我看天气
/schedule 每天早上 9 点提醒我看天气
/schedules
/unschedule <job-id>

These commands are scoped to the current chat context. /schedules only shows jobs created for the current room/DM, and /unschedule only removes jobs visible in that same chat context. Daily/weekly schedules are stored with the server’s local IANA timezone so wall-clock times survive DST changes.

In a BotFather management room, /allbots <message> broadcasts a command to the room’s bound child bots (at most 8 per broadcast). Stale bindings are skipped, and private bots only accept broadcasts from their owner.

When approval_policy rules are configured (see Configuration → Human Approval Rules), tool calls matching a rule pause the turn and post an approval card to the room. Robrix renders native Approve/Deny buttons; the decision is only accepted from the rule’s authorized_approvers, in the originating room, before expiry.

Message Coalescing

Long responses are automatically split into channel-safe chunks:

Channel	Max chars per message
Telegram	4000
Discord	1900
DingTalk	3600
Slack	3900

Split preference: paragraph boundary > newline > sentence end > space > hard cut.

Config Hot-Reload

The gateway detects config file changes automatically:

Hot-reloaded (no restart): system prompt, AGENTS.md, SOUL.md, USER.md
Restart required: provider, model, API keys, channel settings

Changes are detected via SHA-256 hashing with debounce.

Memory & Skills

Octos has a layered memory system and an extensible skill framework. Memory gives the agent persistent context across sessions. Skills give the agent new tools and capabilities.

Bootstrap Files

These files are loaded into the system prompt at startup. Create them with octos init.

File	Purpose
`.octos/AGENTS.md`	Agent instructions and guidelines
`.octos/SOUL.md`	Personality and values
`.octos/USER.md`	User information and preferences
`.octos/TOOLS.md`	Tool-specific guidance
`.octos/IDENTITY.md`	Custom identity definition

Bootstrap files are hot-reloaded – edit them and the agent picks up changes without a restart.

Memory System

Octos uses a 3-layer memory architecture that combines automatic recording with agent-driven knowledge management:

┌──────────────────────────────────────────────────────────────────┐
│                     System Prompt (every turn)                    │
│                                                                   │
│  1. Episodic Memory  ─── top 6 relevant past task experiences    │
│  2. Memory Context   ─── MEMORY.md + recent 7 days daily notes   │
│  3. Entity Bank      ─── one-line abstracts of all known entities │
│                                                                   │
│  Tools: save_memory / recall_memory  (entity bank CRUD)           │
└──────────────────────────────────────────────────────────────────┘

Layer 1: Episodic Memory (automatic)

Every completed task is automatically recorded as an episode in episodes.redb, a persistent embedded database. Each episode stores:

Summary — LLM-generated, truncated to 500 chars
Outcome — Success, Failure, Blocked, or Cancelled
Files modified — list of file paths touched during the task
Key decisions — notable choices made during execution
Working directory — scope for directory-scoped retrieval

At the start of each new task, the agent queries the episode store for up to 6 relevant past experiences using:

Hybrid search (default when embedding is configured): combines BM25 keyword matching (30% weight) with HNSW vector similarity (70% weight)
Keyword search (fallback when no embedder): matches query terms against episode summaries, scoped to the same working directory

Embedding configuration (in config.json):

{
  "embedding": {
    "provider": "openai",
    "api_key_env": "OPENAI_API_KEY",
    "base_url": null
  }
}

When configured, the agent embeds each episode summary in a fire-and-forget background task and stores the vector alongside the episode. At query time, the task instruction is embedded and used for vector search. When omitted, the system falls back to BM25-only keyword matching.

Layer 2: Long-Term Memory & Daily Notes (file-based)

Long-term memory (.octos/memory/MEMORY.md) holds persistent facts and notes that survive across all sessions. Edit this file manually or via the write_file tool — it is injected verbatim into the system prompt on every turn.

Daily notes (.octos/memory/YYYY-MM-DD.md) provide a rolling window of recent activity. The last 7 days of daily notes are automatically included in the agent’s context. These files can be created manually or via the write_file tool.

Note: Daily notes are read by the system prompt builder but are not auto-populated. You can populate them manually or instruct the agent to write to them using write_file.

Layer 3: Entity Bank (tool-driven)

The entity bank is a structured knowledge store at .octos/memory/bank/entities/. Each entity is a markdown file containing everything the agent knows about a specific topic.

How it works:

Abstracts in prompt — The first non-heading line of each entity becomes a one-line abstract. All abstracts are injected into the system prompt, giving the agent a compact index of everything it knows.
Full pages on demand — The agent uses the recall_memory tool to load the full content of a specific entity when it needs more detail.
Agent-managed — The agent decides when to create and update entities using the save_memory tool.

Memory tools:

save_memory — Create or update an entity page. The agent is instructed to first recall_memory for existing content, then merge new information before saving (no data loss).
recall_memory — Load the full content of a named entity. If the entity doesn’t exist, returns a list of all available entities.

Auto-deferral: When the total tool count exceeds 15, memory tools are moved to the group:memory deferred group. The agent must use activate_tools to enable them before saving or recalling.

File Layout

.octos/
├── config.json              # Configuration (versioned, auto-migrated)
├── cron.json                # Cron job store
├── AGENTS.md                # Agent instructions
├── SOUL.md                  # Personality
├── USER.md                  # User info
├── HEARTBEAT.md             # Background tasks
├── sessions/                # Chat history (JSONL)
├── memory/                  # Memory files
│   ├── MEMORY.md            # Long-term memory (manual or write_file)
│   ├── 2025-02-10.md        # Daily note (manual or write_file)
│   └── bank/
│       └── entities/        # Entity bank (managed by save/recall tools)
│           ├── yuechen.md   # Entity: "who is the user"
│           └── octos.md     # Entity: "what is this project"
├── skills/                  # Custom skills
├── episodes.redb            # Episodic memory DB (auto-populated)
└── history/
    └── chat_history         # Readline history

Built-in System Skills

Octos bundles 3 system skills at compile time:

Skill	Description
`cron`	Cron tool usage examples (always-on)
`skill-store`	Skill installation and management
`skill-creator`	Guide for creating custom skills

Workspace skills in .octos/skills/ override built-in skills with the same name.

Bundled App Skills

Eight app skills ship as compiled binaries alongside Octos. They are automatically bootstrapped into .octos/skills/ on gateway startup – no installation required.

News Fetch

Tool: news_fetch | Always active: Yes

Fetches headlines and full article content from Google News RSS, Hacker News API, Yahoo News, Substack, and Medium. The agent synthesizes raw data into a formatted digest.

Parameters:

Parameter	Type	Default	Description
`categories`	array	all	News categories to fetch
`language`	`"zh"` / `"en"`	`"zh"`	Output language

Categories: politics, world, business, technology, science, entertainment, health, sports

Configuration:

/config set news_digest.language en
/config set news_digest.hn_top_stories 50
/config set news_digest.max_deep_fetch_total 30

Deep Search

Tool: deep_search | Timeout: 600 seconds

Multi-round web research tool. Performs iterative searches, parallel page crawling, reference chasing, and generates structured reports saved to ./research/<query-slug>/.

Parameter	Type	Default	Description
`query`	string	(required)	Research topic or question
`depth`	1–3	2	Research depth level
`max_results`	1–10	8	Results per search round
`search_engine`	string	auto	`perplexity`, `duckduckgo`, `brave`, `you`

Depth levels:

1 (Quick): single search round, ~1 minute, up to 10 pages
2 (Standard): 3 search rounds + reference chasing, ~3 minutes, up to 30 pages
3 (Thorough): 5 search rounds + aggressive link chasing, ~5 minutes, up to 50 pages

Deep Crawl

Tool: deep_crawl | Requires: Chrome/Chromium in PATH

Recursively crawls a website using headless Chrome via CDP. Renders JavaScript, follows same-origin links via BFS, extracts clean text.

Parameter	Type	Default	Description
`url`	string	(required)	Starting URL
`max_depth`	1–10	3	Maximum link-following depth
`max_pages`	1–200	50	Maximum pages to crawl
`path_prefix`	string	none	Only follow links under this path

Output is saved to crawl-<hostname>/ with numbered markdown files.

Configuration:

/config set deep_crawl.page_settle_ms 5000
/config set deep_crawl.max_output_chars 100000

Send Email

Tool: send_email

Sends emails via SMTP or Feishu/Lark Mail API (auto-detected from available environment variables).

Parameter	Type	Default	Description
`to`	string	(required)	Recipient email address
`subject`	string	(required)	Email subject
`body`	string	(required)	Email body (plain text or HTML)
`html`	boolean	false	Treat body as HTML
`attachments`	array	none	File attachments (SMTP only)

SMTP environment variables:

export SMTP_HOST="smtp.gmail.com"
export SMTP_PORT="465"
export SMTP_USERNAME="your-email@gmail.com"
export SMTP_PASSWORD="your-app-password"
export SMTP_FROM="your-email@gmail.com"

Weather

Tools: get_weather, get_forecast | API: Open-Meteo (free, no key required)

Parameter	Type	Default	Description
`city`	string	(required)	City name in English
`days`	1–16	7	Forecast days (forecast only)

Clock

Tool: get_time

Returns current date, time, day of week, and UTC offset for any IANA timezone.

Parameter	Type	Default	Description
`timezone`	string	server local	IANA timezone name (e.g., `Asia/Shanghai`, `US/Eastern`)

Account Manager

Tool: manage_account

Manages sub-accounts under the current profile. Actions: list, create, update, delete, info, start, stop, restart.

Platform Skills (ASR/TTS)

Platform skills provide on-device voice transcription and synthesis. They require the OminiX backend running on Apple Silicon (M1/M2/M3/M4).

Voice Transcription

Tool: voice_transcribe

Parameter	Type	Default	Description
`audio_path`	string	(required)	Path to audio file (WAV, OGG, MP3, FLAC, M4A)
`language`	string	`"Chinese"`	`"Chinese"`, `"English"`, `"Japanese"`, `"Korean"`, `"Cantonese"`

Voice Synthesis

Tool: voice_synthesize

Parameter	Type	Default	Description
`text`	string	(required)	Text to synthesize
`output_path`	string	auto	Output file path
`language`	string	`"chinese"`	`"chinese"`, `"english"`, `"japanese"`, `"korean"`
`speaker`	string	`"vivian"`	Voice preset

Available voices: vivian, serena, ryan, aiden, eric, dylan (EN/ZH), uncle_fu (ZH only), ono_anna (JA), sohee (KO)

Voice Cloning

Tool: voice_clone_synthesize

Synthesizes speech using a cloned voice from a 3–10 second reference audio sample.

Parameter	Type	Default	Description
`text`	string	(required)	Text to synthesize
`reference_audio`	string	(required)	Path to reference audio
`language`	string	`"chinese"`	Target language

Podcast Generation

Tool: generate_podcast

Creates multi-speaker podcast audio from a script of {speaker, voice, text} objects.

Custom Skill Installation

Installing from GitHub

# Install all skills from a repo
octos skills install user/repo

# Install a specific skill
octos skills install user/repo/skill-name

# Install from a specific branch
octos skills install user/repo --branch develop

# Force overwrite existing
octos skills install user/repo --force

# Install into a specific profile
octos skills --profile my-bot install user/repo

The installer tries to download a pre-built binary from the skill registry (SHA-256 verified), falls back to cargo build --release if a Cargo.toml is present, or runs npm install if a package.json is present.

Managing Skills

octos skills list                    # List installed skills
octos skills info skill-name         # Show detailed info
octos skills update skill-name       # Update a specific skill
octos skills update all              # Update all skills
octos skills remove skill-name       # Remove a skill
octos skills search "web scraping"   # Search the online registry

Skill Resolution Order

Profile gateways load skills from these directories (highest priority first):

~/.octos/profiles/<profile>/data/skills/ (profile-scoped custom skills)
<octos_home>/bundled-app-skills/ (bundled app skills)
<octos_home>/platform-skills/ (admin-loaded platform skills)

Standalone project runs can also load <project>/.octos/plugins/ and <project>/.octos/skills/. The old HOME-rooted global directories are migration-only and are no longer part of the normal scan path.

Skill Authoring

A custom skill lives in .octos/skills/<name>/ and contains:

.octos/skills/my-skill/
├── SKILL.md         # Required: instructions + frontmatter
├── manifest.json    # Required for tool skills: tool definitions
├── main             # Compiled binary (or script)
└── .source          # Auto-generated: tracks install source

SKILL.md Format

---
name: my-skill
version: 1.0.0
author: Your Name
description: A brief description of what this skill does
always: false
requires_bins: curl,jq
requires_env: MY_API_KEY
---

# My Skill Instructions

Instructions for the agent on how and when to use this skill.

## When to Use
- Use this skill when the user asks about...

## Tool Usage
The `my_tool` tool accepts:
- `query` (required): The search query
- `limit` (optional): Maximum results (default: 10)

Frontmatter fields:

Field	Description
`name`	Skill identifier (must match directory name)
`version`	Semantic version
`author`	Skill author
`description`	Short description
`always`	If `true`, included in every system prompt. If `false`, available on demand.
`requires_bins`	Comma-separated binaries checked via `which`. Skill is unavailable if any are missing.
`requires_env`	Comma-separated environment variables. Skill is unavailable if any are unset.

manifest.json Format

For skills that provide executable tools:

{
  "name": "my-skill",
  "version": "1.0.0",
  "description": "My custom skill",
  "tools": [
    {
      "name": "my_tool",
      "description": "Does something useful",
      "timeout_secs": 60,
      "input_schema": {
        "type": "object",
        "properties": {
          "query": {
            "type": "string",
            "description": "The search query"
          },
          "limit": {
            "type": "integer",
            "description": "Maximum results",
            "default": 10
          }
        },
        "required": ["query"]
      }
    }
  ],
  "entrypoint": "main"
}

The tool binary receives JSON input on stdin and must output JSON on stdout:

// Input (stdin)
{"query": "test", "limit": 5}

// Output (stdout)
{"output": "Results here...", "success": true}

Advanced Features

This chapter covers power-user features: tool management, queue modes, lifecycle hooks, sandboxing, session management, and the web dashboard.

Tools & LRU Deferral

Octos manages a large tool catalog by splitting tools into active and deferred sets. Active tools are sent to the LLM as callable tool specifications. Deferred tools are listed by name in the system prompt but not sent as full specs until needed.

How It Works

Base tools (never evicted): read_file, write_file, shell, glob, grep, list_dir, run_pipeline, deep_search, and others.
Dynamic tools: tools like save_memory, web_search, recall_memory that are activated on demand and evicted when idle.
Deferred tools: browser, manage_skills, spawn, configure_tool, switch_model, and others listed by name only.

Eviction Rules

When the active tool count exceeds 15:

Tools idle for 5+ agent iterations that are not in the base set become candidates.
The stalest tool is moved to the deferred list first.

Re-activation

When the LLM needs a deferred tool, it calls activate_tools({"tools": [...]}). This resolves the tool name to its group and activates the entire group.

Tool Configuration

Tools can be configured at runtime using the /config slash command. Settings persist in {data_dir}/tool_config.json.

Tool	Setting	Type	Default	Description
`news_digest`	`language`	`"zh"` / `"en"`	`"zh"`	Output language for news digests
`news_digest`	`hn_top_stories`	5-100	30	Hacker News stories to fetch
`news_digest`	`max_rss_items`	5-100	30	Items per RSS feed
`news_digest`	`max_deep_fetch_total`	1-50	20	Total articles to deep-fetch
`news_digest`	`max_source_chars`	1000-50000	12000	Per-source HTML char limit
`news_digest`	`max_article_chars`	1000-50000	8000	Per-article content limit
`deep_crawl`	`page_settle_ms`	500-10000	3000	JS render wait time (ms)
`deep_crawl`	`max_output_chars`	10000-200000	50000	Output truncation limit
`web_search`	`count`	1-10	5	Default number of search results
`web_fetch`	`extract_mode`	`"markdown"` / `"text"`	`"markdown"`	Content extraction format
`web_fetch`	`max_chars`	1000-200000	50000	Content size limit
`browser`	`action_timeout_secs`	30-600	300	Per-action timeout
`browser`	`idle_timeout_secs`	60-600	300	Idle session timeout

In-chat config commands:

/config                              # Show all tool settings
/config web_search                   # Show web_search settings
/config set web_search.count 10      # Set default result count to 10
/config set news_digest.language en  # Switch news digests to English
/config reset web_search.count       # Reset to default

Priority order (highest first):

Explicit per-call arguments (tool invocation parameters)
/config overrides (stored in tool_config.json)
Hardcoded defaults

Tool Policies

Tool policies control which tools the agent can use. They can be set globally, per-provider, or per-context.

Global Policy

{
  "tool_policy": {
    "allow": ["group:fs", "group:search", "web_search"],
    "deny": ["shell", "spawn"]
  }
}

allow – If non-empty, only these tools are permitted. If empty, all tools are allowed.
deny – These tools are always blocked. Deny wins over allow.

Named Groups

Group	Expands To
`group:fs`	`read_file`, `write_file`, `edit_file`, `diff_edit`
`group:runtime`	`shell`
`group:web`	`web_search`, `web_fetch`, `browser`
`group:search`	`glob`, `grep`, `list_dir`
`group:sessions`	`spawn`

Additional tools not in named groups: send_file, switch_model, run_pipeline, configure_tool, cron, message.

Wildcard Matching

Suffix * matches prefixes:

{
  "tool_policy": {
    "deny": ["web_*"]
  }
}

This denies web_search, web_fetch, etc.

Per-Provider Policies

Different tool sets for different LLM models:

{
  "tool_policy_by_provider": {
    "openai/gpt-4o-mini": {
      "deny": ["shell", "write_file"]
    },
    "gemini": {
      "deny": ["diff_edit"]
    }
  }
}

Queue Modes

Queue modes control how incoming user messages are handled while the agent is busy processing a previous request. Set via /queue <mode> in chat, or queue_mode in profile config.

Followup (default)

Sequential processing. Each message waits its turn.

Agent processes A, finishes, processes B, finishes, processes C.
Simple and predictable.
The user is blocked until the current request completes.

Collect

Batch queued messages into a single combined prompt.

Agent processes A. User sends B, then C.
When A finishes, B and C are merged into one prompt: B\n---\nQueued #1: C
One LLM call for the batch.
Good for users who send thoughts in multiple short messages (common in chat apps).

Steer

Keep only the newest queued message, discard older ones.

Agent processes A. User sends B, then C.
When A finishes, B is discarded; only C is processed.
Good when the user corrects or refines their question mid-flight.
Example: “search for X” then “actually search for Y” – only Y is processed.

Interrupt

Keep only the newest queued message and cancel the running agent.

Agent processes A. User sends B, then C.
A is cancelled, B is discarded, C is processed immediately.
Fastest response to course-correction.
Use when responsiveness matters more than completing the current task.

Note: Currently, Interrupt and Steer share the same drain-and-discard behavior. There is no in-flight agent cancellation — the running agent completes before the newest message is processed. True mid-flight cancellation is planned.

Speculative

Spawn concurrent overflow agents for each new message while the primary runs.

Agent processes A. User sends B, then C.
B and C each get their own concurrent agent task (overflow).
All three run in parallel – no blocking.
Best for slow LLM providers where users do not want to wait.
Overflow agents use a snapshot of conversation history from before the primary started.

How overflow works

Primary agent is spawned for the first message.
While the primary runs, new messages arrive in the inbox.
Each new message triggers serve_overflow(), spawning a full agent task with its own streaming bubble.
Overflow agents use the history snapshot from before the primary to avoid re-answering the primary question.
All agents run concurrently and save results to session history.

Known limitations

Interactive prompts break in overflow: If the LLM asks a follow-up question and returns EndTurn, the overflow agent exits. The user’s reply spawns a new overflow with no context of the question.
Short replies misrouted: A “yes” or “2” intended as a continuation may be treated as an independent new query.

Auto-Escalation

The session actor can auto-escalate from Followup to Speculative when sustained latency degradation is detected:

ResponsivenessObserver learns a median baseline from the first 5 requests (robust to outliers), then tracks LLM response times in a 20-sample rolling window. The baseline adapts every 20 samples via 80/20 EMA blend with the current window median, so gradual drift is tracked.
If 3 consecutive responses exceed 3× baseline latency, Speculative queue mode and Hedge racing are auto-activated simultaneously.
A user notification is sent: “Detected slow responses. Enabling hedge racing + speculative queue.”
When the provider recovers (one normal-latency response), both revert to Followup and static routing.
Auto-escalation also triggers on API channel (web client), which always uses the speculative processing path.

Queue Commands

/queue                  -- show current mode
/queue followup         -- sequential processing
/queue collect          -- batch queued messages
/queue steer            -- keep newest only
/queue interrupt        -- cancel current + keep newest
/queue speculative      -- concurrent overflow agents

Hooks

Hooks are the primary extension point for enforcing LLM policies, recording metrics, and auditing agent behavior – per profile, without modifying core code.

Hooks are shell commands that run at agent lifecycle events. Each hook receives a JSON payload on stdin and communicates its decision via exit code.

Exit Codes

Exit Code	Meaning	Before-events	After-events
0	Allow	Operation proceeds	Success logged
1	Deny	Operation blocked (reason on stdout)	Treated as error
2+	Error	Logged, operation proceeds	Logged

Events

Four lifecycle events, each with a specific payload:

`before_tool_call`

Fires before each tool execution. Can deny (exit 1).

{
  "event": "before_tool_call",
  "tool_name": "shell",
  "arguments": {"command": "ls -la"},
  "tool_id": "call_abc123",
  "session_id": "telegram:12345",
  "profile_id": "my-bot"
}

`after_tool_call`

Fires after each tool execution. Observe-only.

{
  "event": "after_tool_call",
  "tool_name": "shell",
  "tool_id": "call_abc123",
  "result": "file1.txt\nfile2.txt\n...",
  "success": true,
  "duration_ms": 142,
  "session_id": "telegram:12345",
  "profile_id": "my-bot"
}

Note: result is truncated to 500 characters.

`before_llm_call`

Fires before each LLM API call. Can deny (exit 1).

{
  "event": "before_llm_call",
  "model": "deepseek-chat",
  "message_count": 12,
  "iteration": 3,
  "session_id": "telegram:12345",
  "profile_id": "my-bot"
}

`after_llm_call`

Fires after each successful LLM response. Observe-only.

{
  "event": "after_llm_call",
  "model": "deepseek-chat",
  "iteration": 3,
  "stop_reason": "EndTurn",
  "has_tool_calls": false,
  "input_tokens": 1200,
  "output_tokens": 350,
  "provider_name": "deepseek",
  "latency_ms": 2340,
  "cumulative_input_tokens": 5600,
  "cumulative_output_tokens": 1800,
  "session_cost": 0.0042,
  "response_cost": 0.0012,
  "session_id": "telegram:12345",
  "profile_id": "my-bot"
}

Hook Configuration

In config.json or per-profile JSON:

{
  "hooks": [
    {
      "event": "before_tool_call",
      "command": ["python3", "~/.octos/hooks/guard.py"],
      "timeout_ms": 3000,
      "tool_filter": ["shell", "write_file"]
    },
    {
      "event": "after_llm_call",
      "command": ["python3", "~/.octos/hooks/cost-tracker.py"],
      "timeout_ms": 5000
    }
  ]
}

Field	Required	Default	Description
`event`	yes	–	One of the 4 event types
`command`	yes	–	Argv array (no shell interpretation)
`timeout_ms`	no	5000	Kill hook process after this timeout
`tool_filter`	no	all	Only trigger for these tool names (tool events only)

Multiple hooks can be registered for the same event. They run sequentially; the first deny wins.

Circuit Breaker

Hooks are auto-disabled after 3 consecutive failures (timeout, crash, or exit code 2+). A successful execution (exit 0 or deny exit 1) resets the counter.

Security

Commands use argv arrays – no shell interpretation.
18 dangerous environment variables are removed (LD_PRELOAD, DYLD_*, NODE_OPTIONS, etc.).
Tilde expansion is supported (~/ and ~username/).

Per-Profile Hooks

Each profile can define its own hooks via the hooks field in profile config. This allows different policy enforcement per channel or bot. Hook changes require a gateway restart.

Backward Compatibility

New fields may be added to payloads.
Existing fields will never be removed or renamed.
Hook scripts should ignore unknown fields (standard JSON practice).

Example: Cost Budget Enforcer

#!/usr/bin/env python3
"""Deny LLM calls when session cost exceeds $1.00."""
import json, sys

payload = json.load(sys.stdin)
if payload.get("event") == "before_llm_call":
    try:
        with open("/tmp/octos-cost.json") as f:
            state = json.load(f)
    except FileNotFoundError:
        state = {}
    sid = payload.get("session_id", "default")
    if state.get(sid, 0) > 1.0:
        print(f"Session cost exceeded $1.00 (${state[sid]:.4f})")
        sys.exit(1)

elif payload.get("event") == "after_llm_call":
    cost = payload.get("session_cost")
    if cost is not None:
        sid = payload.get("session_id", "default")
        try:
            with open("/tmp/octos-cost.json") as f:
                state = json.load(f)
        except FileNotFoundError:
            state = {}
        state[sid] = cost
        with open("/tmp/octos-cost.json", "w") as f:
            json.dump(state, f)

sys.exit(0)

Example: Audit Logger

#!/usr/bin/env python3
"""Log all tool and LLM calls to a JSONL file."""
import json, sys, datetime

payload = json.load(sys.stdin)
payload["timestamp"] = datetime.datetime.utcnow().isoformat()

with open("/var/log/octos-audit.jsonl", "a") as f:
    f.write(json.dumps(payload) + "\n")

sys.exit(0)

Sandbox

Shell commands run inside a sandbox for isolation. Three backends are supported:

Backend	Platform	Isolation	Network Control
bwrap	Linux	RO bind `/usr,/lib,/bin,/sbin,/etc`; RW bind workdir; tmpfs `/tmp`; unshare-pid	`--unshare-net` if network denied
macOS	macOS	sandbox-exec with SBPL profile: `process-exec/fork`, `file-read*`, writes to workdir + `/private/tmp`	`(allow network)` or `(deny network)`
Docker	Any	`--rm --security-opt no-new-privileges --cap-drop ALL`	`--network none` if network denied

Configure in config.json:

{
  "sandbox": {
    "enabled": true,
    "mode": "auto",
    "allow_network": false,
    "docker": {
      "image": "alpine:3.21",
      "mount_mode": "rw",
      "cpu_limit": "1.0",
      "memory_limit": "512m",
      "pids_limit": 100
    }
  }
}

Modes: auto (detect best available), bwrap, macos, docker, none.
Mount modes: rw (read-write), ro (read-only), none (no workspace mount).
Docker resource limits: --cpus, --memory, --pids-limit.
Docker bind mount safety: docker.sock, /proc, /sys, /dev, and /etc are blocked as bind mount sources.
Path validation: Docker rejects :, \0, \n, \r; macOS rejects control chars, (, ), \, ".
Environment sanitization: 18 dangerous environment variables are automatically cleared in all sandbox backends, MCP server spawning, hooks, and the browser tool: LD_PRELOAD, LD_LIBRARY_PATH, LD_AUDIT, DYLD_INSERT_LIBRARIES, DYLD_LIBRARY_PATH, DYLD_FRAMEWORK_PATH, DYLD_FALLBACK_LIBRARY_PATH, DYLD_VERSIONED_LIBRARY_PATH, NODE_OPTIONS, PYTHONSTARTUP, PYTHONPATH, PERL5OPT, RUBYOPT, RUBYLIB, JAVA_TOOL_OPTIONS, BASH_ENV, ENV, ZDOTDIR.
Process cleanup: Shell tool sends SIGTERM, waits grace period, then SIGKILL to child processes on timeout.

Session Management

Session Forking

Send /new to create a branched conversation:

/new

This creates a new session that copies the last 10 messages from the current conversation. The child session has a parent_key reference to the original. Each fork gets a unique key namespaced by sender and timestamp.

Session Persistence

Each channel:chat_id pair maintains its own session (conversation history).

Storage: JSONL files in .octos/sessions/
Max history: Configurable via gateway.max_history (default: 50 messages)
Session forking: /new creates a branched conversation with parent_key tracking

Config Hot-Reload

The gateway automatically detects config file changes:

Hot-reloaded (no restart): system prompt, AGENTS.md, SOUL.md, USER.md
Restart required: provider, model, API keys, gateway channels

Changes are detected via SHA-256 hashing with debounce.

Message Coalescing

Long responses are automatically split into channel-safe chunks before sending:

Channel	Max chars per message
Telegram	4000
Discord	1900
Slack	3900

Split preference: paragraph boundary > newline > sentence end > space > hard cut. Messages exceeding 50 chunks are truncated with a marker.

Context Compaction

When the conversation exceeds the LLM’s context window, older messages are automatically compacted:

Tool arguments are stripped (replaced with "[stripped]")
Messages are summarized to first lines
Recent tool call/result pairs are preserved intact
The agent continues seamlessly without losing critical context

In-Chat Commands

Slash Commands

Command	Description
`/new`	Fork the conversation (creates a new session copying the last 10 messages)
`/config`	View and modify tool configuration
`/queue`	View or change queue mode
`/exit`, `/quit`, `:q`	Exit chat (CLI mode only)

In-Chat Provider Switching

The switch_model tool allows users to list available LLM providers and switch models at runtime through natural conversation. This tool is only available in gateway mode.

List available providers:

User: What models are available?

Bot: Current model: deepseek/deepseek-chat

     Available providers:
       - anthropic (default: claude-sonnet-4-20250514) [ready]
       - openai (default: gpt-4o) [ready]
       - deepseek (default: deepseek-chat) [ready]
       - gemini (default: gemini-2.0-flash) [ready]
       ...

Switch models:

User: Switch to GPT-4o

Bot: Switched to openai/gpt-4o.
     Previous model (deepseek/deepseek-chat) is kept as fallback.

When you switch models, the previous model automatically becomes a fallback:

If the new model fails (rate limit, server error), requests automatically fall back to the original model.
The fallback uses the circuit breaker (3 consecutive failures triggers failover).
The chain is always flat: [new_model, original_model] – repeated switches do not nest.

Model switches are persisted to the profile JSON file. On gateway restart, the bot starts with the last-selected model.

Memory System

The agent maintains long-term memory across sessions:

MEMORY.md – Persistent notes, always loaded into context
Daily notes – .octos/memory/YYYY-MM-DD.md, auto-created
Recent memory – Last 7 days of daily notes included in context
Episodes – Task completion summaries stored in episodes.redb

Hybrid Memory Search

Memory search combines BM25 (keyword) and vector (semantic) scoring:

Ranking: vector_weight * vector_score + bm25_weight * bm25_score (defaults: 0.7 / 0.3)
Index: HNSW with L2-normalized embeddings
Fallback: BM25-only when no embedding provider is configured

Configure an embedding provider to enable vector search:

{
  "embedding": {
    "provider": "openai"
  }
}

The embedding config supports three fields: provider (default: "openai"), api_key_env (optional override), and base_url (optional custom endpoint).

Cron Jobs (Scheduled Tasks)

The agent can schedule recurring tasks using the cron tool:

User: Schedule a daily news digest at 8am Beijing time

Bot: Created cron job "daily-news" running at 8:00 AM Asia/Shanghai every day.
     Expression: 0 0 8 * * * *

Cron jobs can also be managed via CLI:

octos cron list                              # List active jobs
octos cron list --all                        # Include disabled
octos cron add --name "report" --message "Generate daily report" --cron "0 0 9 * * * *"
octos cron add --name "check" --message "Check status" --every 3600
octos cron remove <job-id>
octos cron enable <job-id>
octos cron enable <job-id> --disable

When the agent is exposed through a Matrix management bot, the same scheduling capability can be presented as BotFather-style chat commands:

/schedule 20秒之后提醒我看天气
/schedule 每天早上 9 点提醒我看天气
/schedules
/unschedule <job-id>

These commands stay bound to the current room/DM context. They do not expose raw cron syntax to end users, but still reuse the same cron store and delivery pipeline internally.

Web Dashboard

The REST API server includes an embedded web UI:

octos serve                               # Binds to 127.0.0.1:50080
octos serve --host 0.0.0.0 --port 50080  # Accept external connections
# Open http://localhost:50080

Features:

Session sidebar
Chat interface
UI Protocol WebSocket streaming
Dark theme

A /metrics endpoint provides Prometheus-format metrics:

octos_tool_calls_total
octos_tool_call_duration_seconds
octos_llm_tokens_total

Operations

This chapter covers day-to-day operational tasks: upgrading, credential management, and service management.

Upgrading

Pull the latest source and rebuild:

cd octos
git pull origin main
./scripts/local-tenant-deploy.sh --full   # Rebuilds and reinstalls

If running as a service, restart it after the upgrade:

# macOS (launchd):
launchctl unload ~/Library/LaunchAgents/io.octos.octos-serve.plist
launchctl load ~/Library/LaunchAgents/io.octos.octos-serve.plist

# Linux (systemd):
systemctl --user restart octos-serve

Keychain Integration

Octos supports storing API keys in the macOS Keychain instead of plaintext in profile JSON files. This provides hardware-backed encryption on Apple Silicon and OS-level access control.

Architecture

                     +------------------------------+
  octos auth set-key |     macOS Keychain            |
  -----------------> |  (AES encrypted, per-user)    |
                     |                               |
                     |  service: "octos"             |
                     |  account: "OPENAI_API_KEY"    |
                     |  password: "sk-proj-abc..."   |
                     +---------------+--------------+
                                     | get_password()
  Profile JSON                       |
  +------------------+               v
  | env_vars: {      |   resolve_env_vars()
  |   "OPENAI_API_   |   if "keychain:" ->
  |    KEY":          |   lookup from Keychain
  |    "keychain:"   |   else -> use literal
  | }                |
  +------------------+               |
                                     v
                               Gateway process

Resolution chain: "keychain:" marker in profile config triggers a Keychain lookup (3-second timeout). If the Keychain is unavailable, the key is skipped with a warning.

Backward compatible: Literal values in env_vars pass through unchanged. No migration is required – adopt keychain per-key at your own pace. Mixed plaintext and keychain entries are fully supported.

CLI Commands

# Unlock keychain for SSH sessions (required before set-key via SSH)
octos auth unlock --password <login-password>
octos auth unlock                               # interactive prompt

# Store a key in Keychain + update profile to use keychain marker
octos auth set-key OPENAI_API_KEY sk-proj-abc123
octos auth set-key OPENAI_API_KEY              # interactive prompt

# With specific profile
octos auth set-key GEMINI_API_KEY AIzaSy... -p my-profile

# List all keys and their storage status
octos auth keys
octos auth keys -p my-profile

# Remove from Keychain + clean up profile
octos auth remove-key OPENAI_API_KEY

Keychain Entry Format

Service: octos (constant for all entries)
Account: The environment variable name (e.g., OPENAI_API_KEY)
Password: The actual secret value

Verify with:

security find-generic-password -s octos -a OPENAI_API_KEY -w

SSH and Headless Server Setup

The macOS Keychain is tied to the GUI login session. SSH sessions cannot access a locked keychain – macOS tries to show a dialog, which hangs on a headless server.

Why SSH fails by default: macOS securityd unlocks the keychain per-session. The GUI session’s unlock does not automatically propagate to SSH sessions.

Solution: Unlock the keychain and disable auto-lock. Run once per boot (or add to your deploy script):

ssh user@<host>

# Unlock the keychain (requires login password)
octos auth unlock --password <login-password>

# That's it -- auto-lock is disabled automatically.
# The keychain stays unlocked until reboot.
# Auto-login will re-unlock it on reboot.

Or with raw security commands:

# Unlock
security unlock-keychain -p '<password>' ~/Library/Keychains/login.keychain-db

# Disable auto-lock timer (so it doesn't re-lock after idle)
security set-keychain-settings ~/Library/Keychains/login.keychain-db

Common issues:

Symptom	Cause	Fix
“User interaction is not allowed”	Keychain locked (SSH session)	`octos auth unlock --password <pw>`
Keychain lookup timed out (3s)	Keychain locked (LaunchAgent)	Enable auto-login, reboot
“keychain marker found but no secret”	Key never stored or wrong keychain	Re-run `octos auth set-key` after unlock
Gateway hangs at startup	Keychain lookup blocking	Update to latest octos binary

Security Comparison

Threat	Plaintext JSON	Keychain
File stolen (backup, git, scp)	All keys exposed	Only `"keychain:"` markers visible
Malware reads disk	Simple file read exposes keys	Must bypass OS Keychain ACL
Other user on machine	File permissions help, root can read	Encrypted per-user
Process memory dump	Keys in env vars	Keys only briefly in memory
Accidental log output	Profile JSON leaks keys	Only reference strings logged

Server Deployment Recommendations

The macOS Keychain was designed for interactive desktop use. On headless servers, it introduces reliability issues. Choose your credential storage based on deployment type:

Deployment	Recommended Storage	Reason
Developer laptop	Keychain (`"keychain:"`)	GUI session keeps keychain unlocked; ACL prompts are fine
Mac with auto-login + GUI	Keychain (`"keychain:"`)	Works if ACL dialogs were approved once via screen sharing
Headless Mac (SSH only)	Plain text in `env_vars` or launchd plist	Most reliable; no unlock/ACL dependencies
Linux server	Plain text in env vars	No macOS Keychain available

Why Keychain is unreliable on headless servers:

Requires the macOS login password – To unlock the keychain via SSH, you need the user’s login password stored somewhere, reducing the security benefit.
Re-locks on reboot/sleep – The LaunchAgent that starts octos serve runs before GUI login, so the keychain is locked at that point.
Re-locks after idle timeout – Even after unlock, macOS may re-lock. The set-keychain-settings workaround can be reset by macOS updates.
ACL prompts block headless access – If the binary was not the one that originally stored the secret, macOS may pop an unanswerable GUI dialog.
Session isolation – Unlocking from SSH does not unlock for the LaunchAgent session, and vice versa.

Plain text setup for servers:

{
  "env_vars": {
    "OPENAI_API_KEY": "sk-proj-abc123",
    "SMTP_PASSWORD": "xxxx xxxx xxxx xxxx",
    "SMTP_HOST": "smtp.gmail.com",
    "SMTP_PORT": "587",
    "SMTP_USERNAME": "user@gmail.com",
    "SMTP_FROM": "user@gmail.com"
  }
}

Protect the files with filesystem permissions:

chmod 600 ~/.octos/profiles/*.json
chmod 600 ~/Library/LaunchAgents/io.octos.octos-serve.plist

Service Management

macOS (launchd)

Create a LaunchAgent plist to run octos as a persistent service:

# Load the service
launchctl load ~/Library/LaunchAgents/io.octos.octos-serve.plist

# Unload the service
launchctl unload ~/Library/LaunchAgents/io.octos.octos-serve.plist

# Check status
launchctl list | grep octos

If the service needs environment variables (e.g., SMTP credentials), add them to the plist:

<key>EnvironmentVariables</key>
<dict>
    <key>SMTP_PASSWORD</key>
    <string>xxxx xxxx xxxx xxxx</string>
</dict>

Check logs at ~/.octos/serve.log.

Linux (systemd)

Manage the service with systemd user units:

# Start / stop / restart
systemctl --user start octos-serve
systemctl --user stop octos-serve
systemctl --user restart octos-serve

# Enable on boot
systemctl --user enable octos-serve

# Check status and logs
systemctl --user status octos-serve
journalctl --user -u octos-serve

Troubleshooting

This chapter covers common issues organized by category, along with environment variable reference.

API & Provider Issues

API Key Not Set

Error: ANTHROPIC_API_KEY environment variable not set

Fix: Export the key in your shell or verify with octos status:

export ANTHROPIC_API_KEY="your-key"

If running as a service, ensure the environment variable is set in the service environment (launchd plist or systemd unit), not just your interactive shell.

Rate Limited (429)

The retry mechanism handles this automatically (3 attempts with exponential backoff). If the error persists:

Try switching to a different provider via /queue or in-chat model switching.
Wait for the rate limit window to reset.

Debug Logging

Enable detailed logs to diagnose issues:

RUST_LOG=debug octos chat
RUST_LOG=octos_agent=trace octos chat --message "task"

Build Issues

Problem	Solution
Build fails on Linux	Install build dependencies: `sudo apt install build-essential pkg-config libssl-dev`
macOS codesign warning	Sign the binary: `codesign -s - ~/.cargo/bin/octos`
`octos: command not found`	Add cargo bin to PATH: `export PATH="$HOME/.cargo/bin:$PATH"`

Channel-Specific Issues

Lark / Feishu

Issue	Solution
404 on WebSocket endpoint	Larksuite international does not support WebSocket mode. Use `"mode": "webhook"` in your config
Challenge verification fails	Ensure your tunnel (e.g., ngrok) is running and the URL matches the one configured in the Lark console
No events received	Publish the app version after adding events. Check Event Log Retrieval in the console
Bot does not reply	Check that the `im:message:send_as_bot` permission is granted
Markdown not rendering	Messages are sent as interactive cards; Lark supports a subset of markdown
Tunnel URL changed	Free tunnel URLs change on restart. Update the request URL in the Lark console

WeCom / WeChat

“Environment variable WECOM_BOT_SECRET not set”

Set the secret before starting the gateway:

export WECOM_BOT_SECRET="your_secret"

Connection drops or fails to subscribe

Verify bot_id and secret are correct.
Check network connectivity to wss://openws.work.weixin.qq.com.
The channel auto-reconnects up to 100 times with exponential backoff. Check logs for error details.

Messages not arriving

Confirm the upstream relay service is running and linked to your account.
Check that the WeCom group robot is the same one configured in octos.
If using allowed_senders, verify the sender’s WeCom user ID is in the list.
Check for duplicate message filtering – the channel deduplicates the last 1000 message IDs.

Long messages are truncated

Messages over 4096 characters are automatically split into multiple chunks by octos. If further truncation occurs, check the relay service’s own message length settings.

Platform-Specific Issues

Problem	Solution
Dashboard not accessible	Check port: `octos serve --port 50080`, open `http://localhost:50080/admin/`
WSL2 port not forwarded	Restart WSL: `wsl --shutdown` then reopen terminal
Service will not start	Check logs: `tail -f ~/.octos/serve.log` (macOS) or `journalctl --user -u octos-serve` (Linux)
Windows: `octos` not found	Ensure `%USERPROFILE%\.cargo\bin` is in your PATH
Windows: shell commands fail	Commands run via `cmd /C`; use Windows-compatible syntax

Environment Variables Reference

Variable	Description
`ANTHROPIC_API_KEY`	Anthropic API key
`OPENAI_API_KEY`	OpenAI API key
`GEMINI_API_KEY`	Gemini API key
`OPENROUTER_API_KEY`	OpenRouter API key
`DEEPSEEK_API_KEY`	DeepSeek API key
`GROQ_API_KEY`	Groq API key
`MOONSHOT_API_KEY`	Moonshot API key
`DASHSCOPE_API_KEY`	DashScope API key
`MINIMAX_API_KEY`	MiniMax API key
`ZHIPU_API_KEY`	Zhipu API key
`ZAI_API_KEY`	Z.AI API key
`NVIDIA_API_KEY`	Nvidia NIM API key
`OMINIX_API_URL`	Local ASR/TTS API URL
`RUST_LOG`	Log level (`error` / `warn` / `info` / `debug` / `trace`)
`TELEGRAM_BOT_TOKEN`	Telegram bot token
`DISCORD_BOT_TOKEN`	Discord bot token
`DINGTALK_BOT_WEBHOOK`	DingTalk custom robot webhook URL
`DINGTALK_BOT_SECRET`	DingTalk robot signing secret
`SLACK_BOT_TOKEN`	Slack bot token
`SLACK_APP_TOKEN`	Slack app-level token
`FEISHU_APP_ID`	Feishu app ID
`FEISHU_APP_SECRET`	Feishu app secret
`EMAIL_USERNAME`	Email account username
`EMAIL_PASSWORD`	Email account password
`WECOM_CORP_ID`	WeCom corp ID
`WECOM_AGENT_SECRET`	WeCom agent secret

CLI Reference

`octos chat`

Interactive multi-turn conversation with readline history.

octos chat [OPTIONS]

Options:
  -c, --cwd <PATH>         Working directory
      --config <PATH>      Config file path
      --provider <NAME>    LLM provider
      --model <NAME>       Model name
      --base-url <URL>     Custom API endpoint
  -m, --message <MSG>      Single message (non-interactive)
      --max-iterations <N> Max tool iterations per message (default: 50)
  -v, --verbose            Show tool outputs
      --no-retry           Disable retry

Features:

Arrow keys and line editing (rustyline)
Persistent history at .octos/history/chat_history
Exit: /exit, /quit, exit, quit, :q, Ctrl+C, Ctrl+D
Full tool access (shell, files, search, web)

Examples:

octos chat                              # Interactive (default)
octos chat --provider deepseek          # Use DeepSeek
octos chat --model glm-4-plus           # Auto-detects Zhipu
octos chat --message "Fix auth bug"     # Single message, exit

`octos gateway`

Run as a persistent multi-channel daemon.

octos gateway [OPTIONS]

Options:
  -c, --cwd <PATH>         Working directory
      --config <PATH>      Config file path
      --provider <NAME>    Override provider
      --model <NAME>       Override model
      --base-url <URL>     Override API endpoint
  -v, --verbose            Verbose logging
      --no-retry           Disable retry

Requires a gateway section in config with a channels array. Runs continuously until Ctrl+C.

`octos init`

Initialize workspace with config and bootstrap files.

octos init [OPTIONS]

Options:
  -c, --cwd <PATH>    Working directory
      --defaults       Skip prompts, use defaults

Creates:

.octos/config.json – Provider/model config
.octos/.gitignore – Ignores state files
.octos/AGENTS.md – Agent instructions template
.octos/SOUL.md – Personality template
.octos/USER.md – User info template
.octos/memory/ – Memory storage directory
.octos/sessions/ – Session history directory
.octos/skills/ – Custom skills directory

`octos status`

Show system status.

octos status [OPTIONS]

Options:
  -c, --cwd <PATH>    Working directory

Example output:

octos Status
══════════════════════════════════════════════════

Config:    .octos/config.json (found)
Workspace: .octos/            (found)
Provider:  anthropic
Model:     claude-sonnet-4-20250514

API Keys
──────────────────────────────────────────────────
  Anthropic    ANTHROPIC_API_KEY         set
  OpenAI       OPENAI_API_KEY           not set
  ...

Bootstrap Files
──────────────────────────────────────────────────
  AGENTS.md        found
  SOUL.md          found
  USER.md          found
  TOOLS.md         missing
  IDENTITY.md      missing

`octos serve`

Launch the web UI and REST API server. Requires the api feature flag.

cargo install --path crates/octos-cli --features api
octos serve                               # Binds to 127.0.0.1:50080
octos serve --host 0.0.0.0 --port 50080  # Accept external connections

Features: session sidebar, chat interface, UI Protocol WebSocket streaming, dark theme. A /metrics endpoint provides Prometheus-format metrics (octos_tool_calls_total, octos_tool_call_duration_seconds, octos_llm_tokens_total).

`octos clean`

Clean database and state files.

octos clean [--all] [--dry-run]

Flag	Description
`--all`	Remove all state files
`--dry-run`	Show what would be removed without deleting

`octos completions`

Generate shell completions.

octos completions <shell>

Supported shells: bash, zsh, fish, powershell.

`octos cron`

Manage scheduled jobs.

octos cron list [--all]                  # List active jobs (--all includes disabled)
octos cron add [OPTIONS]                 # Add a cron job
octos cron remove <job-id>               # Remove a cron job
octos cron enable <job-id>               # Enable a cron job
octos cron enable <job-id> --disable     # Disable a cron job

Adding jobs:

octos cron add --name "report" --message "Generate daily report" --cron "0 0 9 * * * *"
octos cron add --name "check" --message "Check status" --every 3600
octos cron add --name "once" --message "Run migration" --at "2025-03-01T09:00:00Z"

Cron expressions use standard syntax. Jobs support an optional timezone field with IANA timezone names (e.g., "America/New_York", "Asia/Shanghai"). When omitted, UTC is used.

When Matrix is fronted by a BotFather-style management bot, the same cron runtime is also available through natural-language chat commands:

/schedule 20秒之后提醒我看天气
/schedule 每天早上 9 点提醒我看天气
/schedules
/unschedule <job-id>

These commands create, list, and remove jobs scoped to the current Matrix room or DM instead of exposing raw cron syntax to end users.

`octos channels`

Manage messaging channels.

octos channels status    # Show channel compile/config status
octos channels login     # WhatsApp QR code login

The status command shows a table with channel name, compile status (feature flags), and config summary (env vars set/missing).

`octos office`

Office file manipulation (DOCX/PPTX/XLSX). Native Rust implementation with no external dependencies for basic operations.

octos office extract <file>               # Extract text as Markdown
octos office unpack <file> <output-dir>   # Unpack into pretty-printed XML
octos office pack <input-dir> <output>    # Pack directory into Office file
octos office clean <dir>                  # Remove orphaned files from unpacked PPTX

`octos account`

Manage sub-accounts under profiles. Sub-accounts inherit LLM provider config but have their own data directory (memory, sessions, skills) and channels.

octos account list --profile <id>                         # List sub-accounts
octos account create --profile <id> <name> [OPTIONS]      # Create sub-account
octos account update <id> [OPTIONS]                       # Update sub-account

`octos auth`

OAuth login and API key management.

octos auth login --provider openai           # PKCE browser OAuth
octos auth login --provider openai --device-code  # Device code flow
octos auth login --provider anthropic        # Paste-token (stdin)
octos auth logout --provider openai          # Remove stored credential
octos auth status                            # Show authenticated providers

Credentials are stored in ~/.octos/auth.json (file mode 0600). The auth store is checked before environment variables when resolving API keys.

`octos skills`

Manage skills.

octos skills list                            # List installed skills
octos skills install user/repo/skill-name    # Install from GitHub
octos skills remove skill-name               # Remove a skill

Fetches SKILL.md from the GitHub repo’s main branch and installs to .octos/skills/.

Skill Development

This guide covers the full lifecycle of an Octos skill — from development to publication to end-user installation — similar to building an app, submitting it to an app store, and distributing it to users.

The Skill Ecosystem

 Developer                    Octos Hub                     User
 ─────────                    ─────────                     ────
 1. Develop skill        ──▶  3. Publish to registry   ──▶  5. Search & discover
 2. Test locally              4. Pre-built binaries         6. Install
                                                            7. Update

Concept	App Store Analogy	Octos Equivalent
App	iOS/Android app	Skill (binary + manifest + docs)
SDK	Xcode / Android Studio	Rust + `manifest.json` + `SKILL.md`
App Store	Apple App Store	octos-hub registry
Distribution	App Store binary delivery	Pre-built binaries in GitHub Releases
Install	Tap “Get”	`octos skills install user/repo`
Sideload	Ad-hoc / TestFlight	`octos skills --profile <profile> install ./my-skill`

Before You Start: Skill vs. Workspace Contract

Before writing a skill, decide whether the logic belongs in a plugin tool (external binary, what this guide covers) or in the in-process workspace contract framework (workspace_policy.toml + ValidatorSpec variants).

These two surfaces look similar — both gate tool behavior — but they serve different concerns and have very different reliability characteristics. Putting logic in the wrong place is the most common mistake we see.

Decision matrix

Axis	Workspace contract (host, in-process Rust)	Plugin tool / lifecycle hook (external binary)
Correctness criticality	Load-bearing — must always run	Best-effort enrichment; tool still works if hook fails to load
Stability	Stable contract; same shape across deployments	Frequently customized per-profile / per-deployment
Performance	Hot path; runs on every tool call, in-process	Cold / occasional; spawns a subprocess + JSON IPC
Failure semantics	Fails the tool with a typed `SpawnTaskContractResult::Failed`	Failure is logged; execution proceeds
Cross-cutting reach	Generic — applies to many tools (file format, integrity, content)	Tool-specific behavior or capability implementation
Customization scope	System invariant, not user-tunable	User-configurable; ships and versions independently of host

If all five axes point to the same column, build there. Mixed signals → use the two tiebreakers below.

Tiebreaker A — “Can the user disable it safely?”

Yes → plugin (user-controllable, low blast radius)
No (system invariant) → workspace contract

Tiebreaker B — “Is the language / runtime constrained?”

Pure Rust data ops, file I/O, HTTP — host (workspace contract)
Needs Python ecosystem, native CLI, GPU/Metal, cloud SDK with hairy auth — plugin

Where existing surface lives

These belong in workspace contract (workspace_policy.toml, see crates/octos-agent/src/workspace_policy.rs::ValidatorSpec):

AudioNonSilent / PerFileNonSilent — TTS output content invariant. The system MUST refuse to claim “audio generated” if the .wav is silent. Applies to every TTS-emitting skill regardless of vendor.
MagicBytes — file format integrity. Applies to every tool that emits files.
Sha256Match — supply-chain / artifact integrity. Must always verify.
HttpProbeUntil — deploy / health gate after a publish step.
FileExists — basic deliverable contract.

These belong as plugin tools (this guide):

fm_tts, mofa_slides, mofa_publish, search (formerly deep_search) — the work itself. Each is the capability implementation, often wraps an external runtime (Python, Chromium, native CLI), ships versioned independently of the host.
qwen-tts voice clone — wraps an external HTTP API with auth; per-tenant credentials; ergonomic to update without recompiling octos.

These belong as plugin hooks ONLY when they are optional enrichment (the tool would still work without them):

Metrics / audit hooks (after_tool_call): log cost / latency to an external system. Failure is fine — the tool still ran.
Channel-side notifications: ping Slack on completion. Optional.

The pipeline-guard case study (and what we’d build differently)

pipeline-guard today is shipped as a plugin (crates/app-skills/pipeline-guard/) with a before_tool_call hook that mutates the DOT graph the LLM authored for run_pipeline — injecting model="cheap" on dynamic_parallel workers and model="strong" on synthesize nodes from the live QoS catalog.

Score it against the matrix:

Axis	pipeline-guard score
Correctness criticality	Load-bearing — without it, every pipeline node hits the same default model regardless of cost/quality
Stability	Stable — same shape on every deploy; logic is “fill in `node.model` from `model_catalog.json`”
Performance	Hot path — runs on every `run_pipeline` call
Failure semantics	Currently silent on failure — manifest-parse failures or missing binary degrade to “no model assignment” with no user-visible error
Cross-cutting reach	run_pipeline-specific
Customization scope	System invariant — users don’t pick which DOT nodes get which model
Language constraint	Pure Rust data ops on a parsed `PipelineGraph`

All five primary axes (and both tiebreakers) point to host / in-process — not plugin. The lifecycle-hook plumbing is a category mismatch. In practice it’s been the failure surface: manifest-parse errors fire on every daemon start, and when they do, the hook silently doesn’t run.

The correct home is inline in octos-pipeline itself. Recommended shape (not yet landed at time of writing):

#![allow(unused)]
fn main() {
// In RunPipelineTool::execute, after parse_dot:
octos_pipeline::model_assignment::assign_from_catalog(
    &mut graph,
    &model_catalog, // passed via ProfileRuntime → RunPipelineTool
)?;
}

A small (~50-line) deterministic in-process function with unit tests beats a separate binary + JSON IPC + plugin loader + manifest parser, for logic that is correctness-critical, non-customizable, and never needs to ship independently of the host.

Generalization — if future tools need similar pre-call mutation (e.g. mofa_slides enforcing a style default), the workspace contract framework should grow a new variant rather than each tool adding its own hook plugin:

#![allow(unused)]
fn main() {
pub enum ValidatorSpec {
    // existing post-execution validators...
    AudioNonSilent { ... },
    MagicBytes { ... },
    HttpProbeUntil { ... },

    // proposed: pre-call argument mutator
    PreCall {
        mutate_args: PreCallMutator,
    },
}
}

This keeps integrity-critical logic in one in-process channel with a typed failure path, instead of scattered across plugin binaries with best-effort semantics.

Quick rubric

   correctness-critical?
        ┌──────┴──────┐
       yes            no
        │             │
   stable shape?     plugin hook (best-effort)
        ┌──┴──┐
       yes   no
        │    │
   workspace plugin tool
   contract  (customized capability)

If you find yourself reaching for a plugin hook to enforce an invariant the user must never override — stop, file an issue against octos-agent, and consider whether a workspace contract variant is the right home instead.

Part 1: Develop

Architecture

A skill is a standalone executable that communicates via stdin/stdout JSON. The gateway spawns it as a child process for each tool call. Skills can be written in any language — Rust, Python, Node.js, shell, etc.

User message → LLM → tool_use("get_weather", {"city": "Paris"})
                        ↓
             Gateway spawns: ~/.octos/profiles/<profile>/data/skills/weather/main get_weather
                        ↓
             Stdin:  {"city": "Paris"}
             Stdout: {"output": "25°C, sunny", "success": true}
                        ↓
             LLM sees result → generates response

Skill Anatomy

Every skill is a directory with three files:

my-skill/
├── manifest.json       # Tool definitions (JSON Schema) — the "API contract"
├── SKILL.md            # Documentation + metadata — the "app description"
├── main                # Executable binary — the "app binary"
└── (optional extras)
    ├── styles/         # Bundled assets
    ├── prompts/*.md    # System prompt fragments
    └── hooks/          # Lifecycle hook scripts

Step 1: Create manifest.json

The manifest declares what tools the skill provides. The LLM reads this to decide when and how to call your skill.

{
  "name": "my-skill",
  "version": "1.0.0",
  "author": "your-name",
  "description": "What this skill does",
  "timeout_secs": 15,
  "requires_network": false,
  "tools": [
    {
      "name": "my_tool",
      "description": "Clear description for the LLM. What does this tool do? When should it be used?",
      "input_schema": {
        "type": "object",
        "properties": {
          "param1": {
            "type": "string",
            "description": "What this parameter means"
          },
          "param2": {
            "type": "integer",
            "description": "Optional numeric parameter (default: 10)"
          }
        },
        "required": ["param1"]
      }
    }
  ]
}

Manifest fields:

Field	Required	Default	Description
`name`	Yes	—	Skill identifier
`version`	Yes	—	Semantic version
`author`	No	—	Author name
`description`	No	—	Human-readable description
`timeout_secs`	No	30	Max execution time per tool call (1-600)
`requires_network`	No	false	Informational flag
`sha256`	No	—	Binary integrity check (hex hash)
`tools`	No	`[]`	Array of tool definitions
`mcp_servers`	No	`[]`	MCP server declarations
`hooks`	No	`[]`	Lifecycle hook definitions
`prompts`	No	—	Prompt fragment config
`binaries`	No	`{}`	Pre-built binaries by `{os}-{arch}`

Step 2: Create SKILL.md

Documentation with YAML frontmatter. The LLM reads this to understand context and trigger conditions.

---
name: my-skill
description: Short description. Triggers: keyword1, keyword2, trigger phrase.
version: 1.0.0
author: your-name
always: false
---

# My Skill

Detailed description of what this skill does and when to use it.

## Tools

### my_tool

Explain what this tool does with examples.

**Parameters:**
- `param1` (required): What it means
- `param2` (optional): What it controls. Default: 10

Frontmatter fields:

Field	Required	Default	Description
`name`	Yes	—	Skill identifier
`description`	Yes	—	One-line description with trigger keywords
`version`	Yes	—	Semantic version
`author`	No	—	Author name
`always`	No	`false`	If `true`, always included in system prompt
`requires_bins`	No	—	Comma-separated binaries that must exist
`requires_env`	No	—	Comma-separated env vars that must be set

Step 3: Implement the Binary

The binary implements the stdin/stdout JSON protocol.

Protocol:

argv[1] = tool name (e.g., get_weather)
stdin = JSON object matching the tool’s input_schema
stdout = JSON with output (string) and success (bool)
exit code = 0 for success, non-zero for failure
stderr = ignored (use for debug logging)

Rust template:

use std::io::Read;
use serde::Deserialize;
use serde_json::json;

#[derive(Deserialize)]
struct MyToolInput {
    param1: String,
    #[serde(default = "default_param2")]
    param2: i32,
}

fn default_param2() -> i32 { 10 }

fn main() {
    let args: Vec<String> = std::env::args().collect();
    let tool_name = args.get(1).map(|s| s.as_str()).unwrap_or("unknown");

    let mut buf = String::new();
    if let Err(e) = std::io::stdin().read_to_string(&mut buf) {
        fail(&format!("Failed to read stdin: {e}"));
    }

    match tool_name {
        "my_tool" => handle_my_tool(&buf),
        _ => fail(&format!("Unknown tool '{tool_name}'")),
    }
}

fn fail(msg: &str) -> ! {
    println!("{}", json!({"output": msg, "success": false}));
    std::process::exit(1);
}

fn handle_my_tool(input_json: &str) {
    let input: MyToolInput = match serde_json::from_str(input_json) {
        Ok(v) => v,
        Err(e) => fail(&format!("Invalid input: {e}")),
    };

    let result = format!("Processed {} with param2={}", input.param1, input.param2);
    println!("{}", json!({"output": result, "success": true}));
}

Python template:

#!/usr/bin/env python3
import sys, json

def main():
    tool_name = sys.argv[1] if len(sys.argv) > 1 else "unknown"
    input_data = json.loads(sys.stdin.read())

    if tool_name == "my_tool":
        result = f"Processed {input_data['param1']}"
        print(json.dumps({"output": result, "success": True}))
    else:
        print(json.dumps({"output": f"Unknown tool: {tool_name}", "success": False}))
        sys.exit(1)

if __name__ == "__main__":
    main()

Shell template:

#!/bin/sh
TOOL="$1"
INPUT=$(cat)

if [ "$TOOL" = "my_tool" ]; then
    PARAM1=$(echo "$INPUT" | python3 -c "import sys,json; print(json.load(sys.stdin)['param1'])")
    printf '{"output": "Processed %s", "success": true}\n' "$PARAM1"
else
    printf '{"output": "Unknown tool: %s", "success": false}\n' "$TOOL"
    exit 1
fi

Step 4: For Bundled Skills (Rust Crate)

If contributing a skill to the core Octos distribution:

mkdir -p crates/app-skills/my-skill/src

Cargo.toml:

[package]
name = "my-skill"
version = "1.0.0"
edition = "2021"

[[bin]]
name = "my_skill"
path = "src/main.rs"

[dependencies]
serde = { version = "1", features = ["derive"] }
serde_json = "1"

Add to workspace Cargo.toml:

members = [
    # ...
    "crates/app-skills/my-skill",
]

#![allow(unused)]
fn main() {
pub const BUNDLED_APP_SKILLS: &[(&str, &str, &str, &str)] = &[
    // ...
    (
        "my-skill",                                          // dir_name
        "my_skill",                                          // binary_name
        include_str!("../../app-skills/my-skill/SKILL.md"),
        include_str!("../../app-skills/my-skill/manifest.json"),
    ),
];
}

Part 2: Test

Standalone Testing

Test your skill binary directly without the gateway:

# Build (Rust)
cargo build -p my-skill

# Test a tool call
echo '{"param1": "hello", "param2": 5}' | ./target/debug/my_skill my_tool
# Expected: {"output":"Processed hello with param2=5","success":true}

# Test error handling
echo '{}' | ./target/debug/my_skill my_tool
echo '{"param1": "test"}' | ./target/debug/my_skill unknown_tool

For non-Rust skills, make the binary executable and test the same way:

chmod +x my-skill/main
echo '{"param1": "hello"}' | ./my-skill/main my_tool

Gateway Integration Testing

# Build everything
cargo build --release --workspace

# Install into the profile you want to test
octos skills --profile alice install ./my-skill

# Verify skill loaded
ls ~/.octos/profiles/alice/data/skills/my-skill/
# main  manifest.json  SKILL.md

# Start the gateway
octos gateway

# Ask the agent to use your skill in conversation

Recommended Timeout Values

Skill Type	Timeout
Local computation	5s
Single API call	15s
Multi-step API calls	30-60s
Long-running research	300-600s

Part 3: Publish

Publishing makes your skill discoverable to all Octos users — like submitting an app to the App Store.

Push to GitHub

Organize your repository. A repo can contain a single skill or multiple skills:

Single-skill repo:

my-skill/                    ← repo root
├── manifest.json
├── SKILL.md
├── Cargo.toml               (or package.json, requirements.txt, etc.)
└── src/main.rs

Multi-skill repo:

my-skills/                   ← repo root
├── skill-a/
│   ├── manifest.json
│   ├── SKILL.md
│   └── src/main.rs
├── skill-b/
│   ├── manifest.json
│   ├── SKILL.md
│   └── main.py
└── shared/                  ← shared dependencies (auto-detected)
    └── utils.py

Submit to the Registry

The octos-hub registry is the central catalog for discoverable skills. Submit a PR to add your entry to registry.json:

{
  "name": "my-skills",
  "description": "What your skills do",
  "repo": "your-user/your-repo",
  "version": "1.0.0",
  "author": "your-name",
  "license": "MIT",
  "skills": ["skill-a", "skill-b"],
  "requires": ["git", "cargo"],
  "provides_tools": true,
  "tags": ["keyword1", "keyword2"]
}

Registry entry fields:

Field	Required	Description
`name`	Yes	Package name (can differ from repo name)
`description`	Yes	Searchable description
`repo`	Yes	GitHub `user/repo` or full URL
`version`	No	Latest version
`author`	No	Author name
`license`	No	License identifier (MIT, Apache-2.0, etc.)
`skills`	No	Individual skill names in the package
`requires`	No	External dependencies (e.g., `["git", "cargo"]`)
`provides_tools`	No	Whether skills have `manifest.json` with tools
`tags`	No	Searchable tags
`binaries`	No	Pre-built binaries (see Distribution below)

Once the PR is merged, users can discover your skill:

octos skills search keyword1

Part 4: Distribute

Pre-built binaries let users install instantly without compiling — like downloading an app binary from the store.

Add Binaries to manifest.json

In your skill’s manifest.json, add a binaries section keyed by {os}-{arch}:

{
  "name": "my-skill",
  "version": "1.0.0",
  "binaries": {
    "darwin-aarch64": {
      "url": "https://github.com/you/repo/releases/download/v1.0.0/my-skill-darwin-aarch64.tar.gz",
      "sha256": "abc123..."
    },
    "darwin-x86_64": {
      "url": "https://github.com/you/repo/releases/download/v1.0.0/my-skill-darwin-x86_64.tar.gz",
      "sha256": "def456..."
    },
    "linux-x86_64": {
      "url": "https://github.com/you/repo/releases/download/v1.0.0/my-skill-linux-x86_64.tar.gz",
      "sha256": "789ghi..."
    }
  },
  "tools": [ ... ]
}

Automate with GitHub Actions

Set up CI to build and publish binaries on each release tag:

name: Release Skill
on:
  push:
    tags: ["v*"]

jobs:
  build:
    strategy:
      matrix:
        include:
          - os: macos-latest
            target: aarch64-apple-darwin
            platform: darwin-aarch64
          - os: ubuntu-latest
            target: x86_64-unknown-linux-gnu
            platform: linux-x86_64

    runs-on: ${{ matrix.os }}
    steps:
      - uses: actions/checkout@v5
      - uses: actions-rust-lang/setup-rust-toolchain@v1

      - run: cargo build --release --target ${{ matrix.target }}

      - name: Package
        run: |
          mkdir dist
          cp target/${{ matrix.target }}/release/my_skill dist/main
          cd dist && tar czf my-skill-${{ matrix.platform }}.tar.gz main
          shasum -a 256 my-skill-${{ matrix.platform }}.tar.gz

      - uses: softprops/action-gh-release@v2
        with:
          files: dist/my-skill-*.tar.gz

Install Resolution Order

When a user runs octos skills install, the installer tries these sources in order:

manifest.json binaries — skill author’s own CI/CD builds
Registry binaries — registry-audited pre-built binaries
cargo build --release — fallback: compile from source (if Cargo.toml exists)
npm install — fallback: install Node.js dependencies (if package.json exists)

Pre-built binaries are verified with SHA-256 before installation.

Part 5: Install

For Users: Search and Install

# Search the registry
octos skills search weather
octos skills search "deep research"

# Install from GitHub (all skills in repo)
octos skills install user/repo

# Install a specific skill from a multi-skill repo
octos skills install user/repo/skill-name

# Install with a specific branch
octos skills install user/repo --branch dev

# Force reinstall
octos skills install user/repo --force

Per-Profile Installation

Skills are isolated per profile (like per-user app installs):

# Install to a specific profile
octos skills --profile alice install user/repo/my-skill

# List skills for a profile
octos skills --profile alice list

# Remove from a profile
octos skills --profile alice remove my-skill

In-Chat Installation

Users can manage skills from within a conversation:

/skills install user/repo/my-skill
/skills list
/skills remove my-skill
/skills search comic

Admin API

Programmatic skill management via REST:

# Install
POST /api/admin/profiles/alice/skills     {"repo": "user/repo/my-skill"}

# List
GET  /api/admin/profiles/alice/skills

# Remove
DELETE /api/admin/profiles/alice/skills/my-skill

Sideloading (Manual Install)

Install a local skill directory into the profile that should be allowed to use it:

octos skills --profile alice install ./my-skill --force

For one-off debugging you can copy the directory yourself, but keep the target profile-scoped:

# Canonical: per-profile install
cp -r my-skill/ ~/.octos/profiles/alice/data/skills/my-skill/
chmod +x ~/.octos/profiles/alice/data/skills/my-skill/main

Installed Skill Layout

~/.octos/profiles/alice/data/skills/my-skill/
├── main                # Executable binary
├── manifest.json       # Tool definitions
├── SKILL.md            # Documentation
├── .source             # Install tracking (repo, branch, date)
└── styles/             # Bundled assets (if any)

The .source file tracks where the skill was installed from:

{
  "repo": "user/repo",
  "subdir": "my-skill",
  "branch": "main",
  "installed_at": "2026-03-28T..."
}

Skill Loading Priority

When multiple directories contain a skill with the same name, first match wins:

Priority	Location	Source
1 (highest)	`<profile-data>/skills/`	Per-profile install
2	`<project-dir>/skills/`	Project-local
3	`<project-dir>/bundled-app-skills/`	Bundled app-skills
4 (lowest, deprecated)	`~/.octos/skills/`	Legacy global install, migration only

Part 6: Update

# Update a skill from its source repo
octos skills update my-skill

# Update from a specific branch
octos skills update my-skill --branch main

# View skill details (version, source, tools)
octos skills info my-skill

The updater reads the .source file to know where to pull from, then re-runs the install flow (clone → discover → build/download → copy).

Hot-Reload

Skill binaries can be updated without restarting the gateway:

# Build just the skill
cargo build --release -p my-skill

# Replace the binary
cp target/release/my_skill ~/.octos/profiles/alice/data/skills/my-skill/main

# Next tool call automatically uses the new binary

Note: If you change SKILL.md or manifest.json for a bundled skill, you must rebuild the octos binary too (they’re embedded via include_str!). External skills reload immediately.

Advanced Topics

Multiple Tools in One Skill

A single binary can serve multiple tools. Route on argv[1]:

#![allow(unused)]
fn main() {
match tool_name {
    "get_weather" => handle_get_weather(&buf),
    "get_forecast" => handle_get_forecast(&buf),
    _ => fail(&format!("Unknown tool '{tool_name}'")),
}
}

Declare all tools in manifest.json:

{
  "tools": [
    { "name": "get_weather", "description": "...", "input_schema": { ... } },
    { "name": "get_forecast", "description": "...", "input_schema": { ... } }
  ]
}

Environment Variables

Skills inherit the gateway’s environment (minus blocked security-sensitive vars). Declare requirements in SKILL.md:

---
requires_env: MY_API_KEY,MY_SECRET
---

The gateway auto-injects provider API keys (e.g., DASHSCOPE_API_KEY, OPENAI_API_KEY) plus OCTOS_DATA_DIR and OCTOS_WORK_DIR.

Bundled Assets

Skills with asset files should resolve paths relative to the executable:

#![allow(unused)]
fn main() {
let exe = std::env::current_exe()?;
let skill_dir = exe.parent().unwrap();
let styles_dir = skill_dir.join("styles");
}

Do not use the current working directory — it points to the profile’s data dir, not the skill dir.

MCP Servers

A skill can declare MCP servers the gateway auto-starts:

{
  "mcp_servers": [
    {
      "command": "./bin/mcp-server",
      "args": ["--port", "0"],
      "env": ["DATABASE_URL"]
    }
  ]
}

Or remote MCP servers:

{
  "mcp_servers": [
    {
      "url": "https://mcp.example.com/v1",
      "headers": { "Authorization": "Bearer ${API_KEY}" }
    }
  ]
}

Path resolution: ./ and ../ are relative to the skill directory. env lists variable names (not values) to forward.

Lifecycle Hooks

Skills can run commands on agent events:

{
  "hooks": [
    {
      "event": "before_tool_call",
      "command": ["./hooks/policy-check.sh"],
      "timeout_ms": 3000,
      "tool_filter": ["shell", "bash"]
    },
    {
      "event": "after_tool_call",
      "command": ["./hooks/audit-log.sh"],
      "timeout_ms": 5000
    }
  ]
}

Event	Can Deny?	When
`before_tool_call`	Yes (exit 1)	Before tool execution
`after_tool_call`	No	After tool completes
`before_llm_call`	Yes (exit 1)	Before LLM request
`after_llm_call`	No	After LLM response

Prompt Fragments

Inject content into the system prompt without writing code:

{
  "name": "company-policy",
  "version": "1.0.0",
  "prompts": {
    "include": ["prompts/*.md"]
  }
}

Extras-Only Skills

Skills don’t need to provide tools. Valid combinations:

Prompt-only: Teach the agent domain knowledge (no binary needed)
Hooks-only: Enforce policies across all tool calls
MCP-only: Expose tools via remote MCP servers
Combined: Tools + MCP + hooks + prompts in one skill

Security

Binary integrity:

Symlinks rejected (defense against link-swap attacks)
SHA-256 verification when sha256 is set in manifest
Size limit: 100 MB max per binary

Environment sanitization — these vars are stripped before spawning skills:

LD_PRELOAD, DYLD_INSERT_LIBRARIES, DYLD_LIBRARY_PATH
NODE_OPTIONS, PYTHONPATH, PERL5LIB
RUSTFLAGS, RUST_LOG, and 10+ others

Best practices:

Validate all input (never trust user-provided paths, names, etc.)
Use timeouts on HTTP requests
Avoid shell injection
Set sha256 in manifest for release builds

Platform Skills vs App Skills

	App Skills	Platform Skills
Location	`crates/app-skills/`	`crates/platform-skills/`
Bootstrap	Every gateway startup	Admin bot only
Scope	Per-gateway	Shared across gateways
Use when	Self-contained, always available	Requires external service

Examples

Example 1: Clock (Local, No Network)

crates/app-skills/time/
├── Cargo.toml          # chrono, chrono-tz, serde, serde_json
├── manifest.json       # 1 tool: get_time, timeout_secs: 5
├── SKILL.md            # Triggers: time, clock
└── src/main.rs         # System clock + timezone formatting

Example 2: Weather (Network API)

crates/app-skills/weather/
├── Cargo.toml          # reqwest (blocking, rustls-tls), serde, serde_json
├── manifest.json       # 2 tools: get_weather, get_forecast, timeout_secs: 15
├── SKILL.md            # Triggers: weather, forecast
└── src/main.rs         # Geocode city → Open-Meteo API

Example 3: Email (Environment Credentials)

crates/app-skills/send-email/
├── Cargo.toml          # lettre, serde, serde_json
├── manifest.json       # 1 tool: send_email
├── SKILL.md            # requires_env: SMTP_HOST,SMTP_USERNAME,SMTP_PASSWORD
└── src/main.rs         # SMTP with credential validation

Checklists

Tool Skill (binary + tools)

Directory has manifest.json, SKILL.md, and executable (main or binary)
manifest.json has valid JSON Schema for all tool inputs
SKILL.md has frontmatter with trigger keywords
Binary reads argv[1] for tool name, stdin for JSON
Binary writes {"output": "...", "success": true/false} to stdout
Error cases return success: false with clear messages
Standalone test passes: echo '{"param": "val"}' | ./main my_tool
Gateway test passes: skill loads and agent can invoke it

Extras Skill (MCP / hooks / prompts)

mcp_servers: command or url set; env lists names only
hooks: valid event name; command is argv array; relative paths resolve
prompts: glob patterns match intended .md files
Extras-only: tools is empty or omitted, no binary needed

Publishing

Repo pushed to GitHub with manifest.json and SKILL.md at expected paths
Registry PR submitted to octos-hub
(Optional) Pre-built binaries for darwin-aarch64, linux-x86_64
(Optional) SHA-256 hashes in manifest.json binaries section
(Optional) GitHub Actions workflow for automated binary builds on release tags

Architecture Document: octos

Overview

octos is a 15-member Rust workspace (Edition 2024, rust-version 1.85.0) providing both a coding agent CLI and a multi-channel messaging gateway. Pure Rust TLS via rustls (no OpenSSL). Error handling via eyre/color-eyre.

Workspace members:

6 core crates: octos-core, octos-memory, octos-llm, octos-agent, octos-bus, octos-cli
1 pipeline crate: octos-pipeline
7 app-skill crates: news, deep-search, deep-crawl, send-email, account-manager, time, weather
1 platform-skill crate: asr

┌─────────────────────────────────────────────────────────────┐
│                        octos-cli                             │
│           (CLI: chat, gateway, init, status)                │
├──────────────────────────┬──────────────────────────────────┤
│       octos-agent         │           octos-bus               │
│  (Agent, Tools, Skills)  │  (Channels, Sessions, Cron)     │
├──────────┬───────────────┼──────────────────────────────────┤
│octos-memory│  octos-llm    │       octos-pipeline              │
│(Episodes) │ (Providers)  │  (DOT-based orchestration)      │
├──────────┴───────────────┴──────────────────────────────────┤
│                       octos-core                             │
│            (Types, Messages, Gateway Protocol)              │
└─────────────────────────────────────────────────────────────┘

octos-core — Foundation Types

Shared types with no internal dependencies. Only depends on serde, chrono, uuid, eyre.

MessageRole implements as_str() -> &'static str and Display for consistent string conversion across providers (system/user/assistant/tool).

Task Model

#![allow(unused)]
fn main() {
pub struct Task {
    pub id: TaskId,                   // UUID v7 (temporal ordering)
    pub parent_id: Option<TaskId>,    // For subtasks
    pub status: TaskStatus,
    pub kind: TaskKind,
    pub context: TaskContext,
    pub result: Option<TaskResult>,
    pub created_at: DateTime<Utc>,
    pub updated_at: DateTime<Utc>,
}
}

TaskId: Newtype over Uuid. Generates UUID v7 via Uuid::now_v7(). Implements Display, FromStr, Default.

TaskStatus (tagged enum, "state" discriminant):

Pending — awaiting assignment
InProgress { agent_id: AgentId } — executing
Blocked { reason: String } — waiting for dependency
Completed — success
Failed { error: String } — failure with message

TaskKind (tagged enum, "type" discriminant):

Plan { goal: String }
Code { instruction: String, files: Vec<PathBuf> }
Review { diff: String }
Test { command: String }
Custom { name: String, params: serde_json::Value }

TaskContext:

working_dir: PathBuf, git_state: Option<GitState>, working_memory: Vec<Message>, episodic_refs: Vec<EpisodeRef>, files_in_scope: Vec<PathBuf>

TaskResult:

success: bool, output: String, files_modified: Vec<PathBuf>, subtasks: Vec<TaskId>, token_usage: TokenUsage

TokenUsage: input_tokens: u32, output_tokens: u32 (defaults to 0/0)

Message Types

#![allow(unused)]
fn main() {
pub struct Message {
    pub role: MessageRole,           // System | User | Assistant | Tool
    pub content: String,
    pub media: Vec<String>,          // File paths (images, audio)
    pub tool_calls: Option<Vec<ToolCall>>,
    pub tool_call_id: Option<String>,
    pub timestamp: DateTime<Utc>,
}

pub struct ToolCall {
    pub id: String,
    pub name: String,
    pub arguments: serde_json::Value,
}
}

Gateway Protocol

#![allow(unused)]
fn main() {
pub struct InboundMessage {       // channel → agent
    pub channel: String,          // "telegram", "cli", "discord", etc.
    pub sender_id: String,
    pub chat_id: String,
    pub content: String,
    pub timestamp: DateTime<Utc>,
    pub media: Vec<String>,
    pub metadata: serde_json::Value,
}

pub struct OutboundMessage {      // agent → channel
    pub channel: String,
    pub chat_id: String,
    pub content: String,
    pub reply_to: Option<String>,
    pub media: Vec<String>,
    pub metadata: serde_json::Value,
}
}

InboundMessage::session_key() derives SessionKey::new(channel, chat_id) — format "{channel}:{chat_id}".

Inter-Agent Coordination

#![allow(unused)]
fn main() {
pub enum AgentMessage {           // tagged: "type", snake_case
    TaskAssign { task: Box<Task> },
    TaskUpdate { task_id: TaskId, status: TaskStatus },
    TaskComplete { task_id: TaskId, result: TaskResult },
    ContextRequest { task_id: TaskId, query: String },
    ContextResponse { task_id: TaskId, context: Vec<Message> },
}
}

Error System

#![allow(unused)]
fn main() {
pub struct Error {
    pub kind: ErrorKind,
    pub context: Option<String>,      // Chained context
    pub suggestion: Option<String>,   // Actionable fix hint
}
}

ErrorKind variants: TaskNotFound, AgentNotFound, InvalidStateTransition, LlmError, ApiError (status-aware: 401→check key, 429→rate limit), ToolError, ConfigError, ApiKeyNotSet, UnknownProvider, Timeout, ChannelError, SessionError, IoError, SerializationError, Other(eyre::Report).

Utilities

truncate_utf8(s: &mut String, max_len: usize, suffix: &str) — in-place truncation at UTF-8 char boundaries. Appends suffix after truncation. Used across all tool outputs.

octos-llm — LLM Provider Abstraction

Provider Trait

#![allow(unused)]
fn main() {
#[async_trait]
pub trait LlmProvider: Send + Sync {
    async fn chat(&self, messages: &[Message], tools: &[ToolSpec], config: &ChatConfig) -> Result<ChatResponse>;
    async fn chat_stream(&self, messages: &[Message], tools: &[ToolSpec], config: &ChatConfig) -> Result<ChatStream>;  // default: falls back to chat()
    fn context_window(&self) -> u32;  // default: context_window_tokens(self.model_id())
    fn model_id(&self) -> &str;
    fn provider_name(&self) -> &str;
}
}

Configuration

#![allow(unused)]
fn main() {
pub struct ChatConfig {
    pub max_tokens: Option<u32>,        // default: Some(4096)
    pub temperature: Option<f32>,       // default: Some(0.0)
    pub tool_choice: ToolChoice,        // Auto | Required | None | Specific { name }
    pub stop_sequences: Vec<String>,
}
}

Response Types

#![allow(unused)]
fn main() {
pub struct ChatResponse {
    pub content: Option<String>,
    pub tool_calls: Vec<ToolCall>,
    pub stop_reason: StopReason,       // EndTurn | ToolUse | MaxTokens | StopSequence
    pub usage: TokenUsage,
}

pub enum StreamEvent {
    TextDelta(String),
    ToolCallDelta { index, id, name, arguments_delta },
    Usage(TokenUsage),
    Done(StopReason),
    Error(String),
}

pub type ChatStream = Pin<Box<dyn Stream<Item = StreamEvent> + Send>>;
}

Provider Registry (`registry/`)

All providers are defined in octos-llm/src/registry/ — one file per provider. Each file exports a ProviderEntry with metadata (name, aliases, default model, API key env var, base URL) and a create() factory function. Adding a new provider = one file + one line in mod.rs.

#![allow(unused)]
fn main() {
pub struct ProviderEntry {
    pub name: &'static str,              // canonical name
    pub aliases: &'static [&'static str], // e.g. ["google"] for gemini
    pub default_model: Option<&'static str>,
    pub api_key_env: Option<&'static str>,
    pub default_base_url: Option<&'static str>,
    pub requires_api_key: bool,
    pub requires_base_url: bool,          // true for vllm
    pub requires_model: bool,             // true for vllm
    pub detect_patterns: &'static [&'static str], // model→provider auto-detect
    pub create: fn(CreateParams) -> Result<Arc<dyn LlmProvider>>,
}

pub struct CreateParams {
    pub api_key: Option<String>,
    pub model: Option<String>,
    pub base_url: Option<String>,
    pub model_hints: Option<ModelHints>,  // config-level override
}
}

Lookup: registry::lookup(name) — case-insensitive, matches canonical name or aliases. Auto-detect: registry::detect_provider(model) — infers provider from model name patterns.

Native Providers (4 protocol implementations)

Provider	Base URL	Auth Header	Image Format	Default Model
Anthropic	api.anthropic.com	x-api-key	Base64 blocks	claude-sonnet-4-20250514
OpenAI	api.openai.com/v1	Authorization: Bearer	Data URI	gpt-4o
Gemini	generativelanguage.googleapis.com/v1beta	x-goog-api-key	Base64 inline	gemini-2.5-flash
OpenRouter	openrouter.ai/api/v1	Authorization: Bearer	Data URI	anthropic/claude-sonnet-4-20250514

OpenAI-Compatible Providers (via `OpenAIProvider::with_base_url()`)

Provider	Aliases	Base URL	Default Model	API Key Env
DeepSeek	—	api.deepseek.com/v1	deepseek-chat	DEEPSEEK_API_KEY
Groq	—	api.groq.com/openai/v1	llama-3.3-70b-versatile	GROQ_API_KEY
Moonshot	kimi	api.moonshot.ai/v1	kimi-k2.5	MOONSHOT_API_KEY
DashScope	qwen	dashscope.aliyuncs.com/compatible-mode/v1	qwen-max	DASHSCOPE_API_KEY
MiniMax	—	api.minimax.io/v1	MiniMax-Text-01	MINIMAX_API_KEY
Zhipu	glm	open.bigmodel.cn/api/paas/v4	glm-4-plus	ZHIPU_API_KEY
Nvidia	nim	integrate.api.nvidia.com/v1	meta/llama-3.3-70b-instruct	NVIDIA_API_KEY
Ollama	—	localhost:11434/v1	llama3.2	(none)
vLLM	—	(user-provided)	(user-provided)	VLLM_API_KEY

Anthropic-Compatible Provider

Provider	Aliases	Base URL	Default Model	API Key Env
Z.AI	zai, z.ai	api.z.ai/api/anthropic	glm-5	ZAI_API_KEY

ModelHints (OpenAI provider)

Auto-detected from model name at construction, overridable via config model_hints:

#![allow(unused)]
fn main() {
pub struct ModelHints {
    pub uses_completion_tokens: bool,  // o-series, gpt-5, gpt-4.1
    pub fixed_temperature: bool,       // o-series, kimi-k2.5
    pub lacks_vision: bool,            // deepseek, minimax, mistral, yi-
    pub merge_system_messages: bool,   // default: true
}
}

SSE Streaming

parse_sse_response(response) -> impl Stream<Item = SseEvent> — stateful unfold-based parser. Max buffer: 1 MB. Handles \n\n and \r\n\r\n separators. Each provider maps SSE events to StreamEvent:

Anthropic: message_start → input tokens, content_block_start/delta → text/tool chunks, message_delta → stop reason. Custom SSE state machine.
OpenAI/OpenRouter: Standard OpenAI SSE with [DONE] sentinel. delta.content for text, delta.tool_calls[] for tools. Shared parser: parse_openai_sse_events().
Gemini: alt=sse endpoint. candidates[0].content.parts[] with function call data.

RetryProvider

Wraps any Arc<dyn LlmProvider> with exponential backoff. Wrapped by ProviderChain for multi-provider failover.

#![allow(unused)]
fn main() {
pub struct RetryConfig {
    pub max_retries: u32,           // default: 3
    pub initial_delay: Duration,    // default: 1s
    pub max_delay: Duration,        // default: 60s
    pub backoff_multiplier: f64,    // default: 2.0
}
}

Delay formula: initial_delay * backoff_multiplier^attempt, capped at max_delay.

Retryable errors (three-tier detection):

HTTP status: 429, 500, 502, 503, 504, 529
reqwest: is_connect() or is_timeout()
String fallback: “connection refused”, “timed out”, “overloaded”

Provider Failover Chain

ProviderChain wraps multiple Arc<dyn LlmProvider> and transparently fails over on retriable errors. Configured via fallback_models in config.

#![allow(unused)]
fn main() {
pub struct ProviderChain {
    slots: Vec<ProviderSlot>,       // provider + AtomicU32 failure count
    failure_threshold: u32,         // default: 3
}
}

Behavior: Tries providers in order, skipping degraded ones (failures >= threshold). On retriable error, moves to the next. On success, resets failure count. If all degraded, picks the one with fewest failures.

Failoverable: Broader than retryable — includes 401/403, timeouts, and content-format 400 errors (e.g. "must not be empty", "reasoning_content", "API key not valid", "invalid_value"). These should not retry on the same provider but should failover to a different one.

AdaptiveRouter (`adaptive.rs`)

Metrics-driven provider selection with three mutually exclusive modes (Off, Hedge, Lane). Tracks per-provider EMA latency (configurable ema_alpha, default 0.3), p95 latency (64-sample circular buffer), error rates, throughput (output tokens/sec EMA), and cost. Four-factor scoring: stability, quality, priority, cost (all weights configurable). Includes circuit breaker, probe requests, model catalog seeding from model_catalog.json, and QoS ranking. Scoring uses EMA blending: baseline catalog data at cold start, live metrics gradually replace it (weight ramps from 0 to 1 over 10 calls).

#![allow(unused)]
fn main() {
pub struct AdaptiveSlot {
    provider: Arc<dyn LlmProvider>,
    metrics: ProviderMetrics,
    priority: usize,
    cost_per_m: f64,
    model_type: Mutex<ModelType>,        // Strong | Fast
    cost_in: AtomicU64,
    ds_output: AtomicU64,                // deep search output quality
    baseline_stability: AtomicU64,
    baseline_tool_avg_ms: AtomicU64,
    baseline_p95_ms: AtomicU64,
    context_window: AtomicU64,
    max_output: AtomicU64,
}
}

Hedge mode: Races primary + cheapest alternate via tokio::select!, cancels loser. Only completed requests record metrics (cancelled loser metrics are discarded). If primary fails, alternate is tried sequentially.

Lane mode: Scores all providers, picks single best. Probe requests sent to stale providers (configurable probability, default 0.1; interval, default 60s).

FallbackProvider (`fallback.rs`)

Wraps primary + QoS-ranked fallbacks. On failure, records cooldown via ProviderRouter. Tries each fallback in order.

SwappableProvider (`swappable.rs`)

Runtime model switching via RwLock. Leaks ~50 bytes per swap (acceptable for rare user-initiated changes). cached_model_id and cached_provider_name are leaked &'static str to satisfy the &str return type.

ProviderRouter (`router.rs`)

Sub-agent multi-model routing with prefix-based key resolution. Supports cooldown (60s default), QoS-scored compatible_fallbacks() (sorted by model catalog score), cost info auto-derived from pricing.rs, and metadata for LLM-visible tool schemas.

#![allow(unused)]
fn main() {
pub struct ProviderRouter {
    providers: RwLock<HashMap<String, Arc<dyn LlmProvider>>>,
    active_key: RwLock<Option<String>>,
    metadata: RwLock<HashMap<String, SubProviderMeta>>,
    cooldowns: RwLock<HashMap<String, Instant>>,
    qos_scores: RwLock<HashMap<String, f64>>,
}
}

OminixClient (`ominix.rs`)

Client for local ASR/TTS via Ominix runtime.

Token Estimation

#![allow(unused)]
fn main() {
pub fn estimate_tokens(text: &str) -> u32  // ~4 chars/token ASCII, ~1.5 chars/token CJK
pub fn estimate_message_tokens(msg: &Message) -> u32  // content + tool_calls + 4 overhead
}

Context Windows

Model Family	Tokens
Claude 3/4	200,000
GPT-4o/4-turbo	128,000
o1/o3/o4	200,000
Gemini 2.0/1.5	1,000,000
Default (unknown)	128,000

Pricing

model_pricing(model_id) -> Option<ModelPricing> — case-insensitive substring match. Cost = (input/1M) * input_rate + (output/1M) * output_rate.

Model	Input $/1M	Output $/1M
claude-opus-4	15.00	75.00
claude-sonnet-4	3.00	15.00
claude-haiku	0.80	4.00
gpt-4o	2.50	10.00
gpt-4o-mini	0.15	0.60
o3/o4	10.00	40.00

Embedding

#![allow(unused)]
fn main() {
pub trait EmbeddingProvider: Send + Sync {
    async fn embed(&self, texts: &[&str]) -> Result<Vec<Vec<f32>>>;
    fn dimension(&self) -> usize;
}
}

OpenAIEmbedder: Default model text-embedding-3-small (1536 dims). text-embedding-3-large = 3072 dims.

Transcription

GroqTranscriber: Whisper whisper-large-v3 via https://api.groq.com/openai/v1/audio/transcriptions. Multipart form. 60s timeout. MIME detection: ogg/opus→audio/ogg, mp3→audio/mpeg, m4a→audio/mp4, wav→audio/wav.

Vision

encode_image(path) -> (mime_type, base64_data) — JPEG/PNG/GIF/WebP. is_image(path) -> bool.

Typed Error Hierarchy (`error.rs`)

LlmError with LlmErrorKind enum: Authentication, RateLimited, ContextOverflow, ModelNotFound, ServerError, Network, Timeout, InvalidRequest, ContentFiltered, StreamError, Provider. is_retryable() returns true for RateLimited, ServerError, Network, Timeout, StreamError. from_status(code, body) maps HTTP status codes to error kinds. Provider response bodies logged at debug level only (not exposed in error messages).

High-Level Client (`high_level.rs`)

LlmClient wraps Arc<dyn LlmProvider> with ergonomic APIs: generate(prompt), generate_with(messages, tools, config), generate_object(prompt, schema_name, schema), generate_typed<T>(prompt, schema_name, schema), stream(prompt), stream_with(messages, tools, config). Configurable via with_config(ChatConfig).

Middleware Pipeline (`middleware.rs`)

LlmMiddleware trait with before()/after()/on_error() hooks. MiddlewareStack wraps LlmProvider and runs layers in insertion order. before() can short-circuit with cached responses. Built-in: LoggingMiddleware (tracing), CostTracker (AtomicU64 counters for input/output tokens and request count). Streaming bypasses middleware (logged as debug warning).

Model Catalog (`catalog.rs`)

ModelCatalog with ModelInfo (id, name, provider, context_window, max_output_tokens, capabilities, cost, aliases). Lookup by ID or alias via HashMap index. with_defaults() pre-registers 4 models (Claude Sonnet 4, Claude Haiku 4.5, GPT-4o, Gemini 2.5 Flash). by_provider() and with_capability() for filtered queries.

octos-memory — Persistence & Search

EpisodeStore

redb database at .octos/episodes.redb with three tables:

Table	Key	Value	Purpose
episodes	&str (episode_id)	&str (JSON)	Full episode records
cwd_index	&str (working_dir)	&str (JSON array of IDs)	Directory-scoped lookup
embeddings	&str (episode_id)	&[u8] (bincode Vec)	Vector embeddings

#![allow(unused)]
fn main() {
pub struct Episode {
    pub id: String,                   // UUID v7
    pub task_id: TaskId,
    pub agent_id: AgentId,
    pub working_dir: PathBuf,
    pub summary: String,              // LLM-generated, truncated to 500 chars
    pub outcome: EpisodeOutcome,      // Success | Failure | Blocked | Cancelled
    pub key_decisions: Vec<String>,
    pub files_modified: Vec<PathBuf>,
    pub created_at: DateTime<Utc>,
}
}

Operations:

store(episode) — serialize to JSON, update cwd_index, insert into in-memory HybridIndex
get(id) — direct lookup by episode_id
find_relevant(cwd, query, limit) — keyword matching scoped to directory
recent_for_cwd(cwd, n) — N most recent by created_at descending
store_embedding(id, Vec<f32>) — bincode serialize, store in embeddings table, update HybridIndex
find_relevant_hybrid(query, query_embedding, limit) — global hybrid search across all episodes

Initialization: On open(), rebuilds in-memory HybridIndex by iterating all episodes and loading embeddings from DB.

MemoryStore

File-based persistent memory at {data_dir}/memory/:

MEMORY.md — long-term memory (full overwrite)
YYYY-MM-DD.md — daily notes (append with date header)

get_memory_context() builds system prompt injection:

## Long-term Memory — full MEMORY.md
## Recent Activity — 7-day rolling window of daily notes
## Today's Notes — current day

HybridIndex — BM25 + Vector Search

#![allow(unused)]
fn main() {
pub struct HybridIndex {
    inverted: HashMap<String, Vec<(usize, u32)>>,  // term → [(doc_idx, raw_tf_count)]
    doc_lengths: Vec<usize>,
    total_len: usize,                         // running total for O(1) avg_dl
    avg_dl: f64,
    ids: Vec<String>,
    hnsw: Option<Hnsw<'static, f32, DistCosine>>,
    has_embedding: Vec<bool>,
    dimension: usize,                               // default: 1536
}
}

BM25 scoring (constants: K1=1.2, B=0.75):

Tokenization: lowercase, split on non-alphanumeric, filter tokens < 2 chars
IDF: ln((N - df + 0.5) / (df + 0.5) + 1.0)
Score: IDF * (tf * (K1 + 1)) / (tf + K1 * (1 - B + B * dl/avg_dl)) — uses raw term counts (not normalized)
Duplicate detection: ids.contains(episode_id) skips already-indexed documents (line 76-78)
Normalized to [0, 1] range (epsilon 1e-10 prevents NaN from near-zero max scores)

HNSW vector index (via hnsw_rs):

Named constants: HNSW_MAX_NB_CONNECTION=16, HNSW_CAPACITY=10_000, HNSW_EF_CONSTRUCTION=200, HNSW_MAX_LAYER=16, DistCosine
L2 normalization before insertion/search; zero vectors rejected (returns None)
Cosine similarity = 1 - distance (DistCosine returns 1-cos_sim)

Hybrid ranking — fetches limit * 4 candidates from each:

Configurable weights via with_weights(vector_weight, bm25_weight) (defaults: 0.7 / 0.3)
Without vectors: BM25 only (graceful fallback)

octos-agent — Agent Runtime

Agent Core

#![allow(unused)]
fn main() {
pub struct Agent {
    id: AgentId,
    llm: Arc<dyn LlmProvider>,
    tools: ToolRegistry,
    memory: Arc<EpisodeStore>,
    embedder: Option<Arc<dyn EmbeddingProvider>>,
    system_prompt: RwLock<String>,
    config: AgentConfig,
    reporter: Arc<dyn ProgressReporter>,
    shutdown: Arc<AtomicBool>,       // Acquire/Release ordering
}

pub struct AgentConfig {
    pub max_iterations: u32,          // default: 50 (CLI overrides to 20)
    pub max_tokens: Option<u32>,      // None = unlimited
    pub max_timeout: Option<Duration>,// default: 600s wall-clock timeout
    pub save_episodes: bool,          // default: true
}
}

Execution Loop (`run_task` / `process_message`)

1. Build messages: system prompt + history + memory context + input
2. Loop (up to max_iterations):
   a. Check shutdown flag and token budget
   b. trim_to_context_window() — compact if needed
   c. Call LLM via chat_stream()
   d. Consume stream → accumulate text, tool_calls, tokens
   e. Match stop_reason:
      - EndTurn/StopSequence → save episode, return result
      - ToolUse → execute_tools() → append results → continue
      - MaxTokens → return result

ConversationResponse: content: String, token_usage: TokenUsage, files_modified: Vec<PathBuf>, streamed: bool

Episode saving: After task completion, fires-and-forgets embedding generation if embedder present.

Wall-clock timeout: Agent aborts after max_timeout (default 600s) regardless of iteration count.

Tool Output Sanitization

Before feeding tool results back to the LLM, sanitize_tool_output() (in sanitize.rs) strips noise:

Base64 data URIs: data:...;base64,<payload> → [base64-data-redacted]
Long hex strings: 64+ contiguous hex chars (SHA-256, raw keys) → [hex-redacted]

Context Compaction

Triggered when estimated tokens exceed 80% of context window / 1.2 safety margin.

Algorithm:

Keep MIN_RECENT_MESSAGES (6) most recent non-system messages
Don’t split inside tool call/result pairs
Summarize old messages: first line (200 chars), strip tool arguments, drop media
Budget: 40% of total for summary (BASE_CHUNK_RATIO = 0.4)
Replace: [System, CompactionSummary, Recent1, Recent2, ...]

Format:

User: > User: first line [media omitted]
Assistant: > Assistant: content or - Called tool_name
Tool: -> tool_name: ok|error - first 100 chars

Bundled App Skills (`bundled_app_skills.rs`)

Compile-time embedded app-skill entries. Each app-skill crate (news, deep-search, deep-crawl, etc.) is registered as a bundled skill available at runtime.

Bootstrap (`bootstrap.rs`)

Bootstraps bundled skills at gateway startup. Ensures all bundled app-skills are registered and available.

Prompt Guard (`prompt_guard.rs`)

Prompt injection detection. ThreatKind enum classifies detected threats. Scans user input before passing to the agent.

Tool System

#![allow(unused)]
fn main() {
pub trait Tool: Send + Sync {
    fn name(&self) -> &str;
    fn description(&self) -> &str;
    fn tags(&self) -> &[&str];
    fn input_schema(&self) -> serde_json::Value;
    async fn execute(&self, args: &serde_json::Value) -> Result<ToolResult>;
}

pub struct ToolResult {
    pub output: String,
    pub success: bool,
    pub file_modified: Option<PathBuf>,
    pub tokens_used: Option<TokenUsage>,
}
}

ToolRegistry: HashMap<String, Arc<dyn Tool>> with provider_policy: Option<ToolPolicy> for soft filtering.

Built-in Tools (14)

Tool	Parameters	Key Behavior
read_file	path, start_line?, end_line?	Line numbers (NNN\|), 100KB truncation, symlink rejection
write_file	path, content	Creates parent dirs, returns file_modified
edit_file	path, old_string, new_string	Exact match required, error on 0 or >1 occurrences
diff_edit	path, diff	Unified diff with fuzzy matching (+-3 lines), reverse hunk application
glob	pattern, limit=100	Rejects absolute paths and `..`, relative results
grep	pattern, file_pattern?, limit=50, context=0, ignore_case=false	.gitignore-aware via `ignore::WalkBuilder`, regex with `(?i)` flag
list_dir	path	Sorted, `[dir]`/`[file]` prefix
shell	command, timeout_secs=120	SafePolicy check, 50KB output truncation, sandbox-wrapped, timeout clamped to [1, 600]s
web_search	query, count=5	Brave Search API (BRAVE_API_KEY)
web_fetch	url, extract_mode=“markdown”, max_chars=50000	SSRF protection, htmd HTML→markdown, 30s timeout
message	content, channel?, chat_id?	Cross-channel messaging via OutboundMessage. Gateway-only
spawn	task, label?, mode=“background”, allowed_tools, context?	Subagent with inherited provider policy. sync=inline, background=async. Gateway-only
cron	action, message, schedule params	Schedule add/list/remove/enable/disable. Gateway-only
browser	action, url?, selector?, text?, expression?	Headless Chrome via CDP (always compiled). Actions: navigate (SSRF + scheme check), get_text, get_html, click, type, screenshot, evaluate, close. 5min idle timeout, env sanitization, 10s JS timeout, early action validation

Registration: Core tools registered in ToolRegistry::with_builtins() (all modes). Browser is always compiled. Message, spawn, and cron are registered only in gateway mode (gateway.rs).

Tool Policies

#![allow(unused)]
fn main() {
pub struct ToolPolicy {
    pub allow: Vec<String>,   // empty = allow all
    pub deny: Vec<String>,    // deny-wins
}
}

Groups: group:fs (read_file, write_file, edit_file, diff_edit), group:runtime (shell), group:web (web_search, web_fetch, browser), group:search (glob, grep, list_dir), group:sessions (spawn).

Wildcards: exec* matches prefix. Provider-specific policies via config tools.byProvider.

Command Policy (ShellTool)

#![allow(unused)]
fn main() {
pub enum Decision { Allow, Deny, Ask }
}

SafePolicy deny patterns: rm -rf /, rm -rf /*, dd if=, mkfs, :(){:|:&};:, chmod -R 777 /. Commands are whitespace-normalized before matching to prevent evasion via extra spaces/tabs.

SafePolicy ask patterns: sudo, rm -rf, git push --force, git reset --hard

Sandbox

#![allow(unused)]
fn main() {
pub enum SandboxMode { Auto, Bwrap, Macos, Docker, None }
}

BLOCKED_ENV_VARS (18 vars, shared across all backends + MCP): LD_PRELOAD, LD_LIBRARY_PATH, LD_AUDIT, DYLD_INSERT_LIBRARIES, DYLD_LIBRARY_PATH, DYLD_FRAMEWORK_PATH, DYLD_FALLBACK_LIBRARY_PATH, DYLD_VERSIONED_LIBRARY_PATH, NODE_OPTIONS, PYTHONSTARTUP, PYTHONPATH, PERL5OPT, RUBYOPT, RUBYLIB, JAVA_TOOL_OPTIONS, BASH_ENV, ENV, ZDOTDIR

Backend	Isolation	Network	Path Validation
Bwrap (Linux)	RO bind /usr,/lib,/bin,/sbin,/etc; RW bind workdir; tmpfs /tmp; unshare-pid	`--unshare-net` if !allow_network	N/A
Macos (sandbox-exec)	SBPL profile: process-exec/fork, file-read*, writes to workdir+/private/tmp	`(allow network)` or `(deny network)`	Rejects control chars, `(`, `)`, `\`, `"`
Docker	`--rm --security-opt no-new-privileges --cap-drop ALL`	`--network none`	Rejects `:`, `\0`, `\n`, `\r`

Docker resource limits: --cpus, --memory, --pids-limit. Mount modes: None (/tmp workdir), ReadOnly, ReadWrite.

Hooks System

Lifecycle hooks run shell commands at agent events. Configured via hooks array in config.

#![allow(unused)]
fn main() {
pub enum HookEvent { BeforeToolCall, AfterToolCall, BeforeLlmCall, AfterLlmCall }

pub struct HookConfig {
    pub event: HookEvent,
    pub command: Vec<String>,       // argv array (no shell interpretation)
    pub timeout_ms: u64,            // default: 5000
    pub tool_filter: Vec<String>,   // tool events only; empty = all
}
}

Shell protocol: JSON payload on stdin. Exit code semantics: 0=allow, 1=deny (before-hooks only), 2+=error. Before-hooks can deny operations; after-hook exit codes only count as errors.

Circuit breaker: HookExecutor auto-disables a hook after 3 consecutive failures (configurable via with_threshold()). Resets on success.

Environment: Commands sanitized via BLOCKED_ENV_VARS. Tilde expansion supports ~/ and ~username/.

Integration: Wired into chat.rs, gateway.rs, serve.rs. Hook config changes trigger restart via config watcher.

MCP Integration

JSON-RPC transport for Model Context Protocol servers. Two transport modes:

Transports:

Stdio: Spawns server as child process (command + args + env). Line limit: 1MB. Env sanitized via BLOCKED_ENV_VARS.
HTTP/SSE: Connects to remote server via url field. POST JSON, SSE response handling.

Lifecycle (stdio):

Spawn server (command + args + env, filtering BLOCKED_ENV_VARS)
Initialize: protocolVersion: "2024-11-05"
Discover tools: tools/list RPC
Validate input schemas (max depth 10, max size 64KB); reject tools with invalid schemas
Register McpTool wrappers (30s timeout, 1MB max response)

McpTool execution: tools/call with name + arguments. Extracts content[].text from response.

Skills System

Skills are markdown instruction files that extend agent capabilities. Two sources: built-in (compiled into binary) and workspace (user-installed).

Skill File Format (SKILL.md)

---
name: skill_name
description: What it does
requires_bins: binary1, binary2    # comma-separated, checked via `which`
requires_env: ENV_VAR1, ENV_VAR2   # comma-separated, checked via std::env::var()
always: true|false                 # auto-load into system prompt when available
---
Skill instructions here (markdown). This body is injected into the agent's
system prompt when the skill is activated.

Frontmatter parsing: Simple key: value line matching (not full YAML). split_frontmatter() finds content between --- delimiters. strip_frontmatter() returns body only.

SkillInfo

#![allow(unused)]
fn main() {
pub struct SkillInfo {
    pub name: String,
    pub description: String,
    pub path: PathBuf,          // filesystem path or "(built-in)/name/SKILL.md"
    pub available: bool,        // bins_ok && env_ok
    pub always: bool,           // auto-load into system prompt
    pub builtin: bool,          // true if from BUILTIN_SKILLS, false if workspace
}
}

Availability check: available = requires_bins all found on PATH AND requires_env all set. Missing requirements make the skill unavailable but still listed.

SkillsLoader

#![allow(unused)]
fn main() {
pub struct SkillsLoader {
    skills_dir: PathBuf,        // {data_dir}/skills/
}
}

Methods:

list_skills() — scans workspace dir + built-ins. Workspace skills override built-ins with same name (checked via HashSet). Results sorted alphabetically.
load_skill(name) — returns body (frontmatter stripped). Checks workspace first, falls back to built-in.

build_skills_summary() — generates XML for system prompt injection:

<skills>
  <skill available="true">
    <name>skill_name</name>
    <description>What it does</description>
    <location>/path/to/SKILL.md</location>
  </skill>
</skills>

get_always_skills() — filters skills where always: true AND available: true.
load_skills_for_context(names) — loads multiple skills, joins with \n---\n.

Built-in Skills (3, compile-time `include_str!()`)

#![allow(unused)]
fn main() {
pub struct BuiltinSkill {
    pub name: &'static str,
    pub content: &'static str,  // full SKILL.md including frontmatter
}
pub const BUILTIN_SKILLS: &[BuiltinSkill] = &[...];
}

Skill	Purpose
cron	Task scheduling instructions
skill-store	Skill store browsing and installation
skill-creator	Create new skills
tmux	Terminal multiplexer control
weather	Weather information retrieval

CLI Management (`octos skills`)

list — shows built-in skills (with override status) + workspace skills
install <user/repo/skill-name> — fetches SKILL.md from https://raw.githubusercontent.com/{repo}/main/SKILL.md (15s timeout), saves to .octos/skills/{name}/SKILL.md. Fails if skill already exists.
remove <name> — deletes .octos/skills/{name}/ directory

Integration with Gateway

In the gateway command, skills are loaded during system prompt construction:

get_always_skills() — collects auto-load skill names
load_skills_for_context(names) — loads and joins skill bodies
build_skills_summary() — appends XML skill index to system prompt
Always-on skill content is prepended to the system prompt

Plugin System

Plugins extend the agent with external tools via standalone executables. Each plugin is a directory containing a manifest.json and an executable file.

Directory Layout

<octos_home>/plugins/                 # deployment-scoped plugins
<octos_home>/skills/                  # deployment-scoped skills
<octos_home>/bundled-app-skills/      # bundled app skills
~/.octos/profiles/<profile>/data/skills/
  └── my-plugin/
      ├── manifest.json  # plugin metadata + tool definitions
      └── my-plugin      # executable (or "main" as fallback)

Discovery order: Config::plugin_dirs_from_project() scans deployment-scoped <octos_home>/plugins, <octos_home>/skills, <octos_home>/bundled-app-skills, and OCTOS_SKILLS_PATH; managed profile gateways then layer platform skills and the active profile’s data/skills/ directory on top. Legacy HOME-rooted globals (~/.octos/plugins, ~/.octos/skills) are no longer scanned except for a one-shot migration warning.

PluginManifest

#![allow(unused)]
fn main() {
pub struct PluginManifest {
    pub name: String,
    pub version: String,
    pub tools: Vec<PluginToolDef>,    // default: empty vec
}

pub struct PluginToolDef {
    pub name: String,                 // must be unique across all plugins
    pub description: String,
    pub input_schema: serde_json::Value,  // default: {"type": "object"}
}
}

Example manifest.json:

{
  "name": "my-plugin",
  "version": "0.1.0",
  "tools": [
    {
      "name": "greet",
      "description": "Greet someone by name",
      "input_schema": {
        "type": "object",
        "properties": { "name": { "type": "string" } }
      }
    }
  ]
}

PluginLoader

#![allow(unused)]
fn main() {
pub struct PluginLoader;  // stateless, all methods are associated functions
}

load_into(registry, dirs):

Scan each directory for subdirectories
For each subdirectory, look for manifest.json
Parse manifest, find executable (try directory name first, then main)
Validate executable permissions (Unix: mode & 0o111 != 0; non-Unix: existence check)
Wrap each tool definition as a PluginTool implementing the Tool trait
Register into ToolRegistry
Log warning: "loaded unverified plugin (no signature check)"
Return total tool count. Failed plugins are skipped with warning, not fatal.

PluginTool — Execution Protocol

#![allow(unused)]
fn main() {
pub struct PluginTool {
    plugin_name: String,
    tool_def: PluginToolDef,
    executable: PathBuf,
}
}

Invocation: executable <tool_name> (tool name passed as first argument).

stdin/stdout protocol:

Spawn executable with tool name as arg, piped stdin/stdout/stderr
Write JSON-serialized arguments to stdin, close (EOF signals end of input)
Wait for exit with 30s timeout (PLUGIN_TIMEOUT)
Parse stdout as JSON:
- Structured: {"output": "...", "success": true/false} → use parsed values
- Fallback: raw stdout + stderr concatenated, success from exit code
Return ToolResult (no file_modified tracking for plugins)

Error handling:

Spawn failure → eyre error with plugin name and executable path
Timeout → eyre error with plugin name, tool name, and duration
JSON parse failure → graceful fallback to raw output

Progress Reporting

The agent emits structured events during execution via a trait-based observer pattern. Consumers (CLI, REST API) implement the trait to render progress in their own format.

ProgressReporter Trait

#![allow(unused)]
fn main() {
pub trait ProgressReporter: Send + Sync {
    fn report(&self, event: ProgressEvent);
}
}

Agent holds reporter: Arc<dyn ProgressReporter>. Events are fired synchronously during the execution loop (non-blocking — implementations must not block).

ProgressEvent Enum

#![allow(unused)]
fn main() {
pub enum ProgressEvent {
    TaskStarted { task_id: String },
    Thinking { iteration: u32 },
    Response { content: String, iteration: u32 },
    ToolStarted { name: String, tool_id: String },
    ToolCompleted { name: String, tool_id: String, success: bool,
                    output_preview: String, duration: Duration },
    FileModified { path: String },
    TokenUsage { input_tokens: u32, output_tokens: u32 },
    TaskCompleted { success: bool, iterations: u32, duration: Duration },
    TaskInterrupted { iterations: u32 },
    MaxIterationsReached { limit: u32 },
    TokenBudgetExceeded { used: u32, limit: u32 },
    StreamChunk { text: String, iteration: u32 },
    StreamDone { iteration: u32 },
    CostUpdate { session_input_tokens: u32, session_output_tokens: u32,
                 response_cost: Option<f64>, session_cost: Option<f64> },
}
}

Implementations (3)

SilentReporter — no-op, used as default when no reporter is configured.

ConsoleReporter — CLI output with ANSI colors and streaming support:

#![allow(unused)]
fn main() {
pub struct ConsoleReporter {
    use_colors: bool,
    verbose: bool,
    stdout: Mutex<BufWriter<Stdout>>,  // buffered for streaming chunks
}
}

Event	Output
Thinking	`\r⟳ Thinking... (iteration N)` (overwrites line, yellow)
Response	`◆ first 3 lines...` (cyan, clears Thinking line)
ToolStarted	`\r⚙ Running tool_name...` (overwrites line, yellow)
ToolCompleted	`✓ tool_name (duration)` green or `✗ tool_name` red; verbose: 5 lines of output + `...`
FileModified	`📝 Modified: path` (green)
TokenUsage	`Tokens: N in, N out` (verbose only, dim)
TaskCompleted	`✓ Completed N iterations, Xs` or `✗ Failed after N iterations`
TaskInterrupted	`⚠ Interrupted after N iterations.` (yellow)
MaxIterationsReached	`⚠ Reached max iterations limit (N).` (yellow)
TokenBudgetExceeded	`⚠ Token budget exceeded (used, limit).` (yellow)
StreamChunk	Write to buffered stdout; flush only on `\n` (reduces syscalls)
StreamDone	Flush + newline
CostUpdate	`Tokens: N in / N out \| Cost: $X.XXXX`
TaskStarted	`▶ Task: id` (verbose only, dim)

Duration formatting: >1s → {:.1}s, ≤1s → {N}ms.

SseBroadcaster (REST API, feature: api) — converts events to JSON and broadcasts via tokio::sync::broadcast channel:

#![allow(unused)]
fn main() {
pub struct SseBroadcaster {
    tx: broadcast::Sender<String>,  // JSON-serialized events
}
}

ProgressEvent	JSON `type` field	Additional fields
ToolStarted	`"tool_start"`	`tool`
ToolCompleted	`"tool_end"`	`tool`, `success`
StreamChunk	`"token"`	`text`
StreamDone	`"stream_end"`	—
CostUpdate	`"cost_update"`	`input_tokens`, `output_tokens`, `session_cost`
Thinking	`"thinking"`	`iteration`
Response	`"response"`	`iteration`
(other)	`"other"`	— (logged at debug level)

Subscribers receive events via SseBroadcaster::subscribe() -> broadcast::Receiver<String>. Send errors (no subscribers) are silently ignored.

Execution Environments (`exec_env.rs`)

ExecEnvironment trait with exec(cmd, args, env), read_file(path), write_file(path, content), file_exists(path), list_dir(path). Two implementations: LocalEnvironment (tokio::process::Command) and DockerEnvironment (docker exec). Environment variables sanitized via shared BLOCKED_ENV_VARS. Docker paths validated against injection characters (\0, \n, \r, :). Docker env vars forwarded via --env flags.

Provider Toolsets (`provider_tools.rs`)

ToolAdjustment (prefer, demote, aliases, extras) per LLM provider. ProviderToolsets registry with with_defaults() for openai/anthropic/google. Used to optimize tool presentation per provider (e.g., OpenAI prefers shell/read_file, demotes diff_edit).

Typed Turns (`turn.rs`)

Turn wraps Message with TurnKind (UserInput, AgentReply, ToolCall, ToolResult, System) and iteration number. turns_to_messages() converts back to Vec<Message> for LLM calls. Enables semantic analysis of conversation history.

Event Bus (`event_bus.rs`)

EventBus with typed EventSubscriber for pub/sub within the agent. Decouples event producers (tool execution, LLM calls) from consumers (logging, metrics, UI updates).

Loop Detection (`loop_detect.rs`)

Detects repetitive agent behavior (e.g., calling the same tool with same args). Configurable threshold and window. Returns early with diagnostic message when loop detected.

Session State (`session.rs`)

SessionState with SessionLimits and SessionUsage tracking. SessionStateHandle for thread-safe access. Tracks token usage, iteration count, and wall-clock time against configured limits.

Steering (`steering.rs`)

SteeringMessage with SteeringSender/SteeringReceiver (mpsc channel). Allows external control of agent behavior mid-conversation (e.g., injecting guidance, changing strategy).

Prompt Layers (`prompt_layer.rs`)

PromptLayerBuilder for composing system prompts from multiple sources (base prompt, persona, user context, memory, skills). Layers are concatenated in order with configurable separators.

octos-bus — Gateway Infrastructure

Message Bus

create_bus() -> (AgentHandle, BusPublisher) linked by mpsc channels (capacity 256). AgentHandle receives InboundMessages; BusPublisher dispatches OutboundMessages.

Queue Modes (configured via gateway.queue_mode):

Followup (default): FIFO — process queued messages one at a time
Collect: Merge queued messages by session, concatenating content before processing

Channel Trait

#![allow(unused)]
fn main() {
#[async_trait]
pub trait Channel: Send + Sync {
    fn name(&self) -> &str;
    async fn start(&self, inbound_tx: mpsc::Sender<InboundMessage>) -> Result<()>;
    async fn send(&self, msg: &OutboundMessage) -> Result<()>;
    fn is_allowed(&self, sender_id: &str) -> bool;
    async fn stop(&self) -> Result<()>;
}
}

Channel Implementations

Channel	Transport	Feature Flag	Auth	Dedup
CLI	stdin/stdout	(always)	N/A	N/A
Telegram	teloxide long-poll	`telegram`	Bot token (env)	teloxide built-in
Discord	serenity gateway	`discord`	Bot token (env)	serenity built-in
DingTalk	custom robot send + outgoing robot webhook	`dingtalk`	Webhook URL + signing secret	msgId
Slack	Socket Mode (tokio-tungstenite)	`slack`	Bot token + App token	message_ts
WhatsApp	WebSocket bridge (ws://localhost:3001)	`whatsapp`	Baileys bridge	HashSet (10K cap, clear on overflow)
Feishu	WebSocket (tokio-tungstenite)	`feishu`	App ID + Secret → tenant token (TTL 6000s)	HashSet (10K cap, clear on overflow)
Email	IMAP poll + SMTP send	`email`	Username/password, rustls TLS	IMAP UNSEEN flag
WeCom	WeCom/WeChat Work API	`wecom`	Corp ID + Agent Secret	message_id
Twilio	Twilio SMS/MMS	`twilio`	Account SID + Auth Token	message SID

Email specifics: IMAP async-imap with rustls for inbound (poll unseen, mark \Seen). SMTP lettre for outbound (port 465=implicit TLS, other=STARTTLS). mailparse for RFC822 body extraction. Body truncated via truncate_utf8(max_body_chars).

Feishu specifics: Tenant access token with TTL cache (6000s). WebSocket gateway URL from /callback/ws/endpoint. Message type detection via header.event_type == "im.message.receive_v1". Supports oc_* (chat_id) vs ou_* (open_id) routing.

Markdown to HTML: markdown_html.rs converts Markdown to Telegram-compatible HTML for rich message formatting.

Media: download_media() helper downloads photos/voice/audio/documents to .octos/media/.

Transcription: Voice/audio auto-transcribed via GroqTranscriber before agent processing.

Message Coalescing

Splits oversized messages into channel-safe chunks:

Channel	Max Chars
Telegram	4000
Discord	1900
Slack	3900

Break preference: paragraph (\n\n) > newline (\n) > sentence (. ) > space ( ) > hard cut.

MAX_CHUNKS = 50 (DoS limit). UTF-8 safe boundary detection via char_indices().

Session Manager

JSONL persistence at .octos/sessions/{key}.jsonl.

In-memory cache: LRU with disk sync on write
Filenames: Percent-encoded SessionKey, truncated to 183 chars with _{hash:016X} suffix on truncation to prevent collisions
File size limit: 10MB max (MAX_SESSION_FILE_SIZE); oversized files skipped on load
Crash safety: Atomic write-then-rename
Forking: fork() creates child session with parent_key tracking, copies last N messages

Cron Service

JSON persistence at .octos/cron.json.

Schedule types:

Every { seconds: u64 } — recurring interval
Cron { expr: String } — cron expression via cron crate
At { timestamp_ms: i64 } — one-shot (auto-delete after run)

CronJob fields: id (8-char hex from UUIDv7), name, enabled, schedule, payload (message + deliver flag + channel + chat_id), state (next_run_at_ms, run_count), delete_after_run.

Heartbeat Service

Periodic check of HEARTBEAT.md (default: 30 min interval). Sends content to agent if non-empty.

octos-cli — CLI & Configuration

Commands

Command	Description
`chat`	Interactive multi-turn chat. Readline with history. Exit: exit/quit/:q
`gateway`	Persistent multi-channel daemon with session management
`init`	Initialize .octos/ with config, templates, directories
`status`	Show config, provider, API keys, bootstrap files
`auth login/logout/status`	OAuth PKCE (OpenAI), device code, paste-token
`cron list/add/remove/enable`	CLI cron job management
`channels status/login`	Channel compilation status, WhatsApp bridge setup
`skills list/install/remove`	Skill management, GitHub fetch
`office`	Office/workspace management
`account`	Account management
`clean`	Remove .redb files with dry-run support
`completions`	Shell completion generation (bash/zsh/fish)
`docs`	Generate tool + provider documentation
`serve`	REST API server (feature: api) — axum on 127.0.0.1:50080 (`--host` to override)

Configuration

Loaded from .octos/config.json (local) or ~/.config/octos/config.json (global). Local takes precedence.

${VAR} expansion: Environment variable substitution in string values
Versioned config: Version field with automatic migrate_config() framework
Provider auto-detect (registry::detect_provider(model)): claude→anthropic, gpt/o1/o3/o4→openai, gemini→gemini, deepseek→deepseek, kimi/moonshot→moonshot, qwen→dashscope, glm→zhipu, llama/mixtral→groq. Patterns defined per-provider in registry/.

API key resolution order: Auth store (~/.octos/auth.json) → environment variable.

Auth Module

OAuth PKCE (OpenAI):

Generate 64-char verifier (two UUIDv4 hex)
SHA-256 challenge, base64-URL encode (no padding)
TCP listener on port 1455
Browser → auth.openai.com with PKCE + state
Callback validates state (CSRF), exchanges code+verifier for tokens

Device Code Flow (OpenAI): POST deviceauth/usercode, poll deviceauth/token every 5s+.

Paste Token: Prompt for API key from stdin, store as auth_method: "paste_token".

AuthStore: ~/.octos/auth.json (mode 0600). {credentials: {provider: AuthCredential}}.

Config Watcher

Polls every 5 seconds. SHA-256 hash comparison of file contents.

Hot-reloadable: system_prompt, max_history (applied live).

Restart-required: provider, model, base_url, api_key_env, sandbox, mcp_servers, hooks, gateway.queue_mode, channels.

REST API (feature: `api`)

Route	Method	Description
`/api/chat`	POST	Send message → response (sync; streaming runs over WS)
`/api/ui-protocol/ws`	WS	JSON-RPC 2.0 UI Protocol v1 (chat stream + `session/list`, `session/messages_page`, `system/status.get`, …)
`/health`	GET	Liveness probe (was `/api/status`; data plane moved to WS `system/status.get` in M12 Phase D-5)
`/metrics`	GET	Prometheus text exposition format (unauthenticated)
`/*` (fallback)	GET	Embedded web UI (static files via rust-embed)

Auth: Optional bearer token with constant-time comparison (API routes only; /metrics and static files are public). CORS: localhost development origins plus the configured base domain. Max message: 1MB.

Web UI: Embedded SPA via rust-embed served as the fallback handler. Session sidebar, chat interface, UI Protocol WebSocket streaming, and dashboard/admin surfaces share the same octos serve process.

Prometheus Metrics: octos_tool_calls_total (counter, labels: tool, success), octos_tool_call_duration_seconds (histogram, label: tool), octos_llm_tokens_total (counter, label: direction). Powered by metrics + metrics-exporter-prometheus crates.

Session Compaction (Gateway)

Triggered when message count > 40 (threshold). Keeps 10 recent messages. Summarizes older messages via LLM to <500 words. Rewrites JSONL session file.

octos-pipeline — DOT-based Pipeline Orchestration

DOT-based pipeline orchestration engine for defining and executing multi-step workflows.

parser.rs — DOT graph parser (parses Graphviz DOT format into pipeline definitions)
graph.rs — PipelineGraph with node/edge types
executor.rs — Async pipeline execution engine
handler.rs — Handler types: CodergenHandler, GateHandler, ShellHandler, NoopHandler, DynamicParallel
condition.rs — Conditional edge evaluation (branching logic)
tool.rs — RunPipelineTool integration (exposes pipeline execution as an agent tool)
validate.rs — Graph validation and lint diagnostics
human_gate.rs — Human-in-the-loop gates with HumanInputProvider trait, ChannelInputProvider (mpsc + oneshot, 5min default timeout), AutoApproveProvider. Input types: Approval, FreeText, Choice
fidelity.rs — FidelityMode enum (Full, Truncate, Compact, Summary) for context carryover control between nodes. Parse from config strings. Safety caps: 10MB max_chars, 100K max_lines
manager.rs — PipelineManager supervisor with SupervisionStrategy (AllOrNothing, BestEffort, RetryFailed). Retry capped at 10 with exponential backoff (100ms-5s). ManagerOutcome converts to NodeOutcome
thread.rs — ThreadRegistry for LLM session reuse across pipeline nodes. Thread stores model_id + message history. Limits: 1000 threads, 10000 messages per thread
server.rs — PipelineServer trait with SubmitRequest (validated: 1MB DOT, 256KB input, 64 variables, safe pipeline IDs), RunStatus lifecycle (Queued → Running → Completed/Failed/Cancelled)
artifact.rs — Pipeline artifact storage for intermediate outputs
checkpoint.rs — Pipeline checkpoint/resume for crash recovery
events.rs — Pipeline event system for progress tracking
run_dir.rs — Per-run working directories with isolation
stylesheet.rs — Visual styling for pipeline graph rendering

Data Flows

Chat Mode

User Input → readline → Agent.process_message(input, history)
                              │
                              ├─ Build messages (system + history + memory + input)
                              ├─ trim_to_context_window() if needed
                              ├─ Call LLM via chat_stream() with tool specs
                              ├─ Execute tools if ToolUse (loop)
                              └─ Return ConversationResponse
                                    │
                              Print response, append to history

Gateway Mode

Channel → InboundMessage → MessageBus → [transcribe audio] → [load session]
                                              │
                                    Agent.process_message()
                                              │
                                        OutboundMessage
                                              │
                                   ChannelManager.dispatch()
                                              │
                                    coalesce() → Channel.send()

System messages (cron, heartbeat, spawn results) flow through the same bus with channel: "system" and metadata routing.

Feature Flags

# octos-bus
telegram = ["teloxide"]
discord  = ["serenity"]
slack    = ["tokio-tungstenite"]
whatsapp = ["tokio-tungstenite"]
feishu   = ["tokio-tungstenite"]
email    = ["async-imap", "tokio-rustls", "rustls", "webpki-roots", "lettre", "mailparse"]

# octos-agent (browser is always compiled in, no longer feature-gated)
git      = ["gix"]                  # git operations via gitoxide
ast      = ["tree-sitter"]          # code_structure.rs AST analysis
admin-bot = [...]                   # admin/ directory tools

# octos-bus (additional)
wecom    = [...]                    # WeCom/WeChat Work channel
twilio   = [...]                    # Twilio SMS/MMS channel

# octos-cli
api      = ["axum", "tower-http", "futures"]
telegram = ["octos-bus/telegram"]
discord  = ["octos-bus/discord"]
slack    = ["octos-bus/slack"]
whatsapp = ["octos-bus/whatsapp"]
feishu   = ["octos-bus/feishu"]
email    = ["octos-bus/email"]
wecom    = ["octos-bus/wecom"]
twilio   = ["octos-bus/twilio"]

File Layout

crates/
├── octos-core/src/
│   ├── lib.rs, task.rs, types.rs, error.rs, gateway.rs, message.rs, utils.rs
├── octos-llm/src/
│   ├── lib.rs, provider.rs, config.rs, types.rs, retry.rs, failover.rs, sse.rs
│   ├── embedding.rs, pricing.rs, context.rs, transcription.rs, vision.rs
│   ├── adaptive.rs, swappable.rs, router.rs, ominix.rs
│   ├── anthropic.rs, openai.rs, gemini.rs, openrouter.rs  (protocol impls)
│   └── registry/ (mod.rs + 14 provider entries: anthropic, openai, gemini,
│                   openrouter, deepseek, groq, moonshot, dashscope, minimax,
│                   zhipu, zai, nvidia, ollama, vllm)
├── octos-memory/src/
│   ├── lib.rs, episode.rs, store.rs, memory_store.rs, hybrid_search.rs
├── octos-agent/src/
│   ├── lib.rs, agent.rs, progress.rs, policy.rs, compaction.rs, sanitize.rs, hooks.rs
│   ├── sandbox.rs, mcp.rs, skills.rs, builtin_skills.rs
│   ├── bundled_app_skills.rs, bootstrap.rs, prompt_guard.rs
│   ├── plugins/ (mod.rs, loader.rs, manifest.rs, tool.rs)
│   ├── skills/ (cron, skill-store, skill-creator SKILL.md)
│   └── tools/ (mod, policy, shell, read_file, write_file, edit_file, diff_edit,
│               list_dir, glob_tool, grep_tool, web_search, web_fetch,
│               message, spawn, browser, ssrf, tool_config,
│               deep_search, site_crawl, recall_memory, save_memory,
│               send_file, take_photo, code_structure, git,
│               deep_research_pipeline, synthesize_research, research_utils,
│               admin/ (profiles, skills, sub_accounts, system,
│                       platform_skills, update))
├── octos-bus/src/
│   ├── lib.rs, bus.rs, channel.rs, session.rs, coalesce.rs, media.rs
│   ├── cli_channel.rs, telegram_channel.rs, discord_channel.rs
│   ├── slack_channel.rs, whatsapp_channel.rs, feishu_channel.rs, email_channel.rs
│   ├── wecom_channel.rs, twilio_channel.rs, markdown_html.rs
│   ├── cron_service.rs, cron_types.rs, heartbeat.rs
└── octos-cli/src/
    ├── main.rs, config.rs, config_watcher.rs, cron_tool.rs, compaction.rs
    ├── auth/ (mod.rs, store.rs, oauth.rs, token.rs)
    ├── api/ (mod.rs, router.rs, handlers.rs, sse.rs, metrics.rs, static_files.rs)
    └── commands/ (mod, chat, init, status, gateway, clean,
                   completions, cron, channels, auth, skills, docs, serve,
                   office, account)
├── octos-pipeline/src/
│   ├── lib.rs, parser.rs, graph.rs, executor.rs, handler.rs
│   ├── condition.rs, tool.rs, validate.rs

Security

Workspace-Level Safety

#![deny(unsafe_code)] — workspace-wide lint via [workspace.lints.rust]
secrecy::SecretString — all provider API keys are wrapped; prevents accidental logging/display

Authentication & Credentials

API keys: auth store (~/.octos/auth.json, mode 0600) checked before env vars
OAuth PKCE with SHA-256 challenges, state parameter (CSRF protection)
Constant-time byte comparison for API bearer tokens (timing attack prevention)

Execution Sandbox

Three backends: bwrap (Linux), sandbox-exec (macOS), Docker — SandboxMode::Auto detection
18 BLOCKED_ENV_VARS shared across all sandbox backends, MCP server spawning, hooks, and browser tool: LD_PRELOAD, LD_LIBRARY_PATH, LD_AUDIT, DYLD_INSERT_LIBRARIES, DYLD_LIBRARY_PATH, DYLD_FRAMEWORK_PATH, DYLD_FALLBACK_LIBRARY_PATH, DYLD_VERSIONED_LIBRARY_PATH, NODE_OPTIONS, PYTHONSTARTUP, PYTHONPATH, PERL5OPT, RUBYOPT, RUBYLIB, JAVA_TOOL_OPTIONS, BASH_ENV, ENV, ZDOTDIR
Path injection prevention per backend (Docker: :, \0, \n, \r; macOS: control chars, (, ), \, ")
Docker: --cap-drop ALL, --security-opt no-new-privileges, --network none, blocked bind mount sources (docker.sock, /proc, /sys, /dev, /etc)

Tool Safety

ShellTool SafePolicy: deny rm -rf /, dd, mkfs, fork bombs, chmod -R 777 /; ask for sudo, rm -rf, git push --force, git reset --hard. Whitespace-normalized before matching. Timeout clamped to [1, 600]s. SIGTERM→grace period→SIGKILL cleanup for child processes.
Tool policies: allow/deny with deny-wins semantics, 8 named groups (group:fs, group:runtime, group:web, group:search, group:sessions, etc.), wildcard matching, provider-specific filtering via tools.byProvider
Tool argument size limit: 1MB per invocation (non-allocating estimate_json_size with escape char accounting)
Symlink-safe file I/O via O_NOFOLLOW on Unix (atomic kernel-level check, eliminates TOCTOU races); metadata-based symlink check fallback on Windows
SSRF protection in shared ssrf.rs module: DNS resolution with fail-closed behavior (blocks on DNS failure), private IP blocking (10/8, 172.16/12, 192.168/16, 169.254/16), IPv6 coverage (ULA fc00::/7, link-local fe80::/10, site-local fec0::/10, IPv4-mapped ::ffff:0:0/96, IPv4-compatible ::/96), loopback blocking. Used by web_fetch and browser.
Browser: URL scheme allowlist (http/https only), 10s JS execution timeout, zombie process reaping, secure tempfiles for screenshots
MCP: input schema validation (max depth 10, max size 64KB) prevents malicious tool definitions
Prompt injection guard (prompt_guard.rs): 5 threat categories (SystemOverride, RoleConfusion, ToolCallInjection, SecretExtraction, InstructionInjection), 10 detection patterns. Sanitizes threats by wrapping in [injection-blocked:...].

Data Safety

Tool output sanitization (sanitize.rs): strips base64 data URIs, long hex strings (64+ chars), and credential redaction with 7 regex patterns covering OpenAI (sk-...), Anthropic (sk-ant-...), AWS (AKIA...), GitHub (ghp_/gho_/ghs_/ghr_/github_pat_...), GitLab (glpat-...), Bearer tokens, and generic password/api_key assignments
UTF-8 safe truncation via truncate_utf8() across all tool outputs and email bodies
Session file collision prevention via percent-encoded filenames with hash suffix on truncation
Session file size limit: 10MB max prevents OOM on corrupted files
Atomic write-then-rename for session persistence (crash safety)
API server binds to 127.0.0.1 by default (not 0.0.0.0)
Channel access control via allowed_senders lists
MCP response limit: 1MB per JSON-RPC line (DoS prevention)
Message coalescing: MAX_CHUNKS=50 DoS limit
API message limit: 1MB per request

Concurrency Model

Why Rust

octos uses Rust with the tokio async runtime, which provides significant advantages over Python (OpenClaw, etc.) and Node.js (NanoCloud, etc.) agent frameworks for concurrent session handling:

True parallelism — Tokio tasks run across all CPU cores simultaneously. Python has the GIL, so even with asyncio, CPU-bound work (JSON parsing, context compaction, token counting) is single-core. Node.js is single-threaded entirely. In octos, 10 concurrent sessions doing context compaction actually execute in parallel across cores.

Memory efficiency — No garbage collector, no runtime overhead per object. Agent sessions are compact structs on the heap. A Python agent session carries interpreter overhead, GC metadata on every object, and dict-based attribute lookup. This matters with hundreds of sessions and large conversation histories in memory.

No GC pauses — Python and Node.js GC can cause latency spikes mid-response. Rust has deterministic deallocation — memory is freed exactly when the owning struct drops.

Single binary deployment — No Python/Node runtime to install, no dependency hell, predictable resource usage. The gateway is one static binary.

Tokio Tasks vs OS Threads

All concurrent session processing uses tokio tasks (green threads), not OS threads. A tokio task is a state machine on the heap (~few KB). An OS thread is ~8MB stack. Thousands of tasks multiplex across a handful of OS threads (defaults to CPU core count). Since agent sessions spend most of their time awaiting I/O (LLM API responses), they yield the thread to other tasks efficiently.

Gateway Concurrency

Inbound messages → main loop
                      │
                      ├─ tokio::spawn() per message
                      │     │
                      │     ├─ Semaphore (max_concurrent_sessions, default 10)
                      │     │     bounds total concurrent agent runs
                      │     │
                      │     └─ Per-session Mutex
                      │           serializes messages within same session
                      │
                      └─ Different sessions run concurrently
                         Same session queues sequentially

Cross-session: concurrent, bounded by max_concurrent_sessions semaphore (default 10)
Within same session: serialized via per-session mutex — prevents race conditions on conversation history
Per-session locks: pruned after completion (Arc strong_count == 1) to prevent unbounded HashMap growth

Tool Execution

Within a single agent iteration, all tool calls from one LLM response execute concurrently via join_all():

LLM response: [web_search, read_file, send_email]
                   │            │           │
                   └────────────┼───────────┘
                          join_all()
                   ┌────────────┼───────────┐
                   │            │           │
                 done         done        done
                          ↓
              All results appended to messages
                          ↓
                    Next LLM call

Sub-Agent Modes (spawn tool)

Aspect	Sync	Background
Parent blocks?	Yes	No (`tokio::spawn()`)
Result delivery	Same conversation turn	New inbound message via gateway
Token accounting	Counted toward parent budget	Independent
Use case	Sequential pipelines	Fire-and-forget long tasks

Sub-agents cannot spawn further sub-agents (spawn tool is always denied in sub-agent policy).

Multi-Tenant Dashboard

The dashboard (octos serve) runs each user profile as a separate gateway OS process:

Dashboard (octos serve)
  ├─ Profile "alice" → octos gateway --config alice.json  (deepseek, own semaphore)
  ├─ Profile "bob"   → octos gateway --config bob.json    (kimi, own semaphore)
  └─ Profile "carol" → octos gateway --config carol.json  (openai, own semaphore)

Each profile has its own LLM provider, API keys, channels, data directory, and max_concurrent_sessions semaphore. Profiles are fully isolated — no shared state between gateway processes.

Testing

1300+ tests across all crates. See TESTING.md for the full inventory and CI guide.

Unit: type serde round-trips, tool arg parsing, config validation, provider detection, tool policies, compaction, coalescing, BM25 scoring, L2 normalization, SSE parsing
Adaptive routing: Off/Hedge/Lane modes, circuit breaker, failover, scoring, metrics, provider racing (19 tests)
Responsiveness: baseline learning, degradation detection, recovery, threshold boundaries (8 tests)
Queue modes: Followup, Collect, Steer, Speculative overflow, auto-escalation/deescalation (9 tests)
Session persistence: JSONL storage, LRU eviction, fork, rewrite, timestamp sort, concurrent access (28 tests)
Integration: CLI commands, file tools, cron jobs, session forking, plugin loading
Security: sandbox path injection, env sanitization, SSRF blocking, symlink rejection (O_NOFOLLOW), private IP detection, dedup overflow, tool argument size limits, session file size limits, circuit breaker threshold edge cases, MCP schema validation
Channel: allowed_senders, message parsing, dedup logic, email address extraction

Local CI: ./scripts/ci.sh (mirrors GitHub Actions + focused subsystem tests). See TESTING.md.

Testing Guide

Quick Start

# Full local CI (mirrors GitHub Actions)
./scripts/ci.sh

# Fast iteration (skip clippy)
./scripts/ci.sh --quick

# Auto-fix formatting
./scripts/ci.sh --fix

# Memory-constrained machines
./scripts/ci.sh --serial

CI Pipeline

scripts/ci.sh runs the same checks as .github/workflows/ci.yml plus focused subsystem tests.

Steps

Step	Command	Flags
1. Format	`cargo fmt --all -- --check`	`--fix` auto-fixes
2. Clippy	`cargo clippy --workspace -- -D warnings`	`--quick` skips
3. Workspace tests	`cargo test --workspace`	`--serial` for single-thread
4. Focused groups	Per-subsystem tests (see below)	Always runs

Focused Test Groups

After the full workspace run, the CI script re-runs critical subsystems individually to surface failures clearly:

Group	Crate	Test Filter	Count	What It Covers
Adaptive routing	`octos-llm`	`adaptive::tests`	19	Off/Hedge/Lane modes, circuit breaker, failover, scoring, metrics, racing
Responsiveness	`octos-llm`	`responsiveness::tests`	8	Baseline learning, degradation detection, recovery, threshold boundaries
Session actor	`octos-cli`	`session_actor::tests`	9	Queue modes, speculative overflow, auto-escalation/deescalation
Session persistence	`octos-bus`	`session::tests`	28	JSONL storage, LRU eviction, fork, rewrite, timestamp sort

Session actor tests always run single-threaded (--test-threads=1) because they spawn full actors with mock providers and can OOM under parallel execution.

Feature Coverage

Adaptive Routing (`crates/octos-llm/src/adaptive.rs` — 19 tests)

Tests the AdaptiveRouter which manages multiple LLM providers with metrics-driven selection.

Off Mode (static priority)

Test	What It Verifies
`test_selects_primary_on_cold_start`	Priority order on first call (no metrics yet)
`test_lane_changing_off_uses_priority_order`	Off mode ignores latency differences
`test_lane_changing_off_skips_circuit_broken`	Off mode still respects circuit breaker
`test_hedged_off_uses_single_provider`	Off mode uses priority, no racing

Hedge Mode (provider racing)

Test	What It Verifies
`test_hedged_racing_picks_faster_provider`	Race 2 providers via `tokio::select!`, faster wins
`test_hedged_racing_survives_one_failure`	Falls back to alternate when primary racer fails
`test_hedge_single_provider_falls_through`	Hedge with 1 provider uses single-provider path

Lane Mode (score-based selection)

Test	What It Verifies
`test_lane_mode_picks_best_by_score`	Switches to faster provider after metrics warm-up

Circuit Breaker and Failover

Test	What It Verifies
`test_circuit_breaker_skips_degraded`	Skips provider after N consecutive failures
`test_failover_on_error`	Falls over to next provider when primary fails
`test_all_providers_fail`	Returns error when every provider fails

Scoring and Metrics

Test	What It Verifies
`test_scoring_cold_start_respects_priority`	Cold-start scores follow config priority
`test_latency_samples_p95`	P95 calculation from circular buffer
`test_metrics_snapshot`	Latency/success/failure recorded correctly
`test_metrics_export_after_calls`	Export includes per-provider metrics

Runtime Controls

Test	What It Verifies
`test_mode_switch_at_runtime`	Off → Hedge → Lane → Off switching
`test_qos_ranking_toggle`	QoS ranking toggle is orthogonal to mode
`test_adaptive_status_reports_correctly`	Status struct reflects current mode/count
`test_empty_router_panics`	Asserts at least 1 provider required

Responsiveness Observer (`crates/octos-llm/src/responsiveness.rs` — 8 tests)

Tests the latency tracker that drives auto-escalation.

Baseline Learning

Test	What It Verifies
`test_baseline_learning`	Baseline established from first 5 samples
`test_sample_count_tracking`	`sample_count()` returns correct value

Degradation Detection

Test	What It Verifies
`test_degradation_detection`	3 consecutive slow requests (> 3x baseline) trigger activation
`test_at_threshold_boundary_not_triggered`	Latency exactly at threshold is not “slow”
`test_no_false_trigger_before_baseline`	No activation before baseline is learned

Recovery and Lifecycle

Test	What It Verifies
`test_recovery_detection`	1 fast request after activation triggers deactivation
`test_multiple_activation_cycles`	Activate → deactivate → reactivate works
`test_window_caps_at_max_size`	Rolling window stays at 20 entries

Queue Modes and Session Actor (`crates/octos-cli/src/session_actor.rs` — 9 tests)

Tests the per-session actor that owns message processing, queue policies, and auto-protection.

Mock infrastructure: DelayedMockProvider — configurable delay + scripted FIFO responses. setup_speculative_actor / setup_actor_with_mode — builds minimal actor with chosen queue mode and optional adaptive router.

Queue Mode: Followup

Test	What It Verifies
`test_queue_mode_followup_sequential`	Each message processed individually — 3 messages produce 3 responses, all appear in session history separately

Queue Mode: Collect

Test	What It Verifies
`test_queue_mode_collect_batches`	Messages queued during a slow LLM call are batched into a single combined prompt (`"msg2\n---\nQueued #1: msg3"`)

Queue Mode: Steer

Test	What It Verifies
`test_queue_mode_steer_keeps_newest`	Older queued messages discarded, only newest processed — discarded message absent from session history

Queue Mode: Speculative

Test	What It Verifies
`test_speculative_overflow_concurrent`	Overflow spawned as full agent task during slow primary (12s > 10s patience); both responses arrive; history sorted by timestamp
`test_speculative_within_patience_drops`	Overflow dropped when primary within patience (5s < 10s); only 1 response arrives
`test_speculative_handles_background_result`	`BackgroundResult` messages handled in the speculative `select!` loop without extra LLM calls

Auto-Escalation / Deescalation

Test	What It Verifies
`test_auto_escalation_on_degradation`	5 fast warmups (baseline 100ms) → 3 slow calls (400ms > 3x) → mode switches to Hedge + Speculative, user gets notification
`test_auto_deescalation_on_recovery`	1 fast response after escalation → mode reverts to Off + Followup, router confirms Off

Utility

Test	What It Verifies
`test_strip_think_tags`	`<think>...</think>` block removal from LLM output

Session Persistence (`crates/octos-bus/src/session.rs` — 28 tests)

Tests JSONL-backed session storage with LRU caching.

CRUD and Persistence

Test	What It Verifies
`test_session_manager_create_and_retrieve`	Create session, add messages, retrieve
`test_session_manager_persistence`	Messages survive manager restart (disk reload)
`test_session_manager_clear`	Clear deletes from memory and disk

History and Ordering

Test	What It Verifies
`test_session_get_history`	Tail-slice returns last N messages
`test_session_get_history_all`	Returns all when fewer than max
`test_sort_by_timestamp_restores_order`	Restores chronological order after concurrent overflow writes

LRU Cache

Test	What It Verifies
`test_eviction_keeps_max_sessions`	Cache respects capacity limit
`test_evicted_session_reloads_from_disk`	Evicted sessions reload on access
`test_with_max_sessions_clamps_zero`	Capacity clamped to minimum 1

Concurrency

Test	What It Verifies
`test_concurrent_sessions`	Multiple sessions don’t interfere
`test_concurrent_session_processing`	10 parallel tasks don’t corrupt sessions

Fork and Rewrite

Test	What It Verifies
`test_fork_creates_child`	Fork copies last N messages with parent link
`test_fork_persists_to_disk`	Forked session survives restart
`test_session_rewrite`	Atomic write-then-rename after mutation

Multi-Session (Topics)

Test	What It Verifies
`test_list_sessions_for_chat`	Lists all topic sessions for a chat
`test_session_topic_persists`	Topic survives restart
`test_update_summary`	Summary update persists
`test_active_session_store`	Active topic switching and go-back
`test_active_session_store_persistence`	Active topic survives restart
`test_validate_topic_name`	Rejects invalid characters and lengths

Filename Encoding

Test	What It Verifies
`test_truncated_session_keys_no_collision`	Long keys with hash suffix don’t collide
`test_decode_filename`	Percent-encoded filenames decode correctly
`test_list_sessions_returns_decoded_keys`	`list_sessions()` returns human-readable keys
`test_short_key_no_hash_suffix`	Short keys don’t get hash suffix

Safety Limits

Test	What It Verifies
`test_load_rejects_oversized_file`	Files over 10 MB refused
`test_append_respects_file_size_limit`	Append skips when file at 10 MB limit
`test_load_rejects_future_schema_version`	Rejects unknown schema versions
`test_purge_stale_sessions`	Deletes sessions older than N days

Known Gaps

Area	Why Not Tested
Interrupt queue mode	Same codepath as Steer — covered by `test_queue_mode_steer_keeps_newest`
Probe/canary requests	Disabled in all tests via `probe_probability: 0.0` for determinism
Streaming (`chat_stream`)	No mock streaming infrastructure; streaming tested manually
Session compaction	Called in actor tests but output not verified (would need LLM mock for summarization)
Live provider integration	Requires API keys; 1 test exists but marked `#[ignore]`
Channel-specific routing	Covered by channel crate tests, not part of this subsystem
⬆️ Earlier task marker	Primary response gets “⬆️ Earlier task completed:” prefix when overflow was served; not directly asserted in tests (would need to inspect outbound content after a slow primary + fast overflow race)
Overflow agent tool execution	`serve_overflow` spawns a full `agent.process_message_tracked()` with tool access; current tests use `DelayedMockProvider` which returns canned responses without tool calls

Running Individual Tests

# Single test
cargo test -p octos-llm --lib adaptive::tests::test_hedged_racing_picks_faster_provider

# One subsystem
cargo test -p octos-llm --lib adaptive::tests

# Session actor (always single-threaded)
cargo test -p octos-cli session_actor::tests -- --test-threads=1

# With output
cargo test -p octos-cli session_actor::tests -- --test-threads=1 --nocapture

GitHub Actions CI

.github/workflows/ci.yml runs on push/PR to main:

cargo fmt --all -- --check
cargo clippy --workspace -- -D warnings
cargo test --workspace

The local scripts/ci.sh is a superset — it runs the same three steps plus focused subsystem groups. If CI passes locally, it passes on GitHub.

Runner: macos-14 (ARM64). Private repo with 2000 free minutes/month (10x multiplier for macOS runners = ~200 effective minutes).

Files

File	What
`scripts/ci.sh`	Local CI script (this document)
`scripts/pre-release.sh`	Full release smoke tests (build, E2E, skill binaries)
`.github/workflows/ci.yml`	GitHub Actions CI
`crates/octos-llm/src/adaptive.rs`	Adaptive router + 19 tests
`crates/octos-llm/src/responsiveness.rs`	Responsiveness observer + 8 tests
`crates/octos-cli/src/session_actor.rs`	Session actor + 9 tests
`crates/octos-bus/src/session.rs`	Session persistence + 28 tests

Keyboard shortcuts

Octos Documentation