Keyboard shortcuts

Press ← or β†’ to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Introduction

🌐 δΈ­ζ–‡ζ–‡ζ‘£

What is Octos?

Octos is an open-source AI agent platform that turns any LLM into a multi-channel, multi-user intelligent assistant. You deploy a single Rust binary, connect your LLM API keys and messaging channels (Telegram, Discord, Slack, WhatsApp, Email, WeChat, and more), and Octos handles everything else – conversation routing, tool execution, memory, provider failover, and multi-tenant isolation.

Think of it as the backend operating system for AI agents. Instead of building a chatbot from scratch for each use case, you configure Octos profiles – each with their own system prompt, model, tools, and channels – and manage them all through a web dashboard or REST API. A small team can run hundreds of specialized AI agents on a single machine.

Octos is built for people who need more than a personal assistant: teams deploying AI for customer support across WhatsApp and Telegram, developers building AI-powered products on top of a REST API, researchers orchestrating multi-step research pipelines with different LLMs at each stage, or families sharing a single AI setup with per-person customization.

Operating Modes

Octos operates in two primary modes:

  • Chat mode (octos chat): Interactive multi-turn conversation with tools, or single-message execution via --message.
  • Gateway mode (octos gateway): Persistent daemon serving multiple messaging channels simultaneously.

Key Concepts

TermDescription
AgentAI that executes tasks using tools
ToolA capability (shell, file ops, search, messaging)
ProviderLLM API service (Anthropic, OpenAI, etc.)
ChannelMessaging platform (CLI, Telegram, Slack, etc.)
SessionConversation history per channel and chat ID
SandboxIsolated execution environment (bwrap, macOS sandbox-exec, Docker)
Tool PolicyAllow/deny rules controlling which tools are available
SkillReusable instruction template (SKILL.md)
BootstrapContext files loaded into system prompt (AGENTS.md, SOUL.md, etc.)

Quick Start

This guide walks you through the essential steps to get Octos running.

1. Initialize Your Workspace

Navigate to your project directory and initialize Octos:

cd your-project
octos init

This creates a .octos/ directory with default configuration, bootstrap files (AGENTS.md, SOUL.md, USER.md), and directories for memory, sessions, and skills.

2. Set Your API Key

Export at least one LLM provider key:

export ANTHROPIC_API_KEY="sk-ant-..."

Add this to your ~/.bashrc or ~/.zshrc for persistence. You can also use octos auth login --provider openai for OAuth-based login.

3. Check Setup

Verify everything is configured correctly:

octos status

This shows your config file location, active provider and model, API key status, and bootstrap file availability.

4. Start Chatting

Launch an interactive multi-turn conversation:

octos chat

Or send a single message and exit:

octos chat --message "Add a hello function to lib.rs"

5. Run the Gateway

To serve multiple messaging channels as a persistent daemon:

octos gateway

This requires a gateway section in your config with at least one channel configured. See the Configuration chapter for details.

6. Launch the Web UI

If you built with the api feature, start the web dashboard:

octos serve

Then open http://localhost:8080 in your browser.

Installation & Deployment

Prerequisites

RequirementVersionNotes
Rust1.85.0+Install via rustup.rs
macOS13+Apple Silicon or Intel
Linuxglibc 2.31+Ubuntu 20.04+, Debian 11+, Fedora 34+
Windows10/11Native build or WSL2

You also need an API key from at least one supported LLM provider.

Optional Dependencies

DependencyUsed ForInstall
Node.jsWhatsApp bridge, PPTX creation skillbrew install node / apt install nodejs
ffmpegMedia/video skillsbrew install ffmpeg / apt install ffmpeg
Chrome/ChromiumBrowser automation toolbrew install --cask chromium
LibreOfficeOffice document conversionbrew install --cask libreoffice
PopplerPDF rendering (pdftoppm)brew install poppler / apt install poppler-utils

Build from Source

git clone https://github.com/octos-org/octos
cd octos

# Basic (CLI, chat, run, gateway with CLI channel)
cargo install --path crates/octos-cli

# With messaging channels
cargo install --path crates/octos-cli --features telegram,discord,slack,whatsapp,feishu,email,wecom

# With browser automation (requires Chrome/Chromium)
cargo install --path crates/octos-cli --features browser

# With web UI and REST API
cargo install --path crates/octos-cli --features api

# Verify
octos --version

Deploy Script

For a streamlined installation, use the deploy script:

# Minimal install (CLI + chat only)
./scripts/local-deploy.sh --minimal

# Full install (all channels + dashboard + app-skills)
./scripts/local-deploy.sh --full

# Custom channels
./scripts/local-deploy.sh --channels telegram,discord,api

Platform-Specific Instructions

macOS

# 1. Install Rust
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
source "$HOME/.cargo/env"

# 2. Install optional deps
brew install node ffmpeg poppler
brew install --cask libreoffice

# 3. Clone and deploy
git clone https://github.com/octos-org/octos.git
cd octos
./scripts/local-deploy.sh --full

# 4. Set API key and run
export ANTHROPIC_API_KEY=sk-ant-...
octos chat

Background service (launchd):

The deploy script creates ~/Library/LaunchAgents/io.octos.octos-serve.plist.

# Start service (survives reboot)
launchctl load ~/Library/LaunchAgents/io.octos.octos-serve.plist

# Stop service
launchctl unload ~/Library/LaunchAgents/io.octos.octos-serve.plist

# View logs
tail -f ~/.octos/serve.log

Linux (Ubuntu/Debian)

# 1. Install system deps
sudo apt update
sudo apt install -y build-essential pkg-config libssl-dev

# 2. Install Rust
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
source "$HOME/.cargo/env"

# 3. Install optional deps
sudo apt install -y nodejs npm ffmpeg poppler-utils

# 4. Clone and deploy
git clone https://github.com/octos-org/octos.git
cd octos
./scripts/local-deploy.sh --full

# 5. Set API key and run
export ANTHROPIC_API_KEY=sk-ant-...
octos chat

Background service (systemd user unit):

The deploy script creates ~/.config/systemd/user/octos-serve.service.

# Start service
systemctl --user start octos-serve

# Enable on boot (requires lingering)
loginctl enable-linger $USER
systemctl --user enable octos-serve

# View logs
journalctl --user -u octos-serve -f

# Stop service
systemctl --user stop octos-serve

Linux (Fedora/RHEL)

# System deps
sudo dnf install -y gcc pkg-config openssl-devel

# Then follow Ubuntu steps from step 2 onward

Windows (Native)

Octos builds and runs natively on Windows. Shell commands are executed via cmd /C.

# 1. Install Rust (download rustup-init.exe from https://rustup.rs)
rustup-init.exe

# 2. Clone and build
git clone https://github.com/octos-org/octos.git
cd octos
cargo install --path crates/octos-cli

# 3. Set API key and run
$env:ANTHROPIC_API_KEY = "sk-ant-..."
octos chat

Windows notes:

  • Sandbox is disabled on Windows (no bubblewrap/sandbox-exec equivalent); shell commands run without isolation. Docker sandbox mode still works if Docker Desktop is installed.
  • API keys are stored via Windows Credential Manager.
  • Process management uses taskkill for cleanup.

Windows (WSL2)

Alternatively, use WSL2 for a Linux environment:

# 1. Install WSL2 (PowerShell as admin)
wsl --install -d Ubuntu

# 2. Open Ubuntu terminal, then follow Linux (Ubuntu) steps above

When running octos serve inside WSL2, the dashboard is accessible from your Windows browser at http://localhost:8080 (WSL2 auto-forwards ports).

Docker

docker compose --profile gateway up -d

Deploy Script Reference

./scripts/local-deploy.sh [OPTIONS]

Options:
  --minimal          CLI + chat only (no channels, no dashboard)
  --full             All channels + dashboard + app-skills
  --channels LIST    Comma-separated: telegram,discord,slack,whatsapp,feishu,email,twilio,wecom
  --no-skills        Skip building app-skills
  --no-service       Skip launchd/systemd service setup
  --uninstall        Remove binaries and service files
  --debug            Build in debug mode (faster compile, larger binary)
  --prefix DIR       Install prefix (default: ~/.cargo/bin)

On Windows, use .\scripts\local-deploy.ps1 (PowerShell) with the same options.

What the script does:

  1. Checks prerequisites (Rust, platform deps)
  2. Builds the octos binary with selected features
  3. Builds app-skill binaries (unless --no-skills)
  4. Signs binaries on macOS (ad-hoc codesign)
  5. Runs octos init if ~/.octos doesn’t exist
  6. Creates background service file (launchd on macOS, systemd on Linux)

Uninstall:

./scripts/local-deploy.sh --uninstall
# Data directory (~/.octos) is NOT removed. Delete manually:
rm -rf ~/.octos

Post-Install Verification

Set API Keys

Set at least one LLM provider key:

# Add to ~/.bashrc, ~/.zshrc, or ~/.profile
export ANTHROPIC_API_KEY=sk-ant-...
# Or
export OPENAI_API_KEY=sk-...
# Or use OAuth login
octos auth login --provider openai

Verify

octos --version              # Check binary
octos status                 # Check config + API keys
octos chat --message "Hello" # Quick test

Upgrading

cd octos
git pull origin main
./scripts/local-deploy.sh --full   # Rebuilds and reinstalls

# If running as a service, restart it:
# macOS:
launchctl unload ~/Library/LaunchAgents/io.octos.octos-serve.plist
launchctl load ~/Library/LaunchAgents/io.octos.octos-serve.plist
# Linux:
systemctl --user restart octos-serve

Troubleshooting

ProblemSolution
octos: command not foundAdd ~/.cargo/bin to PATH: export PATH="$HOME/.cargo/bin:$PATH"
Build fails on LinuxInstall build-essential pkg-config libssl-dev
macOS codesign warningRun: codesign -s - ~/.cargo/bin/octos
Dashboard not accessibleCheck port: octos serve --port 8080, open http://localhost:8080
WSL2 port not forwardedRestart WSL: wsl --shutdown then reopen terminal
Service won’t startCheck logs: tail -f ~/.octos/serve.log or journalctl --user -u octos-serve
API key not foundEnsure env var is set in the service environment, not just your shell

Configuration

Config File Locations

Configuration files are loaded in order (first found wins):

  1. .octos/config.json – project-local configuration
  2. ~/.config/octos/config.json – global configuration

Basic Config

A minimal configuration specifies the LLM provider and model:

{
  "provider": "anthropic",
  "model": "claude-sonnet-4-20250514",
  "api_key_env": "ANTHROPIC_API_KEY"
}

Gateway Config

To run Octos as a multi-channel daemon, add a gateway section:

{
  "provider": "anthropic",
  "model": "claude-sonnet-4-20250514",
  "gateway": {
    "channels": [
      {"type": "cli"},
      {"type": "telegram", "allowed_senders": ["123456789"]},
      {"type": "discord", "settings": {"token_env": "DISCORD_BOT_TOKEN"}},
      {"type": "slack", "settings": {"bot_token_env": "SLACK_BOT_TOKEN", "app_token_env": "SLACK_APP_TOKEN"}},
      {"type": "whatsapp", "settings": {"bridge_url": "ws://localhost:3001"}},
      {"type": "feishu", "settings": {"app_id_env": "FEISHU_APP_ID", "app_secret_env": "FEISHU_APP_SECRET"}}
    ],
    "max_history": 50,
    "system_prompt": "You are a helpful assistant."
  }
}

Environment Variable Expansion

Use ${VAR_NAME} syntax anywhere in config values:

{
  "base_url": "${ANTHROPIC_BASE_URL}",
  "model": "${OCTOS_MODEL}"
}

Full Config Reference

The complete configuration structure with all available fields:

{
  "version": 1,

  // LLM Provider
  "provider": "anthropic",
  "model": "claude-sonnet-4-20250514",
  "base_url": null,
  "api_key_env": null,
  "api_type": null,

  // Fallback chain
  "fallback_models": [
    {
      "provider": "deepseek",
      "model": "deepseek-chat",
      "base_url": null,
      "api_key_env": "DEEPSEEK_API_KEY"
    }
  ],

  // Adaptive routing
  "adaptive_routing": {
    "enabled": false,
    "latency_threshold_ms": 30000,
    "error_rate_threshold": 0.3,
    "probe_probability": 0.1,
    "probe_interval_secs": 60,
    "failure_threshold": 3
  },

  // Gateway
  "gateway": {
    "channels": [{"type": "cli"}],
    "max_history": 50,
    "system_prompt": null,
    "queue_mode": "followup",
    "max_sessions": 1000,
    "max_concurrent_sessions": 10,
    "llm_timeout_secs": null,
    "llm_connect_timeout_secs": null,
    "tool_timeout_secs": null,
    "session_timeout_secs": null,
    "browser_timeout_secs": null
  },

  // Tool policies
  "tool_policy": {"allow": [], "deny": []},
  "tool_policy_by_provider": {},
  "context_filter": [],

  // Sub-providers (for spawn tool)
  "sub_providers": [
    {
      "key": "cheap",
      "provider": "deepseek",
      "model": "deepseek-chat",
      "description": "Fast model for simple tasks"
    }
  ],

  // Agent settings
  "max_iterations": 50,

  // Embedding (for vector search in memory)
  "embedding": {
    "provider": "openai",
    "api_key_env": "OPENAI_API_KEY",
    "base_url": null
  },

  // Voice
  "voice": {
    "auto_asr": true,
    "auto_tts": false,
    "default_voice": "vivian",
    "asr_language": null
  },

  // Hooks
  "hooks": [],

  // MCP servers
  "mcp_servers": [],

  // Sandbox
  "sandbox": {
    "enabled": true,
    "mode": "auto",
    "allow_network": false
  },

  // Email (for email channel)
  "email": null,

  // Dashboard auth (serve mode only)
  "dashboard_auth": null,

  // Monitor (serve mode only)
  "monitor": null
}

Environment Variables

LLM Providers

VariableDescription
ANTHROPIC_API_KEYAnthropic (Claude) API key
OPENAI_API_KEYOpenAI API key
GEMINI_API_KEYGoogle Gemini API key
OPENROUTER_API_KEYOpenRouter API key
DEEPSEEK_API_KEYDeepSeek API key
GROQ_API_KEYGroq API key
MOONSHOT_API_KEYMoonshot/Kimi API key
DASHSCOPE_API_KEYAlibaba DashScope (Qwen) API key
MINIMAX_API_KEYMiniMax API key
ZHIPU_API_KEYZhipu (GLM) API key
ZAI_API_KEYZ.AI API key
NVIDIA_API_KEYNvidia NIM API key
VariableDescription
BRAVE_API_KEYBrave Search API key
PERPLEXITY_API_KEYPerplexity Sonar API key
YDC_API_KEYYou.com API key

Channels

VariableDescription
TELEGRAM_BOT_TOKENTelegram bot token
DISCORD_BOT_TOKENDiscord bot token
SLACK_BOT_TOKENSlack bot token
SLACK_APP_TOKENSlack app-level token
FEISHU_APP_IDFeishu/Lark app ID
FEISHU_APP_SECRETFeishu/Lark app secret
WECOM_CORP_IDWeCom corp ID
WECOM_AGENT_SECRETWeCom agent secret
EMAIL_USERNAMEEmail account username
EMAIL_PASSWORDEmail account password

Email (send-email skill)

VariableDescription
SMTP_HOSTSMTP server hostname
SMTP_PORTSMTP server port
SMTP_USERNAMESMTP username
SMTP_PASSWORDSMTP password
SMTP_FROMSMTP from address
LARK_APP_IDFeishu mail app ID
LARK_APP_SECRETFeishu mail app secret
LARK_FROM_ADDRESSFeishu mail from address

Voice

VariableDescription
OMINIX_API_URLOminiX ASR/TTS API URL

System

VariableDescription
RUST_LOGLog level (error/warn/info/debug/trace)
OCTOS_LOG_JSONEnable JSON-formatted logs (set to any value)

File Layout

~/.octos/                        # Global config directory
β”œβ”€β”€ auth.json                   # Stored API credentials (mode 0600)
β”œβ”€β”€ profiles/                   # Profile configs (serve mode)
β”‚   β”œβ”€β”€ my-bot.json
β”‚   └── work-bot.json
β”œβ”€β”€ skills/                     # Global custom skills
└── serve.log                   # Serve mode log file

.octos/                          # Project/profile data directory
β”œβ”€β”€ config.json                 # Configuration
β”œβ”€β”€ cron.json                   # Scheduled jobs
β”œβ”€β”€ AGENTS.md                   # Agent instructions
β”œβ”€β”€ SOUL.md                     # Personality definition
β”œβ”€β”€ USER.md                     # User information
β”œβ”€β”€ HEARTBEAT.md                # Background tasks
β”œβ”€β”€ sessions/                   # Chat history (JSONL)
β”œβ”€β”€ memory/                     # Memory files
β”‚   β”œβ”€β”€ MEMORY.md               # Long-term
β”‚   └── 2025-02-10.md           # Daily
β”œβ”€β”€ skills/                     # Custom skills
β”œβ”€β”€ episodes.redb               # Episodic memory DB
└── history/
    └── chat_history            # Readline history

LLM Providers & Routing

Octos supports 14 LLM providers out of the box. Each provider needs an API key stored in an environment variable (except local providers like Ollama).

Supported Providers

ProviderEnv VariableDefault ModelAPI FormatAliases
anthropicANTHROPIC_API_KEYclaude-sonnet-4-20250514Native Anthropic–
openaiOPENAI_API_KEYgpt-4oNative OpenAI–
geminiGEMINI_API_KEYgemini-2.0-flashNative Gemini–
openrouterOPENROUTER_API_KEYanthropic/claude-sonnet-4-20250514Native OpenRouter–
deepseekDEEPSEEK_API_KEYdeepseek-chatOpenAI-compatible–
groqGROQ_API_KEYllama-3.3-70b-versatileOpenAI-compatible–
moonshotMOONSHOT_API_KEYkimi-k2.5OpenAI-compatiblekimi
dashscopeDASHSCOPE_API_KEYqwen-maxOpenAI-compatibleqwen
minimaxMINIMAX_API_KEYMiniMax-Text-01OpenAI-compatible–
zhipuZHIPU_API_KEYglm-4-plusOpenAI-compatibleglm
zaiZAI_API_KEYglm-5Anthropic-compatiblez.ai
nvidiaNVIDIA_API_KEYmeta/llama-3.3-70b-instructOpenAI-compatiblenim
ollama(none)llama3.2OpenAI-compatible–
vllmVLLM_API_KEY(must specify)OpenAI-compatible–

Configuration Methods

Config File

Set provider and model in your config.json:

{
  "provider": "moonshot",
  "model": "kimi-2.5",
  "api_key_env": "KIMI_API_KEY"
}

The api_key_env field overrides the default environment variable name for the provider. For example, Moonshot defaults to MOONSHOT_API_KEY, but you can point it at KIMI_API_KEY instead.

CLI Flags

octos chat --provider deepseek --model deepseek-chat
octos chat --model gpt-4o  # auto-detects provider from model name

Auth Store

Instead of environment variables, you can store API keys through the auth CLI:

# OAuth PKCE (OpenAI)
octos auth login --provider openai

# Device code flow (OpenAI)
octos auth login --provider openai --device-code

# Paste-token (all other providers)
octos auth login --provider anthropic
# -> prompts: "Paste your API key:"

# Check stored credentials
octos auth status

# Remove credentials
octos auth logout --provider openai

Credentials are stored in ~/.octos/auth.json (file mode 0600). The auth store is checked before environment variables when resolving API keys.

Auto-Detection

When --provider is omitted, Octos infers the provider from the model name:

Model PatternDetected Provider
claude-*anthropic
gpt-*, o1-*, o3-*, o4-*openai
gemini-*gemini
deepseek-*deepseek
kimi-*, moonshot-*moonshot
qwen-*dashscope
glm-*zhipu
llama-*groq
octos chat --model gpt-4o           # -> openai
octos chat --model claude-sonnet-4-20250514  # -> anthropic
octos chat --model deepseek-chat    # -> deepseek
octos chat --model glm-4-plus       # -> zhipu
octos chat --model qwen-max         # -> dashscope

Custom Endpoints

Use base_url to point at self-hosted or proxy endpoints:

{
  "provider": "openai",
  "model": "gpt-4o",
  "base_url": "https://your-azure-endpoint.openai.azure.com/v1"
}
{
  "provider": "ollama",
  "model": "llama3.2",
  "base_url": "http://localhost:11434/v1"
}
{
  "provider": "vllm",
  "model": "meta-llama/Llama-3-70b",
  "base_url": "http://localhost:8000/v1"
}

API Type Override

The api_type field forces a specific wire format when a provider uses a non-standard protocol:

{
  "provider": "zai",
  "model": "glm-5",
  "api_type": "anthropic"
}
  • "openai" – OpenAI Chat Completions format (default for most providers)
  • "anthropic" – Anthropic Messages format (for Anthropic-compatible proxies)

Fallback Chains

Configure a priority-ordered fallback chain. If the primary provider fails, the next provider in the list is tried automatically:

{
  "provider": "moonshot",
  "model": "kimi-2.5",
  "fallback_models": [
    {
      "provider": "deepseek",
      "model": "deepseek-chat",
      "api_key_env": "DEEPSEEK_API_KEY"
    },
    {
      "provider": "gemini",
      "model": "gemini-2.0-flash",
      "api_key_env": "GEMINI_API_KEY"
    }
  ]
}

Failover rules:

  • 401/403 (authentication errors) – failover immediately, no retry on the same provider
  • 429 (rate limit) / 5xx (server errors) – retry with exponential backoff, then failover
  • 400 (content-format errors) – failover if the error contains β€œmust not be empty”, β€œreasoning_content”, β€œAPI key not valid”, or β€œinvalid_value”
  • Timeouts – failover immediately, no retry (don’t waste 120s Γ— retries on an unresponsive provider)
  • Circuit breaker – 3 consecutive failures marks a provider as degraded

Adaptive Routing

When multiple fallback models are configured, adaptive routing dynamically selects the best provider based on real-time performance metrics instead of following the static priority order. Three mutually exclusive modes are available:

{
  "adaptive_routing": {
    "mode": "hedge",
    "qos_ranking": true,
    "latency_threshold_ms": 30000,
    "error_rate_threshold": 0.3,
    "probe_probability": 0.1,
    "probe_interval_secs": 60,
    "failure_threshold": 3,
    "weight_latency": 0.3,
    "weight_error_rate": 0.3,
    "weight_priority": 0.2,
    "weight_cost": 0.2
  }
}

Adaptive Modes

ModeDescription
off (default)Static priority order. Failover only when a provider is circuit-broken (N consecutive failures). No scoring, no racing.
hedgeHedged racing: fire each request to 2 providers simultaneously, take the winner, cancel the loser. Both results accumulate QoS metrics.
laneScore-based lane changing: dynamically pick the best single provider based on a 4-factor scoring formula. Cheaper than hedge (no duplicate requests).

QoS Ranking

Setting qos_ranking: true enables quality-of-service ranking using a unified model catalog (model_catalog.json). The catalog provides baseline metrics (stability, latency, output quality) that blend with live traffic data via EMA:

  • Cold start: Baseline catalog values are used (10 synthetic samples seeded).
  • Warm state: Live metrics gradually replace baselines (weight ramps from 0 to 1 over 10 calls).
  • Export: Live catalog is exported to model_catalog.json for observability.

Scoring Formula

Each provider is scored on 4 factors (lower score = better). All weights are configurable via adaptive_routing:

FactorWeight keyDefaultDescription
Stabilityweight_error_rate0.3Blended baseline + live error rate. EMA blend: weight ramps from 0β†’1 over 10 calls.
Qualityweight_latency0.360% normalized ds_output quality + 40% normalized throughput (output tokens/sec EMA)
Priorityweight_priority0.2Config-order preference (primary = 0). Normalize to [0, 1].
Costweight_cost0.2Normalized output cost per million tokens. Unknown cost β†’ 0 (no penalty).

Provider Metadata

SettingDefaultDescription
latency_threshold_ms30000Providers with average latency above this are penalized
error_rate_threshold0.3Providers with error rates above 30% are deprioritized
probe_probability0.1Fraction of requests sent to non-primary providers as health probes
probe_interval_secs60Minimum seconds between probes to the same provider
failure_threshold3Consecutive failures before the circuit breaker opens

Hedge Mode Details

When Hedge is active:

  1. The primary provider and the cheapest alternate are raced via tokio::select!.
  2. The winner’s response is returned; the loser is cancelled.
  3. Both completed requests record metrics (cancelled requests do not).
  4. If the primary fails, the alternate is tried sequentially (it was cancelled by the race).

Auto-Escalation

When sustained latency degradation is detected (3 consecutive responses exceeding 3Γ— baseline), the session actor auto-activates Hedge mode + Speculative queue. The ResponsivenessObserver learns a median baseline from the first 5 requests (robust to outliers), then adapts every 20 samples via 80/20 EMA blend with the current window median. When the provider recovers (one normal-latency response), both revert to normal.

Provider Wrappers

The routing stack is composed of layered wrappers:

WrapperPurpose
AdaptiveRouterTop-level: metrics-driven scoring, Hedge/Lane modes, circuit breaker, probe requests
ProviderChainOrdered failover with per-provider circuit breaker (failure count β‰₯ threshold β†’ degraded)
FallbackProviderPrimary + QoS-ranked fallbacks with cooldown tracking via ProviderRouter
RetryProviderExponential backoff on 429/5xx. Timeout β†’ no retry (failover instead)
ProviderRouterSub-agent multi-model routing. Prefix-based key resolution, cooldown, QoS-scored fallbacks
SwappableProviderRuntime model swap via RwLock (e.g. switch_model tool). Leaks ~50 bytes per swap

Gateway & Channels

Octos runs as a gateway that bridges messaging platforms to your LLM agent. Each platform connection is called a channel. You can run multiple channels simultaneously – for example, Telegram and Slack in the same gateway process.

Channel Overview

Channels are configured in the gateway.channels array of your config.json. Each entry specifies a type, optional allowed_senders for access control, and platform-specific settings.

Check which channels are compiled and configured:

octos channels status

This shows a table with each channel’s compile status (feature flags) and config summary (environment variables set or missing).


Telegram

Requires a bot token from @BotFather.

export TELEGRAM_BOT_TOKEN="123456:ABC..."
{
  "type": "telegram",
  "allowed_senders": ["your_user_id"],
  "settings": {
    "token_env": "TELEGRAM_BOT_TOKEN"
  }
}

Telegram supports bot commands, inline keyboards, voice messages, images, and files.


Slack

Requires a Socket Mode app with both a bot token and an app-level token.

export SLACK_BOT_TOKEN="xoxb-..."
export SLACK_APP_TOKEN="xapp-..."
{
  "type": "slack",
  "settings": {
    "bot_token_env": "SLACK_BOT_TOKEN",
    "app_token_env": "SLACK_APP_TOKEN"
  }
}

Discord

Requires a bot token from the Discord Developer Portal.

export DISCORD_BOT_TOKEN="..."
{
  "type": "discord",
  "settings": {
    "token_env": "DISCORD_BOT_TOKEN"
  }
}

WhatsApp

Requires a Node.js bridge (Baileys) running at a WebSocket URL.

{
  "type": "whatsapp",
  "settings": {
    "bridge_url": "ws://localhost:3001"
  }
}

Feishu (China)

Feishu uses WebSocket long-connection mode by default (no public URL needed).

export FEISHU_APP_ID="cli_..."
export FEISHU_APP_SECRET="..."
{
  "type": "feishu",
  "settings": {
    "app_id_env": "FEISHU_APP_ID",
    "app_secret_env": "FEISHU_APP_SECRET"
  }
}

Build with the feishu feature flag:

cargo build --release -p octos-cli --features feishu

Lark (International)

Larksuite (international) does not support WebSocket mode. Use webhook mode instead, where Lark pushes events to your server via HTTP POST.

Lark Cloud --> ngrok --> localhost:9321/webhook/event --> Gateway --> LLM

Developer Console Setup

  1. Go to open.larksuite.com/app and create (or select) an app
  2. Add Bot capability under Features
  3. Configure event subscription:
    • Events & Callbacks > Event Configuration > Edit subscription method
    • Select β€œSend events to developer server”
    • Set request URL to https://YOUR_NGROK_URL/webhook/event
  4. Add event: im.message.receive_v1 (Receive Message)
  5. Enable permissions: im:message, im:message:send_as_bot, im:resource
  6. Publish the app: App Release > Version Management > Create Version > Apply for Online Release

Config

export LARK_APP_ID="cli_..."
export LARK_APP_SECRET="..."
{
  "type": "lark",
  "allowed_senders": [],
  "settings": {
    "app_id_env": "LARK_APP_ID",
    "app_secret_env": "LARK_APP_SECRET",
    "region": "global",
    "mode": "webhook",
    "webhook_port": 9321
  }
}

Settings Reference

SettingDescriptionDefault
app_id_envEnv var name for App IDFEISHU_APP_ID
app_secret_envEnv var name for App SecretFEISHU_APP_SECRET
region"cn" (Feishu) or "global" / "lark" (Larksuite)"cn"
mode"ws" (WebSocket) or "webhook" (HTTP)"ws"
webhook_portPort for webhook HTTP server9321
encrypt_keyEncrypt Key from Lark console (for AES-256-CBC)none
verification_tokenVerification Token from Lark consolenone

Encryption (Optional)

If you configure an Encrypt Key in the Lark console (Events & Callbacks > Encryption Strategy), add it to your config:

{
  "type": "lark",
  "settings": {
    "app_id_env": "LARK_APP_ID",
    "app_secret_env": "LARK_APP_SECRET",
    "region": "global",
    "mode": "webhook",
    "webhook_port": 9321,
    "encrypt_key": "your-encrypt-key-here",
    "verification_token": "your-verification-token"
  }
}

With encryption enabled, Lark sends encrypted POST bodies. The gateway decrypts using AES-256-CBC with SHA-256 key derivation and validates signatures via the X-Lark-Signature header.

Supported Message Types

Inbound: text, images, files (PDF, docs), audio, video, stickers

Outbound: markdown (via interactive cards), image upload, file upload

Running

# Start ngrok tunnel
ngrok http 9321

# Start gateway
LARK_APP_ID="cli_xxxxx" LARK_APP_SECRET="xxxxx" octos gateway --cwd /path/to/workdir

Troubleshooting

IssueSolution
404 on WS endpointLarksuite international does not support WebSocket. Use "mode": "webhook"
Challenge verification failsEnsure ngrok is running and the URL matches the Lark console
No events receivedPublish the app version after adding events. Check Event Log in the console
Bot does not replyVerify im:message:send_as_bot permission is granted
Ngrok URL changedFree ngrok URLs change on restart. Update the request URL in Lark console

Email (IMAP/SMTP)

Polls an IMAP inbox for inbound messages and replies via SMTP. Feature-gated behind email.

export EMAIL_USERNAME="bot@example.com"
export EMAIL_PASSWORD="app-specific-password"
{
  "type": "email",
  "allowed_senders": ["trusted@example.com"],
  "settings": {
    "imap_host": "imap.gmail.com",
    "imap_port": 993,
    "smtp_host": "smtp.gmail.com",
    "smtp_port": 465,
    "username_env": "EMAIL_USERNAME",
    "password_env": "EMAIL_PASSWORD",
    "from_address": "bot@example.com",
    "poll_interval_secs": 30,
    "max_body_chars": 10000
  }
}

WeCom (WeChat Work)

Requires a Custom App with a message callback URL. Feature-gated behind wecom.

export WECOM_CORP_ID="ww..."
export WECOM_AGENT_SECRET="..."
{
  "type": "wecom",
  "settings": {
    "corp_id_env": "WECOM_CORP_ID",
    "agent_secret_env": "WECOM_AGENT_SECRET",
    "agent_id": "1000002",
    "verification_token": "...",
    "encoding_aes_key": "...",
    "webhook_port": 9322
  }
}

WeChat (via WorkBuddy Bridge)

Regular WeChat users can connect to your agent through a WorkBuddy desktop bridge. WorkBuddy handles the WeChat transport; Octos handles the AI logic via its WeCom Bot channel.

WeChat (mobile) --> WorkBuddy (desktop) --> WeCom group robot (WSS) --> octos wecom-bot channel

Setup

  1. Create a WeCom group robot in the WeCom Admin Console under Applications > Group Robot. Note the Bot ID and Secret.

  2. Configure the wecom-bot channel:

export WECOM_BOT_SECRET="your_robot_secret_here"
{
  "type": "wecom-bot",
  "allowed_senders": [],
  "settings": {
    "bot_id": "YOUR_BOT_ID",
    "secret_env": "WECOM_BOT_SECRET"
  }
}
  1. Build and start:
cargo build --release -p octos-cli --features "wecom-bot"
octos gateway
  1. Install the WorkBuddy desktop client, link it to your WeChat via QR scan, and connect it to the same WeCom group robot.

Connection Details

PropertyValue
ProtocolWebSocket (WSS)
Endpointwss://openws.work.weixin.qq.com
HeartbeatPing/pong every 30 seconds
Auto-reconnectYes, exponential backoff (5s–60s)
Max message length4096 characters
Message formatMarkdown

The wecom-bot channel uses an outbound WebSocket connection – no public URL or port forwarding is required. This makes it suitable for servers behind NAT or firewalls.

Limitations

  • Text only – voice and image messages are passed as placeholders
  • No message editing – responses are sent as new messages
  • One direction – WeChat-to-Octos is automatic; for proactive messages, use cron jobs

Session Control Commands

In any gateway channel, the following commands manage conversation sessions:

CommandDescription
/newCreate a new session (forks the last 10 messages from the current conversation)
/new <name>Create a named session
/s <name>Switch to a named session
/sSwitch to the default session
/sessionsList all sessions for this chat
/backSwitch to the previously active session
/deleteDelete the current session

Only one session is active at a time per chat. Messages are routed to the active session. Inactive sessions can still run background tasks (deep search, pipelines, etc.). When an inactive session finishes work, you receive a notification – use /s <name> to view the results.


Voice Transcription

Voice and audio messages from channels are automatically transcribed before being sent to the agent. The system tries local ASR first (via the OminiX engine) and falls back to cloud-based Whisper when local ASR is unavailable. The transcription is prepended as [transcription: ...].

# Local ASR (preferred) -- set automatically by octos serve
export OMINIX_API_URL="http://localhost:8080"

# Cloud fallback
export GROQ_API_KEY="gsk_..."

Voice configuration in config.json:

{
  "voice": {
    "auto_asr": true,
    "auto_tts": true,
    "default_voice": "vivian",
    "asr_language": null
  }
}
  • auto_asr – automatically transcribe incoming voice/audio messages
  • auto_tts – automatically synthesize voice replies when the user sends voice
  • default_voice – voice preset for auto-TTS
  • asr_language – force a specific language for transcription (null = auto-detect)

Access Control

Use allowed_senders to restrict who can interact with the agent. An empty list allows everyone.

{
  "type": "telegram",
  "allowed_senders": ["123456", "789012"]
}

Each channel type uses its own sender identifier format (Telegram user IDs, email addresses, WeCom user IDs, etc.).


Cron Jobs

The agent can schedule recurring tasks that deliver messages through any channel:

octos cron list                          # List active jobs
octos cron list --all                    # Include disabled jobs
octos cron add --name "report" --message "Generate daily report" --cron "0 0 9 * * * *"
octos cron add --name "check" --message "Check status" --every 3600
octos cron add --name "once" --message "Run migration" --at "2025-03-01T09:00:00Z"
octos cron remove <job-id>
octos cron enable <job-id>               # Enable a job
octos cron enable <job-id> --disable     # Disable a job

Jobs support an optional timezone field with IANA timezone names (e.g., "America/New_York", "Asia/Shanghai"). When omitted, UTC is used.


Message Coalescing

Long responses are automatically split into channel-safe chunks:

ChannelMax chars per message
Telegram4000
Discord1900
Slack3900

Split preference: paragraph boundary > newline > sentence end > space > hard cut.


Config Hot-Reload

The gateway detects config file changes automatically:

  • Hot-reloaded (no restart): system prompt, AGENTS.md, SOUL.md, USER.md
  • Restart required: provider, model, API keys, channel settings

Changes are detected via SHA-256 hashing with debounce.

Memory & Skills

Octos has a layered memory system and an extensible skill framework. Memory gives the agent persistent context across sessions. Skills give the agent new tools and capabilities.

Bootstrap Files

These files are loaded into the system prompt at startup. Create them with octos init.

FilePurpose
.octos/AGENTS.mdAgent instructions and guidelines
.octos/SOUL.mdPersonality and values
.octos/USER.mdUser information and preferences
.octos/TOOLS.mdTool-specific guidance
.octos/IDENTITY.mdCustom identity definition

Bootstrap files are hot-reloaded – edit them and the agent picks up changes without a restart.

Memory System

Octos uses a 3-layer memory architecture that combines automatic recording with agent-driven knowledge management:

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                     System Prompt (every turn)                    β”‚
β”‚                                                                   β”‚
β”‚  1. Episodic Memory  ─── top 6 relevant past task experiences    β”‚
β”‚  2. Memory Context   ─── MEMORY.md + recent 7 days daily notes   β”‚
β”‚  3. Entity Bank      ─── one-line abstracts of all known entities β”‚
β”‚                                                                   β”‚
β”‚  Tools: save_memory / recall_memory  (entity bank CRUD)           β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Layer 1: Episodic Memory (automatic)

Every completed task is automatically recorded as an episode in episodes.redb, a persistent embedded database. Each episode stores:

  • Summary β€” LLM-generated, truncated to 500 chars
  • Outcome β€” Success, Failure, Blocked, or Cancelled
  • Files modified β€” list of file paths touched during the task
  • Key decisions β€” notable choices made during execution
  • Working directory β€” scope for directory-scoped retrieval

At the start of each new task, the agent queries the episode store for up to 6 relevant past experiences using:

  • Hybrid search (default when embedding is configured): combines BM25 keyword matching (30% weight) with HNSW vector similarity (70% weight)
  • Keyword search (fallback when no embedder): matches query terms against episode summaries, scoped to the same working directory

Embedding configuration (in config.json):

{
  "embedding": {
    "provider": "openai",
    "api_key_env": "OPENAI_API_KEY",
    "base_url": null
  }
}

When configured, the agent embeds each episode summary in a fire-and-forget background task and stores the vector alongside the episode. At query time, the task instruction is embedded and used for vector search. When omitted, the system falls back to BM25-only keyword matching.

Layer 2: Long-Term Memory & Daily Notes (file-based)

Long-term memory (.octos/memory/MEMORY.md) holds persistent facts and notes that survive across all sessions. Edit this file manually or via the write_file tool β€” it is injected verbatim into the system prompt on every turn.

Daily notes (.octos/memory/YYYY-MM-DD.md) provide a rolling window of recent activity. The last 7 days of daily notes are automatically included in the agent’s context. These files can be created manually or via the write_file tool.

Note: Daily notes are read by the system prompt builder but are not auto-populated. You can populate them manually or instruct the agent to write to them using write_file.

Layer 3: Entity Bank (tool-driven)

The entity bank is a structured knowledge store at .octos/memory/bank/entities/. Each entity is a markdown file containing everything the agent knows about a specific topic.

How it works:

  1. Abstracts in prompt β€” The first non-heading line of each entity becomes a one-line abstract. All abstracts are injected into the system prompt, giving the agent a compact index of everything it knows.
  2. Full pages on demand β€” The agent uses the recall_memory tool to load the full content of a specific entity when it needs more detail.
  3. Agent-managed β€” The agent decides when to create and update entities using the save_memory tool.

Memory tools:

  • save_memory β€” Create or update an entity page. The agent is instructed to first recall_memory for existing content, then merge new information before saving (no data loss).
  • recall_memory β€” Load the full content of a named entity. If the entity doesn’t exist, returns a list of all available entities.

Auto-deferral: When the total tool count exceeds 15, memory tools are moved to the group:memory deferred group. The agent must use activate_tools to enable them before saving or recalling.

File Layout

.octos/
β”œβ”€β”€ config.json              # Configuration (versioned, auto-migrated)
β”œβ”€β”€ cron.json                # Cron job store
β”œβ”€β”€ AGENTS.md                # Agent instructions
β”œβ”€β”€ SOUL.md                  # Personality
β”œβ”€β”€ USER.md                  # User info
β”œβ”€β”€ HEARTBEAT.md             # Background tasks
β”œβ”€β”€ sessions/                # Chat history (JSONL)
β”œβ”€β”€ memory/                  # Memory files
β”‚   β”œβ”€β”€ MEMORY.md            # Long-term memory (manual or write_file)
β”‚   β”œβ”€β”€ 2025-02-10.md        # Daily note (manual or write_file)
β”‚   └── bank/
β”‚       └── entities/        # Entity bank (managed by save/recall tools)
β”‚           β”œβ”€β”€ yuechen.md   # Entity: "who is the user"
β”‚           └── octos.md     # Entity: "what is this project"
β”œβ”€β”€ skills/                  # Custom skills
β”œβ”€β”€ episodes.redb            # Episodic memory DB (auto-populated)
└── history/
    └── chat_history         # Readline history

Built-in System Skills

Octos bundles 3 system skills at compile time:

SkillDescription
cronCron tool usage examples (always-on)
skill-storeSkill installation and management
skill-creatorGuide for creating custom skills

Workspace skills in .octos/skills/ override built-in skills with the same name.

Bundled App Skills

Eight app skills ship as compiled binaries alongside Octos. They are automatically bootstrapped into .octos/skills/ on gateway startup – no installation required.

News Fetch

Tool: news_fetch | Always active: Yes

Fetches headlines and full article content from Google News RSS, Hacker News API, Yahoo News, Substack, and Medium. The agent synthesizes raw data into a formatted digest.

Parameters:

ParameterTypeDefaultDescription
categoriesarrayallNews categories to fetch
language"zh" / "en""zh"Output language

Categories: politics, world, business, technology, science, entertainment, health, sports

Configuration:

/config set news_digest.language en
/config set news_digest.hn_top_stories 50
/config set news_digest.max_deep_fetch_total 30

Tool: deep_search | Timeout: 600 seconds

Multi-round web research tool. Performs iterative searches, parallel page crawling, reference chasing, and generates structured reports saved to ./research/<query-slug>/.

ParameterTypeDefaultDescription
querystring(required)Research topic or question
depth1–32Research depth level
max_results1–108Results per search round
search_enginestringautoperplexity, duckduckgo, brave, you

Depth levels:

  • 1 (Quick): single search round, ~1 minute, up to 10 pages
  • 2 (Standard): 3 search rounds + reference chasing, ~3 minutes, up to 30 pages
  • 3 (Thorough): 5 search rounds + aggressive link chasing, ~5 minutes, up to 50 pages

Deep Crawl

Tool: deep_crawl | Requires: Chrome/Chromium in PATH

Recursively crawls a website using headless Chrome via CDP. Renders JavaScript, follows same-origin links via BFS, extracts clean text.

ParameterTypeDefaultDescription
urlstring(required)Starting URL
max_depth1–103Maximum link-following depth
max_pages1–20050Maximum pages to crawl
path_prefixstringnoneOnly follow links under this path

Output is saved to crawl-<hostname>/ with numbered markdown files.

Configuration:

/config set deep_crawl.page_settle_ms 5000
/config set deep_crawl.max_output_chars 100000

Send Email

Tool: send_email

Sends emails via SMTP or Feishu/Lark Mail API (auto-detected from available environment variables).

ParameterTypeDefaultDescription
tostring(required)Recipient email address
subjectstring(required)Email subject
bodystring(required)Email body (plain text or HTML)
htmlbooleanfalseTreat body as HTML
attachmentsarraynoneFile attachments (SMTP only)

SMTP environment variables:

export SMTP_HOST="smtp.gmail.com"
export SMTP_PORT="465"
export SMTP_USERNAME="your-email@gmail.com"
export SMTP_PASSWORD="your-app-password"
export SMTP_FROM="your-email@gmail.com"

Weather

Tools: get_weather, get_forecast | API: Open-Meteo (free, no key required)

ParameterTypeDefaultDescription
citystring(required)City name in English
days1–167Forecast days (forecast only)

Clock

Tool: get_time

Returns current date, time, day of week, and UTC offset for any IANA timezone.

ParameterTypeDefaultDescription
timezonestringserver localIANA timezone name (e.g., Asia/Shanghai, US/Eastern)

Account Manager

Tool: manage_account

Manages sub-accounts under the current profile. Actions: list, create, update, delete, info, start, stop, restart.


Platform Skills (ASR/TTS)

Platform skills provide on-device voice transcription and synthesis. They require the OminiX backend running on Apple Silicon (M1/M2/M3/M4).

Voice Transcription

Tool: voice_transcribe

ParameterTypeDefaultDescription
audio_pathstring(required)Path to audio file (WAV, OGG, MP3, FLAC, M4A)
languagestring"Chinese""Chinese", "English", "Japanese", "Korean", "Cantonese"

Voice Synthesis

Tool: voice_synthesize

ParameterTypeDefaultDescription
textstring(required)Text to synthesize
output_pathstringautoOutput file path
languagestring"chinese""chinese", "english", "japanese", "korean"
speakerstring"vivian"Voice preset

Available voices: vivian, serena, ryan, aiden, eric, dylan (EN/ZH), uncle_fu (ZH only), ono_anna (JA), sohee (KO)

Voice Cloning

Tool: voice_clone_synthesize

Synthesizes speech using a cloned voice from a 3–10 second reference audio sample.

ParameterTypeDefaultDescription
textstring(required)Text to synthesize
reference_audiostring(required)Path to reference audio
languagestring"chinese"Target language

Podcast Generation

Tool: generate_podcast

Creates multi-speaker podcast audio from a script of {speaker, voice, text} objects.


Custom Skill Installation

Installing from GitHub

# Install all skills from a repo
octos skills install user/repo

# Install a specific skill
octos skills install user/repo/skill-name

# Install from a specific branch
octos skills install user/repo --branch develop

# Force overwrite existing
octos skills install user/repo --force

# Install into a specific profile
octos skills install user/repo --profile my-bot

The installer tries to download a pre-built binary from the skill registry (SHA-256 verified), falls back to cargo build --release if a Cargo.toml is present, or runs npm install if a package.json is present.

Managing Skills

octos skills list                    # List installed skills
octos skills info skill-name         # Show detailed info
octos skills update skill-name       # Update a specific skill
octos skills update all              # Update all skills
octos skills remove skill-name       # Remove a skill
octos skills search "web scraping"   # Search the online registry

Skill Resolution Order

Skills are loaded from these directories (highest priority first):

  1. .octos/plugins/ (legacy)
  2. .octos/skills/ (user-installed custom skills)
  3. .octos/bundled-app-skills/ (bundled app skills)
  4. .octos/platform-skills/ (platform: ASR/TTS)
  5. ~/.octos/plugins/ (global legacy)
  6. ~/.octos/skills/ (global custom)

User-installed skills override bundled skills with the same name.


Skill Authoring

A custom skill lives in .octos/skills/<name>/ and contains:

.octos/skills/my-skill/
β”œβ”€β”€ SKILL.md         # Required: instructions + frontmatter
β”œβ”€β”€ manifest.json    # Required for tool skills: tool definitions
β”œβ”€β”€ main             # Compiled binary (or script)
└── .source          # Auto-generated: tracks install source

SKILL.md Format

---
name: my-skill
version: 1.0.0
author: Your Name
description: A brief description of what this skill does
always: false
requires_bins: curl,jq
requires_env: MY_API_KEY
---

# My Skill Instructions

Instructions for the agent on how and when to use this skill.

## When to Use
- Use this skill when the user asks about...

## Tool Usage
The `my_tool` tool accepts:
- `query` (required): The search query
- `limit` (optional): Maximum results (default: 10)

Frontmatter fields:

FieldDescription
nameSkill identifier (must match directory name)
versionSemantic version
authorSkill author
descriptionShort description
alwaysIf true, included in every system prompt. If false, available on demand.
requires_binsComma-separated binaries checked via which. Skill is unavailable if any are missing.
requires_envComma-separated environment variables. Skill is unavailable if any are unset.

manifest.json Format

For skills that provide executable tools:

{
  "name": "my-skill",
  "version": "1.0.0",
  "description": "My custom skill",
  "tools": [
    {
      "name": "my_tool",
      "description": "Does something useful",
      "timeout_secs": 60,
      "input_schema": {
        "type": "object",
        "properties": {
          "query": {
            "type": "string",
            "description": "The search query"
          },
          "limit": {
            "type": "integer",
            "description": "Maximum results",
            "default": 10
          }
        },
        "required": ["query"]
      }
    }
  ],
  "entrypoint": "main"
}

The tool binary receives JSON input on stdin and must output JSON on stdout:

// Input (stdin)
{"query": "test", "limit": 5}

// Output (stdout)
{"output": "Results here...", "success": true}

Advanced Features

This chapter covers power-user features: tool management, queue modes, lifecycle hooks, sandboxing, session management, and the web dashboard.


Tools & LRU Deferral

Octos manages a large tool catalog by splitting tools into active and deferred sets. Active tools are sent to the LLM as callable tool specifications. Deferred tools are listed by name in the system prompt but not sent as full specs until needed.

How It Works

  • Base tools (never evicted): read_file, write_file, shell, glob, grep, list_dir, run_pipeline, deep_search, and others.
  • Dynamic tools: tools like save_memory, web_search, recall_memory that are activated on demand and evicted when idle.
  • Deferred tools: browser, manage_skills, spawn, configure_tool, switch_model, and others listed by name only.

Eviction Rules

When the active tool count exceeds 15:

  • Tools idle for 5+ agent iterations that are not in the base set become candidates.
  • The stalest tool is moved to the deferred list first.

Re-activation

When the LLM needs a deferred tool, it calls activate_tools({"tools": [...]}). This resolves the tool name to its group and activates the entire group.

Tool Configuration

Tools can be configured at runtime using the /config slash command. Settings persist in {data_dir}/tool_config.json.

ToolSettingTypeDefaultDescription
news_digestlanguage"zh" / "en""zh"Output language for news digests
news_digesthn_top_stories5-10030Hacker News stories to fetch
news_digestmax_rss_items5-10030Items per RSS feed
news_digestmax_deep_fetch_total1-5020Total articles to deep-fetch
news_digestmax_source_chars1000-5000012000Per-source HTML char limit
news_digestmax_article_chars1000-500008000Per-article content limit
deep_crawlpage_settle_ms500-100003000JS render wait time (ms)
deep_crawlmax_output_chars10000-20000050000Output truncation limit
web_searchcount1-105Default number of search results
web_fetchextract_mode"markdown" / "text""markdown"Content extraction format
web_fetchmax_chars1000-20000050000Content size limit
browseraction_timeout_secs30-600300Per-action timeout
browseridle_timeout_secs60-600300Idle session timeout

In-chat config commands:

/config                              # Show all tool settings
/config web_search                   # Show web_search settings
/config set web_search.count 10      # Set default result count to 10
/config set news_digest.language en  # Switch news digests to English
/config reset web_search.count       # Reset to default

Priority order (highest first):

  1. Explicit per-call arguments (tool invocation parameters)
  2. /config overrides (stored in tool_config.json)
  3. Hardcoded defaults

Tool Policies

Tool policies control which tools the agent can use. They can be set globally, per-provider, or per-context.

Global Policy

{
  "tool_policy": {
    "allow": ["group:fs", "group:search", "web_search"],
    "deny": ["shell", "spawn"]
  }
}
  • allow – If non-empty, only these tools are permitted. If empty, all tools are allowed.
  • deny – These tools are always blocked. Deny wins over allow.

Named Groups

GroupExpands To
group:fsread_file, write_file, edit_file, diff_edit
group:runtimeshell
group:webweb_search, web_fetch, browser
group:searchglob, grep, list_dir
group:sessionsspawn

Additional tools not in named groups: send_file, switch_model, run_pipeline, configure_tool, cron, message.

Wildcard Matching

Suffix * matches prefixes:

{
  "tool_policy": {
    "deny": ["web_*"]
  }
}

This denies web_search, web_fetch, etc.

Per-Provider Policies

Different tool sets for different LLM models:

{
  "tool_policy_by_provider": {
    "openai/gpt-4o-mini": {
      "deny": ["shell", "write_file"]
    },
    "gemini": {
      "deny": ["diff_edit"]
    }
  }
}

Queue Modes

Queue modes control how incoming user messages are handled while the agent is busy processing a previous request. Set via /queue <mode> in chat, or queue_mode in profile config.

Followup (default)

Sequential processing. Each message waits its turn.

  • Agent processes A, finishes, processes B, finishes, processes C.
  • Simple and predictable.
  • The user is blocked until the current request completes.

Collect

Batch queued messages into a single combined prompt.

  • Agent processes A. User sends B, then C.
  • When A finishes, B and C are merged into one prompt: B\n---\nQueued #1: C
  • One LLM call for the batch.
  • Good for users who send thoughts in multiple short messages (common in chat apps).

Steer

Keep only the newest queued message, discard older ones.

  • Agent processes A. User sends B, then C.
  • When A finishes, B is discarded; only C is processed.
  • Good when the user corrects or refines their question mid-flight.
  • Example: β€œsearch for X” then β€œactually search for Y” – only Y is processed.

Interrupt

Keep only the newest queued message and cancel the running agent.

  • Agent processes A. User sends B, then C.
  • A is cancelled, B is discarded, C is processed immediately.
  • Fastest response to course-correction.
  • Use when responsiveness matters more than completing the current task.

Note: Currently, Interrupt and Steer share the same drain-and-discard behavior. There is no in-flight agent cancellation β€” the running agent completes before the newest message is processed. True mid-flight cancellation is planned.

Speculative

Spawn concurrent overflow agents for each new message while the primary runs.

  • Agent processes A. User sends B, then C.
  • B and C each get their own concurrent agent task (overflow).
  • All three run in parallel – no blocking.
  • Best for slow LLM providers where users do not want to wait.
  • Overflow agents use a snapshot of conversation history from before the primary started.

How overflow works

  1. Primary agent is spawned for the first message.
  2. While the primary runs, new messages arrive in the inbox.
  3. Each new message triggers serve_overflow(), spawning a full agent task with its own streaming bubble.
  4. Overflow agents use the history snapshot from before the primary to avoid re-answering the primary question.
  5. All agents run concurrently and save results to session history.

Known limitations

  • Interactive prompts break in overflow: If the LLM asks a follow-up question and returns EndTurn, the overflow agent exits. The user’s reply spawns a new overflow with no context of the question.
  • Short replies misrouted: A β€œyes” or β€œ2” intended as a continuation may be treated as an independent new query.

Auto-Escalation

The session actor can auto-escalate from Followup to Speculative when sustained latency degradation is detected:

  • ResponsivenessObserver learns a median baseline from the first 5 requests (robust to outliers), then tracks LLM response times in a 20-sample rolling window. The baseline adapts every 20 samples via 80/20 EMA blend with the current window median, so gradual drift is tracked.
  • If 3 consecutive responses exceed 3Γ— baseline latency, Speculative queue mode and Hedge racing are auto-activated simultaneously.
  • A user notification is sent: β€œDetected slow responses. Enabling hedge racing + speculative queue.”
  • When the provider recovers (one normal-latency response), both revert to Followup and static routing.
  • Auto-escalation also triggers on API channel (web client), which always uses the speculative processing path.

Queue Commands

/queue                  -- show current mode
/queue followup         -- sequential processing
/queue collect          -- batch queued messages
/queue steer            -- keep newest only
/queue interrupt        -- cancel current + keep newest
/queue speculative      -- concurrent overflow agents

Hooks

Hooks are the primary extension point for enforcing LLM policies, recording metrics, and auditing agent behavior – per profile, without modifying core code.

Hooks are shell commands that run at agent lifecycle events. Each hook receives a JSON payload on stdin and communicates its decision via exit code.

Exit Codes

Exit CodeMeaningBefore-eventsAfter-events
0AllowOperation proceedsSuccess logged
1DenyOperation blocked (reason on stdout)Treated as error
2+ErrorLogged, operation proceedsLogged

Events

Four lifecycle events, each with a specific payload:

before_tool_call

Fires before each tool execution. Can deny (exit 1).

{
  "event": "before_tool_call",
  "tool_name": "shell",
  "arguments": {"command": "ls -la"},
  "tool_id": "call_abc123",
  "session_id": "telegram:12345",
  "profile_id": "my-bot"
}

after_tool_call

Fires after each tool execution. Observe-only.

{
  "event": "after_tool_call",
  "tool_name": "shell",
  "tool_id": "call_abc123",
  "result": "file1.txt\nfile2.txt\n...",
  "success": true,
  "duration_ms": 142,
  "session_id": "telegram:12345",
  "profile_id": "my-bot"
}

Note: result is truncated to 500 characters.

before_llm_call

Fires before each LLM API call. Can deny (exit 1).

{
  "event": "before_llm_call",
  "model": "deepseek-chat",
  "message_count": 12,
  "iteration": 3,
  "session_id": "telegram:12345",
  "profile_id": "my-bot"
}

after_llm_call

Fires after each successful LLM response. Observe-only.

{
  "event": "after_llm_call",
  "model": "deepseek-chat",
  "iteration": 3,
  "stop_reason": "EndTurn",
  "has_tool_calls": false,
  "input_tokens": 1200,
  "output_tokens": 350,
  "provider_name": "deepseek",
  "latency_ms": 2340,
  "cumulative_input_tokens": 5600,
  "cumulative_output_tokens": 1800,
  "session_cost": 0.0042,
  "response_cost": 0.0012,
  "session_id": "telegram:12345",
  "profile_id": "my-bot"
}

Hook Configuration

In config.json or per-profile JSON:

{
  "hooks": [
    {
      "event": "before_tool_call",
      "command": ["python3", "~/.octos/hooks/guard.py"],
      "timeout_ms": 3000,
      "tool_filter": ["shell", "write_file"]
    },
    {
      "event": "after_llm_call",
      "command": ["python3", "~/.octos/hooks/cost-tracker.py"],
      "timeout_ms": 5000
    }
  ]
}
FieldRequiredDefaultDescription
eventyes–One of the 4 event types
commandyes–Argv array (no shell interpretation)
timeout_msno5000Kill hook process after this timeout
tool_filternoallOnly trigger for these tool names (tool events only)

Multiple hooks can be registered for the same event. They run sequentially; the first deny wins.

Circuit Breaker

Hooks are auto-disabled after 3 consecutive failures (timeout, crash, or exit code 2+). A successful execution (exit 0 or deny exit 1) resets the counter.

Security

  • Commands use argv arrays – no shell interpretation.
  • 18 dangerous environment variables are removed (LD_PRELOAD, DYLD_*, NODE_OPTIONS, etc.).
  • Tilde expansion is supported (~/ and ~username/).

Per-Profile Hooks

Each profile can define its own hooks via the hooks field in profile config. This allows different policy enforcement per channel or bot. Hook changes require a gateway restart.

Backward Compatibility

  • New fields may be added to payloads.
  • Existing fields will never be removed or renamed.
  • Hook scripts should ignore unknown fields (standard JSON practice).

Example: Cost Budget Enforcer

#!/usr/bin/env python3
"""Deny LLM calls when session cost exceeds $1.00."""
import json, sys

payload = json.load(sys.stdin)
if payload.get("event") == "before_llm_call":
    try:
        with open("/tmp/octos-cost.json") as f:
            state = json.load(f)
    except FileNotFoundError:
        state = {}
    sid = payload.get("session_id", "default")
    if state.get(sid, 0) > 1.0:
        print(f"Session cost exceeded $1.00 (${state[sid]:.4f})")
        sys.exit(1)

elif payload.get("event") == "after_llm_call":
    cost = payload.get("session_cost")
    if cost is not None:
        sid = payload.get("session_id", "default")
        try:
            with open("/tmp/octos-cost.json") as f:
                state = json.load(f)
        except FileNotFoundError:
            state = {}
        state[sid] = cost
        with open("/tmp/octos-cost.json", "w") as f:
            json.dump(state, f)

sys.exit(0)

Example: Audit Logger

#!/usr/bin/env python3
"""Log all tool and LLM calls to a JSONL file."""
import json, sys, datetime

payload = json.load(sys.stdin)
payload["timestamp"] = datetime.datetime.utcnow().isoformat()

with open("/var/log/octos-audit.jsonl", "a") as f:
    f.write(json.dumps(payload) + "\n")

sys.exit(0)

Sandbox

Shell commands run inside a sandbox for isolation. Three backends are supported:

BackendPlatformIsolationNetwork Control
bwrapLinuxRO bind /usr,/lib,/bin,/sbin,/etc; RW bind workdir; tmpfs /tmp; unshare-pid--unshare-net if network denied
macOSmacOSsandbox-exec with SBPL profile: process-exec/fork, file-read*, writes to workdir + /private/tmp(allow network*) or (deny network*)
DockerAny--rm --security-opt no-new-privileges --cap-drop ALL--network none if network denied

Configure in config.json:

{
  "sandbox": {
    "enabled": true,
    "mode": "auto",
    "allow_network": false,
    "docker": {
      "image": "alpine:3.21",
      "mount_mode": "rw",
      "cpu_limit": "1.0",
      "memory_limit": "512m",
      "pids_limit": 100
    }
  }
}
  • Modes: auto (detect best available), bwrap, macos, docker, none.
  • Mount modes: rw (read-write), ro (read-only), none (no workspace mount).
  • Docker resource limits: --cpus, --memory, --pids-limit.
  • Docker bind mount safety: docker.sock, /proc, /sys, /dev, and /etc are blocked as bind mount sources.
  • Path validation: Docker rejects :, \0, \n, \r; macOS rejects control chars, (, ), \, ".
  • Environment sanitization: 18 dangerous environment variables are automatically cleared in all sandbox backends, MCP server spawning, hooks, and the browser tool: LD_PRELOAD, LD_LIBRARY_PATH, LD_AUDIT, DYLD_INSERT_LIBRARIES, DYLD_LIBRARY_PATH, DYLD_FRAMEWORK_PATH, DYLD_FALLBACK_LIBRARY_PATH, DYLD_VERSIONED_LIBRARY_PATH, NODE_OPTIONS, PYTHONSTARTUP, PYTHONPATH, PERL5OPT, RUBYOPT, RUBYLIB, JAVA_TOOL_OPTIONS, BASH_ENV, ENV, ZDOTDIR.
  • Process cleanup: Shell tool sends SIGTERM, waits grace period, then SIGKILL to child processes on timeout.

Session Management

Session Forking

Send /new to create a branched conversation:

/new

This creates a new session that copies the last 10 messages from the current conversation. The child session has a parent_key reference to the original. Each fork gets a unique key namespaced by sender and timestamp.

Session Persistence

Each channel:chat_id pair maintains its own session (conversation history).

  • Storage: JSONL files in .octos/sessions/
  • Max history: Configurable via gateway.max_history (default: 50 messages)
  • Session forking: /new creates a branched conversation with parent_key tracking

Config Hot-Reload

The gateway automatically detects config file changes:

  • Hot-reloaded (no restart): system prompt, AGENTS.md, SOUL.md, USER.md
  • Restart required: provider, model, API keys, gateway channels

Changes are detected via SHA-256 hashing with debounce.

Message Coalescing

Long responses are automatically split into channel-safe chunks before sending:

ChannelMax chars per message
Telegram4000
Discord1900
Slack3900

Split preference: paragraph boundary > newline > sentence end > space > hard cut. Messages exceeding 50 chunks are truncated with a marker.


Context Compaction

When the conversation exceeds the LLM’s context window, older messages are automatically compacted:

  • Tool arguments are stripped (replaced with "[stripped]")
  • Messages are summarized to first lines
  • Recent tool call/result pairs are preserved intact
  • The agent continues seamlessly without losing critical context

In-Chat Commands

Slash Commands

CommandDescription
/newFork the conversation (creates a new session copying the last 10 messages)
/configView and modify tool configuration
/queueView or change queue mode
/exit, /quit, :qExit chat (CLI mode only)

In-Chat Provider Switching

The switch_model tool allows users to list available LLM providers and switch models at runtime through natural conversation. This tool is only available in gateway mode.

List available providers:

User: What models are available?

Bot: Current model: deepseek/deepseek-chat

     Available providers:
       - anthropic (default: claude-sonnet-4-20250514) [ready]
       - openai (default: gpt-4o) [ready]
       - deepseek (default: deepseek-chat) [ready]
       - gemini (default: gemini-2.0-flash) [ready]
       ...

Switch models:

User: Switch to GPT-4o

Bot: Switched to openai/gpt-4o.
     Previous model (deepseek/deepseek-chat) is kept as fallback.

When you switch models, the previous model automatically becomes a fallback:

  • If the new model fails (rate limit, server error), requests automatically fall back to the original model.
  • The fallback uses the circuit breaker (3 consecutive failures triggers failover).
  • The chain is always flat: [new_model, original_model] – repeated switches do not nest.

Model switches are persisted to the profile JSON file. On gateway restart, the bot starts with the last-selected model.

Memory System

The agent maintains long-term memory across sessions:

  • MEMORY.md – Persistent notes, always loaded into context
  • Daily notes – .octos/memory/YYYY-MM-DD.md, auto-created
  • Recent memory – Last 7 days of daily notes included in context
  • Episodes – Task completion summaries stored in episodes.redb

Memory search combines BM25 (keyword) and vector (semantic) scoring:

  • Ranking: vector_weight * vector_score + bm25_weight * bm25_score (defaults: 0.7 / 0.3)
  • Index: HNSW with L2-normalized embeddings
  • Fallback: BM25-only when no embedding provider is configured

Configure an embedding provider to enable vector search:

{
  "embedding": {
    "provider": "openai"
  }
}

The embedding config supports three fields: provider (default: "openai"), api_key_env (optional override), and base_url (optional custom endpoint).

Cron Jobs (Scheduled Tasks)

The agent can schedule recurring tasks using the cron tool:

User: Schedule a daily news digest at 8am Beijing time

Bot: Created cron job "daily-news" running at 8:00 AM Asia/Shanghai every day.
     Expression: 0 0 8 * * * *

Cron jobs can also be managed via CLI:

octos cron list                              # List active jobs
octos cron list --all                        # Include disabled
octos cron add --name "report" --message "Generate daily report" --cron "0 0 9 * * * *"
octos cron add --name "check" --message "Check status" --every 3600
octos cron remove <job-id>
octos cron enable <job-id>
octos cron enable <job-id> --disable

Web Dashboard

The REST API server includes an embedded web UI:

octos serve                              # Binds to 127.0.0.1:8080
octos serve --host 0.0.0.0 --port 3000  # Accept external connections
# Open http://localhost:8080

Features:

  • Session sidebar
  • Chat interface
  • SSE streaming
  • Dark theme

A /metrics endpoint provides Prometheus-format metrics:

  • octos_tool_calls_total
  • octos_tool_call_duration_seconds
  • octos_llm_tokens_total

Operations

This chapter covers day-to-day operational tasks: upgrading, credential management, and service management.


Upgrading

Pull the latest source and rebuild:

cd octos
git pull origin main
./scripts/local-deploy.sh --full   # Rebuilds and reinstalls

If running as a service, restart it after the upgrade:

# macOS (launchd):
launchctl unload ~/Library/LaunchAgents/io.octos.octos-serve.plist
launchctl load ~/Library/LaunchAgents/io.octos.octos-serve.plist

# Linux (systemd):
systemctl --user restart octos-serve

Keychain Integration

Octos supports storing API keys in the macOS Keychain instead of plaintext in profile JSON files. This provides hardware-backed encryption on Apple Silicon and OS-level access control.

Architecture

                     +------------------------------+
  octos auth set-key |     macOS Keychain            |
  -----------------> |  (AES encrypted, per-user)    |
                     |                               |
                     |  service: "octos"             |
                     |  account: "OPENAI_API_KEY"    |
                     |  password: "sk-proj-abc..."   |
                     +---------------+--------------+
                                     | get_password()
  Profile JSON                       |
  +------------------+               v
  | env_vars: {      |   resolve_env_vars()
  |   "OPENAI_API_   |   if "keychain:" ->
  |    KEY":          |   lookup from Keychain
  |    "keychain:"   |   else -> use literal
  | }                |
  +------------------+               |
                                     v
                               Gateway process

Resolution chain: "keychain:" marker in profile config triggers a Keychain lookup (3-second timeout). If the Keychain is unavailable, the key is skipped with a warning.

Backward compatible: Literal values in env_vars pass through unchanged. No migration is required – adopt keychain per-key at your own pace. Mixed plaintext and keychain entries are fully supported.

CLI Commands

# Unlock keychain for SSH sessions (required before set-key via SSH)
octos auth unlock --password <login-password>
octos auth unlock                               # interactive prompt

# Store a key in Keychain + update profile to use keychain marker
octos auth set-key OPENAI_API_KEY sk-proj-abc123
octos auth set-key OPENAI_API_KEY              # interactive prompt

# With specific profile
octos auth set-key GEMINI_API_KEY AIzaSy... -p my-profile

# List all keys and their storage status
octos auth keys
octos auth keys -p my-profile

# Remove from Keychain + clean up profile
octos auth remove-key OPENAI_API_KEY

Keychain Entry Format

  • Service: octos (constant for all entries)
  • Account: The environment variable name (e.g., OPENAI_API_KEY)
  • Password: The actual secret value

Verify with:

security find-generic-password -s octos -a OPENAI_API_KEY -w

SSH and Headless Server Setup

The macOS Keychain is tied to the GUI login session. SSH sessions cannot access a locked keychain – macOS tries to show a dialog, which hangs on a headless server.

Why SSH fails by default: macOS securityd unlocks the keychain per-session. The GUI session’s unlock does not automatically propagate to SSH sessions.

Solution: Unlock the keychain and disable auto-lock. Run once per boot (or add to your deploy script):

ssh user@<host>

# Unlock the keychain (requires login password)
octos auth unlock --password <login-password>

# That's it -- auto-lock is disabled automatically.
# The keychain stays unlocked until reboot.
# Auto-login will re-unlock it on reboot.

Or with raw security commands:

# Unlock
security unlock-keychain -p '<password>' ~/Library/Keychains/login.keychain-db

# Disable auto-lock timer (so it doesn't re-lock after idle)
security set-keychain-settings ~/Library/Keychains/login.keychain-db

Common issues:

SymptomCauseFix
β€œUser interaction is not allowed”Keychain locked (SSH session)octos auth unlock --password <pw>
Keychain lookup timed out (3s)Keychain locked (LaunchAgent)Enable auto-login, reboot
β€œkeychain marker found but no secret”Key never stored or wrong keychainRe-run octos auth set-key after unlock
Gateway hangs at startupKeychain lookup blockingUpdate to latest octos binary

Security Comparison

ThreatPlaintext JSONKeychain
File stolen (backup, git, scp)All keys exposedOnly "keychain:" markers visible
Malware reads diskSimple file read exposes keysMust bypass OS Keychain ACL
Other user on machineFile permissions help, root can readEncrypted per-user
Process memory dumpKeys in env varsKeys only briefly in memory
Accidental log outputProfile JSON leaks keysOnly reference strings logged

Server Deployment Recommendations

The macOS Keychain was designed for interactive desktop use. On headless servers, it introduces reliability issues. Choose your credential storage based on deployment type:

DeploymentRecommended StorageReason
Developer laptopKeychain ("keychain:")GUI session keeps keychain unlocked; ACL prompts are fine
Mac with auto-login + GUIKeychain ("keychain:")Works if ACL dialogs were approved once via screen sharing
Headless Mac (SSH only)Plain text in env_vars or launchd plistMost reliable; no unlock/ACL dependencies
Linux serverPlain text in env varsNo macOS Keychain available

Why Keychain is unreliable on headless servers:

  1. Requires the macOS login password – To unlock the keychain via SSH, you need the user’s login password stored somewhere, reducing the security benefit.
  2. Re-locks on reboot/sleep – The LaunchAgent that starts octos serve runs before GUI login, so the keychain is locked at that point.
  3. Re-locks after idle timeout – Even after unlock, macOS may re-lock. The set-keychain-settings workaround can be reset by macOS updates.
  4. ACL prompts block headless access – If the binary was not the one that originally stored the secret, macOS may pop an unanswerable GUI dialog.
  5. Session isolation – Unlocking from SSH does not unlock for the LaunchAgent session, and vice versa.

Plain text setup for servers:

{
  "env_vars": {
    "OPENAI_API_KEY": "sk-proj-abc123",
    "SMTP_PASSWORD": "xxxx xxxx xxxx xxxx",
    "SMTP_HOST": "smtp.gmail.com",
    "SMTP_PORT": "587",
    "SMTP_USERNAME": "user@gmail.com",
    "SMTP_FROM": "user@gmail.com"
  }
}

Protect the files with filesystem permissions:

chmod 600 ~/.octos/profiles/*.json
chmod 600 ~/Library/LaunchAgents/io.octos.octos-serve.plist

Service Management

macOS (launchd)

Create a LaunchAgent plist to run octos as a persistent service:

# Load the service
launchctl load ~/Library/LaunchAgents/io.octos.octos-serve.plist

# Unload the service
launchctl unload ~/Library/LaunchAgents/io.octos.octos-serve.plist

# Check status
launchctl list | grep octos

If the service needs environment variables (e.g., SMTP credentials), add them to the plist:

<key>EnvironmentVariables</key>
<dict>
    <key>SMTP_PASSWORD</key>
    <string>xxxx xxxx xxxx xxxx</string>
</dict>

Check logs at ~/.octos/serve.log.

Linux (systemd)

Manage the service with systemd user units:

# Start / stop / restart
systemctl --user start octos-serve
systemctl --user stop octos-serve
systemctl --user restart octos-serve

# Enable on boot
systemctl --user enable octos-serve

# Check status and logs
systemctl --user status octos-serve
journalctl --user -u octos-serve

Troubleshooting

This chapter covers common issues organized by category, along with environment variable reference.


API & Provider Issues

API Key Not Set

Error: ANTHROPIC_API_KEY environment variable not set

Fix: Export the key in your shell or verify with octos status:

export ANTHROPIC_API_KEY="your-key"

If running as a service, ensure the environment variable is set in the service environment (launchd plist or systemd unit), not just your interactive shell.

Rate Limited (429)

The retry mechanism handles this automatically (3 attempts with exponential backoff). If the error persists:

  • Try switching to a different provider via /queue or in-chat model switching.
  • Wait for the rate limit window to reset.

Debug Logging

Enable detailed logs to diagnose issues:

RUST_LOG=debug octos chat
RUST_LOG=octos_agent=trace octos chat --message "task"

Build Issues

ProblemSolution
Build fails on LinuxInstall build dependencies: sudo apt install build-essential pkg-config libssl-dev
macOS codesign warningSign the binary: codesign -s - ~/.cargo/bin/octos
octos: command not foundAdd cargo bin to PATH: export PATH="$HOME/.cargo/bin:$PATH"

Channel-Specific Issues

Lark / Feishu

IssueSolution
404 on WebSocket endpointLarksuite international does not support WebSocket mode. Use "mode": "webhook" in your config
Challenge verification failsEnsure your tunnel (e.g., ngrok) is running and the URL matches the one configured in the Lark console
No events receivedPublish the app version after adding events. Check Event Log Retrieval in the console
Bot does not replyCheck that the im:message:send_as_bot permission is granted
Markdown not renderingMessages are sent as interactive cards; Lark supports a subset of markdown
Tunnel URL changedFree tunnel URLs change on restart. Update the request URL in the Lark console

WeCom / WeChat

β€œEnvironment variable WECOM_BOT_SECRET not set”

Set the secret before starting the gateway:

export WECOM_BOT_SECRET="your_secret"

Connection drops or fails to subscribe

  • Verify bot_id and secret are correct.
  • Check network connectivity to wss://openws.work.weixin.qq.com.
  • The channel auto-reconnects up to 100 times with exponential backoff. Check logs for error details.

Messages not arriving

  • Confirm the upstream relay service is running and linked to your account.
  • Check that the WeCom group robot is the same one configured in octos.
  • If using allowed_senders, verify the sender’s WeCom user ID is in the list.
  • Check for duplicate message filtering – the channel deduplicates the last 1000 message IDs.

Long messages are truncated

Messages over 4096 characters are automatically split into multiple chunks by octos. If further truncation occurs, check the relay service’s own message length settings.


Platform-Specific Issues

ProblemSolution
Dashboard not accessibleCheck port: octos serve --port 8080, open http://localhost:8080/admin/
WSL2 port not forwardedRestart WSL: wsl --shutdown then reopen terminal
Service will not startCheck logs: tail -f ~/.octos/serve.log (macOS) or journalctl --user -u octos-serve (Linux)
Windows: octos not foundEnsure %USERPROFILE%\.cargo\bin is in your PATH
Windows: shell commands failCommands run via cmd /C; use Windows-compatible syntax

Environment Variables Reference

VariableDescription
ANTHROPIC_API_KEYAnthropic API key
OPENAI_API_KEYOpenAI API key
GEMINI_API_KEYGemini API key
OPENROUTER_API_KEYOpenRouter API key
DEEPSEEK_API_KEYDeepSeek API key
GROQ_API_KEYGroq API key
MOONSHOT_API_KEYMoonshot API key
DASHSCOPE_API_KEYDashScope API key
MINIMAX_API_KEYMiniMax API key
ZHIPU_API_KEYZhipu API key
ZAI_API_KEYZ.AI API key
NVIDIA_API_KEYNvidia NIM API key
OMINIX_API_URLLocal ASR/TTS API URL
RUST_LOGLog level (error / warn / info / debug / trace)
TELEGRAM_BOT_TOKENTelegram bot token
DISCORD_BOT_TOKENDiscord bot token
SLACK_BOT_TOKENSlack bot token
SLACK_APP_TOKENSlack app-level token
FEISHU_APP_IDFeishu app ID
FEISHU_APP_SECRETFeishu app secret
EMAIL_USERNAMEEmail account username
EMAIL_PASSWORDEmail account password
WECOM_CORP_IDWeCom corp ID
WECOM_AGENT_SECRETWeCom agent secret

CLI Reference

octos chat

Interactive multi-turn conversation with readline history.

octos chat [OPTIONS]

Options:
  -c, --cwd <PATH>         Working directory
      --config <PATH>      Config file path
      --provider <NAME>    LLM provider
      --model <NAME>       Model name
      --base-url <URL>     Custom API endpoint
  -m, --message <MSG>      Single message (non-interactive)
      --max-iterations <N> Max tool iterations per message (default: 50)
  -v, --verbose            Show tool outputs
      --no-retry           Disable retry

Features:

  • Arrow keys and line editing (rustyline)
  • Persistent history at .octos/history/chat_history
  • Exit: /exit, /quit, exit, quit, :q, Ctrl+C, Ctrl+D
  • Full tool access (shell, files, search, web)

Examples:

octos chat                              # Interactive (default)
octos chat --provider deepseek          # Use DeepSeek
octos chat --model glm-4-plus           # Auto-detects Zhipu
octos chat --message "Fix auth bug"     # Single message, exit

octos gateway

Run as a persistent multi-channel daemon.

octos gateway [OPTIONS]

Options:
  -c, --cwd <PATH>         Working directory
      --config <PATH>      Config file path
      --provider <NAME>    Override provider
      --model <NAME>       Override model
      --base-url <URL>     Override API endpoint
  -v, --verbose            Verbose logging
      --no-retry           Disable retry

Requires a gateway section in config with a channels array. Runs continuously until Ctrl+C.


octos init

Initialize workspace with config and bootstrap files.

octos init [OPTIONS]

Options:
  -c, --cwd <PATH>    Working directory
      --defaults       Skip prompts, use defaults

Creates:

  • .octos/config.json – Provider/model config
  • .octos/.gitignore – Ignores state files
  • .octos/AGENTS.md – Agent instructions template
  • .octos/SOUL.md – Personality template
  • .octos/USER.md – User info template
  • .octos/memory/ – Memory storage directory
  • .octos/sessions/ – Session history directory
  • .octos/skills/ – Custom skills directory

octos status

Show system status.

octos status [OPTIONS]

Options:
  -c, --cwd <PATH>    Working directory

Example output:

octos Status
══════════════════════════════════════════════════

Config:    .octos/config.json (found)
Workspace: .octos/            (found)
Provider:  anthropic
Model:     claude-sonnet-4-20250514

API Keys
──────────────────────────────────────────────────
  Anthropic    ANTHROPIC_API_KEY         set
  OpenAI       OPENAI_API_KEY           not set
  ...

Bootstrap Files
──────────────────────────────────────────────────
  AGENTS.md        found
  SOUL.md          found
  USER.md          found
  TOOLS.md         missing
  IDENTITY.md      missing

octos serve

Launch the web UI and REST API server. Requires the api feature flag.

cargo install --path crates/octos-cli --features api
octos serve                              # Binds to 127.0.0.1:8080
octos serve --host 0.0.0.0 --port 3000  # Accept external connections

Features: session sidebar, chat interface, SSE streaming, dark theme. A /metrics endpoint provides Prometheus-format metrics (octos_tool_calls_total, octos_tool_call_duration_seconds, octos_llm_tokens_total).


octos clean

Clean database and state files.

octos clean [--all] [--dry-run]
FlagDescription
--allRemove all state files
--dry-runShow what would be removed without deleting

octos completions

Generate shell completions.

octos completions <shell>

Supported shells: bash, zsh, fish, powershell.


octos cron

Manage scheduled jobs.

octos cron list [--all]                  # List active jobs (--all includes disabled)
octos cron add [OPTIONS]                 # Add a cron job
octos cron remove <job-id>               # Remove a cron job
octos cron enable <job-id>               # Enable a cron job
octos cron enable <job-id> --disable     # Disable a cron job

Adding jobs:

octos cron add --name "report" --message "Generate daily report" --cron "0 0 9 * * * *"
octos cron add --name "check" --message "Check status" --every 3600
octos cron add --name "once" --message "Run migration" --at "2025-03-01T09:00:00Z"

Cron expressions use standard syntax. Jobs support an optional timezone field with IANA timezone names (e.g., "America/New_York", "Asia/Shanghai"). When omitted, UTC is used.


octos channels

Manage messaging channels.

octos channels status    # Show channel compile/config status
octos channels login     # WhatsApp QR code login

The status command shows a table with channel name, compile status (feature flags), and config summary (env vars set/missing).


octos office

Office file manipulation (DOCX/PPTX/XLSX). Native Rust implementation with no external dependencies for basic operations.

octos office extract <file>               # Extract text as Markdown
octos office unpack <file> <output-dir>   # Unpack into pretty-printed XML
octos office pack <input-dir> <output>    # Pack directory into Office file
octos office clean <dir>                  # Remove orphaned files from unpacked PPTX

octos account

Manage sub-accounts under profiles. Sub-accounts inherit LLM provider config but have their own data directory (memory, sessions, skills) and channels.

octos account list --profile <id>                         # List sub-accounts
octos account create --profile <id> <name> [OPTIONS]      # Create sub-account
octos account update <id> [OPTIONS]                       # Update sub-account

octos auth

OAuth login and API key management.

octos auth login --provider openai           # PKCE browser OAuth
octos auth login --provider openai --device-code  # Device code flow
octos auth login --provider anthropic        # Paste-token (stdin)
octos auth logout --provider openai          # Remove stored credential
octos auth status                            # Show authenticated providers

Credentials are stored in ~/.octos/auth.json (file mode 0600). The auth store is checked before environment variables when resolving API keys.


octos skills

Manage skills.

octos skills list                            # List installed skills
octos skills install user/repo/skill-name    # Install from GitHub
octos skills remove skill-name               # Remove a skill

Fetches SKILL.md from the GitHub repo’s main branch and installs to .octos/skills/.

Skill Development

This guide covers the full lifecycle of an Octos skill β€” from development to publication to end-user installation β€” similar to building an app, submitting it to an app store, and distributing it to users.


The Skill Ecosystem

 Developer                    Octos Hub                     User
 ─────────                    ─────────                     ────
 1. Develop skill        ──▢  3. Publish to registry   ──▢  5. Search & discover
 2. Test locally              4. Pre-built binaries         6. Install
                                                            7. Update
ConceptApp Store AnalogyOctos Equivalent
AppiOS/Android appSkill (binary + manifest + docs)
SDKXcode / Android StudioRust + manifest.json + SKILL.md
App StoreApple App Storeoctos-hub registry
DistributionApp Store binary deliveryPre-built binaries in GitHub Releases
InstallTap β€œGet”octos skills install user/repo
SideloadAd-hoc / TestFlightCopy to ~/.octos/skills/ directly

Part 1: Develop

Architecture

A skill is a standalone executable that communicates via stdin/stdout JSON. The gateway spawns it as a child process for each tool call. Skills can be written in any language β€” Rust, Python, Node.js, shell, etc.

User message β†’ LLM β†’ tool_use("get_weather", {"city": "Paris"})
                        ↓
             Gateway spawns: ~/.octos/skills/weather/main get_weather
                        ↓
             Stdin:  {"city": "Paris"}
             Stdout: {"output": "25Β°C, sunny", "success": true}
                        ↓
             LLM sees result β†’ generates response

Skill Anatomy

Every skill is a directory with three files:

my-skill/
β”œβ”€β”€ manifest.json       # Tool definitions (JSON Schema) β€” the "API contract"
β”œβ”€β”€ SKILL.md            # Documentation + metadata β€” the "app description"
β”œβ”€β”€ main                # Executable binary β€” the "app binary"
└── (optional extras)
    β”œβ”€β”€ styles/         # Bundled assets
    β”œβ”€β”€ prompts/*.md    # System prompt fragments
    └── hooks/          # Lifecycle hook scripts

Step 1: Create manifest.json

The manifest declares what tools the skill provides. The LLM reads this to decide when and how to call your skill.

{
  "name": "my-skill",
  "version": "1.0.0",
  "author": "your-name",
  "description": "What this skill does",
  "timeout_secs": 15,
  "requires_network": false,
  "tools": [
    {
      "name": "my_tool",
      "description": "Clear description for the LLM. What does this tool do? When should it be used?",
      "input_schema": {
        "type": "object",
        "properties": {
          "param1": {
            "type": "string",
            "description": "What this parameter means"
          },
          "param2": {
            "type": "integer",
            "description": "Optional numeric parameter (default: 10)"
          }
        },
        "required": ["param1"]
      }
    }
  ]
}

Manifest fields:

FieldRequiredDefaultDescription
nameYesβ€”Skill identifier
versionYesβ€”Semantic version
authorNoβ€”Author name
descriptionNoβ€”Human-readable description
timeout_secsNo30Max execution time per tool call (1-600)
requires_networkNofalseInformational flag
sha256Noβ€”Binary integrity check (hex hash)
toolsNo[]Array of tool definitions
mcp_serversNo[]MCP server declarations
hooksNo[]Lifecycle hook definitions
promptsNoβ€”Prompt fragment config
binariesNo{}Pre-built binaries by {os}-{arch}

Step 2: Create SKILL.md

Documentation with YAML frontmatter. The LLM reads this to understand context and trigger conditions.

---
name: my-skill
description: Short description. Triggers: keyword1, keyword2, trigger phrase.
version: 1.0.0
author: your-name
always: false
---

# My Skill

Detailed description of what this skill does and when to use it.

## Tools

### my_tool

Explain what this tool does with examples.

**Parameters:**
- `param1` (required): What it means
- `param2` (optional): What it controls. Default: 10

Frontmatter fields:

FieldRequiredDefaultDescription
nameYesβ€”Skill identifier
descriptionYesβ€”One-line description with trigger keywords
versionYesβ€”Semantic version
authorNoβ€”Author name
alwaysNofalseIf true, always included in system prompt
requires_binsNoβ€”Comma-separated binaries that must exist
requires_envNoβ€”Comma-separated env vars that must be set

Step 3: Implement the Binary

The binary implements the stdin/stdout JSON protocol.

Protocol:

  1. argv[1] = tool name (e.g., get_weather)
  2. stdin = JSON object matching the tool’s input_schema
  3. stdout = JSON with output (string) and success (bool)
  4. exit code = 0 for success, non-zero for failure
  5. stderr = ignored (use for debug logging)

Rust template:

use std::io::Read;
use serde::Deserialize;
use serde_json::json;

#[derive(Deserialize)]
struct MyToolInput {
    param1: String,
    #[serde(default = "default_param2")]
    param2: i32,
}

fn default_param2() -> i32 { 10 }

fn main() {
    let args: Vec<String> = std::env::args().collect();
    let tool_name = args.get(1).map(|s| s.as_str()).unwrap_or("unknown");

    let mut buf = String::new();
    if let Err(e) = std::io::stdin().read_to_string(&mut buf) {
        fail(&format!("Failed to read stdin: {e}"));
    }

    match tool_name {
        "my_tool" => handle_my_tool(&buf),
        _ => fail(&format!("Unknown tool '{tool_name}'")),
    }
}

fn fail(msg: &str) -> ! {
    println!("{}", json!({"output": msg, "success": false}));
    std::process::exit(1);
}

fn handle_my_tool(input_json: &str) {
    let input: MyToolInput = match serde_json::from_str(input_json) {
        Ok(v) => v,
        Err(e) => fail(&format!("Invalid input: {e}")),
    };

    let result = format!("Processed {} with param2={}", input.param1, input.param2);
    println!("{}", json!({"output": result, "success": true}));
}

Python template:

#!/usr/bin/env python3
import sys, json

def main():
    tool_name = sys.argv[1] if len(sys.argv) > 1 else "unknown"
    input_data = json.loads(sys.stdin.read())

    if tool_name == "my_tool":
        result = f"Processed {input_data['param1']}"
        print(json.dumps({"output": result, "success": True}))
    else:
        print(json.dumps({"output": f"Unknown tool: {tool_name}", "success": False}))
        sys.exit(1)

if __name__ == "__main__":
    main()

Shell template:

#!/bin/sh
TOOL="$1"
INPUT=$(cat)

if [ "$TOOL" = "my_tool" ]; then
    PARAM1=$(echo "$INPUT" | python3 -c "import sys,json; print(json.load(sys.stdin)['param1'])")
    printf '{"output": "Processed %s", "success": true}\n' "$PARAM1"
else
    printf '{"output": "Unknown tool: %s", "success": false}\n' "$TOOL"
    exit 1
fi

Step 4: For Bundled Skills (Rust Crate)

If contributing a skill to the core Octos distribution:

mkdir -p crates/app-skills/my-skill/src

Cargo.toml:

[package]
name = "my-skill"
version = "1.0.0"
edition = "2021"

[[bin]]
name = "my_skill"
path = "src/main.rs"

[dependencies]
serde = { version = "1", features = ["derive"] }
serde_json = "1"

Add to workspace Cargo.toml:

members = [
    # ...
    "crates/app-skills/my-skill",
]

Register in crates/octos-agent/src/bundled_app_skills.rs:

#![allow(unused)]
fn main() {
pub const BUNDLED_APP_SKILLS: &[(&str, &str, &str, &str)] = &[
    // ...
    (
        "my-skill",                                          // dir_name
        "my_skill",                                          // binary_name
        include_str!("../../app-skills/my-skill/SKILL.md"),
        include_str!("../../app-skills/my-skill/manifest.json"),
    ),
];
}

Part 2: Test

Standalone Testing

Test your skill binary directly without the gateway:

# Build (Rust)
cargo build -p my-skill

# Test a tool call
echo '{"param1": "hello", "param2": 5}' | ./target/debug/my_skill my_tool
# Expected: {"output":"Processed hello with param2=5","success":true}

# Test error handling
echo '{}' | ./target/debug/my_skill my_tool
echo '{"param1": "test"}' | ./target/debug/my_skill unknown_tool

For non-Rust skills, make the binary executable and test the same way:

chmod +x my-skill/main
echo '{"param1": "hello"}' | ./my-skill/main my_tool

Gateway Integration Testing

# Build everything
cargo build --release --workspace

# Start the gateway
octos gateway

# Verify skill loaded
ls ~/.octos/skills/my-skill/
# main  manifest.json  SKILL.md

# Ask the agent to use your skill in conversation
Skill TypeTimeout
Local computation5s
Single API call15s
Multi-step API calls30-60s
Long-running research300-600s

Part 3: Publish

Publishing makes your skill discoverable to all Octos users β€” like submitting an app to the App Store.

Push to GitHub

Organize your repository. A repo can contain a single skill or multiple skills:

Single-skill repo:

my-skill/                    ← repo root
β”œβ”€β”€ manifest.json
β”œβ”€β”€ SKILL.md
β”œβ”€β”€ Cargo.toml               (or package.json, requirements.txt, etc.)
└── src/main.rs

Multi-skill repo:

my-skills/                   ← repo root
β”œβ”€β”€ skill-a/
β”‚   β”œβ”€β”€ manifest.json
β”‚   β”œβ”€β”€ SKILL.md
β”‚   └── src/main.rs
β”œβ”€β”€ skill-b/
β”‚   β”œβ”€β”€ manifest.json
β”‚   β”œβ”€β”€ SKILL.md
β”‚   └── main.py
└── shared/                  ← shared dependencies (auto-detected)
    └── utils.py

Submit to the Registry

The octos-hub registry is the central catalog for discoverable skills. Submit a PR to add your entry to registry.json:

{
  "name": "my-skills",
  "description": "What your skills do",
  "repo": "your-user/your-repo",
  "version": "1.0.0",
  "author": "your-name",
  "license": "MIT",
  "skills": ["skill-a", "skill-b"],
  "requires": ["git", "cargo"],
  "provides_tools": true,
  "tags": ["keyword1", "keyword2"]
}

Registry entry fields:

FieldRequiredDescription
nameYesPackage name (can differ from repo name)
descriptionYesSearchable description
repoYesGitHub user/repo or full URL
versionNoLatest version
authorNoAuthor name
licenseNoLicense identifier (MIT, Apache-2.0, etc.)
skillsNoIndividual skill names in the package
requiresNoExternal dependencies (e.g., ["git", "cargo"])
provides_toolsNoWhether skills have manifest.json with tools
tagsNoSearchable tags
binariesNoPre-built binaries (see Distribution below)

Once the PR is merged, users can discover your skill:

octos skills search keyword1

Part 4: Distribute

Pre-built binaries let users install instantly without compiling β€” like downloading an app binary from the store.

Add Binaries to manifest.json

In your skill’s manifest.json, add a binaries section keyed by {os}-{arch}:

{
  "name": "my-skill",
  "version": "1.0.0",
  "binaries": {
    "darwin-aarch64": {
      "url": "https://github.com/you/repo/releases/download/v1.0.0/my-skill-darwin-aarch64.tar.gz",
      "sha256": "abc123..."
    },
    "darwin-x86_64": {
      "url": "https://github.com/you/repo/releases/download/v1.0.0/my-skill-darwin-x86_64.tar.gz",
      "sha256": "def456..."
    },
    "linux-x86_64": {
      "url": "https://github.com/you/repo/releases/download/v1.0.0/my-skill-linux-x86_64.tar.gz",
      "sha256": "789ghi..."
    }
  },
  "tools": [ ... ]
}

Automate with GitHub Actions

Set up CI to build and publish binaries on each release tag:

name: Release Skill
on:
  push:
    tags: ["v*"]

jobs:
  build:
    strategy:
      matrix:
        include:
          - os: macos-latest
            target: aarch64-apple-darwin
            platform: darwin-aarch64
          - os: ubuntu-latest
            target: x86_64-unknown-linux-gnu
            platform: linux-x86_64

    runs-on: ${{ matrix.os }}
    steps:
      - uses: actions/checkout@v5
      - uses: actions-rust-lang/setup-rust-toolchain@v1

      - run: cargo build --release --target ${{ matrix.target }}

      - name: Package
        run: |
          mkdir dist
          cp target/${{ matrix.target }}/release/my_skill dist/main
          cd dist && tar czf my-skill-${{ matrix.platform }}.tar.gz main
          shasum -a 256 my-skill-${{ matrix.platform }}.tar.gz

      - uses: softprops/action-gh-release@v2
        with:
          files: dist/my-skill-*.tar.gz

Install Resolution Order

When a user runs octos skills install, the installer tries these sources in order:

  1. manifest.json binaries β€” skill author’s own CI/CD builds
  2. Registry binaries β€” registry-audited pre-built binaries
  3. cargo build --release β€” fallback: compile from source (if Cargo.toml exists)
  4. npm install β€” fallback: install Node.js dependencies (if package.json exists)

Pre-built binaries are verified with SHA-256 before installation.


Part 5: Install

For Users: Search and Install

# Search the registry
octos skills search weather
octos skills search "deep research"

# Install from GitHub (all skills in repo)
octos skills install user/repo

# Install a specific skill from a multi-skill repo
octos skills install user/repo/skill-name

# Install with a specific branch
octos skills install user/repo --branch dev

# Force reinstall
octos skills install user/repo --force

Per-Profile Installation

Skills are isolated per profile (like per-user app installs):

# Install to a specific profile
octos skills --profile alice install user/repo/my-skill

# List skills for a profile
octos skills --profile alice list

# Remove from a profile
octos skills --profile alice remove my-skill

In-Chat Installation

Users can manage skills from within a conversation:

/skills install user/repo/my-skill
/skills list
/skills remove my-skill
/skills search comic

Admin API

Programmatic skill management via REST:

# Install
POST /api/admin/profiles/alice/skills     {"repo": "user/repo/my-skill"}

# List
GET  /api/admin/profiles/alice/skills

# Remove
DELETE /api/admin/profiles/alice/skills/my-skill

Sideloading (Manual Install)

Copy a skill directory directly β€” like sideloading an app:

# Copy to global skills directory
cp -r my-skill/ ~/.octos/skills/my-skill/
chmod +x ~/.octos/skills/my-skill/main

# Or to a profile-specific directory
cp -r my-skill/ ~/.octos/profiles/alice/data/skills/my-skill/

Installed Skill Layout

~/.octos/skills/my-skill/
β”œβ”€β”€ main                # Executable binary
β”œβ”€β”€ manifest.json       # Tool definitions
β”œβ”€β”€ SKILL.md            # Documentation
β”œβ”€β”€ .source             # Install tracking (repo, branch, date)
└── styles/             # Bundled assets (if any)

The .source file tracks where the skill was installed from:

{
  "repo": "user/repo",
  "subdir": "my-skill",
  "branch": "main",
  "installed_at": "2026-03-28T..."
}

Skill Loading Priority

When multiple directories contain a skill with the same name, first match wins:

PriorityLocationSource
1 (highest)<profile-data>/skills/Per-profile install
2<project-dir>/skills/Project-local
3<project-dir>/bundled-skills/Bundled app-skills
4 (lowest)~/.octos/skills/Global install

Part 6: Update

# Update a skill from its source repo
octos skills update my-skill

# Update from a specific branch
octos skills update my-skill --branch main

# View skill details (version, source, tools)
octos skills info my-skill

The updater reads the .source file to know where to pull from, then re-runs the install flow (clone β†’ discover β†’ build/download β†’ copy).

Hot-Reload

Skill binaries can be updated without restarting the gateway:

# Build just the skill
cargo build --release -p my-skill

# Replace the binary
cp target/release/my_skill ~/.octos/skills/my-skill/main

# Next tool call automatically uses the new binary

Note: If you change SKILL.md or manifest.json for a bundled skill, you must rebuild the octos binary too (they’re embedded via include_str!). External skills reload immediately.


Advanced Topics

Multiple Tools in One Skill

A single binary can serve multiple tools. Route on argv[1]:

#![allow(unused)]
fn main() {
match tool_name {
    "get_weather" => handle_get_weather(&buf),
    "get_forecast" => handle_get_forecast(&buf),
    _ => fail(&format!("Unknown tool '{tool_name}'")),
}
}

Declare all tools in manifest.json:

{
  "tools": [
    { "name": "get_weather", "description": "...", "input_schema": { ... } },
    { "name": "get_forecast", "description": "...", "input_schema": { ... } }
  ]
}

Environment Variables

Skills inherit the gateway’s environment (minus blocked security-sensitive vars). Declare requirements in SKILL.md:

---
requires_env: MY_API_KEY,MY_SECRET
---

The gateway auto-injects provider API keys (e.g., DASHSCOPE_API_KEY, OPENAI_API_KEY) plus OCTOS_DATA_DIR and OCTOS_WORK_DIR.

Bundled Assets

Skills with asset files should resolve paths relative to the executable:

#![allow(unused)]
fn main() {
let exe = std::env::current_exe()?;
let skill_dir = exe.parent().unwrap();
let styles_dir = skill_dir.join("styles");
}

Do not use the current working directory β€” it points to the profile’s data dir, not the skill dir.

MCP Servers

A skill can declare MCP servers the gateway auto-starts:

{
  "mcp_servers": [
    {
      "command": "./bin/mcp-server",
      "args": ["--port", "0"],
      "env": ["DATABASE_URL"]
    }
  ]
}

Or remote MCP servers:

{
  "mcp_servers": [
    {
      "url": "https://mcp.example.com/v1",
      "headers": { "Authorization": "Bearer ${API_KEY}" }
    }
  ]
}

Path resolution: ./ and ../ are relative to the skill directory. env lists variable names (not values) to forward.

Lifecycle Hooks

Skills can run commands on agent events:

{
  "hooks": [
    {
      "event": "before_tool_call",
      "command": ["./hooks/policy-check.sh"],
      "timeout_ms": 3000,
      "tool_filter": ["shell", "bash"]
    },
    {
      "event": "after_tool_call",
      "command": ["./hooks/audit-log.sh"],
      "timeout_ms": 5000
    }
  ]
}
EventCan Deny?When
before_tool_callYes (exit 1)Before tool execution
after_tool_callNoAfter tool completes
before_llm_callYes (exit 1)Before LLM request
after_llm_callNoAfter LLM response

Prompt Fragments

Inject content into the system prompt without writing code:

{
  "name": "company-policy",
  "version": "1.0.0",
  "prompts": {
    "include": ["prompts/*.md"]
  }
}

Extras-Only Skills

Skills don’t need to provide tools. Valid combinations:

  • Prompt-only: Teach the agent domain knowledge (no binary needed)
  • Hooks-only: Enforce policies across all tool calls
  • MCP-only: Expose tools via remote MCP servers
  • Combined: Tools + MCP + hooks + prompts in one skill

Security

Binary integrity:

  • Symlinks rejected (defense against link-swap attacks)
  • SHA-256 verification when sha256 is set in manifest
  • Size limit: 100 MB max per binary

Environment sanitization β€” these vars are stripped before spawning skills:

  • LD_PRELOAD, DYLD_INSERT_LIBRARIES, DYLD_LIBRARY_PATH
  • NODE_OPTIONS, PYTHONPATH, PERL5LIB
  • RUSTFLAGS, RUST_LOG, and 10+ others

Best practices:

  • Validate all input (never trust user-provided paths, names, etc.)
  • Use timeouts on HTTP requests
  • Avoid shell injection
  • Set sha256 in manifest for release builds

Platform Skills vs App Skills

App SkillsPlatform Skills
Locationcrates/app-skills/crates/platform-skills/
BootstrapEvery gateway startupAdmin bot only
ScopePer-gatewayShared across gateways
Use whenSelf-contained, always availableRequires external service

Examples

Example 1: Clock (Local, No Network)

crates/app-skills/time/
β”œβ”€β”€ Cargo.toml          # chrono, chrono-tz, serde, serde_json
β”œβ”€β”€ manifest.json       # 1 tool: get_time, timeout_secs: 5
β”œβ”€β”€ SKILL.md            # Triggers: time, clock
└── src/main.rs         # System clock + timezone formatting

Example 2: Weather (Network API)

crates/app-skills/weather/
β”œβ”€β”€ Cargo.toml          # reqwest (blocking, rustls-tls), serde, serde_json
β”œβ”€β”€ manifest.json       # 2 tools: get_weather, get_forecast, timeout_secs: 15
β”œβ”€β”€ SKILL.md            # Triggers: weather, forecast
└── src/main.rs         # Geocode city β†’ Open-Meteo API

Example 3: Email (Environment Credentials)

crates/app-skills/send-email/
β”œβ”€β”€ Cargo.toml          # lettre, serde, serde_json
β”œβ”€β”€ manifest.json       # 1 tool: send_email
β”œβ”€β”€ SKILL.md            # requires_env: SMTP_HOST,SMTP_USERNAME,SMTP_PASSWORD
└── src/main.rs         # SMTP with credential validation

Checklists

Tool Skill (binary + tools)

  • Directory has manifest.json, SKILL.md, and executable (main or binary)
  • manifest.json has valid JSON Schema for all tool inputs
  • SKILL.md has frontmatter with trigger keywords
  • Binary reads argv[1] for tool name, stdin for JSON
  • Binary writes {"output": "...", "success": true/false} to stdout
  • Error cases return success: false with clear messages
  • Standalone test passes: echo '{"param": "val"}' | ./main my_tool
  • Gateway test passes: skill loads and agent can invoke it

Extras Skill (MCP / hooks / prompts)

  • mcp_servers: command or url set; env lists names only
  • hooks: valid event name; command is argv array; relative paths resolve
  • prompts: glob patterns match intended .md files
  • Extras-only: tools is empty or omitted, no binary needed

Publishing

  • Repo pushed to GitHub with manifest.json and SKILL.md at expected paths
  • Registry PR submitted to octos-hub
  • (Optional) Pre-built binaries for darwin-aarch64, linux-x86_64
  • (Optional) SHA-256 hashes in manifest.json binaries section
  • (Optional) GitHub Actions workflow for automated binary builds on release tags

Architecture Document: octos

Overview

octos is a 15-member Rust workspace (Edition 2024, rust-version 1.85.0) providing both a coding agent CLI and a multi-channel messaging gateway. Pure Rust TLS via rustls (no OpenSSL). Error handling via eyre/color-eyre.

Workspace members:

  • 6 core crates: octos-core, octos-memory, octos-llm, octos-agent, octos-bus, octos-cli
  • 1 pipeline crate: octos-pipeline
  • 7 app-skill crates: news, deep-search, deep-crawl, send-email, account-manager, time, weather
  • 1 platform-skill crate: asr
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                        octos-cli                             β”‚
β”‚           (CLI: chat, gateway, init, status)                β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚       octos-agent         β”‚           octos-bus               β”‚
β”‚  (Agent, Tools, Skills)  β”‚  (Channels, Sessions, Cron)     β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚octos-memoryβ”‚  octos-llm    β”‚       octos-pipeline              β”‚
β”‚(Episodes) β”‚ (Providers)  β”‚  (DOT-based orchestration)      β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚                       octos-core                             β”‚
β”‚            (Types, Messages, Gateway Protocol)              β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

octos-core β€” Foundation Types

Shared types with no internal dependencies. Only depends on serde, chrono, uuid, eyre.

MessageRole implements as_str() -> &'static str and Display for consistent string conversion across providers (system/user/assistant/tool).

Task Model

#![allow(unused)]
fn main() {
pub struct Task {
    pub id: TaskId,                   // UUID v7 (temporal ordering)
    pub parent_id: Option<TaskId>,    // For subtasks
    pub status: TaskStatus,
    pub kind: TaskKind,
    pub context: TaskContext,
    pub result: Option<TaskResult>,
    pub created_at: DateTime<Utc>,
    pub updated_at: DateTime<Utc>,
}
}

TaskId: Newtype over Uuid. Generates UUID v7 via Uuid::now_v7(). Implements Display, FromStr, Default.

TaskStatus (tagged enum, "state" discriminant):

  • Pending β€” awaiting assignment
  • InProgress { agent_id: AgentId } β€” executing
  • Blocked { reason: String } β€” waiting for dependency
  • Completed β€” success
  • Failed { error: String } β€” failure with message

TaskKind (tagged enum, "type" discriminant):

  • Plan { goal: String }
  • Code { instruction: String, files: Vec<PathBuf> }
  • Review { diff: String }
  • Test { command: String }
  • Custom { name: String, params: serde_json::Value }

TaskContext:

  • working_dir: PathBuf, git_state: Option<GitState>, working_memory: Vec<Message>, episodic_refs: Vec<EpisodeRef>, files_in_scope: Vec<PathBuf>

TaskResult:

  • success: bool, output: String, files_modified: Vec<PathBuf>, subtasks: Vec<TaskId>, token_usage: TokenUsage

TokenUsage: input_tokens: u32, output_tokens: u32 (defaults to 0/0)

Message Types

#![allow(unused)]
fn main() {
pub struct Message {
    pub role: MessageRole,           // System | User | Assistant | Tool
    pub content: String,
    pub media: Vec<String>,          // File paths (images, audio)
    pub tool_calls: Option<Vec<ToolCall>>,
    pub tool_call_id: Option<String>,
    pub timestamp: DateTime<Utc>,
}

pub struct ToolCall {
    pub id: String,
    pub name: String,
    pub arguments: serde_json::Value,
}
}

Gateway Protocol

#![allow(unused)]
fn main() {
pub struct InboundMessage {       // channel β†’ agent
    pub channel: String,          // "telegram", "cli", "discord", etc.
    pub sender_id: String,
    pub chat_id: String,
    pub content: String,
    pub timestamp: DateTime<Utc>,
    pub media: Vec<String>,
    pub metadata: serde_json::Value,
}

pub struct OutboundMessage {      // agent β†’ channel
    pub channel: String,
    pub chat_id: String,
    pub content: String,
    pub reply_to: Option<String>,
    pub media: Vec<String>,
    pub metadata: serde_json::Value,
}
}

InboundMessage::session_key() derives SessionKey::new(channel, chat_id) β€” format "{channel}:{chat_id}".

Inter-Agent Coordination

#![allow(unused)]
fn main() {
pub enum AgentMessage {           // tagged: "type", snake_case
    TaskAssign { task: Box<Task> },
    TaskUpdate { task_id: TaskId, status: TaskStatus },
    TaskComplete { task_id: TaskId, result: TaskResult },
    ContextRequest { task_id: TaskId, query: String },
    ContextResponse { task_id: TaskId, context: Vec<Message> },
}
}

Error System

#![allow(unused)]
fn main() {
pub struct Error {
    pub kind: ErrorKind,
    pub context: Option<String>,      // Chained context
    pub suggestion: Option<String>,   // Actionable fix hint
}
}

ErrorKind variants: TaskNotFound, AgentNotFound, InvalidStateTransition, LlmError, ApiError (status-aware: 401β†’check key, 429β†’rate limit), ToolError, ConfigError, ApiKeyNotSet, UnknownProvider, Timeout, ChannelError, SessionError, IoError, SerializationError, Other(eyre::Report).

Utilities

truncate_utf8(s: &mut String, max_len: usize, suffix: &str) β€” in-place truncation at UTF-8 char boundaries. Appends suffix after truncation. Used across all tool outputs.


octos-llm β€” LLM Provider Abstraction

Provider Trait

#![allow(unused)]
fn main() {
#[async_trait]
pub trait LlmProvider: Send + Sync {
    async fn chat(&self, messages: &[Message], tools: &[ToolSpec], config: &ChatConfig) -> Result<ChatResponse>;
    async fn chat_stream(&self, messages: &[Message], tools: &[ToolSpec], config: &ChatConfig) -> Result<ChatStream>;  // default: falls back to chat()
    fn context_window(&self) -> u32;  // default: context_window_tokens(self.model_id())
    fn model_id(&self) -> &str;
    fn provider_name(&self) -> &str;
}
}

Configuration

#![allow(unused)]
fn main() {
pub struct ChatConfig {
    pub max_tokens: Option<u32>,        // default: Some(4096)
    pub temperature: Option<f32>,       // default: Some(0.0)
    pub tool_choice: ToolChoice,        // Auto | Required | None | Specific { name }
    pub stop_sequences: Vec<String>,
}
}

Response Types

#![allow(unused)]
fn main() {
pub struct ChatResponse {
    pub content: Option<String>,
    pub tool_calls: Vec<ToolCall>,
    pub stop_reason: StopReason,       // EndTurn | ToolUse | MaxTokens | StopSequence
    pub usage: TokenUsage,
}

pub enum StreamEvent {
    TextDelta(String),
    ToolCallDelta { index, id, name, arguments_delta },
    Usage(TokenUsage),
    Done(StopReason),
    Error(String),
}

pub type ChatStream = Pin<Box<dyn Stream<Item = StreamEvent> + Send>>;
}

Provider Registry (registry/)

All providers are defined in octos-llm/src/registry/ β€” one file per provider. Each file exports a ProviderEntry with metadata (name, aliases, default model, API key env var, base URL) and a create() factory function. Adding a new provider = one file + one line in mod.rs.

#![allow(unused)]
fn main() {
pub struct ProviderEntry {
    pub name: &'static str,              // canonical name
    pub aliases: &'static [&'static str], // e.g. ["google"] for gemini
    pub default_model: Option<&'static str>,
    pub api_key_env: Option<&'static str>,
    pub default_base_url: Option<&'static str>,
    pub requires_api_key: bool,
    pub requires_base_url: bool,          // true for vllm
    pub requires_model: bool,             // true for vllm
    pub detect_patterns: &'static [&'static str], // model→provider auto-detect
    pub create: fn(CreateParams) -> Result<Arc<dyn LlmProvider>>,
}

pub struct CreateParams {
    pub api_key: Option<String>,
    pub model: Option<String>,
    pub base_url: Option<String>,
    pub model_hints: Option<ModelHints>,  // config-level override
}
}

Lookup: registry::lookup(name) β€” case-insensitive, matches canonical name or aliases. Auto-detect: registry::detect_provider(model) β€” infers provider from model name patterns.

Native Providers (4 protocol implementations)

ProviderBase URLAuth HeaderImage FormatDefault Model
Anthropicapi.anthropic.comx-api-keyBase64 blocksclaude-sonnet-4-20250514
OpenAIapi.openai.com/v1Authorization: BearerData URIgpt-4o
Geminigenerativelanguage.googleapis.com/v1betax-goog-api-keyBase64 inlinegemini-2.5-flash
OpenRouteropenrouter.ai/api/v1Authorization: BearerData URIanthropic/claude-sonnet-4-20250514

OpenAI-Compatible Providers (via OpenAIProvider::with_base_url())

ProviderAliasesBase URLDefault ModelAPI Key Env
DeepSeekβ€”api.deepseek.com/v1deepseek-chatDEEPSEEK_API_KEY
Groqβ€”api.groq.com/openai/v1llama-3.3-70b-versatileGROQ_API_KEY
Moonshotkimiapi.moonshot.ai/v1kimi-k2.5MOONSHOT_API_KEY
DashScopeqwendashscope.aliyuncs.com/compatible-mode/v1qwen-maxDASHSCOPE_API_KEY
MiniMaxβ€”api.minimax.io/v1MiniMax-Text-01MINIMAX_API_KEY
Zhipuglmopen.bigmodel.cn/api/paas/v4glm-4-plusZHIPU_API_KEY
Nvidianimintegrate.api.nvidia.com/v1meta/llama-3.3-70b-instructNVIDIA_API_KEY
Ollamaβ€”localhost:11434/v1llama3.2(none)
vLLMβ€”(user-provided)(user-provided)VLLM_API_KEY

Anthropic-Compatible Provider

ProviderAliasesBase URLDefault ModelAPI Key Env
Z.AIzai, z.aiapi.z.ai/api/anthropicglm-5ZAI_API_KEY

ModelHints (OpenAI provider)

Auto-detected from model name at construction, overridable via config model_hints:

#![allow(unused)]
fn main() {
pub struct ModelHints {
    pub uses_completion_tokens: bool,  // o-series, gpt-5, gpt-4.1
    pub fixed_temperature: bool,       // o-series, kimi-k2.5
    pub lacks_vision: bool,            // deepseek, minimax, mistral, yi-
    pub merge_system_messages: bool,   // default: true
}
}

SSE Streaming

parse_sse_response(response) -> impl Stream<Item = SseEvent> β€” stateful unfold-based parser. Max buffer: 1 MB. Handles \n\n and \r\n\r\n separators. Each provider maps SSE events to StreamEvent:

  • Anthropic: message_start β†’ input tokens, content_block_start/delta β†’ text/tool chunks, message_delta β†’ stop reason. Custom SSE state machine.
  • OpenAI/OpenRouter: Standard OpenAI SSE with [DONE] sentinel. delta.content for text, delta.tool_calls[] for tools. Shared parser: parse_openai_sse_events().
  • Gemini: alt=sse endpoint. candidates[0].content.parts[] with function call data.

RetryProvider

Wraps any Arc<dyn LlmProvider> with exponential backoff. Wrapped by ProviderChain for multi-provider failover.

#![allow(unused)]
fn main() {
pub struct RetryConfig {
    pub max_retries: u32,           // default: 3
    pub initial_delay: Duration,    // default: 1s
    pub max_delay: Duration,        // default: 60s
    pub backoff_multiplier: f64,    // default: 2.0
}
}

Delay formula: initial_delay * backoff_multiplier^attempt, capped at max_delay.

Retryable errors (three-tier detection):

  1. HTTP status: 429, 500, 502, 503, 504, 529
  2. reqwest: is_connect() or is_timeout()
  3. String fallback: β€œconnection refused”, β€œtimed out”, β€œoverloaded”

Provider Failover Chain

ProviderChain wraps multiple Arc<dyn LlmProvider> and transparently fails over on retriable errors. Configured via fallback_models in config.

#![allow(unused)]
fn main() {
pub struct ProviderChain {
    slots: Vec<ProviderSlot>,       // provider + AtomicU32 failure count
    failure_threshold: u32,         // default: 3
}
}

Behavior: Tries providers in order, skipping degraded ones (failures >= threshold). On retriable error, moves to the next. On success, resets failure count. If all degraded, picks the one with fewest failures.

Failoverable: Broader than retryable β€” includes 401/403, timeouts, and content-format 400 errors (e.g. "must not be empty", "reasoning_content", "API key not valid", "invalid_value"). These should not retry on the same provider but should failover to a different one.

AdaptiveRouter (adaptive.rs)

Metrics-driven provider selection with three mutually exclusive modes (Off, Hedge, Lane). Tracks per-provider EMA latency (configurable ema_alpha, default 0.3), p95 latency (64-sample circular buffer), error rates, throughput (output tokens/sec EMA), and cost. Four-factor scoring: stability, quality, priority, cost (all weights configurable). Includes circuit breaker, probe requests, model catalog seeding from model_catalog.json, and QoS ranking. Scoring uses EMA blending: baseline catalog data at cold start, live metrics gradually replace it (weight ramps from 0 to 1 over 10 calls).

#![allow(unused)]
fn main() {
pub struct AdaptiveSlot {
    provider: Arc<dyn LlmProvider>,
    metrics: ProviderMetrics,
    priority: usize,
    cost_per_m: f64,
    model_type: Mutex<ModelType>,        // Strong | Fast
    cost_in: AtomicU64,
    ds_output: AtomicU64,                // deep search output quality
    baseline_stability: AtomicU64,
    baseline_tool_avg_ms: AtomicU64,
    baseline_p95_ms: AtomicU64,
    context_window: AtomicU64,
    max_output: AtomicU64,
}
}

Hedge mode: Races primary + cheapest alternate via tokio::select!, cancels loser. Only completed requests record metrics (cancelled loser metrics are discarded). If primary fails, alternate is tried sequentially.

Lane mode: Scores all providers, picks single best. Probe requests sent to stale providers (configurable probability, default 0.1; interval, default 60s).

FallbackProvider (fallback.rs)

Wraps primary + QoS-ranked fallbacks. On failure, records cooldown via ProviderRouter. Tries each fallback in order.

SwappableProvider (swappable.rs)

Runtime model switching via RwLock. Leaks ~50 bytes per swap (acceptable for rare user-initiated changes). cached_model_id and cached_provider_name are leaked &'static str to satisfy the &str return type.

ProviderRouter (router.rs)

Sub-agent multi-model routing with prefix-based key resolution. Supports cooldown (60s default), QoS-scored compatible_fallbacks() (sorted by model catalog score), cost info auto-derived from pricing.rs, and metadata for LLM-visible tool schemas.

#![allow(unused)]
fn main() {
pub struct ProviderRouter {
    providers: RwLock<HashMap<String, Arc<dyn LlmProvider>>>,
    active_key: RwLock<Option<String>>,
    metadata: RwLock<HashMap<String, SubProviderMeta>>,
    cooldowns: RwLock<HashMap<String, Instant>>,
    qos_scores: RwLock<HashMap<String, f64>>,
}
}

OminixClient (ominix.rs)

Client for local ASR/TTS via Ominix runtime.

Token Estimation

#![allow(unused)]
fn main() {
pub fn estimate_tokens(text: &str) -> u32  // ~4 chars/token ASCII, ~1.5 chars/token CJK
pub fn estimate_message_tokens(msg: &Message) -> u32  // content + tool_calls + 4 overhead
}

Context Windows

Model FamilyTokens
Claude 3/4200,000
GPT-4o/4-turbo128,000
o1/o3/o4200,000
Gemini 2.0/1.51,000,000
Default (unknown)128,000

Pricing

model_pricing(model_id) -> Option<ModelPricing> β€” case-insensitive substring match. Cost = (input/1M) * input_rate + (output/1M) * output_rate.

ModelInput $/1MOutput $/1M
claude-opus-415.0075.00
claude-sonnet-43.0015.00
claude-haiku0.804.00
gpt-4o2.5010.00
gpt-4o-mini0.150.60
o3/o410.0040.00

Embedding

#![allow(unused)]
fn main() {
pub trait EmbeddingProvider: Send + Sync {
    async fn embed(&self, texts: &[&str]) -> Result<Vec<Vec<f32>>>;
    fn dimension(&self) -> usize;
}
}

OpenAIEmbedder: Default model text-embedding-3-small (1536 dims). text-embedding-3-large = 3072 dims.

Transcription

GroqTranscriber: Whisper whisper-large-v3 via https://api.groq.com/openai/v1/audio/transcriptions. Multipart form. 60s timeout. MIME detection: ogg/opus→audio/ogg, mp3→audio/mpeg, m4a→audio/mp4, wav→audio/wav.

Vision

encode_image(path) -> (mime_type, base64_data) β€” JPEG/PNG/GIF/WebP. is_image(path) -> bool.

Typed Error Hierarchy (error.rs)

LlmError with LlmErrorKind enum: Authentication, RateLimited, ContextOverflow, ModelNotFound, ServerError, Network, Timeout, InvalidRequest, ContentFiltered, StreamError, Provider. is_retryable() returns true for RateLimited, ServerError, Network, Timeout, StreamError. from_status(code, body) maps HTTP status codes to error kinds. Provider response bodies logged at debug level only (not exposed in error messages).

High-Level Client (high_level.rs)

LlmClient wraps Arc<dyn LlmProvider> with ergonomic APIs: generate(prompt), generate_with(messages, tools, config), generate_object(prompt, schema_name, schema), generate_typed<T>(prompt, schema_name, schema), stream(prompt), stream_with(messages, tools, config). Configurable via with_config(ChatConfig).

Middleware Pipeline (middleware.rs)

LlmMiddleware trait with before()/after()/on_error() hooks. MiddlewareStack wraps LlmProvider and runs layers in insertion order. before() can short-circuit with cached responses. Built-in: LoggingMiddleware (tracing), CostTracker (AtomicU64 counters for input/output tokens and request count). Streaming bypasses middleware (logged as debug warning).

Model Catalog (catalog.rs)

ModelCatalog with ModelInfo (id, name, provider, context_window, max_output_tokens, capabilities, cost, aliases). Lookup by ID or alias via HashMap index. with_defaults() pre-registers 4 models (Claude Sonnet 4, Claude Haiku 4.5, GPT-4o, Gemini 2.5 Flash). by_provider() and with_capability() for filtered queries.


EpisodeStore

redb database at .octos/episodes.redb with three tables:

TableKeyValuePurpose
episodes&str (episode_id)&str (JSON)Full episode records
cwd_index&str (working_dir)&str (JSON array of IDs)Directory-scoped lookup
embeddings&str (episode_id)&[u8] (bincode Vec)Vector embeddings
#![allow(unused)]
fn main() {
pub struct Episode {
    pub id: String,                   // UUID v7
    pub task_id: TaskId,
    pub agent_id: AgentId,
    pub working_dir: PathBuf,
    pub summary: String,              // LLM-generated, truncated to 500 chars
    pub outcome: EpisodeOutcome,      // Success | Failure | Blocked | Cancelled
    pub key_decisions: Vec<String>,
    pub files_modified: Vec<PathBuf>,
    pub created_at: DateTime<Utc>,
}
}

Operations:

  • store(episode) β€” serialize to JSON, update cwd_index, insert into in-memory HybridIndex
  • get(id) β€” direct lookup by episode_id
  • find_relevant(cwd, query, limit) β€” keyword matching scoped to directory
  • recent_for_cwd(cwd, n) β€” N most recent by created_at descending
  • store_embedding(id, Vec<f32>) β€” bincode serialize, store in embeddings table, update HybridIndex
  • find_relevant_hybrid(query, query_embedding, limit) β€” global hybrid search across all episodes

Initialization: On open(), rebuilds in-memory HybridIndex by iterating all episodes and loading embeddings from DB.

MemoryStore

File-based persistent memory at {data_dir}/memory/:

  • MEMORY.md β€” long-term memory (full overwrite)
  • YYYY-MM-DD.md β€” daily notes (append with date header)

get_memory_context() builds system prompt injection:

  1. ## Long-term Memory β€” full MEMORY.md
  2. ## Recent Activity β€” 7-day rolling window of daily notes
  3. ## Today's Notes β€” current day
#![allow(unused)]
fn main() {
pub struct HybridIndex {
    inverted: HashMap<String, Vec<(usize, u32)>>,  // term β†’ [(doc_idx, raw_tf_count)]
    doc_lengths: Vec<usize>,
    total_len: usize,                         // running total for O(1) avg_dl
    avg_dl: f64,
    ids: Vec<String>,
    hnsw: Option<Hnsw<'static, f32, DistCosine>>,
    has_embedding: Vec<bool>,
    dimension: usize,                               // default: 1536
}
}

BM25 scoring (constants: K1=1.2, B=0.75):

  • Tokenization: lowercase, split on non-alphanumeric, filter tokens < 2 chars
  • IDF: ln((N - df + 0.5) / (df + 0.5) + 1.0)
  • Score: IDF * (tf * (K1 + 1)) / (tf + K1 * (1 - B + B * dl/avg_dl)) β€” uses raw term counts (not normalized)
  • Duplicate detection: ids.contains(episode_id) skips already-indexed documents (line 76-78)
  • Normalized to [0, 1] range (epsilon 1e-10 prevents NaN from near-zero max scores)

HNSW vector index (via hnsw_rs):

  • Named constants: HNSW_MAX_NB_CONNECTION=16, HNSW_CAPACITY=10_000, HNSW_EF_CONSTRUCTION=200, HNSW_MAX_LAYER=16, DistCosine
  • L2 normalization before insertion/search; zero vectors rejected (returns None)
  • Cosine similarity = 1 - distance (DistCosine returns 1-cos_sim)

Hybrid ranking β€” fetches limit * 4 candidates from each:

  • Configurable weights via with_weights(vector_weight, bm25_weight) (defaults: 0.7 / 0.3)
  • Without vectors: BM25 only (graceful fallback)

octos-agent β€” Agent Runtime

Agent Core

#![allow(unused)]
fn main() {
pub struct Agent {
    id: AgentId,
    llm: Arc<dyn LlmProvider>,
    tools: ToolRegistry,
    memory: Arc<EpisodeStore>,
    embedder: Option<Arc<dyn EmbeddingProvider>>,
    system_prompt: RwLock<String>,
    config: AgentConfig,
    reporter: Arc<dyn ProgressReporter>,
    shutdown: Arc<AtomicBool>,       // Acquire/Release ordering
}

pub struct AgentConfig {
    pub max_iterations: u32,          // default: 50 (CLI overrides to 20)
    pub max_tokens: Option<u32>,      // None = unlimited
    pub max_timeout: Option<Duration>,// default: 600s wall-clock timeout
    pub save_episodes: bool,          // default: true
}
}

Execution Loop (run_task / process_message)

1. Build messages: system prompt + history + memory context + input
2. Loop (up to max_iterations):
   a. Check shutdown flag and token budget
   b. trim_to_context_window() β€” compact if needed
   c. Call LLM via chat_stream()
   d. Consume stream β†’ accumulate text, tool_calls, tokens
   e. Match stop_reason:
      - EndTurn/StopSequence β†’ save episode, return result
      - ToolUse β†’ execute_tools() β†’ append results β†’ continue
      - MaxTokens β†’ return result

ConversationResponse: content: String, token_usage: TokenUsage, files_modified: Vec<PathBuf>, streamed: bool

Episode saving: After task completion, fires-and-forgets embedding generation if embedder present.

Wall-clock timeout: Agent aborts after max_timeout (default 600s) regardless of iteration count.

Tool Output Sanitization

Before feeding tool results back to the LLM, sanitize_tool_output() (in sanitize.rs) strips noise:

  • Base64 data URIs: data:...;base64,<payload> β†’ [base64-data-redacted]
  • Long hex strings: 64+ contiguous hex chars (SHA-256, raw keys) β†’ [hex-redacted]

Context Compaction

Triggered when estimated tokens exceed 80% of context window / 1.2 safety margin.

Algorithm:

  1. Keep MIN_RECENT_MESSAGES (6) most recent non-system messages
  2. Don’t split inside tool call/result pairs
  3. Summarize old messages: first line (200 chars), strip tool arguments, drop media
  4. Budget: 40% of total for summary (BASE_CHUNK_RATIO = 0.4)
  5. Replace: [System, CompactionSummary, Recent1, Recent2, ...]

Format:

  • User: > User: first line [media omitted]
  • Assistant: > Assistant: content or - Called tool_name
  • Tool: -> tool_name: ok|error - first 100 chars

Bundled App Skills (bundled_app_skills.rs)

Compile-time embedded app-skill entries. Each app-skill crate (news, deep-search, deep-crawl, etc.) is registered as a bundled skill available at runtime.

Bootstrap (bootstrap.rs)

Bootstraps bundled skills at gateway startup. Ensures all bundled app-skills are registered and available.

Prompt Guard (prompt_guard.rs)

Prompt injection detection. ThreatKind enum classifies detected threats. Scans user input before passing to the agent.

Tool System

#![allow(unused)]
fn main() {
pub trait Tool: Send + Sync {
    fn name(&self) -> &str;
    fn description(&self) -> &str;
    fn tags(&self) -> &[&str];
    fn input_schema(&self) -> serde_json::Value;
    async fn execute(&self, args: &serde_json::Value) -> Result<ToolResult>;
}

pub struct ToolResult {
    pub output: String,
    pub success: bool,
    pub file_modified: Option<PathBuf>,
    pub tokens_used: Option<TokenUsage>,
}
}

ToolRegistry: HashMap<String, Arc<dyn Tool>> with provider_policy: Option<ToolPolicy> for soft filtering.

Built-in Tools (14)

ToolParametersKey Behavior
read_filepath, start_line?, end_line?Line numbers (NNN|), 100KB truncation, symlink rejection
write_filepath, contentCreates parent dirs, returns file_modified
edit_filepath, old_string, new_stringExact match required, error on 0 or >1 occurrences
diff_editpath, diffUnified diff with fuzzy matching (+-3 lines), reverse hunk application
globpattern, limit=100Rejects absolute paths and .., relative results
greppattern, file_pattern?, limit=50, context=0, ignore_case=false.gitignore-aware via ignore::WalkBuilder, regex with (?i) flag
list_dirpathSorted, [dir]/[file] prefix
shellcommand, timeout_secs=120SafePolicy check, 50KB output truncation, sandbox-wrapped, timeout clamped to [1, 600]s
web_searchquery, count=5Brave Search API (BRAVE_API_KEY)
web_fetchurl, extract_mode=β€œmarkdown”, max_chars=50000SSRF protection, htmd HTMLβ†’markdown, 30s timeout
messagecontent, channel?, chat_id?Cross-channel messaging via OutboundMessage. Gateway-only
spawntask, label?, mode=β€œbackground”, allowed_tools, context?Subagent with inherited provider policy. sync=inline, background=async. Gateway-only
cronaction, message, schedule paramsSchedule add/list/remove/enable/disable. Gateway-only
browseraction, url?, selector?, text?, expression?Headless Chrome via CDP (always compiled). Actions: navigate (SSRF + scheme check), get_text, get_html, click, type, screenshot, evaluate, close. 5min idle timeout, env sanitization, 10s JS timeout, early action validation

Registration: Core tools registered in ToolRegistry::with_builtins() (all modes). Browser is always compiled. Message, spawn, and cron are registered only in gateway mode (gateway.rs).

Tool Policies

#![allow(unused)]
fn main() {
pub struct ToolPolicy {
    pub allow: Vec<String>,   // empty = allow all
    pub deny: Vec<String>,    // deny-wins
}
}

Groups: group:fs (read_file, write_file, edit_file, diff_edit), group:runtime (shell), group:web (web_search, web_fetch, browser), group:search (glob, grep, list_dir), group:sessions (spawn).

Wildcards: exec* matches prefix. Provider-specific policies via config tools.byProvider.

Command Policy (ShellTool)

#![allow(unused)]
fn main() {
pub enum Decision { Allow, Deny, Ask }
}

SafePolicy deny patterns: rm -rf /, rm -rf /*, dd if=, mkfs, :(){:|:&};:, chmod -R 777 /. Commands are whitespace-normalized before matching to prevent evasion via extra spaces/tabs.

SafePolicy ask patterns: sudo, rm -rf, git push --force, git reset --hard

Sandbox

#![allow(unused)]
fn main() {
pub enum SandboxMode { Auto, Bwrap, Macos, Docker, None }
}

BLOCKED_ENV_VARS (18 vars, shared across all backends + MCP): LD_PRELOAD, LD_LIBRARY_PATH, LD_AUDIT, DYLD_INSERT_LIBRARIES, DYLD_LIBRARY_PATH, DYLD_FRAMEWORK_PATH, DYLD_FALLBACK_LIBRARY_PATH, DYLD_VERSIONED_LIBRARY_PATH, NODE_OPTIONS, PYTHONSTARTUP, PYTHONPATH, PERL5OPT, RUBYOPT, RUBYLIB, JAVA_TOOL_OPTIONS, BASH_ENV, ENV, ZDOTDIR

BackendIsolationNetworkPath Validation
Bwrap (Linux)RO bind /usr,/lib,/bin,/sbin,/etc; RW bind workdir; tmpfs /tmp; unshare-pid--unshare-net if !allow_networkN/A
Macos (sandbox-exec)SBPL profile: process-exec/fork, file-read*, writes to workdir+/private/tmp(allow network*) or (deny network*)Rejects control chars, (, ), \, "
Docker--rm --security-opt no-new-privileges --cap-drop ALL--network noneRejects :, \0, \n, \r

Docker resource limits: --cpus, --memory, --pids-limit. Mount modes: None (/tmp workdir), ReadOnly, ReadWrite.

Hooks System

Lifecycle hooks run shell commands at agent events. Configured via hooks array in config.

#![allow(unused)]
fn main() {
pub enum HookEvent { BeforeToolCall, AfterToolCall, BeforeLlmCall, AfterLlmCall }

pub struct HookConfig {
    pub event: HookEvent,
    pub command: Vec<String>,       // argv array (no shell interpretation)
    pub timeout_ms: u64,            // default: 5000
    pub tool_filter: Vec<String>,   // tool events only; empty = all
}
}

Shell protocol: JSON payload on stdin. Exit code semantics: 0=allow, 1=deny (before-hooks only), 2+=error. Before-hooks can deny operations; after-hook exit codes only count as errors.

Circuit breaker: HookExecutor auto-disables a hook after 3 consecutive failures (configurable via with_threshold()). Resets on success.

Environment: Commands sanitized via BLOCKED_ENV_VARS. Tilde expansion supports ~/ and ~username/.

Integration: Wired into chat.rs, gateway.rs, serve.rs. Hook config changes trigger restart via config watcher.

MCP Integration

JSON-RPC transport for Model Context Protocol servers. Two transport modes:

Transports:

  1. Stdio: Spawns server as child process (command + args + env). Line limit: 1MB. Env sanitized via BLOCKED_ENV_VARS.
  2. HTTP/SSE: Connects to remote server via url field. POST JSON, SSE response handling.

Lifecycle (stdio):

  1. Spawn server (command + args + env, filtering BLOCKED_ENV_VARS)
  2. Initialize: protocolVersion: "2024-11-05"
  3. Discover tools: tools/list RPC
  4. Validate input schemas (max depth 10, max size 64KB); reject tools with invalid schemas
  5. Register McpTool wrappers (30s timeout, 1MB max response)

McpTool execution: tools/call with name + arguments. Extracts content[].text from response.

Skills System

Skills are markdown instruction files that extend agent capabilities. Two sources: built-in (compiled into binary) and workspace (user-installed).

Skill File Format (SKILL.md)

---
name: skill_name
description: What it does
requires_bins: binary1, binary2    # comma-separated, checked via `which`
requires_env: ENV_VAR1, ENV_VAR2   # comma-separated, checked via std::env::var()
always: true|false                 # auto-load into system prompt when available
---
Skill instructions here (markdown). This body is injected into the agent's
system prompt when the skill is activated.

Frontmatter parsing: Simple key: value line matching (not full YAML). split_frontmatter() finds content between --- delimiters. strip_frontmatter() returns body only.

SkillInfo

#![allow(unused)]
fn main() {
pub struct SkillInfo {
    pub name: String,
    pub description: String,
    pub path: PathBuf,          // filesystem path or "(built-in)/name/SKILL.md"
    pub available: bool,        // bins_ok && env_ok
    pub always: bool,           // auto-load into system prompt
    pub builtin: bool,          // true if from BUILTIN_SKILLS, false if workspace
}
}

Availability check: available = requires_bins all found on PATH AND requires_env all set. Missing requirements make the skill unavailable but still listed.

SkillsLoader

#![allow(unused)]
fn main() {
pub struct SkillsLoader {
    skills_dir: PathBuf,        // {data_dir}/skills/
}
}

Methods:

  • list_skills() β€” scans workspace dir + built-ins. Workspace skills override built-ins with same name (checked via HashSet). Results sorted alphabetically.
  • load_skill(name) β€” returns body (frontmatter stripped). Checks workspace first, falls back to built-in.
  • build_skills_summary() β€” generates XML for system prompt injection:
    <skills>
      <skill available="true">
        <name>skill_name</name>
        <description>What it does</description>
        <location>/path/to/SKILL.md</location>
      </skill>
    </skills>
    
  • get_always_skills() β€” filters skills where always: true AND available: true.
  • load_skills_for_context(names) β€” loads multiple skills, joins with \n---\n.

Built-in Skills (3, compile-time include_str!())

#![allow(unused)]
fn main() {
pub struct BuiltinSkill {
    pub name: &'static str,
    pub content: &'static str,  // full SKILL.md including frontmatter
}
pub const BUILTIN_SKILLS: &[BuiltinSkill] = &[...];
}
SkillPurpose
cronTask scheduling instructions
skill-storeSkill store browsing and installation
skill-creatorCreate new skills
tmuxTerminal multiplexer control
weatherWeather information retrieval

CLI Management (octos skills)

  • list β€” shows built-in skills (with override status) + workspace skills
  • install <user/repo/skill-name> β€” fetches SKILL.md from https://raw.githubusercontent.com/{repo}/main/SKILL.md (15s timeout), saves to .octos/skills/{name}/SKILL.md. Fails if skill already exists.
  • remove <name> β€” deletes .octos/skills/{name}/ directory

Integration with Gateway

In the gateway command, skills are loaded during system prompt construction:

  1. get_always_skills() β€” collects auto-load skill names
  2. load_skills_for_context(names) β€” loads and joins skill bodies
  3. build_skills_summary() β€” appends XML skill index to system prompt
  4. Always-on skill content is prepended to the system prompt

Plugin System

Plugins extend the agent with external tools via standalone executables. Each plugin is a directory containing a manifest.json and an executable file.

Directory Layout

.octos/plugins/           # local (project-level)
~/.octos/plugins/         # global (user-level)
  └── my-plugin/
      β”œβ”€β”€ manifest.json  # plugin metadata + tool definitions
      └── my-plugin      # executable (or "main" as fallback)

Discovery order: local .octos/plugins/ first, then global ~/.octos/plugins/. Both are scanned by Config::plugin_dirs().

PluginManifest

#![allow(unused)]
fn main() {
pub struct PluginManifest {
    pub name: String,
    pub version: String,
    pub tools: Vec<PluginToolDef>,    // default: empty vec
}

pub struct PluginToolDef {
    pub name: String,                 // must be unique across all plugins
    pub description: String,
    pub input_schema: serde_json::Value,  // default: {"type": "object"}
}
}

Example manifest.json:

{
  "name": "my-plugin",
  "version": "0.1.0",
  "tools": [
    {
      "name": "greet",
      "description": "Greet someone by name",
      "input_schema": {
        "type": "object",
        "properties": { "name": { "type": "string" } }
      }
    }
  ]
}

PluginLoader

#![allow(unused)]
fn main() {
pub struct PluginLoader;  // stateless, all methods are associated functions
}

load_into(registry, dirs):

  1. Scan each directory for subdirectories
  2. For each subdirectory, look for manifest.json
  3. Parse manifest, find executable (try directory name first, then main)
  4. Validate executable permissions (Unix: mode & 0o111 != 0; non-Unix: existence check)
  5. Wrap each tool definition as a PluginTool implementing the Tool trait
  6. Register into ToolRegistry
  7. Log warning: "loaded unverified plugin (no signature check)"
  8. Return total tool count. Failed plugins are skipped with warning, not fatal.

PluginTool β€” Execution Protocol

#![allow(unused)]
fn main() {
pub struct PluginTool {
    plugin_name: String,
    tool_def: PluginToolDef,
    executable: PathBuf,
}
}

Invocation: executable <tool_name> (tool name passed as first argument).

stdin/stdout protocol:

  1. Spawn executable with tool name as arg, piped stdin/stdout/stderr
  2. Write JSON-serialized arguments to stdin, close (EOF signals end of input)
  3. Wait for exit with 30s timeout (PLUGIN_TIMEOUT)
  4. Parse stdout as JSON:
    • Structured: {"output": "...", "success": true/false} β†’ use parsed values
    • Fallback: raw stdout + stderr concatenated, success from exit code
  5. Return ToolResult (no file_modified tracking for plugins)

Error handling:

  • Spawn failure β†’ eyre error with plugin name and executable path
  • Timeout β†’ eyre error with plugin name, tool name, and duration
  • JSON parse failure β†’ graceful fallback to raw output

Progress Reporting

The agent emits structured events during execution via a trait-based observer pattern. Consumers (CLI, REST API) implement the trait to render progress in their own format.

ProgressReporter Trait

#![allow(unused)]
fn main() {
pub trait ProgressReporter: Send + Sync {
    fn report(&self, event: ProgressEvent);
}
}

Agent holds reporter: Arc<dyn ProgressReporter>. Events are fired synchronously during the execution loop (non-blocking β€” implementations must not block).

ProgressEvent Enum

#![allow(unused)]
fn main() {
pub enum ProgressEvent {
    TaskStarted { task_id: String },
    Thinking { iteration: u32 },
    Response { content: String, iteration: u32 },
    ToolStarted { name: String, tool_id: String },
    ToolCompleted { name: String, tool_id: String, success: bool,
                    output_preview: String, duration: Duration },
    FileModified { path: String },
    TokenUsage { input_tokens: u32, output_tokens: u32 },
    TaskCompleted { success: bool, iterations: u32, duration: Duration },
    TaskInterrupted { iterations: u32 },
    MaxIterationsReached { limit: u32 },
    TokenBudgetExceeded { used: u32, limit: u32 },
    StreamChunk { text: String, iteration: u32 },
    StreamDone { iteration: u32 },
    CostUpdate { session_input_tokens: u32, session_output_tokens: u32,
                 response_cost: Option<f64>, session_cost: Option<f64> },
}
}

Implementations (3)

SilentReporter β€” no-op, used as default when no reporter is configured.

ConsoleReporter β€” CLI output with ANSI colors and streaming support:

#![allow(unused)]
fn main() {
pub struct ConsoleReporter {
    use_colors: bool,
    verbose: bool,
    stdout: Mutex<BufWriter<Stdout>>,  // buffered for streaming chunks
}
}
EventOutput
Thinking\r⟳ Thinking... (iteration N) (overwrites line, yellow)
Responseβ—† first 3 lines... (cyan, clears Thinking line)
ToolStarted\rβš™ Running tool_name... (overwrites line, yellow)
ToolCompletedβœ“ tool_name (duration) green or βœ— tool_name red; verbose: 5 lines of output + ...
FileModifiedπŸ“ Modified: path (green)
TokenUsageTokens: N in, N out (verbose only, dim)
TaskCompletedβœ“ Completed N iterations, Xs or βœ— Failed after N iterations
TaskInterrupted⚠ Interrupted after N iterations. (yellow)
MaxIterationsReached⚠ Reached max iterations limit (N). (yellow)
TokenBudgetExceeded⚠ Token budget exceeded (used, limit). (yellow)
StreamChunkWrite to buffered stdout; flush only on \n (reduces syscalls)
StreamDoneFlush + newline
CostUpdateTokens: N in / N out | Cost: $X.XXXX
TaskStartedβ–Ά Task: id (verbose only, dim)

Duration formatting: >1s β†’ {:.1}s, ≀1s β†’ {N}ms.

SseBroadcaster (REST API, feature: api) β€” converts events to JSON and broadcasts via tokio::sync::broadcast channel:

#![allow(unused)]
fn main() {
pub struct SseBroadcaster {
    tx: broadcast::Sender<String>,  // JSON-serialized events
}
}
ProgressEventJSON type fieldAdditional fields
ToolStarted"tool_start"tool
ToolCompleted"tool_end"tool, success
StreamChunk"token"text
StreamDone"stream_end"β€”
CostUpdate"cost_update"input_tokens, output_tokens, session_cost
Thinking"thinking"iteration
Response"response"iteration
(other)"other"β€” (logged at debug level)

Subscribers receive events via SseBroadcaster::subscribe() -> broadcast::Receiver<String>. Send errors (no subscribers) are silently ignored.

Execution Environments (exec_env.rs)

ExecEnvironment trait with exec(cmd, args, env), read_file(path), write_file(path, content), file_exists(path), list_dir(path). Two implementations: LocalEnvironment (tokio::process::Command) and DockerEnvironment (docker exec). Environment variables sanitized via shared BLOCKED_ENV_VARS. Docker paths validated against injection characters (\0, \n, \r, :). Docker env vars forwarded via --env flags.

Provider Toolsets (provider_tools.rs)

ToolAdjustment (prefer, demote, aliases, extras) per LLM provider. ProviderToolsets registry with with_defaults() for openai/anthropic/google. Used to optimize tool presentation per provider (e.g., OpenAI prefers shell/read_file, demotes diff_edit).

Typed Turns (turn.rs)

Turn wraps Message with TurnKind (UserInput, AgentReply, ToolCall, ToolResult, System) and iteration number. turns_to_messages() converts back to Vec<Message> for LLM calls. Enables semantic analysis of conversation history.

Event Bus (event_bus.rs)

EventBus with typed EventSubscriber for pub/sub within the agent. Decouples event producers (tool execution, LLM calls) from consumers (logging, metrics, UI updates).

Loop Detection (loop_detect.rs)

Detects repetitive agent behavior (e.g., calling the same tool with same args). Configurable threshold and window. Returns early with diagnostic message when loop detected.

Session State (session.rs)

SessionState with SessionLimits and SessionUsage tracking. SessionStateHandle for thread-safe access. Tracks token usage, iteration count, and wall-clock time against configured limits.

Steering (steering.rs)

SteeringMessage with SteeringSender/SteeringReceiver (mpsc channel). Allows external control of agent behavior mid-conversation (e.g., injecting guidance, changing strategy).

Prompt Layers (prompt_layer.rs)

PromptLayerBuilder for composing system prompts from multiple sources (base prompt, persona, user context, memory, skills). Layers are concatenated in order with configurable separators.


octos-bus β€” Gateway Infrastructure

Message Bus

create_bus() -> (AgentHandle, BusPublisher) linked by mpsc channels (capacity 256). AgentHandle receives InboundMessages; BusPublisher dispatches OutboundMessages.

Queue Modes (configured via gateway.queue_mode):

  • Followup (default): FIFO β€” process queued messages one at a time
  • Collect: Merge queued messages by session, concatenating content before processing

Channel Trait

#![allow(unused)]
fn main() {
#[async_trait]
pub trait Channel: Send + Sync {
    fn name(&self) -> &str;
    async fn start(&self, inbound_tx: mpsc::Sender<InboundMessage>) -> Result<()>;
    async fn send(&self, msg: &OutboundMessage) -> Result<()>;
    fn is_allowed(&self, sender_id: &str) -> bool;
    async fn stop(&self) -> Result<()>;
}
}

Channel Implementations

ChannelTransportFeature FlagAuthDedup
CLIstdin/stdout(always)N/AN/A
Telegramteloxide long-polltelegramBot token (env)teloxide built-in
Discordserenity gatewaydiscordBot token (env)serenity built-in
SlackSocket Mode (tokio-tungstenite)slackBot token + App tokenmessage_ts
WhatsAppWebSocket bridge (ws://localhost:3001)whatsappBaileys bridgeHashSet (10K cap, clear on overflow)
FeishuWebSocket (tokio-tungstenite)feishuApp ID + Secret β†’ tenant token (TTL 6000s)HashSet (10K cap, clear on overflow)
EmailIMAP poll + SMTP sendemailUsername/password, rustls TLSIMAP UNSEEN flag
WeComWeCom/WeChat Work APIwecomCorp ID + Agent Secretmessage_id
TwilioTwilio SMS/MMStwilioAccount SID + Auth Tokenmessage SID

Email specifics: IMAP async-imap with rustls for inbound (poll unseen, mark \Seen). SMTP lettre for outbound (port 465=implicit TLS, other=STARTTLS). mailparse for RFC822 body extraction. Body truncated via truncate_utf8(max_body_chars).

Feishu specifics: Tenant access token with TTL cache (6000s). WebSocket gateway URL from /callback/ws/endpoint. Message type detection via header.event_type == "im.message.receive_v1". Supports oc_* (chat_id) vs ou_* (open_id) routing.

Markdown to HTML: markdown_html.rs converts Markdown to Telegram-compatible HTML for rich message formatting.

Media: download_media() helper downloads photos/voice/audio/documents to .octos/media/.

Transcription: Voice/audio auto-transcribed via GroqTranscriber before agent processing.

Message Coalescing

Splits oversized messages into channel-safe chunks:

ChannelMax Chars
Telegram4000
Discord1900
Slack3900

Break preference: paragraph (\n\n) > newline (\n) > sentence (. ) > space ( ) > hard cut.

MAX_CHUNKS = 50 (DoS limit). UTF-8 safe boundary detection via char_indices().

Session Manager

JSONL persistence at .octos/sessions/{key}.jsonl.

  • In-memory cache: LRU with disk sync on write
  • Filenames: Percent-encoded SessionKey, truncated to 183 chars with _{hash:016X} suffix on truncation to prevent collisions
  • File size limit: 10MB max (MAX_SESSION_FILE_SIZE); oversized files skipped on load
  • Crash safety: Atomic write-then-rename
  • Forking: fork() creates child session with parent_key tracking, copies last N messages

Cron Service

JSON persistence at .octos/cron.json.

Schedule types:

  • Every { seconds: u64 } β€” recurring interval
  • Cron { expr: String } β€” cron expression via cron crate
  • At { timestamp_ms: i64 } β€” one-shot (auto-delete after run)

CronJob fields: id (8-char hex from UUIDv7), name, enabled, schedule, payload (message + deliver flag + channel + chat_id), state (next_run_at_ms, run_count), delete_after_run.

Heartbeat Service

Periodic check of HEARTBEAT.md (default: 30 min interval). Sends content to agent if non-empty.


octos-cli β€” CLI & Configuration

Commands

CommandDescription
chatInteractive multi-turn chat. Readline with history. Exit: exit/quit/:q
gatewayPersistent multi-channel daemon with session management
initInitialize .octos/ with config, templates, directories
statusShow config, provider, API keys, bootstrap files
auth login/logout/statusOAuth PKCE (OpenAI), device code, paste-token
cron list/add/remove/enableCLI cron job management
channels status/loginChannel compilation status, WhatsApp bridge setup
skills list/install/removeSkill management, GitHub fetch
officeOffice/workspace management
accountAccount management
cleanRemove .redb files with dry-run support
completionsShell completion generation (bash/zsh/fish)
docsGenerate tool + provider documentation
serveREST API server (feature: api) β€” axum on 127.0.0.1:8080 (--host to override)

Configuration

Loaded from .octos/config.json (local) or ~/.config/octos/config.json (global). Local takes precedence.

  • ${VAR} expansion: Environment variable substitution in string values
  • Versioned config: Version field with automatic migrate_config() framework
  • Provider auto-detect (registry::detect_provider(model)): claudeβ†’anthropic, gpt/o1/o3/o4β†’openai, geminiβ†’gemini, deepseekβ†’deepseek, kimi/moonshotβ†’moonshot, qwenβ†’dashscope, glmβ†’zhipu, llama/mixtralβ†’groq. Patterns defined per-provider in registry/.

API key resolution order: Auth store (~/.octos/auth.json) β†’ environment variable.

Auth Module

OAuth PKCE (OpenAI):

  1. Generate 64-char verifier (two UUIDv4 hex)
  2. SHA-256 challenge, base64-URL encode (no padding)
  3. TCP listener on port 1455
  4. Browser β†’ auth.openai.com with PKCE + state
  5. Callback validates state (CSRF), exchanges code+verifier for tokens

Device Code Flow (OpenAI): POST deviceauth/usercode, poll deviceauth/token every 5s+.

Paste Token: Prompt for API key from stdin, store as auth_method: "paste_token".

AuthStore: ~/.octos/auth.json (mode 0600). {credentials: {provider: AuthCredential}}.

Config Watcher

Polls every 5 seconds. SHA-256 hash comparison of file contents.

Hot-reloadable: system_prompt, max_history (applied live).

Restart-required: provider, model, base_url, api_key_env, sandbox, mcp_servers, hooks, gateway.queue_mode, channels.

REST API (feature: api)

RouteMethodDescription
/api/chatPOSTSend message β†’ response
/api/chat/streamGETSSE stream of ProgressEvents
/api/sessionsGETList all sessions
/api/sessions/{id}/messagesGETPaginated history (?limit=100&offset=0, max 500)
/api/statusGETVersion, model, provider, uptime
/metricsGETPrometheus text exposition format (unauthenticated)
/* (fallback)GETEmbedded web UI (static files via rust-embed)

Auth: Optional bearer token with constant-time comparison (API routes only; /metrics and static files are public). CORS: localhost:3000/8080. Max message: 1MB.

Web UI: Embedded SPA via rust-embed served as the fallback handler. Session sidebar, chat interface, SSE streaming, dark theme. Vanilla HTML/CSS/JS (no build tools).

Prometheus Metrics: octos_tool_calls_total (counter, labels: tool, success), octos_tool_call_duration_seconds (histogram, label: tool), octos_llm_tokens_total (counter, label: direction). Powered by metrics + metrics-exporter-prometheus crates.

Session Compaction (Gateway)

Triggered when message count > 40 (threshold). Keeps 10 recent messages. Summarizes older messages via LLM to <500 words. Rewrites JSONL session file.


octos-pipeline β€” DOT-based Pipeline Orchestration

DOT-based pipeline orchestration engine for defining and executing multi-step workflows.

  • parser.rs β€” DOT graph parser (parses Graphviz DOT format into pipeline definitions)
  • graph.rs β€” PipelineGraph with node/edge types
  • executor.rs β€” Async pipeline execution engine
  • handler.rs β€” Handler types: CodergenHandler, GateHandler, ShellHandler, NoopHandler, DynamicParallel
  • condition.rs β€” Conditional edge evaluation (branching logic)
  • tool.rs β€” RunPipelineTool integration (exposes pipeline execution as an agent tool)
  • validate.rs β€” Graph validation and lint diagnostics
  • human_gate.rs β€” Human-in-the-loop gates with HumanInputProvider trait, ChannelInputProvider (mpsc + oneshot, 5min default timeout), AutoApproveProvider. Input types: Approval, FreeText, Choice
  • fidelity.rs β€” FidelityMode enum (Full, Truncate, Compact, Summary) for context carryover control between nodes. Parse from config strings. Safety caps: 10MB max_chars, 100K max_lines
  • manager.rs β€” PipelineManager supervisor with SupervisionStrategy (AllOrNothing, BestEffort, RetryFailed). Retry capped at 10 with exponential backoff (100ms-5s). ManagerOutcome converts to NodeOutcome
  • thread.rs β€” ThreadRegistry for LLM session reuse across pipeline nodes. Thread stores model_id + message history. Limits: 1000 threads, 10000 messages per thread
  • server.rs β€” PipelineServer trait with SubmitRequest (validated: 1MB DOT, 256KB input, 64 variables, safe pipeline IDs), RunStatus lifecycle (Queued β†’ Running β†’ Completed/Failed/Cancelled)
  • artifact.rs β€” Pipeline artifact storage for intermediate outputs
  • checkpoint.rs β€” Pipeline checkpoint/resume for crash recovery
  • events.rs β€” Pipeline event system for progress tracking
  • run_dir.rs β€” Per-run working directories with isolation
  • stylesheet.rs β€” Visual styling for pipeline graph rendering

Data Flows

Chat Mode

User Input β†’ readline β†’ Agent.process_message(input, history)
                              β”‚
                              β”œβ”€ Build messages (system + history + memory + input)
                              β”œβ”€ trim_to_context_window() if needed
                              β”œβ”€ Call LLM via chat_stream() with tool specs
                              β”œβ”€ Execute tools if ToolUse (loop)
                              └─ Return ConversationResponse
                                    β”‚
                              Print response, append to history

Gateway Mode

Channel β†’ InboundMessage β†’ MessageBus β†’ [transcribe audio] β†’ [load session]
                                              β”‚
                                    Agent.process_message()
                                              β”‚
                                        OutboundMessage
                                              β”‚
                                   ChannelManager.dispatch()
                                              β”‚
                                    coalesce() β†’ Channel.send()

System messages (cron, heartbeat, spawn results) flow through the same bus with channel: "system" and metadata routing.


Feature Flags

# octos-bus
telegram = ["teloxide"]
discord  = ["serenity"]
slack    = ["tokio-tungstenite"]
whatsapp = ["tokio-tungstenite"]
feishu   = ["tokio-tungstenite"]
email    = ["async-imap", "tokio-rustls", "rustls", "webpki-roots", "lettre", "mailparse"]

# octos-agent (browser is always compiled in, no longer feature-gated)
git      = ["gix"]                  # git operations via gitoxide
ast      = ["tree-sitter"]          # code_structure.rs AST analysis
admin-bot = [...]                   # admin/ directory tools

# octos-bus (additional)
wecom    = [...]                    # WeCom/WeChat Work channel
twilio   = [...]                    # Twilio SMS/MMS channel

# octos-cli
api      = ["axum", "tower-http", "futures"]
telegram = ["octos-bus/telegram"]
discord  = ["octos-bus/discord"]
slack    = ["octos-bus/slack"]
whatsapp = ["octos-bus/whatsapp"]
feishu   = ["octos-bus/feishu"]
email    = ["octos-bus/email"]
wecom    = ["octos-bus/wecom"]
twilio   = ["octos-bus/twilio"]

File Layout

crates/
β”œβ”€β”€ octos-core/src/
β”‚   β”œβ”€β”€ lib.rs, task.rs, types.rs, error.rs, gateway.rs, message.rs, utils.rs
β”œβ”€β”€ octos-llm/src/
β”‚   β”œβ”€β”€ lib.rs, provider.rs, config.rs, types.rs, retry.rs, failover.rs, sse.rs
β”‚   β”œβ”€β”€ embedding.rs, pricing.rs, context.rs, transcription.rs, vision.rs
β”‚   β”œβ”€β”€ adaptive.rs, swappable.rs, router.rs, ominix.rs
β”‚   β”œβ”€β”€ anthropic.rs, openai.rs, gemini.rs, openrouter.rs  (protocol impls)
β”‚   └── registry/ (mod.rs + 14 provider entries: anthropic, openai, gemini,
β”‚                   openrouter, deepseek, groq, moonshot, dashscope, minimax,
β”‚                   zhipu, zai, nvidia, ollama, vllm)
β”œβ”€β”€ octos-memory/src/
β”‚   β”œβ”€β”€ lib.rs, episode.rs, store.rs, memory_store.rs, hybrid_search.rs
β”œβ”€β”€ octos-agent/src/
β”‚   β”œβ”€β”€ lib.rs, agent.rs, progress.rs, policy.rs, compaction.rs, sanitize.rs, hooks.rs
β”‚   β”œβ”€β”€ sandbox.rs, mcp.rs, skills.rs, builtin_skills.rs
β”‚   β”œβ”€β”€ bundled_app_skills.rs, bootstrap.rs, prompt_guard.rs
β”‚   β”œβ”€β”€ plugins/ (mod.rs, loader.rs, manifest.rs, tool.rs)
β”‚   β”œβ”€β”€ skills/ (cron, skill-store, skill-creator SKILL.md)
β”‚   └── tools/ (mod, policy, shell, read_file, write_file, edit_file, diff_edit,
β”‚               list_dir, glob_tool, grep_tool, web_search, web_fetch,
β”‚               message, spawn, browser, ssrf, tool_config,
β”‚               deep_search, site_crawl, recall_memory, save_memory,
β”‚               send_file, take_photo, code_structure, git,
β”‚               deep_research_pipeline, synthesize_research, research_utils,
β”‚               admin/ (profiles, skills, sub_accounts, system,
β”‚                       platform_skills, update))
β”œβ”€β”€ octos-bus/src/
β”‚   β”œβ”€β”€ lib.rs, bus.rs, channel.rs, session.rs, coalesce.rs, media.rs
β”‚   β”œβ”€β”€ cli_channel.rs, telegram_channel.rs, discord_channel.rs
β”‚   β”œβ”€β”€ slack_channel.rs, whatsapp_channel.rs, feishu_channel.rs, email_channel.rs
β”‚   β”œβ”€β”€ wecom_channel.rs, twilio_channel.rs, markdown_html.rs
β”‚   β”œβ”€β”€ cron_service.rs, cron_types.rs, heartbeat.rs
└── octos-cli/src/
    β”œβ”€β”€ main.rs, config.rs, config_watcher.rs, cron_tool.rs, compaction.rs
    β”œβ”€β”€ auth/ (mod.rs, store.rs, oauth.rs, token.rs)
    β”œβ”€β”€ api/ (mod.rs, router.rs, handlers.rs, sse.rs, metrics.rs, static_files.rs)
    └── commands/ (mod, chat, init, status, gateway, clean,
                   completions, cron, channels, auth, skills, docs, serve,
                   office, account)
β”œβ”€β”€ octos-pipeline/src/
β”‚   β”œβ”€β”€ lib.rs, parser.rs, graph.rs, executor.rs, handler.rs
β”‚   β”œβ”€β”€ condition.rs, tool.rs, validate.rs

Security

Workspace-Level Safety

  • #![deny(unsafe_code)] β€” workspace-wide lint via [workspace.lints.rust]
  • secrecy::SecretString β€” all provider API keys are wrapped; prevents accidental logging/display

Authentication & Credentials

  • API keys: auth store (~/.octos/auth.json, mode 0600) checked before env vars
  • OAuth PKCE with SHA-256 challenges, state parameter (CSRF protection)
  • Constant-time byte comparison for API bearer tokens (timing attack prevention)

Execution Sandbox

  • Three backends: bwrap (Linux), sandbox-exec (macOS), Docker β€” SandboxMode::Auto detection
  • 18 BLOCKED_ENV_VARS shared across all sandbox backends, MCP server spawning, hooks, and browser tool: LD_PRELOAD, LD_LIBRARY_PATH, LD_AUDIT, DYLD_INSERT_LIBRARIES, DYLD_LIBRARY_PATH, DYLD_FRAMEWORK_PATH, DYLD_FALLBACK_LIBRARY_PATH, DYLD_VERSIONED_LIBRARY_PATH, NODE_OPTIONS, PYTHONSTARTUP, PYTHONPATH, PERL5OPT, RUBYOPT, RUBYLIB, JAVA_TOOL_OPTIONS, BASH_ENV, ENV, ZDOTDIR
  • Path injection prevention per backend (Docker: :, \0, \n, \r; macOS: control chars, (, ), \, ")
  • Docker: --cap-drop ALL, --security-opt no-new-privileges, --network none, blocked bind mount sources (docker.sock, /proc, /sys, /dev, /etc)

Tool Safety

  • ShellTool SafePolicy: deny rm -rf /, dd, mkfs, fork bombs, chmod -R 777 /; ask for sudo, rm -rf, git push --force, git reset --hard. Whitespace-normalized before matching. Timeout clamped to [1, 600]s. SIGTERMβ†’grace periodβ†’SIGKILL cleanup for child processes.
  • Tool policies: allow/deny with deny-wins semantics, 8 named groups (group:fs, group:runtime, group:web, group:search, group:sessions, etc.), wildcard matching, provider-specific filtering via tools.byProvider
  • Tool argument size limit: 1MB per invocation (non-allocating estimate_json_size with escape char accounting)
  • Symlink-safe file I/O via O_NOFOLLOW on Unix (atomic kernel-level check, eliminates TOCTOU races); metadata-based symlink check fallback on Windows
  • SSRF protection in shared ssrf.rs module: DNS resolution with fail-closed behavior (blocks on DNS failure), private IP blocking (10/8, 172.16/12, 192.168/16, 169.254/16), IPv6 coverage (ULA fc00::/7, link-local fe80::/10, site-local fec0::/10, IPv4-mapped ::ffff:0:0/96, IPv4-compatible ::/96), loopback blocking. Used by web_fetch and browser.
  • Browser: URL scheme allowlist (http/https only), 10s JS execution timeout, zombie process reaping, secure tempfiles for screenshots
  • MCP: input schema validation (max depth 10, max size 64KB) prevents malicious tool definitions
  • Prompt injection guard (prompt_guard.rs): 5 threat categories (SystemOverride, RoleConfusion, ToolCallInjection, SecretExtraction, InstructionInjection), 10 detection patterns. Sanitizes threats by wrapping in [injection-blocked:...].

Data Safety

  • Tool output sanitization (sanitize.rs): strips base64 data URIs, long hex strings (64+ chars), and credential redaction with 7 regex patterns covering OpenAI (sk-...), Anthropic (sk-ant-...), AWS (AKIA...), GitHub (ghp_/gho_/ghs_/ghr_/github_pat_...), GitLab (glpat-...), Bearer tokens, and generic password/api_key assignments
  • UTF-8 safe truncation via truncate_utf8() across all tool outputs and email bodies
  • Session file collision prevention via percent-encoded filenames with hash suffix on truncation
  • Session file size limit: 10MB max prevents OOM on corrupted files
  • Atomic write-then-rename for session persistence (crash safety)
  • API server binds to 127.0.0.1 by default (not 0.0.0.0)
  • Channel access control via allowed_senders lists
  • MCP response limit: 1MB per JSON-RPC line (DoS prevention)
  • Message coalescing: MAX_CHUNKS=50 DoS limit
  • API message limit: 1MB per request

Concurrency Model

Why Rust

octos uses Rust with the tokio async runtime, which provides significant advantages over Python (OpenClaw, etc.) and Node.js (NanoCloud, etc.) agent frameworks for concurrent session handling:

True parallelism β€” Tokio tasks run across all CPU cores simultaneously. Python has the GIL, so even with asyncio, CPU-bound work (JSON parsing, context compaction, token counting) is single-core. Node.js is single-threaded entirely. In octos, 10 concurrent sessions doing context compaction actually execute in parallel across cores.

Memory efficiency β€” No garbage collector, no runtime overhead per object. Agent sessions are compact structs on the heap. A Python agent session carries interpreter overhead, GC metadata on every object, and dict-based attribute lookup. This matters with hundreds of sessions and large conversation histories in memory.

No GC pauses β€” Python and Node.js GC can cause latency spikes mid-response. Rust has deterministic deallocation β€” memory is freed exactly when the owning struct drops.

Single binary deployment β€” No Python/Node runtime to install, no dependency hell, predictable resource usage. The gateway is one static binary.

Tokio Tasks vs OS Threads

All concurrent session processing uses tokio tasks (green threads), not OS threads. A tokio task is a state machine on the heap (~few KB). An OS thread is ~8MB stack. Thousands of tasks multiplex across a handful of OS threads (defaults to CPU core count). Since agent sessions spend most of their time awaiting I/O (LLM API responses), they yield the thread to other tasks efficiently.

Gateway Concurrency

Inbound messages β†’ main loop
                      β”‚
                      β”œβ”€ tokio::spawn() per message
                      β”‚     β”‚
                      β”‚     β”œβ”€ Semaphore (max_concurrent_sessions, default 10)
                      β”‚     β”‚     bounds total concurrent agent runs
                      β”‚     β”‚
                      β”‚     └─ Per-session Mutex
                      β”‚           serializes messages within same session
                      β”‚
                      └─ Different sessions run concurrently
                         Same session queues sequentially
  • Cross-session: concurrent, bounded by max_concurrent_sessions semaphore (default 10)
  • Within same session: serialized via per-session mutex β€” prevents race conditions on conversation history
  • Per-session locks: pruned after completion (Arc strong_count == 1) to prevent unbounded HashMap growth

Tool Execution

Within a single agent iteration, all tool calls from one LLM response execute concurrently via join_all():

LLM response: [web_search, read_file, send_email]
                   β”‚            β”‚           β”‚
                   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                          join_all()
                   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                   β”‚            β”‚           β”‚
                 done         done        done
                          ↓
              All results appended to messages
                          ↓
                    Next LLM call

Sub-Agent Modes (spawn tool)

AspectSyncBackground
Parent blocks?YesNo (tokio::spawn())
Result deliverySame conversation turnNew inbound message via gateway
Token accountingCounted toward parent budgetIndependent
Use caseSequential pipelinesFire-and-forget long tasks

Sub-agents cannot spawn further sub-agents (spawn tool is always denied in sub-agent policy).

Multi-Tenant Dashboard

The dashboard (octos serve) runs each user profile as a separate gateway OS process:

Dashboard (octos serve)
  β”œβ”€ Profile "alice" β†’ octos gateway --config alice.json  (deepseek, own semaphore)
  β”œβ”€ Profile "bob"   β†’ octos gateway --config bob.json    (kimi, own semaphore)
  └─ Profile "carol" β†’ octos gateway --config carol.json  (openai, own semaphore)

Each profile has its own LLM provider, API keys, channels, data directory, and max_concurrent_sessions semaphore. Profiles are fully isolated β€” no shared state between gateway processes.


Testing

1300+ tests across all crates. See TESTING.md for the full inventory and CI guide.

  • Unit: type serde round-trips, tool arg parsing, config validation, provider detection, tool policies, compaction, coalescing, BM25 scoring, L2 normalization, SSE parsing
  • Adaptive routing: Off/Hedge/Lane modes, circuit breaker, failover, scoring, metrics, provider racing (19 tests)
  • Responsiveness: baseline learning, degradation detection, recovery, threshold boundaries (8 tests)
  • Queue modes: Followup, Collect, Steer, Speculative overflow, auto-escalation/deescalation (9 tests)
  • Session persistence: JSONL storage, LRU eviction, fork, rewrite, timestamp sort, concurrent access (28 tests)
  • Integration: CLI commands, file tools, cron jobs, session forking, plugin loading
  • Security: sandbox path injection, env sanitization, SSRF blocking, symlink rejection (O_NOFOLLOW), private IP detection, dedup overflow, tool argument size limits, session file size limits, circuit breaker threshold edge cases, MCP schema validation
  • Channel: allowed_senders, message parsing, dedup logic, email address extraction

Local CI: ./scripts/ci.sh (mirrors GitHub Actions + focused subsystem tests). See TESTING.md.

Testing Guide

Quick Start

# Full local CI (mirrors GitHub Actions)
./scripts/ci.sh

# Fast iteration (skip clippy)
./scripts/ci.sh --quick

# Auto-fix formatting
./scripts/ci.sh --fix

# Memory-constrained machines
./scripts/ci.sh --serial

CI Pipeline

scripts/ci.sh runs the same checks as .github/workflows/ci.yml plus focused subsystem tests.

Steps

StepCommandFlags
1. Formatcargo fmt --all -- --check--fix auto-fixes
2. Clippycargo clippy --workspace -- -D warnings--quick skips
3. Workspace testscargo test --workspace--serial for single-thread
4. Focused groupsPer-subsystem tests (see below)Always runs

Focused Test Groups

After the full workspace run, the CI script re-runs critical subsystems individually to surface failures clearly:

GroupCrateTest FilterCountWhat It Covers
Adaptive routingoctos-llmadaptive::tests19Off/Hedge/Lane modes, circuit breaker, failover, scoring, metrics, racing
Responsivenessoctos-llmresponsiveness::tests8Baseline learning, degradation detection, recovery, threshold boundaries
Session actoroctos-clisession_actor::tests9Queue modes, speculative overflow, auto-escalation/deescalation
Session persistenceoctos-bussession::tests28JSONL storage, LRU eviction, fork, rewrite, timestamp sort

Session actor tests always run single-threaded (--test-threads=1) because they spawn full actors with mock providers and can OOM under parallel execution.


Feature Coverage

Adaptive Routing (crates/octos-llm/src/adaptive.rs β€” 19 tests)

Tests the AdaptiveRouter which manages multiple LLM providers with metrics-driven selection.

Off Mode (static priority)

TestWhat It Verifies
test_selects_primary_on_cold_startPriority order on first call (no metrics yet)
test_lane_changing_off_uses_priority_orderOff mode ignores latency differences
test_lane_changing_off_skips_circuit_brokenOff mode still respects circuit breaker
test_hedged_off_uses_single_providerOff mode uses priority, no racing

Hedge Mode (provider racing)

TestWhat It Verifies
test_hedged_racing_picks_faster_providerRace 2 providers via tokio::select!, faster wins
test_hedged_racing_survives_one_failureFalls back to alternate when primary racer fails
test_hedge_single_provider_falls_throughHedge with 1 provider uses single-provider path

Lane Mode (score-based selection)

TestWhat It Verifies
test_lane_mode_picks_best_by_scoreSwitches to faster provider after metrics warm-up

Circuit Breaker and Failover

TestWhat It Verifies
test_circuit_breaker_skips_degradedSkips provider after N consecutive failures
test_failover_on_errorFalls over to next provider when primary fails
test_all_providers_failReturns error when every provider fails

Scoring and Metrics

TestWhat It Verifies
test_scoring_cold_start_respects_priorityCold-start scores follow config priority
test_latency_samples_p95P95 calculation from circular buffer
test_metrics_snapshotLatency/success/failure recorded correctly
test_metrics_export_after_callsExport includes per-provider metrics

Runtime Controls

TestWhat It Verifies
test_mode_switch_at_runtimeOff β†’ Hedge β†’ Lane β†’ Off switching
test_qos_ranking_toggleQoS ranking toggle is orthogonal to mode
test_adaptive_status_reports_correctlyStatus struct reflects current mode/count
test_empty_router_panicsAsserts at least 1 provider required

Responsiveness Observer (crates/octos-llm/src/responsiveness.rs β€” 8 tests)

Tests the latency tracker that drives auto-escalation.

Baseline Learning

TestWhat It Verifies
test_baseline_learningBaseline established from first 5 samples
test_sample_count_trackingsample_count() returns correct value

Degradation Detection

TestWhat It Verifies
test_degradation_detection3 consecutive slow requests (> 3x baseline) trigger activation
test_at_threshold_boundary_not_triggeredLatency exactly at threshold is not β€œslow”
test_no_false_trigger_before_baselineNo activation before baseline is learned

Recovery and Lifecycle

TestWhat It Verifies
test_recovery_detection1 fast request after activation triggers deactivation
test_multiple_activation_cyclesActivate β†’ deactivate β†’ reactivate works
test_window_caps_at_max_sizeRolling window stays at 20 entries

Queue Modes and Session Actor (crates/octos-cli/src/session_actor.rs β€” 9 tests)

Tests the per-session actor that owns message processing, queue policies, and auto-protection.

Mock infrastructure: DelayedMockProvider β€” configurable delay + scripted FIFO responses. setup_speculative_actor / setup_actor_with_mode β€” builds minimal actor with chosen queue mode and optional adaptive router.

Queue Mode: Followup

TestWhat It Verifies
test_queue_mode_followup_sequentialEach message processed individually β€” 3 messages produce 3 responses, all appear in session history separately

Queue Mode: Collect

TestWhat It Verifies
test_queue_mode_collect_batchesMessages queued during a slow LLM call are batched into a single combined prompt ("msg2\n---\nQueued #1: msg3")

Queue Mode: Steer

TestWhat It Verifies
test_queue_mode_steer_keeps_newestOlder queued messages discarded, only newest processed β€” discarded message absent from session history

Queue Mode: Speculative

TestWhat It Verifies
test_speculative_overflow_concurrentOverflow spawned as full agent task during slow primary (12s > 10s patience); both responses arrive; history sorted by timestamp
test_speculative_within_patience_dropsOverflow dropped when primary within patience (5s < 10s); only 1 response arrives
test_speculative_handles_background_resultBackgroundResult messages handled in the speculative select! loop without extra LLM calls

Auto-Escalation / Deescalation

TestWhat It Verifies
test_auto_escalation_on_degradation5 fast warmups (baseline 100ms) β†’ 3 slow calls (400ms > 3x) β†’ mode switches to Hedge + Speculative, user gets notification
test_auto_deescalation_on_recovery1 fast response after escalation β†’ mode reverts to Off + Followup, router confirms Off

Utility

TestWhat It Verifies
test_strip_think_tags<think>...</think> block removal from LLM output

Session Persistence (crates/octos-bus/src/session.rs β€” 28 tests)

Tests JSONL-backed session storage with LRU caching.

CRUD and Persistence

TestWhat It Verifies
test_session_manager_create_and_retrieveCreate session, add messages, retrieve
test_session_manager_persistenceMessages survive manager restart (disk reload)
test_session_manager_clearClear deletes from memory and disk

History and Ordering

TestWhat It Verifies
test_session_get_historyTail-slice returns last N messages
test_session_get_history_allReturns all when fewer than max
test_sort_by_timestamp_restores_orderRestores chronological order after concurrent overflow writes

LRU Cache

TestWhat It Verifies
test_eviction_keeps_max_sessionsCache respects capacity limit
test_evicted_session_reloads_from_diskEvicted sessions reload on access
test_with_max_sessions_clamps_zeroCapacity clamped to minimum 1

Concurrency

TestWhat It Verifies
test_concurrent_sessionsMultiple sessions don’t interfere
test_concurrent_session_processing10 parallel tasks don’t corrupt sessions

Fork and Rewrite

TestWhat It Verifies
test_fork_creates_childFork copies last N messages with parent link
test_fork_persists_to_diskForked session survives restart
test_session_rewriteAtomic write-then-rename after mutation

Multi-Session (Topics)

TestWhat It Verifies
test_list_sessions_for_chatLists all topic sessions for a chat
test_session_topic_persistsTopic survives restart
test_update_summarySummary update persists
test_active_session_storeActive topic switching and go-back
test_active_session_store_persistenceActive topic survives restart
test_validate_topic_nameRejects invalid characters and lengths

Filename Encoding

TestWhat It Verifies
test_truncated_session_keys_no_collisionLong keys with hash suffix don’t collide
test_decode_filenamePercent-encoded filenames decode correctly
test_list_sessions_returns_decoded_keyslist_sessions() returns human-readable keys
test_short_key_no_hash_suffixShort keys don’t get hash suffix

Safety Limits

TestWhat It Verifies
test_load_rejects_oversized_fileFiles over 10 MB refused
test_append_respects_file_size_limitAppend skips when file at 10 MB limit
test_load_rejects_future_schema_versionRejects unknown schema versions
test_purge_stale_sessionsDeletes sessions older than N days

Known Gaps

AreaWhy Not Tested
Interrupt queue modeSame codepath as Steer β€” covered by test_queue_mode_steer_keeps_newest
Probe/canary requestsDisabled in all tests via probe_probability: 0.0 for determinism
Streaming (chat_stream)No mock streaming infrastructure; streaming tested manually
Session compactionCalled in actor tests but output not verified (would need LLM mock for summarization)
Live provider integrationRequires API keys; 1 test exists but marked #[ignore]
Channel-specific routingCovered by channel crate tests, not part of this subsystem
⬆️ Earlier task markerPrimary response gets β€œβ¬†οΈ Earlier task completed:” prefix when overflow was served; not directly asserted in tests (would need to inspect outbound content after a slow primary + fast overflow race)
Overflow agent tool executionserve_overflow spawns a full agent.process_message_tracked() with tool access; current tests use DelayedMockProvider which returns canned responses without tool calls

Running Individual Tests

# Single test
cargo test -p octos-llm --lib adaptive::tests::test_hedged_racing_picks_faster_provider

# One subsystem
cargo test -p octos-llm --lib adaptive::tests

# Session actor (always single-threaded)
cargo test -p octos-cli session_actor::tests -- --test-threads=1

# With output
cargo test -p octos-cli session_actor::tests -- --test-threads=1 --nocapture

GitHub Actions CI

.github/workflows/ci.yml runs on push/PR to main:

  1. cargo fmt --all -- --check
  2. cargo clippy --workspace -- -D warnings
  3. cargo test --workspace

The local scripts/ci.sh is a superset β€” it runs the same three steps plus focused subsystem groups. If CI passes locally, it passes on GitHub.

Runner: macos-14 (ARM64). Private repo with 2000 free minutes/month (10x multiplier for macOS runners = ~200 effective minutes).


Files

FileWhat
scripts/ci.shLocal CI script (this document)
scripts/pre-release.shFull release smoke tests (build, E2E, skill binaries)
.github/workflows/ci.ymlGitHub Actions CI
crates/octos-llm/src/adaptive.rsAdaptive router + 19 tests
crates/octos-llm/src/responsiveness.rsResponsiveness observer + 8 tests
crates/octos-cli/src/session_actor.rsSession actor + 9 tests
crates/octos-bus/src/session.rsSession persistence + 28 tests