Introduction
π δΈζζζ‘£
What is Octos?
Octos is an open-source AI agent platform that turns any LLM into a multi-channel, multi-user intelligent assistant. You deploy a single Rust binary, connect your LLM API keys and messaging channels (Telegram, Discord, Slack, WhatsApp, Email, WeChat, and more), and Octos handles everything else β conversation routing, tool execution, memory, provider failover, and multi-tenant isolation.
Think of it as the backend operating system for AI agents. Instead of building a chatbot from scratch for each use case, you configure Octos profiles β each with their own system prompt, model, tools, and channels β and manage them all through a web dashboard or REST API. A small team can run hundreds of specialized AI agents on a single machine.
Octos is built for people who need more than a personal assistant: teams deploying AI for customer support across WhatsApp and Telegram, developers building AI-powered products on top of a REST API, researchers orchestrating multi-step research pipelines with different LLMs at each stage, or families sharing a single AI setup with per-person customization.
Operating Modes
Octos operates in two primary modes:
- Chat mode (
octos chat): Interactive multi-turn conversation with tools, or single-message execution via--message. - Gateway mode (
octos gateway): Persistent daemon serving multiple messaging channels simultaneously.
Key Concepts
| Term | Description |
|---|---|
| Agent | AI that executes tasks using tools |
| Tool | A capability (shell, file ops, search, messaging) |
| Provider | LLM API service (Anthropic, OpenAI, etc.) |
| Channel | Messaging platform (CLI, Telegram, Slack, etc.) |
| Session | Conversation history per channel and chat ID |
| Sandbox | Isolated execution environment (bwrap, macOS sandbox-exec, Docker) |
| Tool Policy | Allow/deny rules controlling which tools are available |
| Skill | Reusable instruction template (SKILL.md) |
| Bootstrap | Context files loaded into system prompt (AGENTS.md, SOUL.md, etc.) |
Quick Start
This guide walks you through the essential steps to get Octos running.
1. Initialize Your Workspace
Navigate to your project directory and initialize Octos:
cd your-project
octos init
This creates a .octos/ directory with default configuration, bootstrap files (AGENTS.md, SOUL.md, USER.md), and directories for memory, sessions, and skills.
2. Set Your API Key
Export at least one LLM provider key:
export ANTHROPIC_API_KEY="sk-ant-..."
Add this to your ~/.bashrc or ~/.zshrc for persistence. You can also use octos auth login --provider openai for OAuth-based login.
3. Check Setup
Verify everything is configured correctly:
octos status
This shows your config file location, active provider and model, API key status, and bootstrap file availability.
4. Start Chatting
Launch an interactive multi-turn conversation:
octos chat
Or send a single message and exit:
octos chat --message "Add a hello function to lib.rs"
5. Run the Gateway
To serve multiple messaging channels as a persistent daemon:
octos gateway
This requires a gateway section in your config with at least one channel configured. See the Configuration chapter for details.
6. Launch the Web UI
If you built with the api feature, start the web dashboard:
octos serve
Then open http://localhost:8080 in your browser.
Installation & Deployment
Prerequisites
| Requirement | Version | Notes |
|---|---|---|
| Rust | 1.85.0+ | Install via rustup.rs |
| macOS | 13+ | Apple Silicon or Intel |
| Linux | glibc 2.31+ | Ubuntu 20.04+, Debian 11+, Fedora 34+ |
| Windows | 10/11 | Native build or WSL2 |
You also need an API key from at least one supported LLM provider.
Optional Dependencies
| Dependency | Used For | Install |
|---|---|---|
| Node.js | WhatsApp bridge, PPTX creation skill | brew install node / apt install nodejs |
| ffmpeg | Media/video skills | brew install ffmpeg / apt install ffmpeg |
| Chrome/Chromium | Browser automation tool | brew install --cask chromium |
| LibreOffice | Office document conversion | brew install --cask libreoffice |
| Poppler | PDF rendering (pdftoppm) | brew install poppler / apt install poppler-utils |
Build from Source
git clone https://github.com/octos-org/octos
cd octos
# Basic (CLI, chat, run, gateway with CLI channel)
cargo install --path crates/octos-cli
# With messaging channels
cargo install --path crates/octos-cli --features telegram,discord,slack,whatsapp,feishu,email,wecom
# With browser automation (requires Chrome/Chromium)
cargo install --path crates/octos-cli --features browser
# With web UI and REST API
cargo install --path crates/octos-cli --features api
# Verify
octos --version
Deploy Script
For a streamlined installation, use the deploy script:
# Minimal install (CLI + chat only)
./scripts/local-deploy.sh --minimal
# Full install (all channels + dashboard + app-skills)
./scripts/local-deploy.sh --full
# Custom channels
./scripts/local-deploy.sh --channels telegram,discord,api
Platform-Specific Instructions
macOS
# 1. Install Rust
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
source "$HOME/.cargo/env"
# 2. Install optional deps
brew install node ffmpeg poppler
brew install --cask libreoffice
# 3. Clone and deploy
git clone https://github.com/octos-org/octos.git
cd octos
./scripts/local-deploy.sh --full
# 4. Set API key and run
export ANTHROPIC_API_KEY=sk-ant-...
octos chat
Background service (launchd):
The deploy script creates ~/Library/LaunchAgents/io.octos.octos-serve.plist.
# Start service (survives reboot)
launchctl load ~/Library/LaunchAgents/io.octos.octos-serve.plist
# Stop service
launchctl unload ~/Library/LaunchAgents/io.octos.octos-serve.plist
# View logs
tail -f ~/.octos/serve.log
Linux (Ubuntu/Debian)
# 1. Install system deps
sudo apt update
sudo apt install -y build-essential pkg-config libssl-dev
# 2. Install Rust
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
source "$HOME/.cargo/env"
# 3. Install optional deps
sudo apt install -y nodejs npm ffmpeg poppler-utils
# 4. Clone and deploy
git clone https://github.com/octos-org/octos.git
cd octos
./scripts/local-deploy.sh --full
# 5. Set API key and run
export ANTHROPIC_API_KEY=sk-ant-...
octos chat
Background service (systemd user unit):
The deploy script creates ~/.config/systemd/user/octos-serve.service.
# Start service
systemctl --user start octos-serve
# Enable on boot (requires lingering)
loginctl enable-linger $USER
systemctl --user enable octos-serve
# View logs
journalctl --user -u octos-serve -f
# Stop service
systemctl --user stop octos-serve
Linux (Fedora/RHEL)
# System deps
sudo dnf install -y gcc pkg-config openssl-devel
# Then follow Ubuntu steps from step 2 onward
Windows (Native)
Octos builds and runs natively on Windows. Shell commands are executed via cmd /C.
# 1. Install Rust (download rustup-init.exe from https://rustup.rs)
rustup-init.exe
# 2. Clone and build
git clone https://github.com/octos-org/octos.git
cd octos
cargo install --path crates/octos-cli
# 3. Set API key and run
$env:ANTHROPIC_API_KEY = "sk-ant-..."
octos chat
Windows notes:
- Sandbox is disabled on Windows (no bubblewrap/sandbox-exec equivalent); shell commands run without isolation. Docker sandbox mode still works if Docker Desktop is installed.
- API keys are stored via Windows Credential Manager.
- Process management uses
taskkillfor cleanup.
Windows (WSL2)
Alternatively, use WSL2 for a Linux environment:
# 1. Install WSL2 (PowerShell as admin)
wsl --install -d Ubuntu
# 2. Open Ubuntu terminal, then follow Linux (Ubuntu) steps above
When running octos serve inside WSL2, the dashboard is accessible from your Windows browser at http://localhost:8080 (WSL2 auto-forwards ports).
Docker
docker compose --profile gateway up -d
Deploy Script Reference
./scripts/local-deploy.sh [OPTIONS]
Options:
--minimal CLI + chat only (no channels, no dashboard)
--full All channels + dashboard + app-skills
--channels LIST Comma-separated: telegram,discord,slack,whatsapp,feishu,email,twilio,wecom
--no-skills Skip building app-skills
--no-service Skip launchd/systemd service setup
--uninstall Remove binaries and service files
--debug Build in debug mode (faster compile, larger binary)
--prefix DIR Install prefix (default: ~/.cargo/bin)
On Windows, use .\scripts\local-deploy.ps1 (PowerShell) with the same options.
What the script does:
- Checks prerequisites (Rust, platform deps)
- Builds the
octosbinary with selected features - Builds app-skill binaries (unless
--no-skills) - Signs binaries on macOS (ad-hoc codesign)
- Runs
octos initif~/.octosdoesnβt exist - Creates background service file (launchd on macOS, systemd on Linux)
Uninstall:
./scripts/local-deploy.sh --uninstall
# Data directory (~/.octos) is NOT removed. Delete manually:
rm -rf ~/.octos
Post-Install Verification
Set API Keys
Set at least one LLM provider key:
# Add to ~/.bashrc, ~/.zshrc, or ~/.profile
export ANTHROPIC_API_KEY=sk-ant-...
# Or
export OPENAI_API_KEY=sk-...
# Or use OAuth login
octos auth login --provider openai
Verify
octos --version # Check binary
octos status # Check config + API keys
octos chat --message "Hello" # Quick test
Upgrading
cd octos
git pull origin main
./scripts/local-deploy.sh --full # Rebuilds and reinstalls
# If running as a service, restart it:
# macOS:
launchctl unload ~/Library/LaunchAgents/io.octos.octos-serve.plist
launchctl load ~/Library/LaunchAgents/io.octos.octos-serve.plist
# Linux:
systemctl --user restart octos-serve
Troubleshooting
| Problem | Solution |
|---|---|
octos: command not found | Add ~/.cargo/bin to PATH: export PATH="$HOME/.cargo/bin:$PATH" |
| Build fails on Linux | Install build-essential pkg-config libssl-dev |
| macOS codesign warning | Run: codesign -s - ~/.cargo/bin/octos |
| Dashboard not accessible | Check port: octos serve --port 8080, open http://localhost:8080 |
| WSL2 port not forwarded | Restart WSL: wsl --shutdown then reopen terminal |
| Service wonβt start | Check logs: tail -f ~/.octos/serve.log or journalctl --user -u octos-serve |
| API key not found | Ensure env var is set in the service environment, not just your shell |
Configuration
Config File Locations
Configuration files are loaded in order (first found wins):
.octos/config.jsonβ project-local configuration~/.config/octos/config.jsonβ global configuration
Basic Config
A minimal configuration specifies the LLM provider and model:
{
"provider": "anthropic",
"model": "claude-sonnet-4-20250514",
"api_key_env": "ANTHROPIC_API_KEY"
}
Gateway Config
To run Octos as a multi-channel daemon, add a gateway section:
{
"provider": "anthropic",
"model": "claude-sonnet-4-20250514",
"gateway": {
"channels": [
{"type": "cli"},
{"type": "telegram", "allowed_senders": ["123456789"]},
{"type": "discord", "settings": {"token_env": "DISCORD_BOT_TOKEN"}},
{"type": "slack", "settings": {"bot_token_env": "SLACK_BOT_TOKEN", "app_token_env": "SLACK_APP_TOKEN"}},
{"type": "whatsapp", "settings": {"bridge_url": "ws://localhost:3001"}},
{"type": "feishu", "settings": {"app_id_env": "FEISHU_APP_ID", "app_secret_env": "FEISHU_APP_SECRET"}}
],
"max_history": 50,
"system_prompt": "You are a helpful assistant."
}
}
Environment Variable Expansion
Use ${VAR_NAME} syntax anywhere in config values:
{
"base_url": "${ANTHROPIC_BASE_URL}",
"model": "${OCTOS_MODEL}"
}
Full Config Reference
The complete configuration structure with all available fields:
{
"version": 1,
// LLM Provider
"provider": "anthropic",
"model": "claude-sonnet-4-20250514",
"base_url": null,
"api_key_env": null,
"api_type": null,
// Fallback chain
"fallback_models": [
{
"provider": "deepseek",
"model": "deepseek-chat",
"base_url": null,
"api_key_env": "DEEPSEEK_API_KEY"
}
],
// Adaptive routing
"adaptive_routing": {
"enabled": false,
"latency_threshold_ms": 30000,
"error_rate_threshold": 0.3,
"probe_probability": 0.1,
"probe_interval_secs": 60,
"failure_threshold": 3
},
// Gateway
"gateway": {
"channels": [{"type": "cli"}],
"max_history": 50,
"system_prompt": null,
"queue_mode": "followup",
"max_sessions": 1000,
"max_concurrent_sessions": 10,
"llm_timeout_secs": null,
"llm_connect_timeout_secs": null,
"tool_timeout_secs": null,
"session_timeout_secs": null,
"browser_timeout_secs": null
},
// Tool policies
"tool_policy": {"allow": [], "deny": []},
"tool_policy_by_provider": {},
"context_filter": [],
// Sub-providers (for spawn tool)
"sub_providers": [
{
"key": "cheap",
"provider": "deepseek",
"model": "deepseek-chat",
"description": "Fast model for simple tasks"
}
],
// Agent settings
"max_iterations": 50,
// Embedding (for vector search in memory)
"embedding": {
"provider": "openai",
"api_key_env": "OPENAI_API_KEY",
"base_url": null
},
// Voice
"voice": {
"auto_asr": true,
"auto_tts": false,
"default_voice": "vivian",
"asr_language": null
},
// Hooks
"hooks": [],
// MCP servers
"mcp_servers": [],
// Sandbox
"sandbox": {
"enabled": true,
"mode": "auto",
"allow_network": false
},
// Email (for email channel)
"email": null,
// Dashboard auth (serve mode only)
"dashboard_auth": null,
// Monitor (serve mode only)
"monitor": null
}
Environment Variables
LLM Providers
| Variable | Description |
|---|---|
ANTHROPIC_API_KEY | Anthropic (Claude) API key |
OPENAI_API_KEY | OpenAI API key |
GEMINI_API_KEY | Google Gemini API key |
OPENROUTER_API_KEY | OpenRouter API key |
DEEPSEEK_API_KEY | DeepSeek API key |
GROQ_API_KEY | Groq API key |
MOONSHOT_API_KEY | Moonshot/Kimi API key |
DASHSCOPE_API_KEY | Alibaba DashScope (Qwen) API key |
MINIMAX_API_KEY | MiniMax API key |
ZHIPU_API_KEY | Zhipu (GLM) API key |
ZAI_API_KEY | Z.AI API key |
NVIDIA_API_KEY | Nvidia NIM API key |
Search
| Variable | Description |
|---|---|
BRAVE_API_KEY | Brave Search API key |
PERPLEXITY_API_KEY | Perplexity Sonar API key |
YDC_API_KEY | You.com API key |
Channels
| Variable | Description |
|---|---|
TELEGRAM_BOT_TOKEN | Telegram bot token |
DISCORD_BOT_TOKEN | Discord bot token |
SLACK_BOT_TOKEN | Slack bot token |
SLACK_APP_TOKEN | Slack app-level token |
FEISHU_APP_ID | Feishu/Lark app ID |
FEISHU_APP_SECRET | Feishu/Lark app secret |
WECOM_CORP_ID | WeCom corp ID |
WECOM_AGENT_SECRET | WeCom agent secret |
EMAIL_USERNAME | Email account username |
EMAIL_PASSWORD | Email account password |
Email (send-email skill)
| Variable | Description |
|---|---|
SMTP_HOST | SMTP server hostname |
SMTP_PORT | SMTP server port |
SMTP_USERNAME | SMTP username |
SMTP_PASSWORD | SMTP password |
SMTP_FROM | SMTP from address |
LARK_APP_ID | Feishu mail app ID |
LARK_APP_SECRET | Feishu mail app secret |
LARK_FROM_ADDRESS | Feishu mail from address |
Voice
| Variable | Description |
|---|---|
OMINIX_API_URL | OminiX ASR/TTS API URL |
System
| Variable | Description |
|---|---|
RUST_LOG | Log level (error/warn/info/debug/trace) |
OCTOS_LOG_JSON | Enable JSON-formatted logs (set to any value) |
File Layout
~/.octos/ # Global config directory
βββ auth.json # Stored API credentials (mode 0600)
βββ profiles/ # Profile configs (serve mode)
β βββ my-bot.json
β βββ work-bot.json
βββ skills/ # Global custom skills
βββ serve.log # Serve mode log file
.octos/ # Project/profile data directory
βββ config.json # Configuration
βββ cron.json # Scheduled jobs
βββ AGENTS.md # Agent instructions
βββ SOUL.md # Personality definition
βββ USER.md # User information
βββ HEARTBEAT.md # Background tasks
βββ sessions/ # Chat history (JSONL)
βββ memory/ # Memory files
β βββ MEMORY.md # Long-term
β βββ 2025-02-10.md # Daily
βββ skills/ # Custom skills
βββ episodes.redb # Episodic memory DB
βββ history/
βββ chat_history # Readline history
LLM Providers & Routing
Octos supports 14 LLM providers out of the box. Each provider needs an API key stored in an environment variable (except local providers like Ollama).
Supported Providers
| Provider | Env Variable | Default Model | API Format | Aliases |
|---|---|---|---|---|
anthropic | ANTHROPIC_API_KEY | claude-sonnet-4-20250514 | Native Anthropic | β |
openai | OPENAI_API_KEY | gpt-4o | Native OpenAI | β |
gemini | GEMINI_API_KEY | gemini-2.0-flash | Native Gemini | β |
openrouter | OPENROUTER_API_KEY | anthropic/claude-sonnet-4-20250514 | Native OpenRouter | β |
deepseek | DEEPSEEK_API_KEY | deepseek-chat | OpenAI-compatible | β |
groq | GROQ_API_KEY | llama-3.3-70b-versatile | OpenAI-compatible | β |
moonshot | MOONSHOT_API_KEY | kimi-k2.5 | OpenAI-compatible | kimi |
dashscope | DASHSCOPE_API_KEY | qwen-max | OpenAI-compatible | qwen |
minimax | MINIMAX_API_KEY | MiniMax-Text-01 | OpenAI-compatible | β |
zhipu | ZHIPU_API_KEY | glm-4-plus | OpenAI-compatible | glm |
zai | ZAI_API_KEY | glm-5 | Anthropic-compatible | z.ai |
nvidia | NVIDIA_API_KEY | meta/llama-3.3-70b-instruct | OpenAI-compatible | nim |
ollama | (none) | llama3.2 | OpenAI-compatible | β |
vllm | VLLM_API_KEY | (must specify) | OpenAI-compatible | β |
Configuration Methods
Config File
Set provider and model in your config.json:
{
"provider": "moonshot",
"model": "kimi-2.5",
"api_key_env": "KIMI_API_KEY"
}
The api_key_env field overrides the default environment variable name for the provider. For example, Moonshot defaults to MOONSHOT_API_KEY, but you can point it at KIMI_API_KEY instead.
CLI Flags
octos chat --provider deepseek --model deepseek-chat
octos chat --model gpt-4o # auto-detects provider from model name
Auth Store
Instead of environment variables, you can store API keys through the auth CLI:
# OAuth PKCE (OpenAI)
octos auth login --provider openai
# Device code flow (OpenAI)
octos auth login --provider openai --device-code
# Paste-token (all other providers)
octos auth login --provider anthropic
# -> prompts: "Paste your API key:"
# Check stored credentials
octos auth status
# Remove credentials
octos auth logout --provider openai
Credentials are stored in ~/.octos/auth.json (file mode 0600). The auth store is checked before environment variables when resolving API keys.
Auto-Detection
When --provider is omitted, Octos infers the provider from the model name:
| Model Pattern | Detected Provider |
|---|---|
claude-* | anthropic |
gpt-*, o1-*, o3-*, o4-* | openai |
gemini-* | gemini |
deepseek-* | deepseek |
kimi-*, moonshot-* | moonshot |
qwen-* | dashscope |
glm-* | zhipu |
llama-* | groq |
octos chat --model gpt-4o # -> openai
octos chat --model claude-sonnet-4-20250514 # -> anthropic
octos chat --model deepseek-chat # -> deepseek
octos chat --model glm-4-plus # -> zhipu
octos chat --model qwen-max # -> dashscope
Custom Endpoints
Use base_url to point at self-hosted or proxy endpoints:
{
"provider": "openai",
"model": "gpt-4o",
"base_url": "https://your-azure-endpoint.openai.azure.com/v1"
}
{
"provider": "ollama",
"model": "llama3.2",
"base_url": "http://localhost:11434/v1"
}
{
"provider": "vllm",
"model": "meta-llama/Llama-3-70b",
"base_url": "http://localhost:8000/v1"
}
API Type Override
The api_type field forces a specific wire format when a provider uses a non-standard protocol:
{
"provider": "zai",
"model": "glm-5",
"api_type": "anthropic"
}
"openai"β OpenAI Chat Completions format (default for most providers)"anthropic"β Anthropic Messages format (for Anthropic-compatible proxies)
Fallback Chains
Configure a priority-ordered fallback chain. If the primary provider fails, the next provider in the list is tried automatically:
{
"provider": "moonshot",
"model": "kimi-2.5",
"fallback_models": [
{
"provider": "deepseek",
"model": "deepseek-chat",
"api_key_env": "DEEPSEEK_API_KEY"
},
{
"provider": "gemini",
"model": "gemini-2.0-flash",
"api_key_env": "GEMINI_API_KEY"
}
]
}
Failover rules:
- 401/403 (authentication errors) β failover immediately, no retry on the same provider
- 429 (rate limit) / 5xx (server errors) β retry with exponential backoff, then failover
- 400 (content-format errors) β failover if the error contains βmust not be emptyβ, βreasoning_contentβ, βAPI key not validβ, or βinvalid_valueβ
- Timeouts β failover immediately, no retry (donβt waste 120s Γ retries on an unresponsive provider)
- Circuit breaker β 3 consecutive failures marks a provider as degraded
Adaptive Routing
When multiple fallback models are configured, adaptive routing dynamically selects the best provider based on real-time performance metrics instead of following the static priority order. Three mutually exclusive modes are available:
{
"adaptive_routing": {
"mode": "hedge",
"qos_ranking": true,
"latency_threshold_ms": 30000,
"error_rate_threshold": 0.3,
"probe_probability": 0.1,
"probe_interval_secs": 60,
"failure_threshold": 3,
"weight_latency": 0.3,
"weight_error_rate": 0.3,
"weight_priority": 0.2,
"weight_cost": 0.2
}
}
Adaptive Modes
| Mode | Description |
|---|---|
off (default) | Static priority order. Failover only when a provider is circuit-broken (N consecutive failures). No scoring, no racing. |
hedge | Hedged racing: fire each request to 2 providers simultaneously, take the winner, cancel the loser. Both results accumulate QoS metrics. |
lane | Score-based lane changing: dynamically pick the best single provider based on a 4-factor scoring formula. Cheaper than hedge (no duplicate requests). |
QoS Ranking
Setting qos_ranking: true enables quality-of-service ranking using a unified model catalog (model_catalog.json). The catalog provides baseline metrics (stability, latency, output quality) that blend with live traffic data via EMA:
- Cold start: Baseline catalog values are used (10 synthetic samples seeded).
- Warm state: Live metrics gradually replace baselines (weight ramps from 0 to 1 over 10 calls).
- Export: Live catalog is exported to
model_catalog.jsonfor observability.
Scoring Formula
Each provider is scored on 4 factors (lower score = better). All weights are configurable via adaptive_routing:
| Factor | Weight key | Default | Description |
|---|---|---|---|
| Stability | weight_error_rate | 0.3 | Blended baseline + live error rate. EMA blend: weight ramps from 0β1 over 10 calls. |
| Quality | weight_latency | 0.3 | 60% normalized ds_output quality + 40% normalized throughput (output tokens/sec EMA) |
| Priority | weight_priority | 0.2 | Config-order preference (primary = 0). Normalize to [0, 1]. |
| Cost | weight_cost | 0.2 | Normalized output cost per million tokens. Unknown cost β 0 (no penalty). |
Provider Metadata
| Setting | Default | Description |
|---|---|---|
latency_threshold_ms | 30000 | Providers with average latency above this are penalized |
error_rate_threshold | 0.3 | Providers with error rates above 30% are deprioritized |
probe_probability | 0.1 | Fraction of requests sent to non-primary providers as health probes |
probe_interval_secs | 60 | Minimum seconds between probes to the same provider |
failure_threshold | 3 | Consecutive failures before the circuit breaker opens |
Hedge Mode Details
When Hedge is active:
- The primary provider and the cheapest alternate are raced via
tokio::select!. - The winnerβs response is returned; the loser is cancelled.
- Both completed requests record metrics (cancelled requests do not).
- If the primary fails, the alternate is tried sequentially (it was cancelled by the race).
Auto-Escalation
When sustained latency degradation is detected (3 consecutive responses exceeding 3Γ baseline), the session actor auto-activates Hedge mode + Speculative queue. The ResponsivenessObserver learns a median baseline from the first 5 requests (robust to outliers), then adapts every 20 samples via 80/20 EMA blend with the current window median. When the provider recovers (one normal-latency response), both revert to normal.
Provider Wrappers
The routing stack is composed of layered wrappers:
| Wrapper | Purpose |
|---|---|
AdaptiveRouter | Top-level: metrics-driven scoring, Hedge/Lane modes, circuit breaker, probe requests |
ProviderChain | Ordered failover with per-provider circuit breaker (failure count β₯ threshold β degraded) |
FallbackProvider | Primary + QoS-ranked fallbacks with cooldown tracking via ProviderRouter |
RetryProvider | Exponential backoff on 429/5xx. Timeout β no retry (failover instead) |
ProviderRouter | Sub-agent multi-model routing. Prefix-based key resolution, cooldown, QoS-scored fallbacks |
SwappableProvider | Runtime model swap via RwLock (e.g. switch_model tool). Leaks ~50 bytes per swap |
Gateway & Channels
Octos runs as a gateway that bridges messaging platforms to your LLM agent. Each platform connection is called a channel. You can run multiple channels simultaneously β for example, Telegram and Slack in the same gateway process.
Channel Overview
Channels are configured in the gateway.channels array of your config.json. Each entry specifies a type, optional allowed_senders for access control, and platform-specific settings.
Check which channels are compiled and configured:
octos channels status
This shows a table with each channelβs compile status (feature flags) and config summary (environment variables set or missing).
Telegram
Requires a bot token from @BotFather.
export TELEGRAM_BOT_TOKEN="123456:ABC..."
{
"type": "telegram",
"allowed_senders": ["your_user_id"],
"settings": {
"token_env": "TELEGRAM_BOT_TOKEN"
}
}
Telegram supports bot commands, inline keyboards, voice messages, images, and files.
Slack
Requires a Socket Mode app with both a bot token and an app-level token.
export SLACK_BOT_TOKEN="xoxb-..."
export SLACK_APP_TOKEN="xapp-..."
{
"type": "slack",
"settings": {
"bot_token_env": "SLACK_BOT_TOKEN",
"app_token_env": "SLACK_APP_TOKEN"
}
}
Discord
Requires a bot token from the Discord Developer Portal.
export DISCORD_BOT_TOKEN="..."
{
"type": "discord",
"settings": {
"token_env": "DISCORD_BOT_TOKEN"
}
}
Requires a Node.js bridge (Baileys) running at a WebSocket URL.
{
"type": "whatsapp",
"settings": {
"bridge_url": "ws://localhost:3001"
}
}
Feishu (China)
Feishu uses WebSocket long-connection mode by default (no public URL needed).
export FEISHU_APP_ID="cli_..."
export FEISHU_APP_SECRET="..."
{
"type": "feishu",
"settings": {
"app_id_env": "FEISHU_APP_ID",
"app_secret_env": "FEISHU_APP_SECRET"
}
}
Build with the feishu feature flag:
cargo build --release -p octos-cli --features feishu
Lark (International)
Larksuite (international) does not support WebSocket mode. Use webhook mode instead, where Lark pushes events to your server via HTTP POST.
Lark Cloud --> ngrok --> localhost:9321/webhook/event --> Gateway --> LLM
Developer Console Setup
- Go to open.larksuite.com/app and create (or select) an app
- Add Bot capability under Features
- Configure event subscription:
- Events & Callbacks > Event Configuration > Edit subscription method
- Select βSend events to developer serverβ
- Set request URL to
https://YOUR_NGROK_URL/webhook/event
- Add event:
im.message.receive_v1(Receive Message) - Enable permissions:
im:message,im:message:send_as_bot,im:resource - Publish the app: App Release > Version Management > Create Version > Apply for Online Release
Config
export LARK_APP_ID="cli_..."
export LARK_APP_SECRET="..."
{
"type": "lark",
"allowed_senders": [],
"settings": {
"app_id_env": "LARK_APP_ID",
"app_secret_env": "LARK_APP_SECRET",
"region": "global",
"mode": "webhook",
"webhook_port": 9321
}
}
Settings Reference
| Setting | Description | Default |
|---|---|---|
app_id_env | Env var name for App ID | FEISHU_APP_ID |
app_secret_env | Env var name for App Secret | FEISHU_APP_SECRET |
region | "cn" (Feishu) or "global" / "lark" (Larksuite) | "cn" |
mode | "ws" (WebSocket) or "webhook" (HTTP) | "ws" |
webhook_port | Port for webhook HTTP server | 9321 |
encrypt_key | Encrypt Key from Lark console (for AES-256-CBC) | none |
verification_token | Verification Token from Lark console | none |
Encryption (Optional)
If you configure an Encrypt Key in the Lark console (Events & Callbacks > Encryption Strategy), add it to your config:
{
"type": "lark",
"settings": {
"app_id_env": "LARK_APP_ID",
"app_secret_env": "LARK_APP_SECRET",
"region": "global",
"mode": "webhook",
"webhook_port": 9321,
"encrypt_key": "your-encrypt-key-here",
"verification_token": "your-verification-token"
}
}
With encryption enabled, Lark sends encrypted POST bodies. The gateway decrypts using AES-256-CBC with SHA-256 key derivation and validates signatures via the X-Lark-Signature header.
Supported Message Types
Inbound: text, images, files (PDF, docs), audio, video, stickers
Outbound: markdown (via interactive cards), image upload, file upload
Running
# Start ngrok tunnel
ngrok http 9321
# Start gateway
LARK_APP_ID="cli_xxxxx" LARK_APP_SECRET="xxxxx" octos gateway --cwd /path/to/workdir
Troubleshooting
| Issue | Solution |
|---|---|
| 404 on WS endpoint | Larksuite international does not support WebSocket. Use "mode": "webhook" |
| Challenge verification fails | Ensure ngrok is running and the URL matches the Lark console |
| No events received | Publish the app version after adding events. Check Event Log in the console |
| Bot does not reply | Verify im:message:send_as_bot permission is granted |
| Ngrok URL changed | Free ngrok URLs change on restart. Update the request URL in Lark console |
Email (IMAP/SMTP)
Polls an IMAP inbox for inbound messages and replies via SMTP. Feature-gated behind email.
export EMAIL_USERNAME="bot@example.com"
export EMAIL_PASSWORD="app-specific-password"
{
"type": "email",
"allowed_senders": ["trusted@example.com"],
"settings": {
"imap_host": "imap.gmail.com",
"imap_port": 993,
"smtp_host": "smtp.gmail.com",
"smtp_port": 465,
"username_env": "EMAIL_USERNAME",
"password_env": "EMAIL_PASSWORD",
"from_address": "bot@example.com",
"poll_interval_secs": 30,
"max_body_chars": 10000
}
}
WeCom (WeChat Work)
Requires a Custom App with a message callback URL. Feature-gated behind wecom.
export WECOM_CORP_ID="ww..."
export WECOM_AGENT_SECRET="..."
{
"type": "wecom",
"settings": {
"corp_id_env": "WECOM_CORP_ID",
"agent_secret_env": "WECOM_AGENT_SECRET",
"agent_id": "1000002",
"verification_token": "...",
"encoding_aes_key": "...",
"webhook_port": 9322
}
}
WeChat (via WorkBuddy Bridge)
Regular WeChat users can connect to your agent through a WorkBuddy desktop bridge. WorkBuddy handles the WeChat transport; Octos handles the AI logic via its WeCom Bot channel.
WeChat (mobile) --> WorkBuddy (desktop) --> WeCom group robot (WSS) --> octos wecom-bot channel
Setup
-
Create a WeCom group robot in the WeCom Admin Console under Applications > Group Robot. Note the Bot ID and Secret.
-
Configure the
wecom-botchannel:
export WECOM_BOT_SECRET="your_robot_secret_here"
{
"type": "wecom-bot",
"allowed_senders": [],
"settings": {
"bot_id": "YOUR_BOT_ID",
"secret_env": "WECOM_BOT_SECRET"
}
}
- Build and start:
cargo build --release -p octos-cli --features "wecom-bot"
octos gateway
- Install the WorkBuddy desktop client, link it to your WeChat via QR scan, and connect it to the same WeCom group robot.
Connection Details
| Property | Value |
|---|---|
| Protocol | WebSocket (WSS) |
| Endpoint | wss://openws.work.weixin.qq.com |
| Heartbeat | Ping/pong every 30 seconds |
| Auto-reconnect | Yes, exponential backoff (5sβ60s) |
| Max message length | 4096 characters |
| Message format | Markdown |
The wecom-bot channel uses an outbound WebSocket connection β no public URL or port forwarding is required. This makes it suitable for servers behind NAT or firewalls.
Limitations
- Text only β voice and image messages are passed as placeholders
- No message editing β responses are sent as new messages
- One direction β WeChat-to-Octos is automatic; for proactive messages, use cron jobs
Session Control Commands
In any gateway channel, the following commands manage conversation sessions:
| Command | Description |
|---|---|
/new | Create a new session (forks the last 10 messages from the current conversation) |
/new <name> | Create a named session |
/s <name> | Switch to a named session |
/s | Switch to the default session |
/sessions | List all sessions for this chat |
/back | Switch to the previously active session |
/delete | Delete the current session |
Only one session is active at a time per chat. Messages are routed to the active session. Inactive sessions can still run background tasks (deep search, pipelines, etc.). When an inactive session finishes work, you receive a notification β use /s <name> to view the results.
Voice Transcription
Voice and audio messages from channels are automatically transcribed before being sent to the agent. The system tries local ASR first (via the OminiX engine) and falls back to cloud-based Whisper when local ASR is unavailable. The transcription is prepended as [transcription: ...].
# Local ASR (preferred) -- set automatically by octos serve
export OMINIX_API_URL="http://localhost:8080"
# Cloud fallback
export GROQ_API_KEY="gsk_..."
Voice configuration in config.json:
{
"voice": {
"auto_asr": true,
"auto_tts": true,
"default_voice": "vivian",
"asr_language": null
}
}
auto_asrβ automatically transcribe incoming voice/audio messagesauto_ttsβ automatically synthesize voice replies when the user sends voicedefault_voiceβ voice preset for auto-TTSasr_languageβ force a specific language for transcription (null= auto-detect)
Access Control
Use allowed_senders to restrict who can interact with the agent. An empty list allows everyone.
{
"type": "telegram",
"allowed_senders": ["123456", "789012"]
}
Each channel type uses its own sender identifier format (Telegram user IDs, email addresses, WeCom user IDs, etc.).
Cron Jobs
The agent can schedule recurring tasks that deliver messages through any channel:
octos cron list # List active jobs
octos cron list --all # Include disabled jobs
octos cron add --name "report" --message "Generate daily report" --cron "0 0 9 * * * *"
octos cron add --name "check" --message "Check status" --every 3600
octos cron add --name "once" --message "Run migration" --at "2025-03-01T09:00:00Z"
octos cron remove <job-id>
octos cron enable <job-id> # Enable a job
octos cron enable <job-id> --disable # Disable a job
Jobs support an optional timezone field with IANA timezone names (e.g., "America/New_York", "Asia/Shanghai"). When omitted, UTC is used.
Message Coalescing
Long responses are automatically split into channel-safe chunks:
| Channel | Max chars per message |
|---|---|
| Telegram | 4000 |
| Discord | 1900 |
| Slack | 3900 |
Split preference: paragraph boundary > newline > sentence end > space > hard cut.
Config Hot-Reload
The gateway detects config file changes automatically:
- Hot-reloaded (no restart): system prompt, AGENTS.md, SOUL.md, USER.md
- Restart required: provider, model, API keys, channel settings
Changes are detected via SHA-256 hashing with debounce.
Memory & Skills
Octos has a layered memory system and an extensible skill framework. Memory gives the agent persistent context across sessions. Skills give the agent new tools and capabilities.
Bootstrap Files
These files are loaded into the system prompt at startup. Create them with octos init.
| File | Purpose |
|---|---|
.octos/AGENTS.md | Agent instructions and guidelines |
.octos/SOUL.md | Personality and values |
.octos/USER.md | User information and preferences |
.octos/TOOLS.md | Tool-specific guidance |
.octos/IDENTITY.md | Custom identity definition |
Bootstrap files are hot-reloaded β edit them and the agent picks up changes without a restart.
Memory System
Octos uses a 3-layer memory architecture that combines automatic recording with agent-driven knowledge management:
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β System Prompt (every turn) β
β β
β 1. Episodic Memory βββ top 6 relevant past task experiences β
β 2. Memory Context βββ MEMORY.md + recent 7 days daily notes β
β 3. Entity Bank βββ one-line abstracts of all known entities β
β β
β Tools: save_memory / recall_memory (entity bank CRUD) β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Layer 1: Episodic Memory (automatic)
Every completed task is automatically recorded as an episode in episodes.redb, a persistent embedded database. Each episode stores:
- Summary β LLM-generated, truncated to 500 chars
- Outcome β Success, Failure, Blocked, or Cancelled
- Files modified β list of file paths touched during the task
- Key decisions β notable choices made during execution
- Working directory β scope for directory-scoped retrieval
At the start of each new task, the agent queries the episode store for up to 6 relevant past experiences using:
- Hybrid search (default when embedding is configured): combines BM25 keyword matching (30% weight) with HNSW vector similarity (70% weight)
- Keyword search (fallback when no embedder): matches query terms against episode summaries, scoped to the same working directory
Embedding configuration (in config.json):
{
"embedding": {
"provider": "openai",
"api_key_env": "OPENAI_API_KEY",
"base_url": null
}
}
When configured, the agent embeds each episode summary in a fire-and-forget background task and stores the vector alongside the episode. At query time, the task instruction is embedded and used for vector search. When omitted, the system falls back to BM25-only keyword matching.
Layer 2: Long-Term Memory & Daily Notes (file-based)
Long-term memory (.octos/memory/MEMORY.md) holds persistent facts and notes that survive across all sessions. Edit this file manually or via the write_file tool β it is injected verbatim into the system prompt on every turn.
Daily notes (.octos/memory/YYYY-MM-DD.md) provide a rolling window of recent activity. The last 7 days of daily notes are automatically included in the agentβs context. These files can be created manually or via the write_file tool.
Note: Daily notes are read by the system prompt builder but are not auto-populated. You can populate them manually or instruct the agent to write to them using
write_file.
Layer 3: Entity Bank (tool-driven)
The entity bank is a structured knowledge store at .octos/memory/bank/entities/. Each entity is a markdown file containing everything the agent knows about a specific topic.
How it works:
- Abstracts in prompt β The first non-heading line of each entity becomes a one-line abstract. All abstracts are injected into the system prompt, giving the agent a compact index of everything it knows.
- Full pages on demand β The agent uses the
recall_memorytool to load the full content of a specific entity when it needs more detail. - Agent-managed β The agent decides when to create and update entities using the
save_memorytool.
Memory tools:
save_memoryβ Create or update an entity page. The agent is instructed to firstrecall_memoryfor existing content, then merge new information before saving (no data loss).recall_memoryβ Load the full content of a named entity. If the entity doesnβt exist, returns a list of all available entities.
Auto-deferral: When the total tool count exceeds 15, memory tools are moved to the
group:memorydeferred group. The agent must useactivate_toolsto enable them before saving or recalling.
File Layout
.octos/
βββ config.json # Configuration (versioned, auto-migrated)
βββ cron.json # Cron job store
βββ AGENTS.md # Agent instructions
βββ SOUL.md # Personality
βββ USER.md # User info
βββ HEARTBEAT.md # Background tasks
βββ sessions/ # Chat history (JSONL)
βββ memory/ # Memory files
β βββ MEMORY.md # Long-term memory (manual or write_file)
β βββ 2025-02-10.md # Daily note (manual or write_file)
β βββ bank/
β βββ entities/ # Entity bank (managed by save/recall tools)
β βββ yuechen.md # Entity: "who is the user"
β βββ octos.md # Entity: "what is this project"
βββ skills/ # Custom skills
βββ episodes.redb # Episodic memory DB (auto-populated)
βββ history/
βββ chat_history # Readline history
Built-in System Skills
Octos bundles 3 system skills at compile time:
| Skill | Description |
|---|---|
cron | Cron tool usage examples (always-on) |
skill-store | Skill installation and management |
skill-creator | Guide for creating custom skills |
Workspace skills in .octos/skills/ override built-in skills with the same name.
Bundled App Skills
Eight app skills ship as compiled binaries alongside Octos. They are automatically bootstrapped into .octos/skills/ on gateway startup β no installation required.
News Fetch
Tool: news_fetch | Always active: Yes
Fetches headlines and full article content from Google News RSS, Hacker News API, Yahoo News, Substack, and Medium. The agent synthesizes raw data into a formatted digest.
Parameters:
| Parameter | Type | Default | Description |
|---|---|---|---|
categories | array | all | News categories to fetch |
language | "zh" / "en" | "zh" | Output language |
Categories: politics, world, business, technology, science, entertainment, health, sports
Configuration:
/config set news_digest.language en
/config set news_digest.hn_top_stories 50
/config set news_digest.max_deep_fetch_total 30
Deep Search
Tool: deep_search | Timeout: 600 seconds
Multi-round web research tool. Performs iterative searches, parallel page crawling, reference chasing, and generates structured reports saved to ./research/<query-slug>/.
| Parameter | Type | Default | Description |
|---|---|---|---|
query | string | (required) | Research topic or question |
depth | 1β3 | 2 | Research depth level |
max_results | 1β10 | 8 | Results per search round |
search_engine | string | auto | perplexity, duckduckgo, brave, you |
Depth levels:
- 1 (Quick): single search round, ~1 minute, up to 10 pages
- 2 (Standard): 3 search rounds + reference chasing, ~3 minutes, up to 30 pages
- 3 (Thorough): 5 search rounds + aggressive link chasing, ~5 minutes, up to 50 pages
Deep Crawl
Tool: deep_crawl | Requires: Chrome/Chromium in PATH
Recursively crawls a website using headless Chrome via CDP. Renders JavaScript, follows same-origin links via BFS, extracts clean text.
| Parameter | Type | Default | Description |
|---|---|---|---|
url | string | (required) | Starting URL |
max_depth | 1β10 | 3 | Maximum link-following depth |
max_pages | 1β200 | 50 | Maximum pages to crawl |
path_prefix | string | none | Only follow links under this path |
Output is saved to crawl-<hostname>/ with numbered markdown files.
Configuration:
/config set deep_crawl.page_settle_ms 5000
/config set deep_crawl.max_output_chars 100000
Send Email
Tool: send_email
Sends emails via SMTP or Feishu/Lark Mail API (auto-detected from available environment variables).
| Parameter | Type | Default | Description |
|---|---|---|---|
to | string | (required) | Recipient email address |
subject | string | (required) | Email subject |
body | string | (required) | Email body (plain text or HTML) |
html | boolean | false | Treat body as HTML |
attachments | array | none | File attachments (SMTP only) |
SMTP environment variables:
export SMTP_HOST="smtp.gmail.com"
export SMTP_PORT="465"
export SMTP_USERNAME="your-email@gmail.com"
export SMTP_PASSWORD="your-app-password"
export SMTP_FROM="your-email@gmail.com"
Weather
Tools: get_weather, get_forecast | API: Open-Meteo (free, no key required)
| Parameter | Type | Default | Description |
|---|---|---|---|
city | string | (required) | City name in English |
days | 1β16 | 7 | Forecast days (forecast only) |
Clock
Tool: get_time
Returns current date, time, day of week, and UTC offset for any IANA timezone.
| Parameter | Type | Default | Description |
|---|---|---|---|
timezone | string | server local | IANA timezone name (e.g., Asia/Shanghai, US/Eastern) |
Account Manager
Tool: manage_account
Manages sub-accounts under the current profile. Actions: list, create, update, delete, info, start, stop, restart.
Platform Skills (ASR/TTS)
Platform skills provide on-device voice transcription and synthesis. They require the OminiX backend running on Apple Silicon (M1/M2/M3/M4).
Voice Transcription
Tool: voice_transcribe
| Parameter | Type | Default | Description |
|---|---|---|---|
audio_path | string | (required) | Path to audio file (WAV, OGG, MP3, FLAC, M4A) |
language | string | "Chinese" | "Chinese", "English", "Japanese", "Korean", "Cantonese" |
Voice Synthesis
Tool: voice_synthesize
| Parameter | Type | Default | Description |
|---|---|---|---|
text | string | (required) | Text to synthesize |
output_path | string | auto | Output file path |
language | string | "chinese" | "chinese", "english", "japanese", "korean" |
speaker | string | "vivian" | Voice preset |
Available voices: vivian, serena, ryan, aiden, eric, dylan (EN/ZH), uncle_fu (ZH only), ono_anna (JA), sohee (KO)
Voice Cloning
Tool: voice_clone_synthesize
Synthesizes speech using a cloned voice from a 3β10 second reference audio sample.
| Parameter | Type | Default | Description |
|---|---|---|---|
text | string | (required) | Text to synthesize |
reference_audio | string | (required) | Path to reference audio |
language | string | "chinese" | Target language |
Podcast Generation
Tool: generate_podcast
Creates multi-speaker podcast audio from a script of {speaker, voice, text} objects.
Custom Skill Installation
Installing from GitHub
# Install all skills from a repo
octos skills install user/repo
# Install a specific skill
octos skills install user/repo/skill-name
# Install from a specific branch
octos skills install user/repo --branch develop
# Force overwrite existing
octos skills install user/repo --force
# Install into a specific profile
octos skills install user/repo --profile my-bot
The installer tries to download a pre-built binary from the skill registry (SHA-256 verified), falls back to cargo build --release if a Cargo.toml is present, or runs npm install if a package.json is present.
Managing Skills
octos skills list # List installed skills
octos skills info skill-name # Show detailed info
octos skills update skill-name # Update a specific skill
octos skills update all # Update all skills
octos skills remove skill-name # Remove a skill
octos skills search "web scraping" # Search the online registry
Skill Resolution Order
Skills are loaded from these directories (highest priority first):
.octos/plugins/(legacy).octos/skills/(user-installed custom skills).octos/bundled-app-skills/(bundled app skills).octos/platform-skills/(platform: ASR/TTS)~/.octos/plugins/(global legacy)~/.octos/skills/(global custom)
User-installed skills override bundled skills with the same name.
Skill Authoring
A custom skill lives in .octos/skills/<name>/ and contains:
.octos/skills/my-skill/
βββ SKILL.md # Required: instructions + frontmatter
βββ manifest.json # Required for tool skills: tool definitions
βββ main # Compiled binary (or script)
βββ .source # Auto-generated: tracks install source
SKILL.md Format
---
name: my-skill
version: 1.0.0
author: Your Name
description: A brief description of what this skill does
always: false
requires_bins: curl,jq
requires_env: MY_API_KEY
---
# My Skill Instructions
Instructions for the agent on how and when to use this skill.
## When to Use
- Use this skill when the user asks about...
## Tool Usage
The `my_tool` tool accepts:
- `query` (required): The search query
- `limit` (optional): Maximum results (default: 10)
Frontmatter fields:
| Field | Description |
|---|---|
name | Skill identifier (must match directory name) |
version | Semantic version |
author | Skill author |
description | Short description |
always | If true, included in every system prompt. If false, available on demand. |
requires_bins | Comma-separated binaries checked via which. Skill is unavailable if any are missing. |
requires_env | Comma-separated environment variables. Skill is unavailable if any are unset. |
manifest.json Format
For skills that provide executable tools:
{
"name": "my-skill",
"version": "1.0.0",
"description": "My custom skill",
"tools": [
{
"name": "my_tool",
"description": "Does something useful",
"timeout_secs": 60,
"input_schema": {
"type": "object",
"properties": {
"query": {
"type": "string",
"description": "The search query"
},
"limit": {
"type": "integer",
"description": "Maximum results",
"default": 10
}
},
"required": ["query"]
}
}
],
"entrypoint": "main"
}
The tool binary receives JSON input on stdin and must output JSON on stdout:
// Input (stdin)
{"query": "test", "limit": 5}
// Output (stdout)
{"output": "Results here...", "success": true}
Advanced Features
This chapter covers power-user features: tool management, queue modes, lifecycle hooks, sandboxing, session management, and the web dashboard.
Tools & LRU Deferral
Octos manages a large tool catalog by splitting tools into active and deferred sets. Active tools are sent to the LLM as callable tool specifications. Deferred tools are listed by name in the system prompt but not sent as full specs until needed.
How It Works
- Base tools (never evicted):
read_file,write_file,shell,glob,grep,list_dir,run_pipeline,deep_search, and others. - Dynamic tools: tools like
save_memory,web_search,recall_memorythat are activated on demand and evicted when idle. - Deferred tools:
browser,manage_skills,spawn,configure_tool,switch_model, and others listed by name only.
Eviction Rules
When the active tool count exceeds 15:
- Tools idle for 5+ agent iterations that are not in the base set become candidates.
- The stalest tool is moved to the deferred list first.
Re-activation
When the LLM needs a deferred tool, it calls activate_tools({"tools": [...]}). This resolves the tool name to its group and activates the entire group.
Tool Configuration
Tools can be configured at runtime using the /config slash command. Settings persist in {data_dir}/tool_config.json.
| Tool | Setting | Type | Default | Description |
|---|---|---|---|---|
news_digest | language | "zh" / "en" | "zh" | Output language for news digests |
news_digest | hn_top_stories | 5-100 | 30 | Hacker News stories to fetch |
news_digest | max_rss_items | 5-100 | 30 | Items per RSS feed |
news_digest | max_deep_fetch_total | 1-50 | 20 | Total articles to deep-fetch |
news_digest | max_source_chars | 1000-50000 | 12000 | Per-source HTML char limit |
news_digest | max_article_chars | 1000-50000 | 8000 | Per-article content limit |
deep_crawl | page_settle_ms | 500-10000 | 3000 | JS render wait time (ms) |
deep_crawl | max_output_chars | 10000-200000 | 50000 | Output truncation limit |
web_search | count | 1-10 | 5 | Default number of search results |
web_fetch | extract_mode | "markdown" / "text" | "markdown" | Content extraction format |
web_fetch | max_chars | 1000-200000 | 50000 | Content size limit |
browser | action_timeout_secs | 30-600 | 300 | Per-action timeout |
browser | idle_timeout_secs | 60-600 | 300 | Idle session timeout |
In-chat config commands:
/config # Show all tool settings
/config web_search # Show web_search settings
/config set web_search.count 10 # Set default result count to 10
/config set news_digest.language en # Switch news digests to English
/config reset web_search.count # Reset to default
Priority order (highest first):
- Explicit per-call arguments (tool invocation parameters)
/configoverrides (stored intool_config.json)- Hardcoded defaults
Tool Policies
Tool policies control which tools the agent can use. They can be set globally, per-provider, or per-context.
Global Policy
{
"tool_policy": {
"allow": ["group:fs", "group:search", "web_search"],
"deny": ["shell", "spawn"]
}
}
allowβ If non-empty, only these tools are permitted. If empty, all tools are allowed.denyβ These tools are always blocked. Deny wins over allow.
Named Groups
| Group | Expands To |
|---|---|
group:fs | read_file, write_file, edit_file, diff_edit |
group:runtime | shell |
group:web | web_search, web_fetch, browser |
group:search | glob, grep, list_dir |
group:sessions | spawn |
Additional tools not in named groups: send_file, switch_model, run_pipeline, configure_tool, cron, message.
Wildcard Matching
Suffix * matches prefixes:
{
"tool_policy": {
"deny": ["web_*"]
}
}
This denies web_search, web_fetch, etc.
Per-Provider Policies
Different tool sets for different LLM models:
{
"tool_policy_by_provider": {
"openai/gpt-4o-mini": {
"deny": ["shell", "write_file"]
},
"gemini": {
"deny": ["diff_edit"]
}
}
}
Queue Modes
Queue modes control how incoming user messages are handled while the agent is busy processing a previous request. Set via /queue <mode> in chat, or queue_mode in profile config.
Followup (default)
Sequential processing. Each message waits its turn.
- Agent processes A, finishes, processes B, finishes, processes C.
- Simple and predictable.
- The user is blocked until the current request completes.
Collect
Batch queued messages into a single combined prompt.
- Agent processes A. User sends B, then C.
- When A finishes, B and C are merged into one prompt:
B\n---\nQueued #1: C - One LLM call for the batch.
- Good for users who send thoughts in multiple short messages (common in chat apps).
Steer
Keep only the newest queued message, discard older ones.
- Agent processes A. User sends B, then C.
- When A finishes, B is discarded; only C is processed.
- Good when the user corrects or refines their question mid-flight.
- Example: βsearch for Xβ then βactually search for Yβ β only Y is processed.
Interrupt
Keep only the newest queued message and cancel the running agent.
- Agent processes A. User sends B, then C.
- A is cancelled, B is discarded, C is processed immediately.
- Fastest response to course-correction.
- Use when responsiveness matters more than completing the current task.
Note: Currently, Interrupt and Steer share the same drain-and-discard behavior. There is no in-flight agent cancellation β the running agent completes before the newest message is processed. True mid-flight cancellation is planned.
Speculative
Spawn concurrent overflow agents for each new message while the primary runs.
- Agent processes A. User sends B, then C.
- B and C each get their own concurrent agent task (overflow).
- All three run in parallel β no blocking.
- Best for slow LLM providers where users do not want to wait.
- Overflow agents use a snapshot of conversation history from before the primary started.
How overflow works
- Primary agent is spawned for the first message.
- While the primary runs, new messages arrive in the inbox.
- Each new message triggers
serve_overflow(), spawning a full agent task with its own streaming bubble. - Overflow agents use the history snapshot from before the primary to avoid re-answering the primary question.
- All agents run concurrently and save results to session history.
Known limitations
- Interactive prompts break in overflow: If the LLM asks a follow-up question and returns EndTurn, the overflow agent exits. The userβs reply spawns a new overflow with no context of the question.
- Short replies misrouted: A βyesβ or β2β intended as a continuation may be treated as an independent new query.
Auto-Escalation
The session actor can auto-escalate from Followup to Speculative when sustained latency degradation is detected:
ResponsivenessObserverlearns a median baseline from the first 5 requests (robust to outliers), then tracks LLM response times in a 20-sample rolling window. The baseline adapts every 20 samples via 80/20 EMA blend with the current window median, so gradual drift is tracked.- If 3 consecutive responses exceed 3Γ baseline latency, Speculative queue mode and Hedge racing are auto-activated simultaneously.
- A user notification is sent: βDetected slow responses. Enabling hedge racing + speculative queue.β
- When the provider recovers (one normal-latency response), both revert to Followup and static routing.
- Auto-escalation also triggers on API channel (web client), which always uses the speculative processing path.
Queue Commands
/queue -- show current mode
/queue followup -- sequential processing
/queue collect -- batch queued messages
/queue steer -- keep newest only
/queue interrupt -- cancel current + keep newest
/queue speculative -- concurrent overflow agents
Hooks
Hooks are the primary extension point for enforcing LLM policies, recording metrics, and auditing agent behavior β per profile, without modifying core code.
Hooks are shell commands that run at agent lifecycle events. Each hook receives a JSON payload on stdin and communicates its decision via exit code.
Exit Codes
| Exit Code | Meaning | Before-events | After-events |
|---|---|---|---|
| 0 | Allow | Operation proceeds | Success logged |
| 1 | Deny | Operation blocked (reason on stdout) | Treated as error |
| 2+ | Error | Logged, operation proceeds | Logged |
Events
Four lifecycle events, each with a specific payload:
before_tool_call
Fires before each tool execution. Can deny (exit 1).
{
"event": "before_tool_call",
"tool_name": "shell",
"arguments": {"command": "ls -la"},
"tool_id": "call_abc123",
"session_id": "telegram:12345",
"profile_id": "my-bot"
}
after_tool_call
Fires after each tool execution. Observe-only.
{
"event": "after_tool_call",
"tool_name": "shell",
"tool_id": "call_abc123",
"result": "file1.txt\nfile2.txt\n...",
"success": true,
"duration_ms": 142,
"session_id": "telegram:12345",
"profile_id": "my-bot"
}
Note: result is truncated to 500 characters.
before_llm_call
Fires before each LLM API call. Can deny (exit 1).
{
"event": "before_llm_call",
"model": "deepseek-chat",
"message_count": 12,
"iteration": 3,
"session_id": "telegram:12345",
"profile_id": "my-bot"
}
after_llm_call
Fires after each successful LLM response. Observe-only.
{
"event": "after_llm_call",
"model": "deepseek-chat",
"iteration": 3,
"stop_reason": "EndTurn",
"has_tool_calls": false,
"input_tokens": 1200,
"output_tokens": 350,
"provider_name": "deepseek",
"latency_ms": 2340,
"cumulative_input_tokens": 5600,
"cumulative_output_tokens": 1800,
"session_cost": 0.0042,
"response_cost": 0.0012,
"session_id": "telegram:12345",
"profile_id": "my-bot"
}
Hook Configuration
In config.json or per-profile JSON:
{
"hooks": [
{
"event": "before_tool_call",
"command": ["python3", "~/.octos/hooks/guard.py"],
"timeout_ms": 3000,
"tool_filter": ["shell", "write_file"]
},
{
"event": "after_llm_call",
"command": ["python3", "~/.octos/hooks/cost-tracker.py"],
"timeout_ms": 5000
}
]
}
| Field | Required | Default | Description |
|---|---|---|---|
event | yes | β | One of the 4 event types |
command | yes | β | Argv array (no shell interpretation) |
timeout_ms | no | 5000 | Kill hook process after this timeout |
tool_filter | no | all | Only trigger for these tool names (tool events only) |
Multiple hooks can be registered for the same event. They run sequentially; the first deny wins.
Circuit Breaker
Hooks are auto-disabled after 3 consecutive failures (timeout, crash, or exit code 2+). A successful execution (exit 0 or deny exit 1) resets the counter.
Security
- Commands use argv arrays β no shell interpretation.
- 18 dangerous environment variables are removed (
LD_PRELOAD,DYLD_*,NODE_OPTIONS, etc.). - Tilde expansion is supported (
~/and~username/).
Per-Profile Hooks
Each profile can define its own hooks via the hooks field in profile config. This allows different policy enforcement per channel or bot. Hook changes require a gateway restart.
Backward Compatibility
- New fields may be added to payloads.
- Existing fields will never be removed or renamed.
- Hook scripts should ignore unknown fields (standard JSON practice).
Example: Cost Budget Enforcer
#!/usr/bin/env python3
"""Deny LLM calls when session cost exceeds $1.00."""
import json, sys
payload = json.load(sys.stdin)
if payload.get("event") == "before_llm_call":
try:
with open("/tmp/octos-cost.json") as f:
state = json.load(f)
except FileNotFoundError:
state = {}
sid = payload.get("session_id", "default")
if state.get(sid, 0) > 1.0:
print(f"Session cost exceeded $1.00 (${state[sid]:.4f})")
sys.exit(1)
elif payload.get("event") == "after_llm_call":
cost = payload.get("session_cost")
if cost is not None:
sid = payload.get("session_id", "default")
try:
with open("/tmp/octos-cost.json") as f:
state = json.load(f)
except FileNotFoundError:
state = {}
state[sid] = cost
with open("/tmp/octos-cost.json", "w") as f:
json.dump(state, f)
sys.exit(0)
Example: Audit Logger
#!/usr/bin/env python3
"""Log all tool and LLM calls to a JSONL file."""
import json, sys, datetime
payload = json.load(sys.stdin)
payload["timestamp"] = datetime.datetime.utcnow().isoformat()
with open("/var/log/octos-audit.jsonl", "a") as f:
f.write(json.dumps(payload) + "\n")
sys.exit(0)
Sandbox
Shell commands run inside a sandbox for isolation. Three backends are supported:
| Backend | Platform | Isolation | Network Control |
|---|---|---|---|
| bwrap | Linux | RO bind /usr,/lib,/bin,/sbin,/etc; RW bind workdir; tmpfs /tmp; unshare-pid | --unshare-net if network denied |
| macOS | macOS | sandbox-exec with SBPL profile: process-exec/fork, file-read*, writes to workdir + /private/tmp | (allow network*) or (deny network*) |
| Docker | Any | --rm --security-opt no-new-privileges --cap-drop ALL | --network none if network denied |
Configure in config.json:
{
"sandbox": {
"enabled": true,
"mode": "auto",
"allow_network": false,
"docker": {
"image": "alpine:3.21",
"mount_mode": "rw",
"cpu_limit": "1.0",
"memory_limit": "512m",
"pids_limit": 100
}
}
}
- Modes:
auto(detect best available),bwrap,macos,docker,none. - Mount modes:
rw(read-write),ro(read-only),none(no workspace mount). - Docker resource limits:
--cpus,--memory,--pids-limit. - Docker bind mount safety:
docker.sock,/proc,/sys,/dev, and/etcare blocked as bind mount sources. - Path validation: Docker rejects
:,\0,\n,\r; macOS rejects control chars,(,),\,". - Environment sanitization: 18 dangerous environment variables are automatically cleared in all sandbox backends, MCP server spawning, hooks, and the browser tool:
LD_PRELOAD, LD_LIBRARY_PATH, LD_AUDIT, DYLD_INSERT_LIBRARIES, DYLD_LIBRARY_PATH, DYLD_FRAMEWORK_PATH, DYLD_FALLBACK_LIBRARY_PATH, DYLD_VERSIONED_LIBRARY_PATH, NODE_OPTIONS, PYTHONSTARTUP, PYTHONPATH, PERL5OPT, RUBYOPT, RUBYLIB, JAVA_TOOL_OPTIONS, BASH_ENV, ENV, ZDOTDIR. - Process cleanup: Shell tool sends SIGTERM, waits grace period, then SIGKILL to child processes on timeout.
Session Management
Session Forking
Send /new to create a branched conversation:
/new
This creates a new session that copies the last 10 messages from the current conversation. The child session has a parent_key reference to the original. Each fork gets a unique key namespaced by sender and timestamp.
Session Persistence
Each channel:chat_id pair maintains its own session (conversation history).
- Storage: JSONL files in
.octos/sessions/ - Max history: Configurable via
gateway.max_history(default: 50 messages) - Session forking:
/newcreates a branched conversation with parent_key tracking
Config Hot-Reload
The gateway automatically detects config file changes:
- Hot-reloaded (no restart): system prompt, AGENTS.md, SOUL.md, USER.md
- Restart required: provider, model, API keys, gateway channels
Changes are detected via SHA-256 hashing with debounce.
Message Coalescing
Long responses are automatically split into channel-safe chunks before sending:
| Channel | Max chars per message |
|---|---|
| Telegram | 4000 |
| Discord | 1900 |
| Slack | 3900 |
Split preference: paragraph boundary > newline > sentence end > space > hard cut. Messages exceeding 50 chunks are truncated with a marker.
Context Compaction
When the conversation exceeds the LLMβs context window, older messages are automatically compacted:
- Tool arguments are stripped (replaced with
"[stripped]") - Messages are summarized to first lines
- Recent tool call/result pairs are preserved intact
- The agent continues seamlessly without losing critical context
In-Chat Commands
Slash Commands
| Command | Description |
|---|---|
/new | Fork the conversation (creates a new session copying the last 10 messages) |
/config | View and modify tool configuration |
/queue | View or change queue mode |
/exit, /quit, :q | Exit chat (CLI mode only) |
In-Chat Provider Switching
The switch_model tool allows users to list available LLM providers and switch models at runtime through natural conversation. This tool is only available in gateway mode.
List available providers:
User: What models are available?
Bot: Current model: deepseek/deepseek-chat
Available providers:
- anthropic (default: claude-sonnet-4-20250514) [ready]
- openai (default: gpt-4o) [ready]
- deepseek (default: deepseek-chat) [ready]
- gemini (default: gemini-2.0-flash) [ready]
...
Switch models:
User: Switch to GPT-4o
Bot: Switched to openai/gpt-4o.
Previous model (deepseek/deepseek-chat) is kept as fallback.
When you switch models, the previous model automatically becomes a fallback:
- If the new model fails (rate limit, server error), requests automatically fall back to the original model.
- The fallback uses the circuit breaker (3 consecutive failures triggers failover).
- The chain is always flat:
[new_model, original_model]β repeated switches do not nest.
Model switches are persisted to the profile JSON file. On gateway restart, the bot starts with the last-selected model.
Memory System
The agent maintains long-term memory across sessions:
MEMORY.mdβ Persistent notes, always loaded into context- Daily notes β
.octos/memory/YYYY-MM-DD.md, auto-created - Recent memory β Last 7 days of daily notes included in context
- Episodes β Task completion summaries stored in
episodes.redb
Hybrid Memory Search
Memory search combines BM25 (keyword) and vector (semantic) scoring:
- Ranking:
vector_weight * vector_score + bm25_weight * bm25_score(defaults: 0.7 / 0.3) - Index: HNSW with L2-normalized embeddings
- Fallback: BM25-only when no embedding provider is configured
Configure an embedding provider to enable vector search:
{
"embedding": {
"provider": "openai"
}
}
The embedding config supports three fields: provider (default: "openai"), api_key_env (optional override), and base_url (optional custom endpoint).
Cron Jobs (Scheduled Tasks)
The agent can schedule recurring tasks using the cron tool:
User: Schedule a daily news digest at 8am Beijing time
Bot: Created cron job "daily-news" running at 8:00 AM Asia/Shanghai every day.
Expression: 0 0 8 * * * *
Cron jobs can also be managed via CLI:
octos cron list # List active jobs
octos cron list --all # Include disabled
octos cron add --name "report" --message "Generate daily report" --cron "0 0 9 * * * *"
octos cron add --name "check" --message "Check status" --every 3600
octos cron remove <job-id>
octos cron enable <job-id>
octos cron enable <job-id> --disable
Web Dashboard
The REST API server includes an embedded web UI:
octos serve # Binds to 127.0.0.1:8080
octos serve --host 0.0.0.0 --port 3000 # Accept external connections
# Open http://localhost:8080
Features:
- Session sidebar
- Chat interface
- SSE streaming
- Dark theme
A /metrics endpoint provides Prometheus-format metrics:
octos_tool_calls_totaloctos_tool_call_duration_secondsoctos_llm_tokens_total
Operations
This chapter covers day-to-day operational tasks: upgrading, credential management, and service management.
Upgrading
Pull the latest source and rebuild:
cd octos
git pull origin main
./scripts/local-deploy.sh --full # Rebuilds and reinstalls
If running as a service, restart it after the upgrade:
# macOS (launchd):
launchctl unload ~/Library/LaunchAgents/io.octos.octos-serve.plist
launchctl load ~/Library/LaunchAgents/io.octos.octos-serve.plist
# Linux (systemd):
systemctl --user restart octos-serve
Keychain Integration
Octos supports storing API keys in the macOS Keychain instead of plaintext in profile JSON files. This provides hardware-backed encryption on Apple Silicon and OS-level access control.
Architecture
+------------------------------+
octos auth set-key | macOS Keychain |
-----------------> | (AES encrypted, per-user) |
| |
| service: "octos" |
| account: "OPENAI_API_KEY" |
| password: "sk-proj-abc..." |
+---------------+--------------+
| get_password()
Profile JSON |
+------------------+ v
| env_vars: { | resolve_env_vars()
| "OPENAI_API_ | if "keychain:" ->
| KEY": | lookup from Keychain
| "keychain:" | else -> use literal
| } |
+------------------+ |
v
Gateway process
Resolution chain: "keychain:" marker in profile config triggers a Keychain lookup (3-second timeout). If the Keychain is unavailable, the key is skipped with a warning.
Backward compatible: Literal values in env_vars pass through unchanged. No migration is required β adopt keychain per-key at your own pace. Mixed plaintext and keychain entries are fully supported.
CLI Commands
# Unlock keychain for SSH sessions (required before set-key via SSH)
octos auth unlock --password <login-password>
octos auth unlock # interactive prompt
# Store a key in Keychain + update profile to use keychain marker
octos auth set-key OPENAI_API_KEY sk-proj-abc123
octos auth set-key OPENAI_API_KEY # interactive prompt
# With specific profile
octos auth set-key GEMINI_API_KEY AIzaSy... -p my-profile
# List all keys and their storage status
octos auth keys
octos auth keys -p my-profile
# Remove from Keychain + clean up profile
octos auth remove-key OPENAI_API_KEY
Keychain Entry Format
- Service:
octos(constant for all entries) - Account: The environment variable name (e.g.,
OPENAI_API_KEY) - Password: The actual secret value
Verify with:
security find-generic-password -s octos -a OPENAI_API_KEY -w
SSH and Headless Server Setup
The macOS Keychain is tied to the GUI login session. SSH sessions cannot access a locked keychain β macOS tries to show a dialog, which hangs on a headless server.
Why SSH fails by default: macOS securityd unlocks the keychain per-session. The GUI sessionβs unlock does not automatically propagate to SSH sessions.
Solution: Unlock the keychain and disable auto-lock. Run once per boot (or add to your deploy script):
ssh user@<host>
# Unlock the keychain (requires login password)
octos auth unlock --password <login-password>
# That's it -- auto-lock is disabled automatically.
# The keychain stays unlocked until reboot.
# Auto-login will re-unlock it on reboot.
Or with raw security commands:
# Unlock
security unlock-keychain -p '<password>' ~/Library/Keychains/login.keychain-db
# Disable auto-lock timer (so it doesn't re-lock after idle)
security set-keychain-settings ~/Library/Keychains/login.keychain-db
Common issues:
| Symptom | Cause | Fix |
|---|---|---|
| βUser interaction is not allowedβ | Keychain locked (SSH session) | octos auth unlock --password <pw> |
| Keychain lookup timed out (3s) | Keychain locked (LaunchAgent) | Enable auto-login, reboot |
| βkeychain marker found but no secretβ | Key never stored or wrong keychain | Re-run octos auth set-key after unlock |
| Gateway hangs at startup | Keychain lookup blocking | Update to latest octos binary |
Security Comparison
| Threat | Plaintext JSON | Keychain |
|---|---|---|
| File stolen (backup, git, scp) | All keys exposed | Only "keychain:" markers visible |
| Malware reads disk | Simple file read exposes keys | Must bypass OS Keychain ACL |
| Other user on machine | File permissions help, root can read | Encrypted per-user |
| Process memory dump | Keys in env vars | Keys only briefly in memory |
| Accidental log output | Profile JSON leaks keys | Only reference strings logged |
Server Deployment Recommendations
The macOS Keychain was designed for interactive desktop use. On headless servers, it introduces reliability issues. Choose your credential storage based on deployment type:
| Deployment | Recommended Storage | Reason |
|---|---|---|
| Developer laptop | Keychain ("keychain:") | GUI session keeps keychain unlocked; ACL prompts are fine |
| Mac with auto-login + GUI | Keychain ("keychain:") | Works if ACL dialogs were approved once via screen sharing |
| Headless Mac (SSH only) | Plain text in env_vars or launchd plist | Most reliable; no unlock/ACL dependencies |
| Linux server | Plain text in env vars | No macOS Keychain available |
Why Keychain is unreliable on headless servers:
- Requires the macOS login password β To unlock the keychain via SSH, you need the userβs login password stored somewhere, reducing the security benefit.
- Re-locks on reboot/sleep β The LaunchAgent that starts
octos serveruns before GUI login, so the keychain is locked at that point. - Re-locks after idle timeout β Even after unlock, macOS may re-lock. The
set-keychain-settingsworkaround can be reset by macOS updates. - ACL prompts block headless access β If the binary was not the one that originally stored the secret, macOS may pop an unanswerable GUI dialog.
- Session isolation β Unlocking from SSH does not unlock for the LaunchAgent session, and vice versa.
Plain text setup for servers:
{
"env_vars": {
"OPENAI_API_KEY": "sk-proj-abc123",
"SMTP_PASSWORD": "xxxx xxxx xxxx xxxx",
"SMTP_HOST": "smtp.gmail.com",
"SMTP_PORT": "587",
"SMTP_USERNAME": "user@gmail.com",
"SMTP_FROM": "user@gmail.com"
}
}
Protect the files with filesystem permissions:
chmod 600 ~/.octos/profiles/*.json
chmod 600 ~/Library/LaunchAgents/io.octos.octos-serve.plist
Service Management
macOS (launchd)
Create a LaunchAgent plist to run octos as a persistent service:
# Load the service
launchctl load ~/Library/LaunchAgents/io.octos.octos-serve.plist
# Unload the service
launchctl unload ~/Library/LaunchAgents/io.octos.octos-serve.plist
# Check status
launchctl list | grep octos
If the service needs environment variables (e.g., SMTP credentials), add them to the plist:
<key>EnvironmentVariables</key>
<dict>
<key>SMTP_PASSWORD</key>
<string>xxxx xxxx xxxx xxxx</string>
</dict>
Check logs at ~/.octos/serve.log.
Linux (systemd)
Manage the service with systemd user units:
# Start / stop / restart
systemctl --user start octos-serve
systemctl --user stop octos-serve
systemctl --user restart octos-serve
# Enable on boot
systemctl --user enable octos-serve
# Check status and logs
systemctl --user status octos-serve
journalctl --user -u octos-serve
Troubleshooting
This chapter covers common issues organized by category, along with environment variable reference.
API & Provider Issues
API Key Not Set
Error: ANTHROPIC_API_KEY environment variable not set
Fix: Export the key in your shell or verify with octos status:
export ANTHROPIC_API_KEY="your-key"
If running as a service, ensure the environment variable is set in the service environment (launchd plist or systemd unit), not just your interactive shell.
Rate Limited (429)
The retry mechanism handles this automatically (3 attempts with exponential backoff). If the error persists:
- Try switching to a different provider via
/queueor in-chat model switching. - Wait for the rate limit window to reset.
Debug Logging
Enable detailed logs to diagnose issues:
RUST_LOG=debug octos chat
RUST_LOG=octos_agent=trace octos chat --message "task"
Build Issues
| Problem | Solution |
|---|---|
| Build fails on Linux | Install build dependencies: sudo apt install build-essential pkg-config libssl-dev |
| macOS codesign warning | Sign the binary: codesign -s - ~/.cargo/bin/octos |
octos: command not found | Add cargo bin to PATH: export PATH="$HOME/.cargo/bin:$PATH" |
Channel-Specific Issues
Lark / Feishu
| Issue | Solution |
|---|---|
| 404 on WebSocket endpoint | Larksuite international does not support WebSocket mode. Use "mode": "webhook" in your config |
| Challenge verification fails | Ensure your tunnel (e.g., ngrok) is running and the URL matches the one configured in the Lark console |
| No events received | Publish the app version after adding events. Check Event Log Retrieval in the console |
| Bot does not reply | Check that the im:message:send_as_bot permission is granted |
| Markdown not rendering | Messages are sent as interactive cards; Lark supports a subset of markdown |
| Tunnel URL changed | Free tunnel URLs change on restart. Update the request URL in the Lark console |
WeCom / WeChat
βEnvironment variable WECOM_BOT_SECRET not setβ
Set the secret before starting the gateway:
export WECOM_BOT_SECRET="your_secret"
Connection drops or fails to subscribe
- Verify
bot_idand secret are correct. - Check network connectivity to
wss://openws.work.weixin.qq.com. - The channel auto-reconnects up to 100 times with exponential backoff. Check logs for error details.
Messages not arriving
- Confirm the upstream relay service is running and linked to your account.
- Check that the WeCom group robot is the same one configured in octos.
- If using
allowed_senders, verify the senderβs WeCom user ID is in the list. - Check for duplicate message filtering β the channel deduplicates the last 1000 message IDs.
Long messages are truncated
Messages over 4096 characters are automatically split into multiple chunks by octos. If further truncation occurs, check the relay serviceβs own message length settings.
Platform-Specific Issues
| Problem | Solution |
|---|---|
| Dashboard not accessible | Check port: octos serve --port 8080, open http://localhost:8080/admin/ |
| WSL2 port not forwarded | Restart WSL: wsl --shutdown then reopen terminal |
| Service will not start | Check logs: tail -f ~/.octos/serve.log (macOS) or journalctl --user -u octos-serve (Linux) |
Windows: octos not found | Ensure %USERPROFILE%\.cargo\bin is in your PATH |
| Windows: shell commands fail | Commands run via cmd /C; use Windows-compatible syntax |
Environment Variables Reference
| Variable | Description |
|---|---|
ANTHROPIC_API_KEY | Anthropic API key |
OPENAI_API_KEY | OpenAI API key |
GEMINI_API_KEY | Gemini API key |
OPENROUTER_API_KEY | OpenRouter API key |
DEEPSEEK_API_KEY | DeepSeek API key |
GROQ_API_KEY | Groq API key |
MOONSHOT_API_KEY | Moonshot API key |
DASHSCOPE_API_KEY | DashScope API key |
MINIMAX_API_KEY | MiniMax API key |
ZHIPU_API_KEY | Zhipu API key |
ZAI_API_KEY | Z.AI API key |
NVIDIA_API_KEY | Nvidia NIM API key |
OMINIX_API_URL | Local ASR/TTS API URL |
RUST_LOG | Log level (error / warn / info / debug / trace) |
TELEGRAM_BOT_TOKEN | Telegram bot token |
DISCORD_BOT_TOKEN | Discord bot token |
SLACK_BOT_TOKEN | Slack bot token |
SLACK_APP_TOKEN | Slack app-level token |
FEISHU_APP_ID | Feishu app ID |
FEISHU_APP_SECRET | Feishu app secret |
EMAIL_USERNAME | Email account username |
EMAIL_PASSWORD | Email account password |
WECOM_CORP_ID | WeCom corp ID |
WECOM_AGENT_SECRET | WeCom agent secret |
CLI Reference
octos chat
Interactive multi-turn conversation with readline history.
octos chat [OPTIONS]
Options:
-c, --cwd <PATH> Working directory
--config <PATH> Config file path
--provider <NAME> LLM provider
--model <NAME> Model name
--base-url <URL> Custom API endpoint
-m, --message <MSG> Single message (non-interactive)
--max-iterations <N> Max tool iterations per message (default: 50)
-v, --verbose Show tool outputs
--no-retry Disable retry
Features:
- Arrow keys and line editing (rustyline)
- Persistent history at
.octos/history/chat_history - Exit:
/exit,/quit,exit,quit,:q, Ctrl+C, Ctrl+D - Full tool access (shell, files, search, web)
Examples:
octos chat # Interactive (default)
octos chat --provider deepseek # Use DeepSeek
octos chat --model glm-4-plus # Auto-detects Zhipu
octos chat --message "Fix auth bug" # Single message, exit
octos gateway
Run as a persistent multi-channel daemon.
octos gateway [OPTIONS]
Options:
-c, --cwd <PATH> Working directory
--config <PATH> Config file path
--provider <NAME> Override provider
--model <NAME> Override model
--base-url <URL> Override API endpoint
-v, --verbose Verbose logging
--no-retry Disable retry
Requires a gateway section in config with a channels array. Runs continuously until Ctrl+C.
octos init
Initialize workspace with config and bootstrap files.
octos init [OPTIONS]
Options:
-c, --cwd <PATH> Working directory
--defaults Skip prompts, use defaults
Creates:
.octos/config.jsonβ Provider/model config.octos/.gitignoreβ Ignores state files.octos/AGENTS.mdβ Agent instructions template.octos/SOUL.mdβ Personality template.octos/USER.mdβ User info template.octos/memory/β Memory storage directory.octos/sessions/β Session history directory.octos/skills/β Custom skills directory
octos status
Show system status.
octos status [OPTIONS]
Options:
-c, --cwd <PATH> Working directory
Example output:
octos Status
ββββββββββββββββββββββββββββββββββββββββββββββββββ
Config: .octos/config.json (found)
Workspace: .octos/ (found)
Provider: anthropic
Model: claude-sonnet-4-20250514
API Keys
ββββββββββββββββββββββββββββββββββββββββββββββββββ
Anthropic ANTHROPIC_API_KEY set
OpenAI OPENAI_API_KEY not set
...
Bootstrap Files
ββββββββββββββββββββββββββββββββββββββββββββββββββ
AGENTS.md found
SOUL.md found
USER.md found
TOOLS.md missing
IDENTITY.md missing
octos serve
Launch the web UI and REST API server. Requires the api feature flag.
cargo install --path crates/octos-cli --features api
octos serve # Binds to 127.0.0.1:8080
octos serve --host 0.0.0.0 --port 3000 # Accept external connections
Features: session sidebar, chat interface, SSE streaming, dark theme. A /metrics endpoint provides Prometheus-format metrics (octos_tool_calls_total, octos_tool_call_duration_seconds, octos_llm_tokens_total).
octos clean
Clean database and state files.
octos clean [--all] [--dry-run]
| Flag | Description |
|---|---|
--all | Remove all state files |
--dry-run | Show what would be removed without deleting |
octos completions
Generate shell completions.
octos completions <shell>
Supported shells: bash, zsh, fish, powershell.
octos cron
Manage scheduled jobs.
octos cron list [--all] # List active jobs (--all includes disabled)
octos cron add [OPTIONS] # Add a cron job
octos cron remove <job-id> # Remove a cron job
octos cron enable <job-id> # Enable a cron job
octos cron enable <job-id> --disable # Disable a cron job
Adding jobs:
octos cron add --name "report" --message "Generate daily report" --cron "0 0 9 * * * *"
octos cron add --name "check" --message "Check status" --every 3600
octos cron add --name "once" --message "Run migration" --at "2025-03-01T09:00:00Z"
Cron expressions use standard syntax. Jobs support an optional timezone field with IANA timezone names (e.g., "America/New_York", "Asia/Shanghai"). When omitted, UTC is used.
octos channels
Manage messaging channels.
octos channels status # Show channel compile/config status
octos channels login # WhatsApp QR code login
The status command shows a table with channel name, compile status (feature flags), and config summary (env vars set/missing).
octos office
Office file manipulation (DOCX/PPTX/XLSX). Native Rust implementation with no external dependencies for basic operations.
octos office extract <file> # Extract text as Markdown
octos office unpack <file> <output-dir> # Unpack into pretty-printed XML
octos office pack <input-dir> <output> # Pack directory into Office file
octos office clean <dir> # Remove orphaned files from unpacked PPTX
octos account
Manage sub-accounts under profiles. Sub-accounts inherit LLM provider config but have their own data directory (memory, sessions, skills) and channels.
octos account list --profile <id> # List sub-accounts
octos account create --profile <id> <name> [OPTIONS] # Create sub-account
octos account update <id> [OPTIONS] # Update sub-account
octos auth
OAuth login and API key management.
octos auth login --provider openai # PKCE browser OAuth
octos auth login --provider openai --device-code # Device code flow
octos auth login --provider anthropic # Paste-token (stdin)
octos auth logout --provider openai # Remove stored credential
octos auth status # Show authenticated providers
Credentials are stored in ~/.octos/auth.json (file mode 0600). The auth store is checked before environment variables when resolving API keys.
octos skills
Manage skills.
octos skills list # List installed skills
octos skills install user/repo/skill-name # Install from GitHub
octos skills remove skill-name # Remove a skill
Fetches SKILL.md from the GitHub repoβs main branch and installs to .octos/skills/.
Skill Development
This guide covers the full lifecycle of an Octos skill β from development to publication to end-user installation β similar to building an app, submitting it to an app store, and distributing it to users.
The Skill Ecosystem
Developer Octos Hub User
βββββββββ βββββββββ ββββ
1. Develop skill βββΆ 3. Publish to registry βββΆ 5. Search & discover
2. Test locally 4. Pre-built binaries 6. Install
7. Update
| Concept | App Store Analogy | Octos Equivalent |
|---|---|---|
| App | iOS/Android app | Skill (binary + manifest + docs) |
| SDK | Xcode / Android Studio | Rust + manifest.json + SKILL.md |
| App Store | Apple App Store | octos-hub registry |
| Distribution | App Store binary delivery | Pre-built binaries in GitHub Releases |
| Install | Tap βGetβ | octos skills install user/repo |
| Sideload | Ad-hoc / TestFlight | Copy to ~/.octos/skills/ directly |
Part 1: Develop
Architecture
A skill is a standalone executable that communicates via stdin/stdout JSON. The gateway spawns it as a child process for each tool call. Skills can be written in any language β Rust, Python, Node.js, shell, etc.
User message β LLM β tool_use("get_weather", {"city": "Paris"})
β
Gateway spawns: ~/.octos/skills/weather/main get_weather
β
Stdin: {"city": "Paris"}
Stdout: {"output": "25Β°C, sunny", "success": true}
β
LLM sees result β generates response
Skill Anatomy
Every skill is a directory with three files:
my-skill/
βββ manifest.json # Tool definitions (JSON Schema) β the "API contract"
βββ SKILL.md # Documentation + metadata β the "app description"
βββ main # Executable binary β the "app binary"
βββ (optional extras)
βββ styles/ # Bundled assets
βββ prompts/*.md # System prompt fragments
βββ hooks/ # Lifecycle hook scripts
Step 1: Create manifest.json
The manifest declares what tools the skill provides. The LLM reads this to decide when and how to call your skill.
{
"name": "my-skill",
"version": "1.0.0",
"author": "your-name",
"description": "What this skill does",
"timeout_secs": 15,
"requires_network": false,
"tools": [
{
"name": "my_tool",
"description": "Clear description for the LLM. What does this tool do? When should it be used?",
"input_schema": {
"type": "object",
"properties": {
"param1": {
"type": "string",
"description": "What this parameter means"
},
"param2": {
"type": "integer",
"description": "Optional numeric parameter (default: 10)"
}
},
"required": ["param1"]
}
}
]
}
Manifest fields:
| Field | Required | Default | Description |
|---|---|---|---|
name | Yes | β | Skill identifier |
version | Yes | β | Semantic version |
author | No | β | Author name |
description | No | β | Human-readable description |
timeout_secs | No | 30 | Max execution time per tool call (1-600) |
requires_network | No | false | Informational flag |
sha256 | No | β | Binary integrity check (hex hash) |
tools | No | [] | Array of tool definitions |
mcp_servers | No | [] | MCP server declarations |
hooks | No | [] | Lifecycle hook definitions |
prompts | No | β | Prompt fragment config |
binaries | No | {} | Pre-built binaries by {os}-{arch} |
Step 2: Create SKILL.md
Documentation with YAML frontmatter. The LLM reads this to understand context and trigger conditions.
---
name: my-skill
description: Short description. Triggers: keyword1, keyword2, trigger phrase.
version: 1.0.0
author: your-name
always: false
---
# My Skill
Detailed description of what this skill does and when to use it.
## Tools
### my_tool
Explain what this tool does with examples.
**Parameters:**
- `param1` (required): What it means
- `param2` (optional): What it controls. Default: 10
Frontmatter fields:
| Field | Required | Default | Description |
|---|---|---|---|
name | Yes | β | Skill identifier |
description | Yes | β | One-line description with trigger keywords |
version | Yes | β | Semantic version |
author | No | β | Author name |
always | No | false | If true, always included in system prompt |
requires_bins | No | β | Comma-separated binaries that must exist |
requires_env | No | β | Comma-separated env vars that must be set |
Step 3: Implement the Binary
The binary implements the stdin/stdout JSON protocol.
Protocol:
- argv[1] = tool name (e.g.,
get_weather) - stdin = JSON object matching the toolβs
input_schema - stdout = JSON with
output(string) andsuccess(bool) - exit code = 0 for success, non-zero for failure
- stderr = ignored (use for debug logging)
Rust template:
use std::io::Read;
use serde::Deserialize;
use serde_json::json;
#[derive(Deserialize)]
struct MyToolInput {
param1: String,
#[serde(default = "default_param2")]
param2: i32,
}
fn default_param2() -> i32 { 10 }
fn main() {
let args: Vec<String> = std::env::args().collect();
let tool_name = args.get(1).map(|s| s.as_str()).unwrap_or("unknown");
let mut buf = String::new();
if let Err(e) = std::io::stdin().read_to_string(&mut buf) {
fail(&format!("Failed to read stdin: {e}"));
}
match tool_name {
"my_tool" => handle_my_tool(&buf),
_ => fail(&format!("Unknown tool '{tool_name}'")),
}
}
fn fail(msg: &str) -> ! {
println!("{}", json!({"output": msg, "success": false}));
std::process::exit(1);
}
fn handle_my_tool(input_json: &str) {
let input: MyToolInput = match serde_json::from_str(input_json) {
Ok(v) => v,
Err(e) => fail(&format!("Invalid input: {e}")),
};
let result = format!("Processed {} with param2={}", input.param1, input.param2);
println!("{}", json!({"output": result, "success": true}));
}
Python template:
#!/usr/bin/env python3
import sys, json
def main():
tool_name = sys.argv[1] if len(sys.argv) > 1 else "unknown"
input_data = json.loads(sys.stdin.read())
if tool_name == "my_tool":
result = f"Processed {input_data['param1']}"
print(json.dumps({"output": result, "success": True}))
else:
print(json.dumps({"output": f"Unknown tool: {tool_name}", "success": False}))
sys.exit(1)
if __name__ == "__main__":
main()
Shell template:
#!/bin/sh
TOOL="$1"
INPUT=$(cat)
if [ "$TOOL" = "my_tool" ]; then
PARAM1=$(echo "$INPUT" | python3 -c "import sys,json; print(json.load(sys.stdin)['param1'])")
printf '{"output": "Processed %s", "success": true}\n' "$PARAM1"
else
printf '{"output": "Unknown tool: %s", "success": false}\n' "$TOOL"
exit 1
fi
Step 4: For Bundled Skills (Rust Crate)
If contributing a skill to the core Octos distribution:
mkdir -p crates/app-skills/my-skill/src
Cargo.toml:
[package]
name = "my-skill"
version = "1.0.0"
edition = "2021"
[[bin]]
name = "my_skill"
path = "src/main.rs"
[dependencies]
serde = { version = "1", features = ["derive"] }
serde_json = "1"
Add to workspace Cargo.toml:
members = [
# ...
"crates/app-skills/my-skill",
]
Register in crates/octos-agent/src/bundled_app_skills.rs:
#![allow(unused)]
fn main() {
pub const BUNDLED_APP_SKILLS: &[(&str, &str, &str, &str)] = &[
// ...
(
"my-skill", // dir_name
"my_skill", // binary_name
include_str!("../../app-skills/my-skill/SKILL.md"),
include_str!("../../app-skills/my-skill/manifest.json"),
),
];
}
Part 2: Test
Standalone Testing
Test your skill binary directly without the gateway:
# Build (Rust)
cargo build -p my-skill
# Test a tool call
echo '{"param1": "hello", "param2": 5}' | ./target/debug/my_skill my_tool
# Expected: {"output":"Processed hello with param2=5","success":true}
# Test error handling
echo '{}' | ./target/debug/my_skill my_tool
echo '{"param1": "test"}' | ./target/debug/my_skill unknown_tool
For non-Rust skills, make the binary executable and test the same way:
chmod +x my-skill/main
echo '{"param1": "hello"}' | ./my-skill/main my_tool
Gateway Integration Testing
# Build everything
cargo build --release --workspace
# Start the gateway
octos gateway
# Verify skill loaded
ls ~/.octos/skills/my-skill/
# main manifest.json SKILL.md
# Ask the agent to use your skill in conversation
Recommended Timeout Values
| Skill Type | Timeout |
|---|---|
| Local computation | 5s |
| Single API call | 15s |
| Multi-step API calls | 30-60s |
| Long-running research | 300-600s |
Part 3: Publish
Publishing makes your skill discoverable to all Octos users β like submitting an app to the App Store.
Push to GitHub
Organize your repository. A repo can contain a single skill or multiple skills:
Single-skill repo:
my-skill/ β repo root
βββ manifest.json
βββ SKILL.md
βββ Cargo.toml (or package.json, requirements.txt, etc.)
βββ src/main.rs
Multi-skill repo:
my-skills/ β repo root
βββ skill-a/
β βββ manifest.json
β βββ SKILL.md
β βββ src/main.rs
βββ skill-b/
β βββ manifest.json
β βββ SKILL.md
β βββ main.py
βββ shared/ β shared dependencies (auto-detected)
βββ utils.py
Submit to the Registry
The octos-hub registry is the central catalog for discoverable skills. Submit a PR to add your entry to registry.json:
{
"name": "my-skills",
"description": "What your skills do",
"repo": "your-user/your-repo",
"version": "1.0.0",
"author": "your-name",
"license": "MIT",
"skills": ["skill-a", "skill-b"],
"requires": ["git", "cargo"],
"provides_tools": true,
"tags": ["keyword1", "keyword2"]
}
Registry entry fields:
| Field | Required | Description |
|---|---|---|
name | Yes | Package name (can differ from repo name) |
description | Yes | Searchable description |
repo | Yes | GitHub user/repo or full URL |
version | No | Latest version |
author | No | Author name |
license | No | License identifier (MIT, Apache-2.0, etc.) |
skills | No | Individual skill names in the package |
requires | No | External dependencies (e.g., ["git", "cargo"]) |
provides_tools | No | Whether skills have manifest.json with tools |
tags | No | Searchable tags |
binaries | No | Pre-built binaries (see Distribution below) |
Once the PR is merged, users can discover your skill:
octos skills search keyword1
Part 4: Distribute
Pre-built binaries let users install instantly without compiling β like downloading an app binary from the store.
Add Binaries to manifest.json
In your skillβs manifest.json, add a binaries section keyed by {os}-{arch}:
{
"name": "my-skill",
"version": "1.0.0",
"binaries": {
"darwin-aarch64": {
"url": "https://github.com/you/repo/releases/download/v1.0.0/my-skill-darwin-aarch64.tar.gz",
"sha256": "abc123..."
},
"darwin-x86_64": {
"url": "https://github.com/you/repo/releases/download/v1.0.0/my-skill-darwin-x86_64.tar.gz",
"sha256": "def456..."
},
"linux-x86_64": {
"url": "https://github.com/you/repo/releases/download/v1.0.0/my-skill-linux-x86_64.tar.gz",
"sha256": "789ghi..."
}
},
"tools": [ ... ]
}
Automate with GitHub Actions
Set up CI to build and publish binaries on each release tag:
name: Release Skill
on:
push:
tags: ["v*"]
jobs:
build:
strategy:
matrix:
include:
- os: macos-latest
target: aarch64-apple-darwin
platform: darwin-aarch64
- os: ubuntu-latest
target: x86_64-unknown-linux-gnu
platform: linux-x86_64
runs-on: ${{ matrix.os }}
steps:
- uses: actions/checkout@v5
- uses: actions-rust-lang/setup-rust-toolchain@v1
- run: cargo build --release --target ${{ matrix.target }}
- name: Package
run: |
mkdir dist
cp target/${{ matrix.target }}/release/my_skill dist/main
cd dist && tar czf my-skill-${{ matrix.platform }}.tar.gz main
shasum -a 256 my-skill-${{ matrix.platform }}.tar.gz
- uses: softprops/action-gh-release@v2
with:
files: dist/my-skill-*.tar.gz
Install Resolution Order
When a user runs octos skills install, the installer tries these sources in order:
- manifest.json
binariesβ skill authorβs own CI/CD builds - Registry
binariesβ registry-audited pre-built binaries cargo build --releaseβ fallback: compile from source (ifCargo.tomlexists)npm installβ fallback: install Node.js dependencies (ifpackage.jsonexists)
Pre-built binaries are verified with SHA-256 before installation.
Part 5: Install
For Users: Search and Install
# Search the registry
octos skills search weather
octos skills search "deep research"
# Install from GitHub (all skills in repo)
octos skills install user/repo
# Install a specific skill from a multi-skill repo
octos skills install user/repo/skill-name
# Install with a specific branch
octos skills install user/repo --branch dev
# Force reinstall
octos skills install user/repo --force
Per-Profile Installation
Skills are isolated per profile (like per-user app installs):
# Install to a specific profile
octos skills --profile alice install user/repo/my-skill
# List skills for a profile
octos skills --profile alice list
# Remove from a profile
octos skills --profile alice remove my-skill
In-Chat Installation
Users can manage skills from within a conversation:
/skills install user/repo/my-skill
/skills list
/skills remove my-skill
/skills search comic
Admin API
Programmatic skill management via REST:
# Install
POST /api/admin/profiles/alice/skills {"repo": "user/repo/my-skill"}
# List
GET /api/admin/profiles/alice/skills
# Remove
DELETE /api/admin/profiles/alice/skills/my-skill
Sideloading (Manual Install)
Copy a skill directory directly β like sideloading an app:
# Copy to global skills directory
cp -r my-skill/ ~/.octos/skills/my-skill/
chmod +x ~/.octos/skills/my-skill/main
# Or to a profile-specific directory
cp -r my-skill/ ~/.octos/profiles/alice/data/skills/my-skill/
Installed Skill Layout
~/.octos/skills/my-skill/
βββ main # Executable binary
βββ manifest.json # Tool definitions
βββ SKILL.md # Documentation
βββ .source # Install tracking (repo, branch, date)
βββ styles/ # Bundled assets (if any)
The .source file tracks where the skill was installed from:
{
"repo": "user/repo",
"subdir": "my-skill",
"branch": "main",
"installed_at": "2026-03-28T..."
}
Skill Loading Priority
When multiple directories contain a skill with the same name, first match wins:
| Priority | Location | Source |
|---|---|---|
| 1 (highest) | <profile-data>/skills/ | Per-profile install |
| 2 | <project-dir>/skills/ | Project-local |
| 3 | <project-dir>/bundled-skills/ | Bundled app-skills |
| 4 (lowest) | ~/.octos/skills/ | Global install |
Part 6: Update
# Update a skill from its source repo
octos skills update my-skill
# Update from a specific branch
octos skills update my-skill --branch main
# View skill details (version, source, tools)
octos skills info my-skill
The updater reads the .source file to know where to pull from, then re-runs the install flow (clone β discover β build/download β copy).
Hot-Reload
Skill binaries can be updated without restarting the gateway:
# Build just the skill
cargo build --release -p my-skill
# Replace the binary
cp target/release/my_skill ~/.octos/skills/my-skill/main
# Next tool call automatically uses the new binary
Note: If you change
SKILL.mdormanifest.jsonfor a bundled skill, you must rebuild theoctosbinary too (theyβre embedded viainclude_str!). External skills reload immediately.
Advanced Topics
Multiple Tools in One Skill
A single binary can serve multiple tools. Route on argv[1]:
#![allow(unused)]
fn main() {
match tool_name {
"get_weather" => handle_get_weather(&buf),
"get_forecast" => handle_get_forecast(&buf),
_ => fail(&format!("Unknown tool '{tool_name}'")),
}
}
Declare all tools in manifest.json:
{
"tools": [
{ "name": "get_weather", "description": "...", "input_schema": { ... } },
{ "name": "get_forecast", "description": "...", "input_schema": { ... } }
]
}
Environment Variables
Skills inherit the gatewayβs environment (minus blocked security-sensitive vars). Declare requirements in SKILL.md:
---
requires_env: MY_API_KEY,MY_SECRET
---
The gateway auto-injects provider API keys (e.g., DASHSCOPE_API_KEY, OPENAI_API_KEY) plus OCTOS_DATA_DIR and OCTOS_WORK_DIR.
Bundled Assets
Skills with asset files should resolve paths relative to the executable:
#![allow(unused)]
fn main() {
let exe = std::env::current_exe()?;
let skill_dir = exe.parent().unwrap();
let styles_dir = skill_dir.join("styles");
}
Do not use the current working directory β it points to the profileβs data dir, not the skill dir.
MCP Servers
A skill can declare MCP servers the gateway auto-starts:
{
"mcp_servers": [
{
"command": "./bin/mcp-server",
"args": ["--port", "0"],
"env": ["DATABASE_URL"]
}
]
}
Or remote MCP servers:
{
"mcp_servers": [
{
"url": "https://mcp.example.com/v1",
"headers": { "Authorization": "Bearer ${API_KEY}" }
}
]
}
Path resolution: ./ and ../ are relative to the skill directory. env lists variable names (not values) to forward.
Lifecycle Hooks
Skills can run commands on agent events:
{
"hooks": [
{
"event": "before_tool_call",
"command": ["./hooks/policy-check.sh"],
"timeout_ms": 3000,
"tool_filter": ["shell", "bash"]
},
{
"event": "after_tool_call",
"command": ["./hooks/audit-log.sh"],
"timeout_ms": 5000
}
]
}
| Event | Can Deny? | When |
|---|---|---|
before_tool_call | Yes (exit 1) | Before tool execution |
after_tool_call | No | After tool completes |
before_llm_call | Yes (exit 1) | Before LLM request |
after_llm_call | No | After LLM response |
Prompt Fragments
Inject content into the system prompt without writing code:
{
"name": "company-policy",
"version": "1.0.0",
"prompts": {
"include": ["prompts/*.md"]
}
}
Extras-Only Skills
Skills donβt need to provide tools. Valid combinations:
- Prompt-only: Teach the agent domain knowledge (no binary needed)
- Hooks-only: Enforce policies across all tool calls
- MCP-only: Expose tools via remote MCP servers
- Combined: Tools + MCP + hooks + prompts in one skill
Security
Binary integrity:
- Symlinks rejected (defense against link-swap attacks)
- SHA-256 verification when
sha256is set in manifest - Size limit: 100 MB max per binary
Environment sanitization β these vars are stripped before spawning skills:
LD_PRELOAD,DYLD_INSERT_LIBRARIES,DYLD_LIBRARY_PATHNODE_OPTIONS,PYTHONPATH,PERL5LIBRUSTFLAGS,RUST_LOG, and 10+ others
Best practices:
- Validate all input (never trust user-provided paths, names, etc.)
- Use timeouts on HTTP requests
- Avoid shell injection
- Set
sha256in manifest for release builds
Platform Skills vs App Skills
| App Skills | Platform Skills | |
|---|---|---|
| Location | crates/app-skills/ | crates/platform-skills/ |
| Bootstrap | Every gateway startup | Admin bot only |
| Scope | Per-gateway | Shared across gateways |
| Use when | Self-contained, always available | Requires external service |
Examples
Example 1: Clock (Local, No Network)
crates/app-skills/time/
βββ Cargo.toml # chrono, chrono-tz, serde, serde_json
βββ manifest.json # 1 tool: get_time, timeout_secs: 5
βββ SKILL.md # Triggers: time, clock
βββ src/main.rs # System clock + timezone formatting
Example 2: Weather (Network API)
crates/app-skills/weather/
βββ Cargo.toml # reqwest (blocking, rustls-tls), serde, serde_json
βββ manifest.json # 2 tools: get_weather, get_forecast, timeout_secs: 15
βββ SKILL.md # Triggers: weather, forecast
βββ src/main.rs # Geocode city β Open-Meteo API
Example 3: Email (Environment Credentials)
crates/app-skills/send-email/
βββ Cargo.toml # lettre, serde, serde_json
βββ manifest.json # 1 tool: send_email
βββ SKILL.md # requires_env: SMTP_HOST,SMTP_USERNAME,SMTP_PASSWORD
βββ src/main.rs # SMTP with credential validation
Checklists
Tool Skill (binary + tools)
- Directory has
manifest.json,SKILL.md, and executable (mainor binary) -
manifest.jsonhas valid JSON Schema for all tool inputs -
SKILL.mdhas frontmatter with trigger keywords - Binary reads
argv[1]for tool name, stdin for JSON - Binary writes
{"output": "...", "success": true/false}to stdout - Error cases return
success: falsewith clear messages - Standalone test passes:
echo '{"param": "val"}' | ./main my_tool - Gateway test passes: skill loads and agent can invoke it
Extras Skill (MCP / hooks / prompts)
-
mcp_servers:commandorurlset;envlists names only -
hooks: valid event name;commandis argv array; relative paths resolve -
prompts: glob patterns match intended.mdfiles - Extras-only:
toolsis empty or omitted, no binary needed
Publishing
- Repo pushed to GitHub with
manifest.jsonandSKILL.mdat expected paths - Registry PR submitted to octos-hub
- (Optional) Pre-built binaries for
darwin-aarch64,linux-x86_64 - (Optional) SHA-256 hashes in
manifest.jsonbinariessection - (Optional) GitHub Actions workflow for automated binary builds on release tags
Architecture Document: octos
Overview
octos is a 15-member Rust workspace (Edition 2024, rust-version 1.85.0) providing both a coding agent CLI and a multi-channel messaging gateway. Pure Rust TLS via rustls (no OpenSSL). Error handling via eyre/color-eyre.
Workspace members:
- 6 core crates: octos-core, octos-memory, octos-llm, octos-agent, octos-bus, octos-cli
- 1 pipeline crate: octos-pipeline
- 7 app-skill crates: news, deep-search, deep-crawl, send-email, account-manager, time, weather
- 1 platform-skill crate: asr
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β octos-cli β
β (CLI: chat, gateway, init, status) β
ββββββββββββββββββββββββββββ¬βββββββββββββββββββββββββββββββββββ€
β octos-agent β octos-bus β
β (Agent, Tools, Skills) β (Channels, Sessions, Cron) β
ββββββββββββ¬ββββββββββββββββΌβββββββββββββββββββββββββββββββββββ€
βoctos-memoryβ octos-llm β octos-pipeline β
β(Episodes) β (Providers) β (DOT-based orchestration) β
ββββββββββββ΄ββββββββββββββββ΄βββββββββββββββββββββββββββββββββββ€
β octos-core β
β (Types, Messages, Gateway Protocol) β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
octos-core β Foundation Types
Shared types with no internal dependencies. Only depends on serde, chrono, uuid, eyre.
MessageRole implements as_str() -> &'static str and Display for consistent string conversion across providers (system/user/assistant/tool).
Task Model
#![allow(unused)]
fn main() {
pub struct Task {
pub id: TaskId, // UUID v7 (temporal ordering)
pub parent_id: Option<TaskId>, // For subtasks
pub status: TaskStatus,
pub kind: TaskKind,
pub context: TaskContext,
pub result: Option<TaskResult>,
pub created_at: DateTime<Utc>,
pub updated_at: DateTime<Utc>,
}
}
TaskId: Newtype over Uuid. Generates UUID v7 via Uuid::now_v7(). Implements Display, FromStr, Default.
TaskStatus (tagged enum, "state" discriminant):
Pendingβ awaiting assignmentInProgress { agent_id: AgentId }β executingBlocked { reason: String }β waiting for dependencyCompletedβ successFailed { error: String }β failure with message
TaskKind (tagged enum, "type" discriminant):
Plan { goal: String }Code { instruction: String, files: Vec<PathBuf> }Review { diff: String }Test { command: String }Custom { name: String, params: serde_json::Value }
TaskContext:
working_dir: PathBuf,git_state: Option<GitState>,working_memory: Vec<Message>,episodic_refs: Vec<EpisodeRef>,files_in_scope: Vec<PathBuf>
TaskResult:
success: bool,output: String,files_modified: Vec<PathBuf>,subtasks: Vec<TaskId>,token_usage: TokenUsage
TokenUsage: input_tokens: u32, output_tokens: u32 (defaults to 0/0)
Message Types
#![allow(unused)]
fn main() {
pub struct Message {
pub role: MessageRole, // System | User | Assistant | Tool
pub content: String,
pub media: Vec<String>, // File paths (images, audio)
pub tool_calls: Option<Vec<ToolCall>>,
pub tool_call_id: Option<String>,
pub timestamp: DateTime<Utc>,
}
pub struct ToolCall {
pub id: String,
pub name: String,
pub arguments: serde_json::Value,
}
}
Gateway Protocol
#![allow(unused)]
fn main() {
pub struct InboundMessage { // channel β agent
pub channel: String, // "telegram", "cli", "discord", etc.
pub sender_id: String,
pub chat_id: String,
pub content: String,
pub timestamp: DateTime<Utc>,
pub media: Vec<String>,
pub metadata: serde_json::Value,
}
pub struct OutboundMessage { // agent β channel
pub channel: String,
pub chat_id: String,
pub content: String,
pub reply_to: Option<String>,
pub media: Vec<String>,
pub metadata: serde_json::Value,
}
}
InboundMessage::session_key() derives SessionKey::new(channel, chat_id) β format "{channel}:{chat_id}".
Inter-Agent Coordination
#![allow(unused)]
fn main() {
pub enum AgentMessage { // tagged: "type", snake_case
TaskAssign { task: Box<Task> },
TaskUpdate { task_id: TaskId, status: TaskStatus },
TaskComplete { task_id: TaskId, result: TaskResult },
ContextRequest { task_id: TaskId, query: String },
ContextResponse { task_id: TaskId, context: Vec<Message> },
}
}
Error System
#![allow(unused)]
fn main() {
pub struct Error {
pub kind: ErrorKind,
pub context: Option<String>, // Chained context
pub suggestion: Option<String>, // Actionable fix hint
}
}
ErrorKind variants: TaskNotFound, AgentNotFound, InvalidStateTransition, LlmError, ApiError (status-aware: 401βcheck key, 429βrate limit), ToolError, ConfigError, ApiKeyNotSet, UnknownProvider, Timeout, ChannelError, SessionError, IoError, SerializationError, Other(eyre::Report).
Utilities
truncate_utf8(s: &mut String, max_len: usize, suffix: &str) β in-place truncation at UTF-8 char boundaries. Appends suffix after truncation. Used across all tool outputs.
octos-llm β LLM Provider Abstraction
Provider Trait
#![allow(unused)]
fn main() {
#[async_trait]
pub trait LlmProvider: Send + Sync {
async fn chat(&self, messages: &[Message], tools: &[ToolSpec], config: &ChatConfig) -> Result<ChatResponse>;
async fn chat_stream(&self, messages: &[Message], tools: &[ToolSpec], config: &ChatConfig) -> Result<ChatStream>; // default: falls back to chat()
fn context_window(&self) -> u32; // default: context_window_tokens(self.model_id())
fn model_id(&self) -> &str;
fn provider_name(&self) -> &str;
}
}
Configuration
#![allow(unused)]
fn main() {
pub struct ChatConfig {
pub max_tokens: Option<u32>, // default: Some(4096)
pub temperature: Option<f32>, // default: Some(0.0)
pub tool_choice: ToolChoice, // Auto | Required | None | Specific { name }
pub stop_sequences: Vec<String>,
}
}
Response Types
#![allow(unused)]
fn main() {
pub struct ChatResponse {
pub content: Option<String>,
pub tool_calls: Vec<ToolCall>,
pub stop_reason: StopReason, // EndTurn | ToolUse | MaxTokens | StopSequence
pub usage: TokenUsage,
}
pub enum StreamEvent {
TextDelta(String),
ToolCallDelta { index, id, name, arguments_delta },
Usage(TokenUsage),
Done(StopReason),
Error(String),
}
pub type ChatStream = Pin<Box<dyn Stream<Item = StreamEvent> + Send>>;
}
Provider Registry (registry/)
All providers are defined in octos-llm/src/registry/ β one file per provider. Each file exports a ProviderEntry with metadata (name, aliases, default model, API key env var, base URL) and a create() factory function. Adding a new provider = one file + one line in mod.rs.
#![allow(unused)]
fn main() {
pub struct ProviderEntry {
pub name: &'static str, // canonical name
pub aliases: &'static [&'static str], // e.g. ["google"] for gemini
pub default_model: Option<&'static str>,
pub api_key_env: Option<&'static str>,
pub default_base_url: Option<&'static str>,
pub requires_api_key: bool,
pub requires_base_url: bool, // true for vllm
pub requires_model: bool, // true for vllm
pub detect_patterns: &'static [&'static str], // modelβprovider auto-detect
pub create: fn(CreateParams) -> Result<Arc<dyn LlmProvider>>,
}
pub struct CreateParams {
pub api_key: Option<String>,
pub model: Option<String>,
pub base_url: Option<String>,
pub model_hints: Option<ModelHints>, // config-level override
}
}
Lookup: registry::lookup(name) β case-insensitive, matches canonical name or aliases.
Auto-detect: registry::detect_provider(model) β infers provider from model name patterns.
Native Providers (4 protocol implementations)
| Provider | Base URL | Auth Header | Image Format | Default Model |
|---|---|---|---|---|
| Anthropic | api.anthropic.com | x-api-key | Base64 blocks | claude-sonnet-4-20250514 |
| OpenAI | api.openai.com/v1 | Authorization: Bearer | Data URI | gpt-4o |
| Gemini | generativelanguage.googleapis.com/v1beta | x-goog-api-key | Base64 inline | gemini-2.5-flash |
| OpenRouter | openrouter.ai/api/v1 | Authorization: Bearer | Data URI | anthropic/claude-sonnet-4-20250514 |
OpenAI-Compatible Providers (via OpenAIProvider::with_base_url())
| Provider | Aliases | Base URL | Default Model | API Key Env |
|---|---|---|---|---|
| DeepSeek | β | api.deepseek.com/v1 | deepseek-chat | DEEPSEEK_API_KEY |
| Groq | β | api.groq.com/openai/v1 | llama-3.3-70b-versatile | GROQ_API_KEY |
| Moonshot | kimi | api.moonshot.ai/v1 | kimi-k2.5 | MOONSHOT_API_KEY |
| DashScope | qwen | dashscope.aliyuncs.com/compatible-mode/v1 | qwen-max | DASHSCOPE_API_KEY |
| MiniMax | β | api.minimax.io/v1 | MiniMax-Text-01 | MINIMAX_API_KEY |
| Zhipu | glm | open.bigmodel.cn/api/paas/v4 | glm-4-plus | ZHIPU_API_KEY |
| Nvidia | nim | integrate.api.nvidia.com/v1 | meta/llama-3.3-70b-instruct | NVIDIA_API_KEY |
| Ollama | β | localhost:11434/v1 | llama3.2 | (none) |
| vLLM | β | (user-provided) | (user-provided) | VLLM_API_KEY |
Anthropic-Compatible Provider
| Provider | Aliases | Base URL | Default Model | API Key Env |
|---|---|---|---|---|
| Z.AI | zai, z.ai | api.z.ai/api/anthropic | glm-5 | ZAI_API_KEY |
ModelHints (OpenAI provider)
Auto-detected from model name at construction, overridable via config model_hints:
#![allow(unused)]
fn main() {
pub struct ModelHints {
pub uses_completion_tokens: bool, // o-series, gpt-5, gpt-4.1
pub fixed_temperature: bool, // o-series, kimi-k2.5
pub lacks_vision: bool, // deepseek, minimax, mistral, yi-
pub merge_system_messages: bool, // default: true
}
}
SSE Streaming
parse_sse_response(response) -> impl Stream<Item = SseEvent> β stateful unfold-based parser. Max buffer: 1 MB. Handles \n\n and \r\n\r\n separators. Each provider maps SSE events to StreamEvent:
- Anthropic:
message_startβ input tokens,content_block_start/deltaβ text/tool chunks,message_deltaβ stop reason. Custom SSE state machine. - OpenAI/OpenRouter: Standard OpenAI SSE with
[DONE]sentinel.delta.contentfor text,delta.tool_calls[]for tools. Shared parser:parse_openai_sse_events(). - Gemini:
alt=sseendpoint.candidates[0].content.parts[]with function call data.
RetryProvider
Wraps any Arc<dyn LlmProvider> with exponential backoff. Wrapped by ProviderChain for multi-provider failover.
#![allow(unused)]
fn main() {
pub struct RetryConfig {
pub max_retries: u32, // default: 3
pub initial_delay: Duration, // default: 1s
pub max_delay: Duration, // default: 60s
pub backoff_multiplier: f64, // default: 2.0
}
}
Delay formula: initial_delay * backoff_multiplier^attempt, capped at max_delay.
Retryable errors (three-tier detection):
- HTTP status: 429, 500, 502, 503, 504, 529
- reqwest:
is_connect()oris_timeout() - String fallback: βconnection refusedβ, βtimed outβ, βoverloadedβ
Provider Failover Chain
ProviderChain wraps multiple Arc<dyn LlmProvider> and transparently fails over on retriable errors. Configured via fallback_models in config.
#![allow(unused)]
fn main() {
pub struct ProviderChain {
slots: Vec<ProviderSlot>, // provider + AtomicU32 failure count
failure_threshold: u32, // default: 3
}
}
Behavior: Tries providers in order, skipping degraded ones (failures >= threshold). On retriable error, moves to the next. On success, resets failure count. If all degraded, picks the one with fewest failures.
Failoverable: Broader than retryable β includes 401/403, timeouts, and content-format 400 errors (e.g. "must not be empty", "reasoning_content", "API key not valid", "invalid_value"). These should not retry on the same provider but should failover to a different one.
AdaptiveRouter (adaptive.rs)
Metrics-driven provider selection with three mutually exclusive modes (Off, Hedge, Lane). Tracks per-provider EMA latency (configurable ema_alpha, default 0.3), p95 latency (64-sample circular buffer), error rates, throughput (output tokens/sec EMA), and cost. Four-factor scoring: stability, quality, priority, cost (all weights configurable). Includes circuit breaker, probe requests, model catalog seeding from model_catalog.json, and QoS ranking. Scoring uses EMA blending: baseline catalog data at cold start, live metrics gradually replace it (weight ramps from 0 to 1 over 10 calls).
#![allow(unused)]
fn main() {
pub struct AdaptiveSlot {
provider: Arc<dyn LlmProvider>,
metrics: ProviderMetrics,
priority: usize,
cost_per_m: f64,
model_type: Mutex<ModelType>, // Strong | Fast
cost_in: AtomicU64,
ds_output: AtomicU64, // deep search output quality
baseline_stability: AtomicU64,
baseline_tool_avg_ms: AtomicU64,
baseline_p95_ms: AtomicU64,
context_window: AtomicU64,
max_output: AtomicU64,
}
}
Hedge mode: Races primary + cheapest alternate via tokio::select!, cancels loser. Only completed requests record metrics (cancelled loser metrics are discarded). If primary fails, alternate is tried sequentially.
Lane mode: Scores all providers, picks single best. Probe requests sent to stale providers (configurable probability, default 0.1; interval, default 60s).
FallbackProvider (fallback.rs)
Wraps primary + QoS-ranked fallbacks. On failure, records cooldown via ProviderRouter. Tries each fallback in order.
SwappableProvider (swappable.rs)
Runtime model switching via RwLock. Leaks ~50 bytes per swap (acceptable for rare user-initiated changes). cached_model_id and cached_provider_name are leaked &'static str to satisfy the &str return type.
ProviderRouter (router.rs)
Sub-agent multi-model routing with prefix-based key resolution. Supports cooldown (60s default), QoS-scored compatible_fallbacks() (sorted by model catalog score), cost info auto-derived from pricing.rs, and metadata for LLM-visible tool schemas.
#![allow(unused)]
fn main() {
pub struct ProviderRouter {
providers: RwLock<HashMap<String, Arc<dyn LlmProvider>>>,
active_key: RwLock<Option<String>>,
metadata: RwLock<HashMap<String, SubProviderMeta>>,
cooldowns: RwLock<HashMap<String, Instant>>,
qos_scores: RwLock<HashMap<String, f64>>,
}
}
OminixClient (ominix.rs)
Client for local ASR/TTS via Ominix runtime.
Token Estimation
#![allow(unused)]
fn main() {
pub fn estimate_tokens(text: &str) -> u32 // ~4 chars/token ASCII, ~1.5 chars/token CJK
pub fn estimate_message_tokens(msg: &Message) -> u32 // content + tool_calls + 4 overhead
}
Context Windows
| Model Family | Tokens |
|---|---|
| Claude 3/4 | 200,000 |
| GPT-4o/4-turbo | 128,000 |
| o1/o3/o4 | 200,000 |
| Gemini 2.0/1.5 | 1,000,000 |
| Default (unknown) | 128,000 |
Pricing
model_pricing(model_id) -> Option<ModelPricing> β case-insensitive substring match. Cost = (input/1M) * input_rate + (output/1M) * output_rate.
| Model | Input $/1M | Output $/1M |
|---|---|---|
| claude-opus-4 | 15.00 | 75.00 |
| claude-sonnet-4 | 3.00 | 15.00 |
| claude-haiku | 0.80 | 4.00 |
| gpt-4o | 2.50 | 10.00 |
| gpt-4o-mini | 0.15 | 0.60 |
| o3/o4 | 10.00 | 40.00 |
Embedding
#![allow(unused)]
fn main() {
pub trait EmbeddingProvider: Send + Sync {
async fn embed(&self, texts: &[&str]) -> Result<Vec<Vec<f32>>>;
fn dimension(&self) -> usize;
}
}
OpenAIEmbedder: Default model text-embedding-3-small (1536 dims). text-embedding-3-large = 3072 dims.
Transcription
GroqTranscriber: Whisper whisper-large-v3 via https://api.groq.com/openai/v1/audio/transcriptions. Multipart form. 60s timeout. MIME detection: ogg/opusβaudio/ogg, mp3βaudio/mpeg, m4aβaudio/mp4, wavβaudio/wav.
Vision
encode_image(path) -> (mime_type, base64_data) β JPEG/PNG/GIF/WebP. is_image(path) -> bool.
Typed Error Hierarchy (error.rs)
LlmError with LlmErrorKind enum: Authentication, RateLimited, ContextOverflow, ModelNotFound, ServerError, Network, Timeout, InvalidRequest, ContentFiltered, StreamError, Provider. is_retryable() returns true for RateLimited, ServerError, Network, Timeout, StreamError. from_status(code, body) maps HTTP status codes to error kinds. Provider response bodies logged at debug level only (not exposed in error messages).
High-Level Client (high_level.rs)
LlmClient wraps Arc<dyn LlmProvider> with ergonomic APIs: generate(prompt), generate_with(messages, tools, config), generate_object(prompt, schema_name, schema), generate_typed<T>(prompt, schema_name, schema), stream(prompt), stream_with(messages, tools, config). Configurable via with_config(ChatConfig).
Middleware Pipeline (middleware.rs)
LlmMiddleware trait with before()/after()/on_error() hooks. MiddlewareStack wraps LlmProvider and runs layers in insertion order. before() can short-circuit with cached responses. Built-in: LoggingMiddleware (tracing), CostTracker (AtomicU64 counters for input/output tokens and request count). Streaming bypasses middleware (logged as debug warning).
Model Catalog (catalog.rs)
ModelCatalog with ModelInfo (id, name, provider, context_window, max_output_tokens, capabilities, cost, aliases). Lookup by ID or alias via HashMap index. with_defaults() pre-registers 4 models (Claude Sonnet 4, Claude Haiku 4.5, GPT-4o, Gemini 2.5 Flash). by_provider() and with_capability() for filtered queries.
octos-memory β Persistence & Search
EpisodeStore
redb database at .octos/episodes.redb with three tables:
| Table | Key | Value | Purpose |
|---|---|---|---|
| episodes | &str (episode_id) | &str (JSON) | Full episode records |
| cwd_index | &str (working_dir) | &str (JSON array of IDs) | Directory-scoped lookup |
| embeddings | &str (episode_id) | &[u8] (bincode Vec | Vector embeddings |
#![allow(unused)]
fn main() {
pub struct Episode {
pub id: String, // UUID v7
pub task_id: TaskId,
pub agent_id: AgentId,
pub working_dir: PathBuf,
pub summary: String, // LLM-generated, truncated to 500 chars
pub outcome: EpisodeOutcome, // Success | Failure | Blocked | Cancelled
pub key_decisions: Vec<String>,
pub files_modified: Vec<PathBuf>,
pub created_at: DateTime<Utc>,
}
}
Operations:
store(episode)β serialize to JSON, update cwd_index, insert into in-memory HybridIndexget(id)β direct lookup by episode_idfind_relevant(cwd, query, limit)β keyword matching scoped to directoryrecent_for_cwd(cwd, n)β N most recent by created_at descendingstore_embedding(id, Vec<f32>)β bincode serialize, store in embeddings table, update HybridIndexfind_relevant_hybrid(query, query_embedding, limit)β global hybrid search across all episodes
Initialization: On open(), rebuilds in-memory HybridIndex by iterating all episodes and loading embeddings from DB.
MemoryStore
File-based persistent memory at {data_dir}/memory/:
MEMORY.mdβ long-term memory (full overwrite)YYYY-MM-DD.mdβ daily notes (append with date header)
get_memory_context() builds system prompt injection:
## Long-term Memoryβ full MEMORY.md## Recent Activityβ 7-day rolling window of daily notes## Today's Notesβ current day
HybridIndex β BM25 + Vector Search
#![allow(unused)]
fn main() {
pub struct HybridIndex {
inverted: HashMap<String, Vec<(usize, u32)>>, // term β [(doc_idx, raw_tf_count)]
doc_lengths: Vec<usize>,
total_len: usize, // running total for O(1) avg_dl
avg_dl: f64,
ids: Vec<String>,
hnsw: Option<Hnsw<'static, f32, DistCosine>>,
has_embedding: Vec<bool>,
dimension: usize, // default: 1536
}
}
BM25 scoring (constants: K1=1.2, B=0.75):
- Tokenization: lowercase, split on non-alphanumeric, filter tokens < 2 chars
- IDF:
ln((N - df + 0.5) / (df + 0.5) + 1.0) - Score:
IDF * (tf * (K1 + 1)) / (tf + K1 * (1 - B + B * dl/avg_dl))β uses raw term counts (not normalized) - Duplicate detection:
ids.contains(episode_id)skips already-indexed documents (line 76-78) - Normalized to [0, 1] range (epsilon
1e-10prevents NaN from near-zero max scores)
HNSW vector index (via hnsw_rs):
- Named constants:
HNSW_MAX_NB_CONNECTION=16,HNSW_CAPACITY=10_000,HNSW_EF_CONSTRUCTION=200,HNSW_MAX_LAYER=16,DistCosine - L2 normalization before insertion/search; zero vectors rejected (returns
None) - Cosine similarity =
1 - distance(DistCosine returns 1-cos_sim)
Hybrid ranking β fetches limit * 4 candidates from each:
- Configurable weights via
with_weights(vector_weight, bm25_weight)(defaults: 0.7 / 0.3) - Without vectors: BM25 only (graceful fallback)
octos-agent β Agent Runtime
Agent Core
#![allow(unused)]
fn main() {
pub struct Agent {
id: AgentId,
llm: Arc<dyn LlmProvider>,
tools: ToolRegistry,
memory: Arc<EpisodeStore>,
embedder: Option<Arc<dyn EmbeddingProvider>>,
system_prompt: RwLock<String>,
config: AgentConfig,
reporter: Arc<dyn ProgressReporter>,
shutdown: Arc<AtomicBool>, // Acquire/Release ordering
}
pub struct AgentConfig {
pub max_iterations: u32, // default: 50 (CLI overrides to 20)
pub max_tokens: Option<u32>, // None = unlimited
pub max_timeout: Option<Duration>,// default: 600s wall-clock timeout
pub save_episodes: bool, // default: true
}
}
Execution Loop (run_task / process_message)
1. Build messages: system prompt + history + memory context + input
2. Loop (up to max_iterations):
a. Check shutdown flag and token budget
b. trim_to_context_window() β compact if needed
c. Call LLM via chat_stream()
d. Consume stream β accumulate text, tool_calls, tokens
e. Match stop_reason:
- EndTurn/StopSequence β save episode, return result
- ToolUse β execute_tools() β append results β continue
- MaxTokens β return result
ConversationResponse: content: String, token_usage: TokenUsage, files_modified: Vec<PathBuf>, streamed: bool
Episode saving: After task completion, fires-and-forgets embedding generation if embedder present.
Wall-clock timeout: Agent aborts after max_timeout (default 600s) regardless of iteration count.
Tool Output Sanitization
Before feeding tool results back to the LLM, sanitize_tool_output() (in sanitize.rs) strips noise:
- Base64 data URIs:
data:...;base64,<payload>β[base64-data-redacted] - Long hex strings: 64+ contiguous hex chars (SHA-256, raw keys) β
[hex-redacted]
Context Compaction
Triggered when estimated tokens exceed 80% of context window / 1.2 safety margin.
Algorithm:
- Keep MIN_RECENT_MESSAGES (6) most recent non-system messages
- Donβt split inside tool call/result pairs
- Summarize old messages: first line (200 chars), strip tool arguments, drop media
- Budget: 40% of total for summary (BASE_CHUNK_RATIO = 0.4)
- Replace:
[System, CompactionSummary, Recent1, Recent2, ...]
Format:
- User:
> User: first line [media omitted] - Assistant:
> Assistant: contentor- Called tool_name - Tool:
-> tool_name: ok|error - first 100 chars
Bundled App Skills (bundled_app_skills.rs)
Compile-time embedded app-skill entries. Each app-skill crate (news, deep-search, deep-crawl, etc.) is registered as a bundled skill available at runtime.
Bootstrap (bootstrap.rs)
Bootstraps bundled skills at gateway startup. Ensures all bundled app-skills are registered and available.
Prompt Guard (prompt_guard.rs)
Prompt injection detection. ThreatKind enum classifies detected threats. Scans user input before passing to the agent.
Tool System
#![allow(unused)]
fn main() {
pub trait Tool: Send + Sync {
fn name(&self) -> &str;
fn description(&self) -> &str;
fn tags(&self) -> &[&str];
fn input_schema(&self) -> serde_json::Value;
async fn execute(&self, args: &serde_json::Value) -> Result<ToolResult>;
}
pub struct ToolResult {
pub output: String,
pub success: bool,
pub file_modified: Option<PathBuf>,
pub tokens_used: Option<TokenUsage>,
}
}
ToolRegistry: HashMap<String, Arc<dyn Tool>> with provider_policy: Option<ToolPolicy> for soft filtering.
Built-in Tools (14)
| Tool | Parameters | Key Behavior |
|---|---|---|
| read_file | path, start_line?, end_line? | Line numbers (NNN|), 100KB truncation, symlink rejection |
| write_file | path, content | Creates parent dirs, returns file_modified |
| edit_file | path, old_string, new_string | Exact match required, error on 0 or >1 occurrences |
| diff_edit | path, diff | Unified diff with fuzzy matching (+-3 lines), reverse hunk application |
| glob | pattern, limit=100 | Rejects absolute paths and .., relative results |
| grep | pattern, file_pattern?, limit=50, context=0, ignore_case=false | .gitignore-aware via ignore::WalkBuilder, regex with (?i) flag |
| list_dir | path | Sorted, [dir]/[file] prefix |
| shell | command, timeout_secs=120 | SafePolicy check, 50KB output truncation, sandbox-wrapped, timeout clamped to [1, 600]s |
| web_search | query, count=5 | Brave Search API (BRAVE_API_KEY) |
| web_fetch | url, extract_mode=βmarkdownβ, max_chars=50000 | SSRF protection, htmd HTMLβmarkdown, 30s timeout |
| message | content, channel?, chat_id? | Cross-channel messaging via OutboundMessage. Gateway-only |
| spawn | task, label?, mode=βbackgroundβ, allowed_tools, context? | Subagent with inherited provider policy. sync=inline, background=async. Gateway-only |
| cron | action, message, schedule params | Schedule add/list/remove/enable/disable. Gateway-only |
| browser | action, url?, selector?, text?, expression? | Headless Chrome via CDP (always compiled). Actions: navigate (SSRF + scheme check), get_text, get_html, click, type, screenshot, evaluate, close. 5min idle timeout, env sanitization, 10s JS timeout, early action validation |
Registration: Core tools registered in ToolRegistry::with_builtins() (all modes). Browser is always compiled. Message, spawn, and cron are registered only in gateway mode (gateway.rs).
Tool Policies
#![allow(unused)]
fn main() {
pub struct ToolPolicy {
pub allow: Vec<String>, // empty = allow all
pub deny: Vec<String>, // deny-wins
}
}
Groups: group:fs (read_file, write_file, edit_file, diff_edit), group:runtime (shell), group:web (web_search, web_fetch, browser), group:search (glob, grep, list_dir), group:sessions (spawn).
Wildcards: exec* matches prefix. Provider-specific policies via config tools.byProvider.
Command Policy (ShellTool)
#![allow(unused)]
fn main() {
pub enum Decision { Allow, Deny, Ask }
}
SafePolicy deny patterns: rm -rf /, rm -rf /*, dd if=, mkfs, :(){:|:&};:, chmod -R 777 /. Commands are whitespace-normalized before matching to prevent evasion via extra spaces/tabs.
SafePolicy ask patterns: sudo, rm -rf, git push --force, git reset --hard
Sandbox
#![allow(unused)]
fn main() {
pub enum SandboxMode { Auto, Bwrap, Macos, Docker, None }
}
BLOCKED_ENV_VARS (18 vars, shared across all backends + MCP):
LD_PRELOAD, LD_LIBRARY_PATH, LD_AUDIT, DYLD_INSERT_LIBRARIES, DYLD_LIBRARY_PATH, DYLD_FRAMEWORK_PATH, DYLD_FALLBACK_LIBRARY_PATH, DYLD_VERSIONED_LIBRARY_PATH, NODE_OPTIONS, PYTHONSTARTUP, PYTHONPATH, PERL5OPT, RUBYOPT, RUBYLIB, JAVA_TOOL_OPTIONS, BASH_ENV, ENV, ZDOTDIR
| Backend | Isolation | Network | Path Validation |
|---|---|---|---|
| Bwrap (Linux) | RO bind /usr,/lib,/bin,/sbin,/etc; RW bind workdir; tmpfs /tmp; unshare-pid | --unshare-net if !allow_network | N/A |
| Macos (sandbox-exec) | SBPL profile: process-exec/fork, file-read*, writes to workdir+/private/tmp | (allow network*) or (deny network*) | Rejects control chars, (, ), \, " |
| Docker | --rm --security-opt no-new-privileges --cap-drop ALL | --network none | Rejects :, \0, \n, \r |
Docker resource limits: --cpus, --memory, --pids-limit. Mount modes: None (/tmp workdir), ReadOnly, ReadWrite.
Hooks System
Lifecycle hooks run shell commands at agent events. Configured via hooks array in config.
#![allow(unused)]
fn main() {
pub enum HookEvent { BeforeToolCall, AfterToolCall, BeforeLlmCall, AfterLlmCall }
pub struct HookConfig {
pub event: HookEvent,
pub command: Vec<String>, // argv array (no shell interpretation)
pub timeout_ms: u64, // default: 5000
pub tool_filter: Vec<String>, // tool events only; empty = all
}
}
Shell protocol: JSON payload on stdin. Exit code semantics: 0=allow, 1=deny (before-hooks only), 2+=error. Before-hooks can deny operations; after-hook exit codes only count as errors.
Circuit breaker: HookExecutor auto-disables a hook after 3 consecutive failures (configurable via with_threshold()). Resets on success.
Environment: Commands sanitized via BLOCKED_ENV_VARS. Tilde expansion supports ~/ and ~username/.
Integration: Wired into chat.rs, gateway.rs, serve.rs. Hook config changes trigger restart via config watcher.
MCP Integration
JSON-RPC transport for Model Context Protocol servers. Two transport modes:
Transports:
- Stdio: Spawns server as child process (command + args + env). Line limit: 1MB. Env sanitized via
BLOCKED_ENV_VARS. - HTTP/SSE: Connects to remote server via
urlfield. POST JSON, SSE response handling.
Lifecycle (stdio):
- Spawn server (command + args + env, filtering BLOCKED_ENV_VARS)
- Initialize:
protocolVersion: "2024-11-05" - Discover tools:
tools/listRPC - Validate input schemas (max depth 10, max size 64KB); reject tools with invalid schemas
- Register McpTool wrappers (30s timeout, 1MB max response)
McpTool execution: tools/call with name + arguments. Extracts content[].text from response.
Skills System
Skills are markdown instruction files that extend agent capabilities. Two sources: built-in (compiled into binary) and workspace (user-installed).
Skill File Format (SKILL.md)
---
name: skill_name
description: What it does
requires_bins: binary1, binary2 # comma-separated, checked via `which`
requires_env: ENV_VAR1, ENV_VAR2 # comma-separated, checked via std::env::var()
always: true|false # auto-load into system prompt when available
---
Skill instructions here (markdown). This body is injected into the agent's
system prompt when the skill is activated.
Frontmatter parsing: Simple key: value line matching (not full YAML). split_frontmatter() finds content between --- delimiters. strip_frontmatter() returns body only.
SkillInfo
#![allow(unused)]
fn main() {
pub struct SkillInfo {
pub name: String,
pub description: String,
pub path: PathBuf, // filesystem path or "(built-in)/name/SKILL.md"
pub available: bool, // bins_ok && env_ok
pub always: bool, // auto-load into system prompt
pub builtin: bool, // true if from BUILTIN_SKILLS, false if workspace
}
}
Availability check: available = requires_bins all found on PATH AND requires_env all set. Missing requirements make the skill unavailable but still listed.
SkillsLoader
#![allow(unused)]
fn main() {
pub struct SkillsLoader {
skills_dir: PathBuf, // {data_dir}/skills/
}
}
Methods:
list_skills()β scans workspace dir + built-ins. Workspace skills override built-ins with same name (checked via HashSet). Results sorted alphabetically.load_skill(name)β returns body (frontmatter stripped). Checks workspace first, falls back to built-in.build_skills_summary()β generates XML for system prompt injection:<skills> <skill available="true"> <name>skill_name</name> <description>What it does</description> <location>/path/to/SKILL.md</location> </skill> </skills>get_always_skills()β filters skills wherealways: trueANDavailable: true.load_skills_for_context(names)β loads multiple skills, joins with\n---\n.
Built-in Skills (3, compile-time include_str!())
#![allow(unused)]
fn main() {
pub struct BuiltinSkill {
pub name: &'static str,
pub content: &'static str, // full SKILL.md including frontmatter
}
pub const BUILTIN_SKILLS: &[BuiltinSkill] = &[...];
}
| Skill | Purpose |
|---|---|
| cron | Task scheduling instructions |
| skill-store | Skill store browsing and installation |
| skill-creator | Create new skills |
| tmux | Terminal multiplexer control |
| weather | Weather information retrieval |
CLI Management (octos skills)
listβ shows built-in skills (with override status) + workspace skillsinstall <user/repo/skill-name>β fetchesSKILL.mdfromhttps://raw.githubusercontent.com/{repo}/main/SKILL.md(15s timeout), saves to.octos/skills/{name}/SKILL.md. Fails if skill already exists.remove <name>β deletes.octos/skills/{name}/directory
Integration with Gateway
In the gateway command, skills are loaded during system prompt construction:
get_always_skills()β collects auto-load skill namesload_skills_for_context(names)β loads and joins skill bodiesbuild_skills_summary()β appends XML skill index to system prompt- Always-on skill content is prepended to the system prompt
Plugin System
Plugins extend the agent with external tools via standalone executables. Each plugin is a directory containing a manifest.json and an executable file.
Directory Layout
.octos/plugins/ # local (project-level)
~/.octos/plugins/ # global (user-level)
βββ my-plugin/
βββ manifest.json # plugin metadata + tool definitions
βββ my-plugin # executable (or "main" as fallback)
Discovery order: local .octos/plugins/ first, then global ~/.octos/plugins/. Both are scanned by Config::plugin_dirs().
PluginManifest
#![allow(unused)]
fn main() {
pub struct PluginManifest {
pub name: String,
pub version: String,
pub tools: Vec<PluginToolDef>, // default: empty vec
}
pub struct PluginToolDef {
pub name: String, // must be unique across all plugins
pub description: String,
pub input_schema: serde_json::Value, // default: {"type": "object"}
}
}
Example manifest.json:
{
"name": "my-plugin",
"version": "0.1.0",
"tools": [
{
"name": "greet",
"description": "Greet someone by name",
"input_schema": {
"type": "object",
"properties": { "name": { "type": "string" } }
}
}
]
}
PluginLoader
#![allow(unused)]
fn main() {
pub struct PluginLoader; // stateless, all methods are associated functions
}
load_into(registry, dirs):
- Scan each directory for subdirectories
- For each subdirectory, look for
manifest.json - Parse manifest, find executable (try directory name first, then
main) - Validate executable permissions (Unix:
mode & 0o111 != 0; non-Unix: existence check) - Wrap each tool definition as a
PluginToolimplementing theTooltrait - Register into
ToolRegistry - Log warning:
"loaded unverified plugin (no signature check)" - Return total tool count. Failed plugins are skipped with warning, not fatal.
PluginTool β Execution Protocol
#![allow(unused)]
fn main() {
pub struct PluginTool {
plugin_name: String,
tool_def: PluginToolDef,
executable: PathBuf,
}
}
Invocation: executable <tool_name> (tool name passed as first argument).
stdin/stdout protocol:
- Spawn executable with tool name as arg, piped stdin/stdout/stderr
- Write JSON-serialized arguments to stdin, close (EOF signals end of input)
- Wait for exit with 30s timeout (
PLUGIN_TIMEOUT) - Parse stdout as JSON:
- Structured:
{"output": "...", "success": true/false}β use parsed values - Fallback: raw stdout + stderr concatenated, success from exit code
- Structured:
- Return
ToolResult(nofile_modifiedtracking for plugins)
Error handling:
- Spawn failure β eyre error with plugin name and executable path
- Timeout β eyre error with plugin name, tool name, and duration
- JSON parse failure β graceful fallback to raw output
Progress Reporting
The agent emits structured events during execution via a trait-based observer pattern. Consumers (CLI, REST API) implement the trait to render progress in their own format.
ProgressReporter Trait
#![allow(unused)]
fn main() {
pub trait ProgressReporter: Send + Sync {
fn report(&self, event: ProgressEvent);
}
}
Agent holds reporter: Arc<dyn ProgressReporter>. Events are fired synchronously during the execution loop (non-blocking β implementations must not block).
ProgressEvent Enum
#![allow(unused)]
fn main() {
pub enum ProgressEvent {
TaskStarted { task_id: String },
Thinking { iteration: u32 },
Response { content: String, iteration: u32 },
ToolStarted { name: String, tool_id: String },
ToolCompleted { name: String, tool_id: String, success: bool,
output_preview: String, duration: Duration },
FileModified { path: String },
TokenUsage { input_tokens: u32, output_tokens: u32 },
TaskCompleted { success: bool, iterations: u32, duration: Duration },
TaskInterrupted { iterations: u32 },
MaxIterationsReached { limit: u32 },
TokenBudgetExceeded { used: u32, limit: u32 },
StreamChunk { text: String, iteration: u32 },
StreamDone { iteration: u32 },
CostUpdate { session_input_tokens: u32, session_output_tokens: u32,
response_cost: Option<f64>, session_cost: Option<f64> },
}
}
Implementations (3)
SilentReporter β no-op, used as default when no reporter is configured.
ConsoleReporter β CLI output with ANSI colors and streaming support:
#![allow(unused)]
fn main() {
pub struct ConsoleReporter {
use_colors: bool,
verbose: bool,
stdout: Mutex<BufWriter<Stdout>>, // buffered for streaming chunks
}
}
| Event | Output |
|---|---|
| Thinking | \rβ³ Thinking... (iteration N) (overwrites line, yellow) |
| Response | β first 3 lines... (cyan, clears Thinking line) |
| ToolStarted | \rβ Running tool_name... (overwrites line, yellow) |
| ToolCompleted | β tool_name (duration) green or β tool_name red; verbose: 5 lines of output + ... |
| FileModified | π Modified: path (green) |
| TokenUsage | Tokens: N in, N out (verbose only, dim) |
| TaskCompleted | β Completed N iterations, Xs or β Failed after N iterations |
| TaskInterrupted | β Interrupted after N iterations. (yellow) |
| MaxIterationsReached | β Reached max iterations limit (N). (yellow) |
| TokenBudgetExceeded | β Token budget exceeded (used, limit). (yellow) |
| StreamChunk | Write to buffered stdout; flush only on \n (reduces syscalls) |
| StreamDone | Flush + newline |
| CostUpdate | Tokens: N in / N out | Cost: $X.XXXX |
| TaskStarted | βΆ Task: id (verbose only, dim) |
Duration formatting: >1s β {:.1}s, β€1s β {N}ms.
SseBroadcaster (REST API, feature: api) β converts events to JSON and broadcasts via tokio::sync::broadcast channel:
#![allow(unused)]
fn main() {
pub struct SseBroadcaster {
tx: broadcast::Sender<String>, // JSON-serialized events
}
}
| ProgressEvent | JSON type field | Additional fields |
|---|---|---|
| ToolStarted | "tool_start" | tool |
| ToolCompleted | "tool_end" | tool, success |
| StreamChunk | "token" | text |
| StreamDone | "stream_end" | β |
| CostUpdate | "cost_update" | input_tokens, output_tokens, session_cost |
| Thinking | "thinking" | iteration |
| Response | "response" | iteration |
| (other) | "other" | β (logged at debug level) |
Subscribers receive events via SseBroadcaster::subscribe() -> broadcast::Receiver<String>. Send errors (no subscribers) are silently ignored.
Execution Environments (exec_env.rs)
ExecEnvironment trait with exec(cmd, args, env), read_file(path), write_file(path, content), file_exists(path), list_dir(path). Two implementations: LocalEnvironment (tokio::process::Command) and DockerEnvironment (docker exec). Environment variables sanitized via shared BLOCKED_ENV_VARS. Docker paths validated against injection characters (\0, \n, \r, :). Docker env vars forwarded via --env flags.
Provider Toolsets (provider_tools.rs)
ToolAdjustment (prefer, demote, aliases, extras) per LLM provider. ProviderToolsets registry with with_defaults() for openai/anthropic/google. Used to optimize tool presentation per provider (e.g., OpenAI prefers shell/read_file, demotes diff_edit).
Typed Turns (turn.rs)
Turn wraps Message with TurnKind (UserInput, AgentReply, ToolCall, ToolResult, System) and iteration number. turns_to_messages() converts back to Vec<Message> for LLM calls. Enables semantic analysis of conversation history.
Event Bus (event_bus.rs)
EventBus with typed EventSubscriber for pub/sub within the agent. Decouples event producers (tool execution, LLM calls) from consumers (logging, metrics, UI updates).
Loop Detection (loop_detect.rs)
Detects repetitive agent behavior (e.g., calling the same tool with same args). Configurable threshold and window. Returns early with diagnostic message when loop detected.
Session State (session.rs)
SessionState with SessionLimits and SessionUsage tracking. SessionStateHandle for thread-safe access. Tracks token usage, iteration count, and wall-clock time against configured limits.
Steering (steering.rs)
SteeringMessage with SteeringSender/SteeringReceiver (mpsc channel). Allows external control of agent behavior mid-conversation (e.g., injecting guidance, changing strategy).
Prompt Layers (prompt_layer.rs)
PromptLayerBuilder for composing system prompts from multiple sources (base prompt, persona, user context, memory, skills). Layers are concatenated in order with configurable separators.
octos-bus β Gateway Infrastructure
Message Bus
create_bus() -> (AgentHandle, BusPublisher) linked by mpsc channels (capacity 256). AgentHandle receives InboundMessages; BusPublisher dispatches OutboundMessages.
Queue Modes (configured via gateway.queue_mode):
Followup(default): FIFO β process queued messages one at a timeCollect: Merge queued messages by session, concatenating content before processing
Channel Trait
#![allow(unused)]
fn main() {
#[async_trait]
pub trait Channel: Send + Sync {
fn name(&self) -> &str;
async fn start(&self, inbound_tx: mpsc::Sender<InboundMessage>) -> Result<()>;
async fn send(&self, msg: &OutboundMessage) -> Result<()>;
fn is_allowed(&self, sender_id: &str) -> bool;
async fn stop(&self) -> Result<()>;
}
}
Channel Implementations
| Channel | Transport | Feature Flag | Auth | Dedup |
|---|---|---|---|---|
| CLI | stdin/stdout | (always) | N/A | N/A |
| Telegram | teloxide long-poll | telegram | Bot token (env) | teloxide built-in |
| Discord | serenity gateway | discord | Bot token (env) | serenity built-in |
| Slack | Socket Mode (tokio-tungstenite) | slack | Bot token + App token | message_ts |
| WebSocket bridge (ws://localhost:3001) | whatsapp | Baileys bridge | HashSet (10K cap, clear on overflow) | |
| Feishu | WebSocket (tokio-tungstenite) | feishu | App ID + Secret β tenant token (TTL 6000s) | HashSet (10K cap, clear on overflow) |
| IMAP poll + SMTP send | email | Username/password, rustls TLS | IMAP UNSEEN flag | |
| WeCom | WeCom/WeChat Work API | wecom | Corp ID + Agent Secret | message_id |
| Twilio | Twilio SMS/MMS | twilio | Account SID + Auth Token | message SID |
Email specifics: IMAP async-imap with rustls for inbound (poll unseen, mark \Seen). SMTP lettre for outbound (port 465=implicit TLS, other=STARTTLS). mailparse for RFC822 body extraction. Body truncated via truncate_utf8(max_body_chars).
Feishu specifics: Tenant access token with TTL cache (6000s). WebSocket gateway URL from /callback/ws/endpoint. Message type detection via header.event_type == "im.message.receive_v1". Supports oc_* (chat_id) vs ou_* (open_id) routing.
Markdown to HTML: markdown_html.rs converts Markdown to Telegram-compatible HTML for rich message formatting.
Media: download_media() helper downloads photos/voice/audio/documents to .octos/media/.
Transcription: Voice/audio auto-transcribed via GroqTranscriber before agent processing.
Message Coalescing
Splits oversized messages into channel-safe chunks:
| Channel | Max Chars |
|---|---|
| Telegram | 4000 |
| Discord | 1900 |
| Slack | 3900 |
Break preference: paragraph (\n\n) > newline (\n) > sentence (. ) > space ( ) > hard cut.
MAX_CHUNKS = 50 (DoS limit). UTF-8 safe boundary detection via char_indices().
Session Manager
JSONL persistence at .octos/sessions/{key}.jsonl.
- In-memory cache: LRU with disk sync on write
- Filenames: Percent-encoded SessionKey, truncated to 183 chars with
_{hash:016X}suffix on truncation to prevent collisions - File size limit: 10MB max (
MAX_SESSION_FILE_SIZE); oversized files skipped on load - Crash safety: Atomic write-then-rename
- Forking:
fork()creates child session withparent_keytracking, copies last N messages
Cron Service
JSON persistence at .octos/cron.json.
Schedule types:
Every { seconds: u64 }β recurring intervalCron { expr: String }β cron expression viacroncrateAt { timestamp_ms: i64 }β one-shot (auto-delete after run)
CronJob fields: id (8-char hex from UUIDv7), name, enabled, schedule, payload (message + deliver flag + channel + chat_id), state (next_run_at_ms, run_count), delete_after_run.
Heartbeat Service
Periodic check of HEARTBEAT.md (default: 30 min interval). Sends content to agent if non-empty.
octos-cli β CLI & Configuration
Commands
| Command | Description |
|---|---|
chat | Interactive multi-turn chat. Readline with history. Exit: exit/quit/:q |
gateway | Persistent multi-channel daemon with session management |
init | Initialize .octos/ with config, templates, directories |
status | Show config, provider, API keys, bootstrap files |
auth login/logout/status | OAuth PKCE (OpenAI), device code, paste-token |
cron list/add/remove/enable | CLI cron job management |
channels status/login | Channel compilation status, WhatsApp bridge setup |
skills list/install/remove | Skill management, GitHub fetch |
office | Office/workspace management |
account | Account management |
clean | Remove .redb files with dry-run support |
completions | Shell completion generation (bash/zsh/fish) |
docs | Generate tool + provider documentation |
serve | REST API server (feature: api) β axum on 127.0.0.1:8080 (--host to override) |
Configuration
Loaded from .octos/config.json (local) or ~/.config/octos/config.json (global). Local takes precedence.
${VAR}expansion: Environment variable substitution in string values- Versioned config: Version field with automatic
migrate_config()framework - Provider auto-detect (
registry::detect_provider(model)): claudeβanthropic, gpt/o1/o3/o4βopenai, geminiβgemini, deepseekβdeepseek, kimi/moonshotβmoonshot, qwenβdashscope, glmβzhipu, llama/mixtralβgroq. Patterns defined per-provider inregistry/.
API key resolution order: Auth store (~/.octos/auth.json) β environment variable.
Auth Module
OAuth PKCE (OpenAI):
- Generate 64-char verifier (two UUIDv4 hex)
- SHA-256 challenge, base64-URL encode (no padding)
- TCP listener on port 1455
- Browser β
auth.openai.comwith PKCE + state - Callback validates state (CSRF), exchanges code+verifier for tokens
Device Code Flow (OpenAI): POST deviceauth/usercode, poll deviceauth/token every 5s+.
Paste Token: Prompt for API key from stdin, store as auth_method: "paste_token".
AuthStore: ~/.octos/auth.json (mode 0600). {credentials: {provider: AuthCredential}}.
Config Watcher
Polls every 5 seconds. SHA-256 hash comparison of file contents.
Hot-reloadable: system_prompt, max_history (applied live).
Restart-required: provider, model, base_url, api_key_env, sandbox, mcp_servers, hooks, gateway.queue_mode, channels.
REST API (feature: api)
| Route | Method | Description |
|---|---|---|
/api/chat | POST | Send message β response |
/api/chat/stream | GET | SSE stream of ProgressEvents |
/api/sessions | GET | List all sessions |
/api/sessions/{id}/messages | GET | Paginated history (?limit=100&offset=0, max 500) |
/api/status | GET | Version, model, provider, uptime |
/metrics | GET | Prometheus text exposition format (unauthenticated) |
/* (fallback) | GET | Embedded web UI (static files via rust-embed) |
Auth: Optional bearer token with constant-time comparison (API routes only; /metrics and static files are public). CORS: localhost:3000/8080. Max message: 1MB.
Web UI: Embedded SPA via rust-embed served as the fallback handler. Session sidebar, chat interface, SSE streaming, dark theme. Vanilla HTML/CSS/JS (no build tools).
Prometheus Metrics: octos_tool_calls_total (counter, labels: tool, success), octos_tool_call_duration_seconds (histogram, label: tool), octos_llm_tokens_total (counter, label: direction). Powered by metrics + metrics-exporter-prometheus crates.
Session Compaction (Gateway)
Triggered when message count > 40 (threshold). Keeps 10 recent messages. Summarizes older messages via LLM to <500 words. Rewrites JSONL session file.
octos-pipeline β DOT-based Pipeline Orchestration
DOT-based pipeline orchestration engine for defining and executing multi-step workflows.
parser.rsβ DOT graph parser (parses Graphviz DOT format into pipeline definitions)graph.rsβ PipelineGraph with node/edge typesexecutor.rsβ Async pipeline execution enginehandler.rsβ Handler types: CodergenHandler, GateHandler, ShellHandler, NoopHandler, DynamicParallelcondition.rsβ Conditional edge evaluation (branching logic)tool.rsβ RunPipelineTool integration (exposes pipeline execution as an agent tool)validate.rsβ Graph validation and lint diagnosticshuman_gate.rsβ Human-in-the-loop gates withHumanInputProvidertrait,ChannelInputProvider(mpsc + oneshot, 5min default timeout),AutoApproveProvider. Input types: Approval, FreeText, Choicefidelity.rsβFidelityModeenum (Full, Truncate, Compact, Summary) for context carryover control between nodes. Parse from config strings. Safety caps: 10MB max_chars, 100K max_linesmanager.rsβPipelineManagersupervisor withSupervisionStrategy(AllOrNothing, BestEffort, RetryFailed). Retry capped at 10 with exponential backoff (100ms-5s).ManagerOutcomeconverts toNodeOutcomethread.rsβThreadRegistryfor LLM session reuse across pipeline nodes.Threadstores model_id + message history. Limits: 1000 threads, 10000 messages per threadserver.rsβPipelineServertrait withSubmitRequest(validated: 1MB DOT, 256KB input, 64 variables, safe pipeline IDs),RunStatuslifecycle (Queued β Running β Completed/Failed/Cancelled)artifact.rsβ Pipeline artifact storage for intermediate outputscheckpoint.rsβ Pipeline checkpoint/resume for crash recoveryevents.rsβ Pipeline event system for progress trackingrun_dir.rsβ Per-run working directories with isolationstylesheet.rsβ Visual styling for pipeline graph rendering
Data Flows
Chat Mode
User Input β readline β Agent.process_message(input, history)
β
ββ Build messages (system + history + memory + input)
ββ trim_to_context_window() if needed
ββ Call LLM via chat_stream() with tool specs
ββ Execute tools if ToolUse (loop)
ββ Return ConversationResponse
β
Print response, append to history
Gateway Mode
Channel β InboundMessage β MessageBus β [transcribe audio] β [load session]
β
Agent.process_message()
β
OutboundMessage
β
ChannelManager.dispatch()
β
coalesce() β Channel.send()
System messages (cron, heartbeat, spawn results) flow through the same bus with channel: "system" and metadata routing.
Feature Flags
# octos-bus
telegram = ["teloxide"]
discord = ["serenity"]
slack = ["tokio-tungstenite"]
whatsapp = ["tokio-tungstenite"]
feishu = ["tokio-tungstenite"]
email = ["async-imap", "tokio-rustls", "rustls", "webpki-roots", "lettre", "mailparse"]
# octos-agent (browser is always compiled in, no longer feature-gated)
git = ["gix"] # git operations via gitoxide
ast = ["tree-sitter"] # code_structure.rs AST analysis
admin-bot = [...] # admin/ directory tools
# octos-bus (additional)
wecom = [...] # WeCom/WeChat Work channel
twilio = [...] # Twilio SMS/MMS channel
# octos-cli
api = ["axum", "tower-http", "futures"]
telegram = ["octos-bus/telegram"]
discord = ["octos-bus/discord"]
slack = ["octos-bus/slack"]
whatsapp = ["octos-bus/whatsapp"]
feishu = ["octos-bus/feishu"]
email = ["octos-bus/email"]
wecom = ["octos-bus/wecom"]
twilio = ["octos-bus/twilio"]
File Layout
crates/
βββ octos-core/src/
β βββ lib.rs, task.rs, types.rs, error.rs, gateway.rs, message.rs, utils.rs
βββ octos-llm/src/
β βββ lib.rs, provider.rs, config.rs, types.rs, retry.rs, failover.rs, sse.rs
β βββ embedding.rs, pricing.rs, context.rs, transcription.rs, vision.rs
β βββ adaptive.rs, swappable.rs, router.rs, ominix.rs
β βββ anthropic.rs, openai.rs, gemini.rs, openrouter.rs (protocol impls)
β βββ registry/ (mod.rs + 14 provider entries: anthropic, openai, gemini,
β openrouter, deepseek, groq, moonshot, dashscope, minimax,
β zhipu, zai, nvidia, ollama, vllm)
βββ octos-memory/src/
β βββ lib.rs, episode.rs, store.rs, memory_store.rs, hybrid_search.rs
βββ octos-agent/src/
β βββ lib.rs, agent.rs, progress.rs, policy.rs, compaction.rs, sanitize.rs, hooks.rs
β βββ sandbox.rs, mcp.rs, skills.rs, builtin_skills.rs
β βββ bundled_app_skills.rs, bootstrap.rs, prompt_guard.rs
β βββ plugins/ (mod.rs, loader.rs, manifest.rs, tool.rs)
β βββ skills/ (cron, skill-store, skill-creator SKILL.md)
β βββ tools/ (mod, policy, shell, read_file, write_file, edit_file, diff_edit,
β list_dir, glob_tool, grep_tool, web_search, web_fetch,
β message, spawn, browser, ssrf, tool_config,
β deep_search, site_crawl, recall_memory, save_memory,
β send_file, take_photo, code_structure, git,
β deep_research_pipeline, synthesize_research, research_utils,
β admin/ (profiles, skills, sub_accounts, system,
β platform_skills, update))
βββ octos-bus/src/
β βββ lib.rs, bus.rs, channel.rs, session.rs, coalesce.rs, media.rs
β βββ cli_channel.rs, telegram_channel.rs, discord_channel.rs
β βββ slack_channel.rs, whatsapp_channel.rs, feishu_channel.rs, email_channel.rs
β βββ wecom_channel.rs, twilio_channel.rs, markdown_html.rs
β βββ cron_service.rs, cron_types.rs, heartbeat.rs
βββ octos-cli/src/
βββ main.rs, config.rs, config_watcher.rs, cron_tool.rs, compaction.rs
βββ auth/ (mod.rs, store.rs, oauth.rs, token.rs)
βββ api/ (mod.rs, router.rs, handlers.rs, sse.rs, metrics.rs, static_files.rs)
βββ commands/ (mod, chat, init, status, gateway, clean,
completions, cron, channels, auth, skills, docs, serve,
office, account)
βββ octos-pipeline/src/
β βββ lib.rs, parser.rs, graph.rs, executor.rs, handler.rs
β βββ condition.rs, tool.rs, validate.rs
Security
Workspace-Level Safety
#![deny(unsafe_code)]β workspace-wide lint via[workspace.lints.rust]secrecy::SecretStringβ all provider API keys are wrapped; prevents accidental logging/display
Authentication & Credentials
- API keys: auth store (
~/.octos/auth.json, mode 0600) checked before env vars - OAuth PKCE with SHA-256 challenges, state parameter (CSRF protection)
- Constant-time byte comparison for API bearer tokens (timing attack prevention)
Execution Sandbox
- Three backends: bwrap (Linux), sandbox-exec (macOS), Docker β
SandboxMode::Autodetection - 18 BLOCKED_ENV_VARS shared across all sandbox backends, MCP server spawning, hooks, and browser tool:
LD_PRELOAD, LD_LIBRARY_PATH, LD_AUDIT, DYLD_INSERT_LIBRARIES, DYLD_LIBRARY_PATH, DYLD_FRAMEWORK_PATH, DYLD_FALLBACK_LIBRARY_PATH, DYLD_VERSIONED_LIBRARY_PATH, NODE_OPTIONS, PYTHONSTARTUP, PYTHONPATH, PERL5OPT, RUBYOPT, RUBYLIB, JAVA_TOOL_OPTIONS, BASH_ENV, ENV, ZDOTDIR - Path injection prevention per backend (Docker:
:,\0,\n,\r; macOS: control chars,(,),\,") - Docker:
--cap-drop ALL,--security-opt no-new-privileges,--network none, blocked bind mount sources (docker.sock,/proc,/sys,/dev,/etc)
Tool Safety
- ShellTool SafePolicy: deny
rm -rf /,dd,mkfs, fork bombs,chmod -R 777 /; ask forsudo,rm -rf,git push --force,git reset --hard. Whitespace-normalized before matching. Timeout clamped to [1, 600]s. SIGTERMβgrace periodβSIGKILL cleanup for child processes. - Tool policies: allow/deny with deny-wins semantics, 8 named groups (
group:fs,group:runtime,group:web,group:search,group:sessions, etc.), wildcard matching, provider-specific filtering viatools.byProvider - Tool argument size limit: 1MB per invocation (non-allocating
estimate_json_sizewith escape char accounting) - Symlink-safe file I/O via
O_NOFOLLOWon Unix (atomic kernel-level check, eliminates TOCTOU races); metadata-based symlink check fallback on Windows - SSRF protection in shared
ssrf.rsmodule: DNS resolution with fail-closed behavior (blocks on DNS failure), private IP blocking (10/8, 172.16/12, 192.168/16, 169.254/16), IPv6 coverage (ULAfc00::/7, link-localfe80::/10, site-localfec0::/10, IPv4-mapped::ffff:0:0/96, IPv4-compatible::/96), loopback blocking. Used by web_fetch and browser. - Browser: URL scheme allowlist (http/https only), 10s JS execution timeout, zombie process reaping, secure tempfiles for screenshots
- MCP: input schema validation (max depth 10, max size 64KB) prevents malicious tool definitions
- Prompt injection guard (
prompt_guard.rs): 5 threat categories (SystemOverride, RoleConfusion, ToolCallInjection, SecretExtraction, InstructionInjection), 10 detection patterns. Sanitizes threats by wrapping in[injection-blocked:...].
Data Safety
- Tool output sanitization (
sanitize.rs): strips base64 data URIs, long hex strings (64+ chars), and credential redaction with 7 regex patterns covering OpenAI (sk-...), Anthropic (sk-ant-...), AWS (AKIA...), GitHub (ghp_/gho_/ghs_/ghr_/github_pat_...), GitLab (glpat-...), Bearer tokens, and genericpassword/api_keyassignments - UTF-8 safe truncation via
truncate_utf8()across all tool outputs and email bodies - Session file collision prevention via percent-encoded filenames with hash suffix on truncation
- Session file size limit: 10MB max prevents OOM on corrupted files
- Atomic write-then-rename for session persistence (crash safety)
- API server binds to 127.0.0.1 by default (not 0.0.0.0)
- Channel access control via
allowed_senderslists - MCP response limit: 1MB per JSON-RPC line (DoS prevention)
- Message coalescing: MAX_CHUNKS=50 DoS limit
- API message limit: 1MB per request
Concurrency Model
Why Rust
octos uses Rust with the tokio async runtime, which provides significant advantages over Python (OpenClaw, etc.) and Node.js (NanoCloud, etc.) agent frameworks for concurrent session handling:
True parallelism β Tokio tasks run across all CPU cores simultaneously. Python has the GIL, so even with asyncio, CPU-bound work (JSON parsing, context compaction, token counting) is single-core. Node.js is single-threaded entirely. In octos, 10 concurrent sessions doing context compaction actually execute in parallel across cores.
Memory efficiency β No garbage collector, no runtime overhead per object. Agent sessions are compact structs on the heap. A Python agent session carries interpreter overhead, GC metadata on every object, and dict-based attribute lookup. This matters with hundreds of sessions and large conversation histories in memory.
No GC pauses β Python and Node.js GC can cause latency spikes mid-response. Rust has deterministic deallocation β memory is freed exactly when the owning struct drops.
Single binary deployment β No Python/Node runtime to install, no dependency hell, predictable resource usage. The gateway is one static binary.
Tokio Tasks vs OS Threads
All concurrent session processing uses tokio tasks (green threads), not OS threads. A tokio task is a state machine on the heap (~few KB). An OS thread is ~8MB stack. Thousands of tasks multiplex across a handful of OS threads (defaults to CPU core count). Since agent sessions spend most of their time awaiting I/O (LLM API responses), they yield the thread to other tasks efficiently.
Gateway Concurrency
Inbound messages β main loop
β
ββ tokio::spawn() per message
β β
β ββ Semaphore (max_concurrent_sessions, default 10)
β β bounds total concurrent agent runs
β β
β ββ Per-session Mutex
β serializes messages within same session
β
ββ Different sessions run concurrently
Same session queues sequentially
- Cross-session: concurrent, bounded by
max_concurrent_sessionssemaphore (default 10) - Within same session: serialized via per-session mutex β prevents race conditions on conversation history
- Per-session locks: pruned after completion (Arc strong_count == 1) to prevent unbounded HashMap growth
Tool Execution
Within a single agent iteration, all tool calls from one LLM response execute concurrently via join_all():
LLM response: [web_search, read_file, send_email]
β β β
ββββββββββββββΌββββββββββββ
join_all()
ββββββββββββββΌββββββββββββ
β β β
done done done
β
All results appended to messages
β
Next LLM call
Sub-Agent Modes (spawn tool)
| Aspect | Sync | Background |
|---|---|---|
| Parent blocks? | Yes | No (tokio::spawn()) |
| Result delivery | Same conversation turn | New inbound message via gateway |
| Token accounting | Counted toward parent budget | Independent |
| Use case | Sequential pipelines | Fire-and-forget long tasks |
Sub-agents cannot spawn further sub-agents (spawn tool is always denied in sub-agent policy).
Multi-Tenant Dashboard
The dashboard (octos serve) runs each user profile as a separate gateway OS process:
Dashboard (octos serve)
ββ Profile "alice" β octos gateway --config alice.json (deepseek, own semaphore)
ββ Profile "bob" β octos gateway --config bob.json (kimi, own semaphore)
ββ Profile "carol" β octos gateway --config carol.json (openai, own semaphore)
Each profile has its own LLM provider, API keys, channels, data directory, and max_concurrent_sessions semaphore. Profiles are fully isolated β no shared state between gateway processes.
Testing
1300+ tests across all crates. See TESTING.md for the full inventory and CI guide.
- Unit: type serde round-trips, tool arg parsing, config validation, provider detection, tool policies, compaction, coalescing, BM25 scoring, L2 normalization, SSE parsing
- Adaptive routing: Off/Hedge/Lane modes, circuit breaker, failover, scoring, metrics, provider racing (19 tests)
- Responsiveness: baseline learning, degradation detection, recovery, threshold boundaries (8 tests)
- Queue modes: Followup, Collect, Steer, Speculative overflow, auto-escalation/deescalation (9 tests)
- Session persistence: JSONL storage, LRU eviction, fork, rewrite, timestamp sort, concurrent access (28 tests)
- Integration: CLI commands, file tools, cron jobs, session forking, plugin loading
- Security: sandbox path injection, env sanitization, SSRF blocking, symlink rejection (O_NOFOLLOW), private IP detection, dedup overflow, tool argument size limits, session file size limits, circuit breaker threshold edge cases, MCP schema validation
- Channel: allowed_senders, message parsing, dedup logic, email address extraction
Local CI: ./scripts/ci.sh (mirrors GitHub Actions + focused subsystem tests). See TESTING.md.
Testing Guide
Quick Start
# Full local CI (mirrors GitHub Actions)
./scripts/ci.sh
# Fast iteration (skip clippy)
./scripts/ci.sh --quick
# Auto-fix formatting
./scripts/ci.sh --fix
# Memory-constrained machines
./scripts/ci.sh --serial
CI Pipeline
scripts/ci.sh runs the same checks as .github/workflows/ci.yml plus focused subsystem tests.
Steps
| Step | Command | Flags |
|---|---|---|
| 1. Format | cargo fmt --all -- --check | --fix auto-fixes |
| 2. Clippy | cargo clippy --workspace -- -D warnings | --quick skips |
| 3. Workspace tests | cargo test --workspace | --serial for single-thread |
| 4. Focused groups | Per-subsystem tests (see below) | Always runs |
Focused Test Groups
After the full workspace run, the CI script re-runs critical subsystems individually to surface failures clearly:
| Group | Crate | Test Filter | Count | What It Covers |
|---|---|---|---|---|
| Adaptive routing | octos-llm | adaptive::tests | 19 | Off/Hedge/Lane modes, circuit breaker, failover, scoring, metrics, racing |
| Responsiveness | octos-llm | responsiveness::tests | 8 | Baseline learning, degradation detection, recovery, threshold boundaries |
| Session actor | octos-cli | session_actor::tests | 9 | Queue modes, speculative overflow, auto-escalation/deescalation |
| Session persistence | octos-bus | session::tests | 28 | JSONL storage, LRU eviction, fork, rewrite, timestamp sort |
Session actor tests always run single-threaded (--test-threads=1) because they spawn full actors with mock providers and can OOM under parallel execution.
Feature Coverage
Adaptive Routing (crates/octos-llm/src/adaptive.rs β 19 tests)
Tests the AdaptiveRouter which manages multiple LLM providers with metrics-driven selection.
Off Mode (static priority)
| Test | What It Verifies |
|---|---|
test_selects_primary_on_cold_start | Priority order on first call (no metrics yet) |
test_lane_changing_off_uses_priority_order | Off mode ignores latency differences |
test_lane_changing_off_skips_circuit_broken | Off mode still respects circuit breaker |
test_hedged_off_uses_single_provider | Off mode uses priority, no racing |
Hedge Mode (provider racing)
| Test | What It Verifies |
|---|---|
test_hedged_racing_picks_faster_provider | Race 2 providers via tokio::select!, faster wins |
test_hedged_racing_survives_one_failure | Falls back to alternate when primary racer fails |
test_hedge_single_provider_falls_through | Hedge with 1 provider uses single-provider path |
Lane Mode (score-based selection)
| Test | What It Verifies |
|---|---|
test_lane_mode_picks_best_by_score | Switches to faster provider after metrics warm-up |
Circuit Breaker and Failover
| Test | What It Verifies |
|---|---|
test_circuit_breaker_skips_degraded | Skips provider after N consecutive failures |
test_failover_on_error | Falls over to next provider when primary fails |
test_all_providers_fail | Returns error when every provider fails |
Scoring and Metrics
| Test | What It Verifies |
|---|---|
test_scoring_cold_start_respects_priority | Cold-start scores follow config priority |
test_latency_samples_p95 | P95 calculation from circular buffer |
test_metrics_snapshot | Latency/success/failure recorded correctly |
test_metrics_export_after_calls | Export includes per-provider metrics |
Runtime Controls
| Test | What It Verifies |
|---|---|
test_mode_switch_at_runtime | Off β Hedge β Lane β Off switching |
test_qos_ranking_toggle | QoS ranking toggle is orthogonal to mode |
test_adaptive_status_reports_correctly | Status struct reflects current mode/count |
test_empty_router_panics | Asserts at least 1 provider required |
Responsiveness Observer (crates/octos-llm/src/responsiveness.rs β 8 tests)
Tests the latency tracker that drives auto-escalation.
Baseline Learning
| Test | What It Verifies |
|---|---|
test_baseline_learning | Baseline established from first 5 samples |
test_sample_count_tracking | sample_count() returns correct value |
Degradation Detection
| Test | What It Verifies |
|---|---|
test_degradation_detection | 3 consecutive slow requests (> 3x baseline) trigger activation |
test_at_threshold_boundary_not_triggered | Latency exactly at threshold is not βslowβ |
test_no_false_trigger_before_baseline | No activation before baseline is learned |
Recovery and Lifecycle
| Test | What It Verifies |
|---|---|
test_recovery_detection | 1 fast request after activation triggers deactivation |
test_multiple_activation_cycles | Activate β deactivate β reactivate works |
test_window_caps_at_max_size | Rolling window stays at 20 entries |
Queue Modes and Session Actor (crates/octos-cli/src/session_actor.rs β 9 tests)
Tests the per-session actor that owns message processing, queue policies, and auto-protection.
Mock infrastructure: DelayedMockProvider β configurable delay + scripted FIFO responses. setup_speculative_actor / setup_actor_with_mode β builds minimal actor with chosen queue mode and optional adaptive router.
Queue Mode: Followup
| Test | What It Verifies |
|---|---|
test_queue_mode_followup_sequential | Each message processed individually β 3 messages produce 3 responses, all appear in session history separately |
Queue Mode: Collect
| Test | What It Verifies |
|---|---|
test_queue_mode_collect_batches | Messages queued during a slow LLM call are batched into a single combined prompt ("msg2\n---\nQueued #1: msg3") |
Queue Mode: Steer
| Test | What It Verifies |
|---|---|
test_queue_mode_steer_keeps_newest | Older queued messages discarded, only newest processed β discarded message absent from session history |
Queue Mode: Speculative
| Test | What It Verifies |
|---|---|
test_speculative_overflow_concurrent | Overflow spawned as full agent task during slow primary (12s > 10s patience); both responses arrive; history sorted by timestamp |
test_speculative_within_patience_drops | Overflow dropped when primary within patience (5s < 10s); only 1 response arrives |
test_speculative_handles_background_result | BackgroundResult messages handled in the speculative select! loop without extra LLM calls |
Auto-Escalation / Deescalation
| Test | What It Verifies |
|---|---|
test_auto_escalation_on_degradation | 5 fast warmups (baseline 100ms) β 3 slow calls (400ms > 3x) β mode switches to Hedge + Speculative, user gets notification |
test_auto_deescalation_on_recovery | 1 fast response after escalation β mode reverts to Off + Followup, router confirms Off |
Utility
| Test | What It Verifies |
|---|---|
test_strip_think_tags | <think>...</think> block removal from LLM output |
Session Persistence (crates/octos-bus/src/session.rs β 28 tests)
Tests JSONL-backed session storage with LRU caching.
CRUD and Persistence
| Test | What It Verifies |
|---|---|
test_session_manager_create_and_retrieve | Create session, add messages, retrieve |
test_session_manager_persistence | Messages survive manager restart (disk reload) |
test_session_manager_clear | Clear deletes from memory and disk |
History and Ordering
| Test | What It Verifies |
|---|---|
test_session_get_history | Tail-slice returns last N messages |
test_session_get_history_all | Returns all when fewer than max |
test_sort_by_timestamp_restores_order | Restores chronological order after concurrent overflow writes |
LRU Cache
| Test | What It Verifies |
|---|---|
test_eviction_keeps_max_sessions | Cache respects capacity limit |
test_evicted_session_reloads_from_disk | Evicted sessions reload on access |
test_with_max_sessions_clamps_zero | Capacity clamped to minimum 1 |
Concurrency
| Test | What It Verifies |
|---|---|
test_concurrent_sessions | Multiple sessions donβt interfere |
test_concurrent_session_processing | 10 parallel tasks donβt corrupt sessions |
Fork and Rewrite
| Test | What It Verifies |
|---|---|
test_fork_creates_child | Fork copies last N messages with parent link |
test_fork_persists_to_disk | Forked session survives restart |
test_session_rewrite | Atomic write-then-rename after mutation |
Multi-Session (Topics)
| Test | What It Verifies |
|---|---|
test_list_sessions_for_chat | Lists all topic sessions for a chat |
test_session_topic_persists | Topic survives restart |
test_update_summary | Summary update persists |
test_active_session_store | Active topic switching and go-back |
test_active_session_store_persistence | Active topic survives restart |
test_validate_topic_name | Rejects invalid characters and lengths |
Filename Encoding
| Test | What It Verifies |
|---|---|
test_truncated_session_keys_no_collision | Long keys with hash suffix donβt collide |
test_decode_filename | Percent-encoded filenames decode correctly |
test_list_sessions_returns_decoded_keys | list_sessions() returns human-readable keys |
test_short_key_no_hash_suffix | Short keys donβt get hash suffix |
Safety Limits
| Test | What It Verifies |
|---|---|
test_load_rejects_oversized_file | Files over 10 MB refused |
test_append_respects_file_size_limit | Append skips when file at 10 MB limit |
test_load_rejects_future_schema_version | Rejects unknown schema versions |
test_purge_stale_sessions | Deletes sessions older than N days |
Known Gaps
| Area | Why Not Tested |
|---|---|
| Interrupt queue mode | Same codepath as Steer β covered by test_queue_mode_steer_keeps_newest |
| Probe/canary requests | Disabled in all tests via probe_probability: 0.0 for determinism |
Streaming (chat_stream) | No mock streaming infrastructure; streaming tested manually |
| Session compaction | Called in actor tests but output not verified (would need LLM mock for summarization) |
| Live provider integration | Requires API keys; 1 test exists but marked #[ignore] |
| Channel-specific routing | Covered by channel crate tests, not part of this subsystem |
| β¬οΈ Earlier task marker | Primary response gets ββ¬οΈ Earlier task completed:β prefix when overflow was served; not directly asserted in tests (would need to inspect outbound content after a slow primary + fast overflow race) |
| Overflow agent tool execution | serve_overflow spawns a full agent.process_message_tracked() with tool access; current tests use DelayedMockProvider which returns canned responses without tool calls |
Running Individual Tests
# Single test
cargo test -p octos-llm --lib adaptive::tests::test_hedged_racing_picks_faster_provider
# One subsystem
cargo test -p octos-llm --lib adaptive::tests
# Session actor (always single-threaded)
cargo test -p octos-cli session_actor::tests -- --test-threads=1
# With output
cargo test -p octos-cli session_actor::tests -- --test-threads=1 --nocapture
GitHub Actions CI
.github/workflows/ci.yml runs on push/PR to main:
cargo fmt --all -- --checkcargo clippy --workspace -- -D warningscargo test --workspace
The local scripts/ci.sh is a superset β it runs the same three steps plus focused subsystem groups. If CI passes locally, it passes on GitHub.
Runner: macos-14 (ARM64). Private repo with 2000 free minutes/month (10x multiplier for macOS runners = ~200 effective minutes).
Files
| File | What |
|---|---|
scripts/ci.sh | Local CI script (this document) |
scripts/pre-release.sh | Full release smoke tests (build, E2E, skill binaries) |
.github/workflows/ci.yml | GitHub Actions CI |
crates/octos-llm/src/adaptive.rs | Adaptive router + 19 tests |
crates/octos-llm/src/responsiveness.rs | Responsiveness observer + 8 tests |
crates/octos-cli/src/session_actor.rs | Session actor + 9 tests |
crates/octos-bus/src/session.rs | Session persistence + 28 tests |