Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Testing Guide

Quick Start

# Full local CI (mirrors GitHub Actions)
./scripts/ci.sh

# Fast iteration (skip clippy)
./scripts/ci.sh --quick

# Auto-fix formatting
./scripts/ci.sh --fix

# Memory-constrained machines
./scripts/ci.sh --serial

CI Pipeline

scripts/ci.sh runs the same checks as .github/workflows/ci.yml plus focused subsystem tests.

Steps

StepCommandFlags
1. Formatcargo fmt --all -- --check--fix auto-fixes
2. Clippycargo clippy --workspace -- -D warnings--quick skips
3. Workspace testscargo test --workspace--serial for single-thread
4. Focused groupsPer-subsystem tests (see below)Always runs

Focused Test Groups

After the full workspace run, the CI script re-runs critical subsystems individually to surface failures clearly:

GroupCrateTest FilterCountWhat It Covers
Adaptive routingoctos-llmadaptive::tests19Off/Hedge/Lane modes, circuit breaker, failover, scoring, metrics, racing
Responsivenessoctos-llmresponsiveness::tests8Baseline learning, degradation detection, recovery, threshold boundaries
Session actoroctos-clisession_actor::tests9Queue modes, speculative overflow, auto-escalation/deescalation
Session persistenceoctos-bussession::tests28JSONL storage, LRU eviction, fork, rewrite, timestamp sort

Session actor tests always run single-threaded (--test-threads=1) because they spawn full actors with mock providers and can OOM under parallel execution.


Feature Coverage

Adaptive Routing (crates/octos-llm/src/adaptive.rs — 19 tests)

Tests the AdaptiveRouter which manages multiple LLM providers with metrics-driven selection.

Off Mode (static priority)

TestWhat It Verifies
test_selects_primary_on_cold_startPriority order on first call (no metrics yet)
test_lane_changing_off_uses_priority_orderOff mode ignores latency differences
test_lane_changing_off_skips_circuit_brokenOff mode still respects circuit breaker
test_hedged_off_uses_single_providerOff mode uses priority, no racing

Hedge Mode (provider racing)

TestWhat It Verifies
test_hedged_racing_picks_faster_providerRace 2 providers via tokio::select!, faster wins
test_hedged_racing_survives_one_failureFalls back to alternate when primary racer fails
test_hedge_single_provider_falls_throughHedge with 1 provider uses single-provider path

Lane Mode (score-based selection)

TestWhat It Verifies
test_lane_mode_picks_best_by_scoreSwitches to faster provider after metrics warm-up

Circuit Breaker and Failover

TestWhat It Verifies
test_circuit_breaker_skips_degradedSkips provider after N consecutive failures
test_failover_on_errorFalls over to next provider when primary fails
test_all_providers_failReturns error when every provider fails

Scoring and Metrics

TestWhat It Verifies
test_scoring_cold_start_respects_priorityCold-start scores follow config priority
test_latency_samples_p95P95 calculation from circular buffer
test_metrics_snapshotLatency/success/failure recorded correctly
test_metrics_export_after_callsExport includes per-provider metrics

Runtime Controls

TestWhat It Verifies
test_mode_switch_at_runtimeOff → Hedge → Lane → Off switching
test_qos_ranking_toggleQoS ranking toggle is orthogonal to mode
test_adaptive_status_reports_correctlyStatus struct reflects current mode/count
test_empty_router_panicsAsserts at least 1 provider required

Responsiveness Observer (crates/octos-llm/src/responsiveness.rs — 8 tests)

Tests the latency tracker that drives auto-escalation.

Baseline Learning

TestWhat It Verifies
test_baseline_learningBaseline established from first 5 samples
test_sample_count_trackingsample_count() returns correct value

Degradation Detection

TestWhat It Verifies
test_degradation_detection3 consecutive slow requests (> 3x baseline) trigger activation
test_at_threshold_boundary_not_triggeredLatency exactly at threshold is not “slow”
test_no_false_trigger_before_baselineNo activation before baseline is learned

Recovery and Lifecycle

TestWhat It Verifies
test_recovery_detection1 fast request after activation triggers deactivation
test_multiple_activation_cyclesActivate → deactivate → reactivate works
test_window_caps_at_max_sizeRolling window stays at 20 entries

Queue Modes and Session Actor (crates/octos-cli/src/session_actor.rs — 9 tests)

Tests the per-session actor that owns message processing, queue policies, and auto-protection.

Mock infrastructure: DelayedMockProvider — configurable delay + scripted FIFO responses. setup_speculative_actor / setup_actor_with_mode — builds minimal actor with chosen queue mode and optional adaptive router.

Queue Mode: Followup

TestWhat It Verifies
test_queue_mode_followup_sequentialEach message processed individually — 3 messages produce 3 responses, all appear in session history separately

Queue Mode: Collect

TestWhat It Verifies
test_queue_mode_collect_batchesMessages queued during a slow LLM call are batched into a single combined prompt ("msg2\n---\nQueued #1: msg3")

Queue Mode: Steer

TestWhat It Verifies
test_queue_mode_steer_keeps_newestOlder queued messages discarded, only newest processed — discarded message absent from session history

Queue Mode: Speculative

TestWhat It Verifies
test_speculative_overflow_concurrentOverflow spawned as full agent task during slow primary (12s > 10s patience); both responses arrive; history sorted by timestamp
test_speculative_within_patience_dropsOverflow dropped when primary within patience (5s < 10s); only 1 response arrives
test_speculative_handles_background_resultBackgroundResult messages handled in the speculative select! loop without extra LLM calls

Auto-Escalation / Deescalation

TestWhat It Verifies
test_auto_escalation_on_degradation5 fast warmups (baseline 100ms) → 3 slow calls (400ms > 3x) → mode switches to Hedge + Speculative, user gets notification
test_auto_deescalation_on_recovery1 fast response after escalation → mode reverts to Off + Followup, router confirms Off

Utility

TestWhat It Verifies
test_strip_think_tags<think>...</think> block removal from LLM output

Session Persistence (crates/octos-bus/src/session.rs — 28 tests)

Tests JSONL-backed session storage with LRU caching.

CRUD and Persistence

TestWhat It Verifies
test_session_manager_create_and_retrieveCreate session, add messages, retrieve
test_session_manager_persistenceMessages survive manager restart (disk reload)
test_session_manager_clearClear deletes from memory and disk

History and Ordering

TestWhat It Verifies
test_session_get_historyTail-slice returns last N messages
test_session_get_history_allReturns all when fewer than max
test_sort_by_timestamp_restores_orderRestores chronological order after concurrent overflow writes

LRU Cache

TestWhat It Verifies
test_eviction_keeps_max_sessionsCache respects capacity limit
test_evicted_session_reloads_from_diskEvicted sessions reload on access
test_with_max_sessions_clamps_zeroCapacity clamped to minimum 1

Concurrency

TestWhat It Verifies
test_concurrent_sessionsMultiple sessions don’t interfere
test_concurrent_session_processing10 parallel tasks don’t corrupt sessions

Fork and Rewrite

TestWhat It Verifies
test_fork_creates_childFork copies last N messages with parent link
test_fork_persists_to_diskForked session survives restart
test_session_rewriteAtomic write-then-rename after mutation

Multi-Session (Topics)

TestWhat It Verifies
test_list_sessions_for_chatLists all topic sessions for a chat
test_session_topic_persistsTopic survives restart
test_update_summarySummary update persists
test_active_session_storeActive topic switching and go-back
test_active_session_store_persistenceActive topic survives restart
test_validate_topic_nameRejects invalid characters and lengths

Filename Encoding

TestWhat It Verifies
test_truncated_session_keys_no_collisionLong keys with hash suffix don’t collide
test_decode_filenamePercent-encoded filenames decode correctly
test_list_sessions_returns_decoded_keyslist_sessions() returns human-readable keys
test_short_key_no_hash_suffixShort keys don’t get hash suffix

Safety Limits

TestWhat It Verifies
test_load_rejects_oversized_fileFiles over 10 MB refused
test_append_respects_file_size_limitAppend skips when file at 10 MB limit
test_load_rejects_future_schema_versionRejects unknown schema versions
test_purge_stale_sessionsDeletes sessions older than N days

Known Gaps

AreaWhy Not Tested
Interrupt queue modeSame codepath as Steer — covered by test_queue_mode_steer_keeps_newest
Probe/canary requestsDisabled in all tests via probe_probability: 0.0 for determinism
Streaming (chat_stream)No mock streaming infrastructure; streaming tested manually
Session compactionCalled in actor tests but output not verified (would need LLM mock for summarization)
Live provider integrationRequires API keys; 1 test exists but marked #[ignore]
Channel-specific routingCovered by channel crate tests, not part of this subsystem
⬆️ Earlier task markerPrimary response gets “⬆️ Earlier task completed:” prefix when overflow was served; not directly asserted in tests (would need to inspect outbound content after a slow primary + fast overflow race)
Overflow agent tool executionserve_overflow spawns a full agent.process_message_tracked() with tool access; current tests use DelayedMockProvider which returns canned responses without tool calls

Running Individual Tests

# Single test
cargo test -p octos-llm --lib adaptive::tests::test_hedged_racing_picks_faster_provider

# One subsystem
cargo test -p octos-llm --lib adaptive::tests

# Session actor (always single-threaded)
cargo test -p octos-cli session_actor::tests -- --test-threads=1

# With output
cargo test -p octos-cli session_actor::tests -- --test-threads=1 --nocapture

GitHub Actions CI

.github/workflows/ci.yml runs on push/PR to main:

  1. cargo fmt --all -- --check
  2. cargo clippy --workspace -- -D warnings
  3. cargo test --workspace

The local scripts/ci.sh is a superset — it runs the same three steps plus focused subsystem groups. If CI passes locally, it passes on GitHub.

Runner: macos-14 (ARM64). Private repo with 2000 free minutes/month (10x multiplier for macOS runners = ~200 effective minutes).


Files

FileWhat
scripts/ci.shLocal CI script (this document)
scripts/pre-release.shFull release smoke tests (build, E2E, skill binaries)
.github/workflows/ci.ymlGitHub Actions CI
crates/octos-llm/src/adaptive.rsAdaptive router + 19 tests
crates/octos-llm/src/responsiveness.rsResponsiveness observer + 8 tests
crates/octos-cli/src/session_actor.rsSession actor + 9 tests
crates/octos-bus/src/session.rsSession persistence + 28 tests