Routing System Architecture

Tier 2 | Deep technical documentation for model routing Hub: README.md | Full Architecture: ARCHITECTURE.md


Overview

The routing system intelligently selects the optimal CLI/model for each task through a multi-stage pipeline:

Task → BudgetRouter → ZeroRouter → PreferenceRouter → TopsisRouter → LinUCB → Selected Model
       (filter)        (fallback)   (preference)        (rank)         (learn)

Use CompositeRouter.route(task) — do NOT directly instantiate stage routers.


CompositeRouter Pipeline

Chains multiple routers in sequence for intelligent model selection.

interface ICompositeRouter {
  route(task: CliTask): Promise<Result<CompositeRoutingDecision, CompositeRoutingError>>;
  getStats(): CompositeRouterStats;
  invalidateCaches(): void;
}

interface CompositeRoutingDecision {
  readonly cliName: 'claude' | 'gemini' | 'codex';
  readonly reason: string;
  readonly confidence: number;
  readonly topsisScore?: number;
  readonly linucbExploration?: number;
  readonly alternatives: readonly ('claude' | 'gemini' | 'codex')[];
  readonly stagesExecuted: readonly string[];
}

Stage 1: Task Analysis

Profiles tasks before routing:

CharacteristicDerived FromImpact
reasoningComplexityKeywords (“design”, “architect”)Boosts Claude quality score
contextRequired0.25 tokens/char + 500 tokens/fileFilters by context window
codeGenerationKeywords (“implement”, “write”)Boosts Codex score
budgetSensitiveKeywords (“quick”, “simple”)Prioritizes Gemini

Stage 2: Budget Filter

Enforces token/cost/latency constraints:

interface BudgetConstraint {
  readonly maxTokens?: number;
  readonly maxCostUsd?: number;
  readonly maxLatencyMs?: number;
}

Stage 3: TOPSIS Ranking

Multi-criteria decision for Pareto-optimal selection:

CriterionWeightDirectionDescription
Quality50%MaximizeReasoning + code generation
Cost30%Minimize$/token estimate
Latency20%MinimizeResponse time

Stage 4: LinUCB Learning

Contextual bandit learns from outcomes:

// 6D context vector
const context = {
  taskComplexity: 0.8, // Normalized 0-1
  contextLengthNormalized: 0.3, // Tokens / max context
  isCodeTask: true,
  isReasoningTask: false,
  budgetUtilization: 0.2, // % of budget used
  timePressure: 0.0, // Deadline proximity
};

// UCB score calculation
UCB = E[reward | context] + alpha * sqrt(uncertainty);

Task Router Interface

Routes tasks to optimal CLI based on capability matching.

interface ITaskRouter {
  route(task: Task): Promise<Result<ICliAdapter, RoutingError>>;
  routeWithDetails(task: Task): Promise<Result<RoutingDecision, RoutingError>>;
}

interface RoutingDecision {
  readonly adapter: ICliAdapter;
  readonly confidence: number; // 0-1 routing confidence
  readonly reason: string; // Why this CLI was chosen
  readonly alternatives: readonly ICliAdapter[];
  readonly decisionTimeMs: number;
}

type CliName = 'claude' | 'gemini' | 'codex';
type CliTransport = 'mcp' | 'subprocess';

Budget Router (IBudgetRouter)

Budget-constrained routing with PILOT pattern (arXiv:2508.21141).

interface IBudgetRouter {
  getSessionBudget(): SessionBudget;
  updateBudget(usage: { tokens?: number; costUsd?: number }): void;
  resetBudget(): void;
  checkBudget(task: CliTask, constraint?: BudgetConstraint): BudgetRoutingResult;
  routeWithBudget(
    task: CliTask,
    budget?: BudgetConstraint
  ): Promise<Result<BudgetRoutingResult, BudgetExceededError>>;
  executeWithBudget(
    task: CliTask,
    budget?: BudgetConstraint
  ): Promise<Result<CliResponse & { budgetAfter: SessionBudget }, CliError>>;
}

Budget Thresholds

LevelUsageAction
Info50%Log usage
Warning75%Warn user
Critical90%Suggest task simplification
Hard100%Reject task

Session Budget

interface SessionBudget {
  readonly tokenBudget: number; // Default: 1M tokens
  readonly costBudgetUsd: number; // Default: $10
  readonly tokensUsed: number;
  readonly costUsed: number;
  readonly resetAt: number; // Epoch ms
}

Circuit Breaker (ICircuitBreaker)

Prevents cascading failures with configurable thresholds.

interface ICircuitBreaker {
  execute<T>(operation: () => Promise<T>): Promise<T>;
  getState(): CircuitState; // 'closed' | 'open' | 'half_open'
  recordFailure(category: FailureCategory): void;
  recordSuccess(): void;
  reset(): void;
  getSnapshot(): CircuitBreakerSnapshot;
}

State Transitions

stateDiagram-v2
    [*] --> Closed
    Closed --> Open: failures >= threshold
    Open --> HalfOpen: timeout elapsed
    HalfOpen --> Closed: success
    HalfOpen --> Open: failure

Configuration

circuitBreaker:
  failureThreshold: 5 # Failures before open
  successThreshold: 2 # Successes to close from half-open
  timeout: 30000 # ms before half-open
  rollingWindow: 60000 # ms for failure counting

CLI Detection Cache (ICliDetectionCache)

Caches CLI health check results with TTL and invalidation.

interface ICliDetectionCache {
  get(cliName: CliName): Promise<CliHealthResult | undefined>;
  set(cliName: CliName, result: CliHealthResult): Promise<void>;
  invalidate(cliName: CliName): void;
  invalidateAll(): void;
  getStats(): CacheStats;
  onInvalidate(listener: (cliName: CliName) => void): () => void;
}

interface CliHealthResult {
  readonly available: boolean;
  readonly version?: string;
  readonly checkedAt: number;
  readonly error?: string;
}

Cache TTL Strategy

ScenarioTTLRationale
Available5 minutesStable, reduce checks
Unavailable30 secondsRetry quickly after failure
Version changeImmediateCapabilities may differ

Token Counter (ITokenCounter)

Universal token counting across model providers.

interface ITokenCounter {
  count(text: string): Promise<TokenCountResult>;
  countMessages(messages: Message[]): Promise<TokenCountResult>;
  getMaxTokens(): number;
  getProvider(): TokenCounterProvider;
}

type TokenCounterProvider = 'tiktoken' | 'anthropic' | 'heuristic';

Provider Selection

ProviderAccuracySpeedUse Case
tiktokenHighFastOpenAI models
anthropicExactMediumClaude models
heuristic±10%InstantQuick estimates

Capacity Monitor (ICapacityMonitor)

Tracks rate limits across model providers.

interface ICapacityMonitor {
  updateFromHeaders(provider: string, headers: Headers): void;
  getCapacity(provider: string): CapacityInfo | null;
  onLowCapacity(callback: LowCapacityCallback): () => void;
  setLowCapacityThreshold(threshold: number): void;
  getTimeUntilReset(provider: string): number | null;
}

interface CapacityInfo {
  readonly remainingTokens: number;
  readonly remainingRequests: number;
  readonly resetTime: Date | null;
  readonly utilizationPercent: number;
}

Rate Limit Headers

ProviderToken HeaderRequest Header
Anthropicanthropic-ratelimit-*anthropic-ratelimit-*
OpenAIx-ratelimit-*-tokensx-ratelimit-*-requests
Googlex-goog-api-*x-goog-api-*

Work Balancer (IWorkBalancer)

Distributes parallel tasks across available CLIs.

interface IWorkBalancer {
  balance(tasks: TaskProfile[]): Promise<BalanceResult>;
  queueTask(task: TaskProfile): void;
  getQueueDepth(): number;
  clearQueue(): void;
}

interface BalanceResult {
  assignments: Map<string, CliName>;
  unassigned: string[];
  reasoning: Record<string, ScoreBreakdown>;
}

Balancing Algorithm

  1. Capacity check: Filter CLIs with available capacity
  2. Task match: Score CLI capabilities vs task requirements
  3. Load balance: Distribute evenly with affinity hints
  4. Fallback: Queue tasks if all CLIs at capacity

Feedback Integration (IFeedbackIntegration)

Connects routing decisions to outcomes for closed-loop learning.

interface IFeedbackIntegration {
  recordRoutingDecision(decision: CompositeRoutingDecision): string;
  recordOutcome(routingId: string, outcome: TaskOutcome): void;
  getRoutingStats(cliName: CliName): RoutingOutcomeStats;
  exportFeedback(): FeedbackExport;
}

interface TaskOutcome {
  readonly success: boolean;
  readonly latencyMs: number;
  readonly tokensUsed?: number;
  readonly errorCategory?: string;
}

interface RoutingOutcomeStats {
  readonly totalRoutings: number;
  readonly successRate: number;
  readonly avgLatencyMs: number;
  readonly avgTokens: number;
}

Reward Computation

reward = success * 0.5 + (1 - retries / max) * 0.3 + coherence * 0.2;

CLI Debugging

# Dry-run routing for a task
nexus-agents routing-audit "Implement a sorting algorithm" --format=json

# Output shows:
# - Task profile analysis
# - Budget filter results
# - TOPSIS scores per CLI
# - LinUCB selection with UCB scores
# - Feature importance analysis

# Show bandit statistics
nexus-agents routing-audit "task" --bandit-stats

Configuration

routing:
  enableBudgetFilter: true # Stage 2 on/off
  enableTopsisRanking: true # Stage 3 on/off
  enableLinUCBSelection: true # Stage 4 on/off

  budget:
    tokenBudget: 1000000 # Session token limit
    costBudgetUsd: 10.0 # Session cost limit
    resetIntervalMs: 3600000 # 1 hour reset

  topsis:
    qualityWeight: 0.5
    costWeight: 0.3
    latencyWeight: 0.2

  linucb:
    alpha: 1.0 # Exploration parameter

DAAO Difficulty Estimator

VAE-inspired difficulty estimation for tier routing (arXiv:2509.11079).

interface IDAAOEstimator {
  encode(task: CliTask): EncodedFeatures;
  estimateDifficulty(task: CliTask): DAAODifficultyEstimate;
  route(task: CliTask, availableClis?: CliName[]): DAAORoutingDecision;
  calibrate(outcome: DAAOOutcome): void;
}

8-Dimensional Feature Encoding

FeatureDescriptionRange
lexicalComplexityVocabulary richness, word length0-1
syntacticComplexitySentence structure, nesting0-1
semanticDensityDomain terms, technical concepts0-1
technicalSpecificityAPI/framework references0-1
taskScopeMulti-step vs single-step0-1
constraintComplexityRequirements, edge cases0-1
clarityAmbiguity level (inverted)0-1
outputComplexityExpected output size/format0-1

Difficulty → Tier Mapping

LevelScore RangeModel Tier
easy0.0 - 0.35fast
medium0.35 - 0.65balanced
hard0.65 - 1.0powerful

Calibration

The estimator learns from outcomes to adjust difficulty bias:

estimator.calibrate({
  taskId: 'task-123',
  features: encodedFeatures,
  estimatedScore: 0.45,
  actualTier: 'balanced',
  success: true,
  qualityScore: 0.82,
});

Source Files

FilePurpose
src/cli-adapters/composite-router.tsMain routing pipeline
src/cli-adapters/budget-router.tsBudget enforcement
src/cli-adapters/topsis-router.tsMulti-criteria ranking
src/cli-adapters/linucb-bandit.tsContextual bandit
src/cli-adapters/daao-estimator.tsDifficulty estimation
src/cli-adapters/daao-types.tsDAAO type definitions
src/cli-adapters/daao-feature-extraction.tsFeature extraction
src/cli-adapters/circuit-breaker.tsFault tolerance
src/cli-adapters/cli-detection-cache.tsHealth check caching
src/context/token-counter.tsToken counting
src/adapters/capacity-monitor.tsRate limit tracking
src/learning/feedback-integration.tsOutcome learning
src/cli/routing-audit.tsDebug CLI command

Research Sources

TechniquePaperPaper-Reported Metrics (not measured on this system)
DAAO DifficultyarXiv:2509.11079VAE-based estimation
PILOT Budget RoutingarXiv:2508.21141Budget-constrained routing
TOPSIS Multi-CriteriaarXiv:2509.0757131.46% cost reduction (paper benchmark)
IPR Quality RoutingarXiv:2509.0627443.9% cost reduction (paper benchmark)
RouteLLM PreferencearXiv:2406.186652x cost reduction (paper benchmark)
SATER ConfidencearXiv:2510.0516450%+ cost reduction, 80% latency reduction (paper)