Nexus-Agents Research Index
Generated: 2026-03-18 (ET) Total Papers: 176 | Techniques: 43 | Topics: 12
Quick Stats
| Status | Papers | Techniques |
|---|---|---|
| Implemented | - | 42 |
| In Progress | - | 0 |
| Planned | - | 0 |
| Not Started | - | 0 |
| Rejected | - | 1 |
Note: Paper-level status tracking deprecated. Technique status is source of truth.
Topics
| Topic | Papers | Techniques | Description |
|---|---|---|---|
| Consensus | 10 | 6 | Multi-agent decision protocols and voting |
| Routing | 19 | 8 | Cost-efficient model routing and selection |
| Memory | 23 | 9 | Context, long-term memory, and compression |
| Code Generation | 21 | 6 | Code generation, repair, and self-improvement |
| CLI Tools | 0 | 0 | External CLI integration and protocols |
| Orchestration | 27 | 12 | Multi-agent coordination and workflows |
| Security | 2 | 2 | Security analysis, prompt injection defense |
| Evaluation | 0 | 0 | Benchmarks, metrics, and testing methodologies (planned) |
| Safety | 0 | 0 | AI safety, alignment, and reward hacking (planned) |
| Planning | 0 | 0 | Task planning, decomposition, and reasoning chains (planned) |
| Tool Use | 0 | 0 | Tool augmentation, function calling, and MCP (planned) |
| Reasoning | 0 | 0 | Reasoning, self-reflection, and search strategies (planned) |
Priority 1 (P1) Techniques
These techniques are high-impact and align well with the current architecture.
| Technique | Topic | Key Metrics | Issue |
|---|---|---|---|
| Aegean Consensus Protocol | consensus | latency_reduction: 1.2x-20x, token_reduction: 4.4x, quality_impact: within 2.5% of baseline | #119 |
| Task-Type Protocol Selection | consensus | reasoning_improvement: +13.2%, knowledge_improvement: +2.8% | #125 |
| Multi-Agent Reflexion (MAR) | consensus | reasoning_improvement: significant across benchmarks | - |
| IPR Quality-Constrained Routing | routing | cost_reduction: 43.9%, latency: sub-150ms | #128 |
| A-MEM Agentic Memory | memory | semantic_organization: Automatic attribute extraction and linking, evolution_detection: Refinement, extension, supersession detection | #122 |
| TRINITY Thinker/Worker/Verifier Roles | orchestration | benchmark_accuracy: 86.2% on LiveCodeBench | #141 |
| Self-Refine Iterative Loop | code-generation | average_improvement: 20% | #126 |
| Reflexion Verbal Reinforcement Learning | code-generation | alfworld_improvement: +22%, hotpotqa_improvement: +20%, humaneval_pass1: 91% | #130 |
| STPA MCP Framework | security | hazard_coverage: Systematic UCA identification, safety_constraints: Auto-generated from analysis | #328 |
| AFlow MCTS Workflow Generation | orchestration | workflow_quality: Improved through search optimization | #329 |
| SEW Self-Evolving Workflows | orchestration | improvement_rate: Continuous through execution feedback | #330 |
| ZeroRouter Universal Difficulty Space | routing | routing_accuracy: Cross-domain difficulty assessment | #338 |
| Context Rot Prevention | memory | - | #1574 |
| Wave-Based Parallel Execution | orchestration | - | - |
Priority 2 (P2) Techniques
Medium-impact or requiring moderate changes.
| Technique | Topic | Key Metrics | Issue |
|---|---|---|---|
| CP-WBFT Byzantine Fault Tolerant Consensus | consensus | fault_tolerance: 85.7% | #103 |
| Free-MAD Anti-Conformity Scoring | consensus | robustness: enhanced against attacks | #152 |
| TOPSIS Multi-Criteria Routing | routing | cost_reduction: 31.46% | #146 |
| PILOT Budget-Constrained Routing | routing | adaptive: handles diverse budget requirements | #102 |
| SATER Confidence-Aware Routing | routing | cost_reduction: 50%+, latency_reduction: 80%+ cascade | #99 |
| Agreement-Based Cascading (ABC) | routing | cost_optimization: significant | #121 |
| Preference-Trained Router (RouteLLM) | routing | cost_reduction: 2x | #148 |
| Mem0 Scalable Long-Term Memory | memory | latency_reduction: 91% lower p95, token_savings: 90%, quality_improvement: 26% | #156 |
| MIRIX Six-Type Memory System | memory | accuracy_vs_rag: +35%, storage_reduction: 99.9%, benchmark_accuracy: 85.4% | #157 |
| MobiMem Post-Deployment Evolution | memory | profile_alignment: 83.1%, retrieval_speed: 280x faster than GraphRAG, task_success_improvement: 50.3% | #149 |
| Adaptive Memory | memory | performance_improvement: Configurable priority scoring | #143 |
| Evolving Orchestration Upgrade | orchestration | task_completion_improvement: 15-30% | #335 |
| LATTS Adaptive Test-Time Compute | orchestration | performance_parity: 1B model matches 405B | #153 |
| Voyager Skill Library Pattern | code-generation | discovery_improvement: 3.3x more unique items, speed_improvement: up to 15.3x faster milestone | #150 |
| SICA Self-Improving Agent | code-generation | swebench_improvement: 17% -> 53%, file_editing_improvement: 82% -> 94% | #151 |
| Constitutional AI Self-Critique | code-generation | scales: without human labelers | #147 |
| Higher-Order Voting (OW/ISP) | consensus | correlation_handling: Improved consensus on correlated inputs | #333 |
| Forest-of-Thought Multi-Tree Reasoning | orchestration | reasoning_quality: Improved through diverse exploration, efficiency: Sparse activation reduces compute | #331 |
| Agent-SafetyBench Evaluation Suite | security | coverage: Multi-dimensional safety evaluation | #332 |
| DAAO VAE Difficulty Estimation | routing | routing_accuracy: Improved task-model alignment | #334 |
| Hindsight Belief Memory | memory | reasoning_quality: Improved through belief tracking | #336 |
| Scaling Agent Coordination Predictor | orchestration | allocation_efficiency: Improved agent utilization | #337 |
| Failure Lesson Injection | orchestration | - | #1568 |
| Skill Relevance Matching | orchestration | - | #1569 |
| Write-Time Memory Deduplication | memory | - | #1570 |
Recently Reviewed Papers
| Date | Paper | Topic | Priority |
|---|---|---|---|
| 2026-03-09 | Red-Teaming LLM Multi-Agent Systems via Communication Attacks | inter-agent-security | - |
| 2026-03-09 | MASFactory: A Graph-centric Framework for Orchestrating LLM-Based Multi-Agent Systems with Vibe Graphing | graph-orchestration | - |
| 2026-03-04 | Style Over Substance: Evaluation Biases for Large Language Models | - | - |
| 2026-03-04 | FrugalGPT: How to Use Large Language Models While Reducing Cost and Improving Performance | routing | - |
| 2026-03-04 | CAMEL: Communicative Agents for “Mind” Exploration of Large Language Model Society | orchestration | - |
Papers by Topic
Consensus (10 papers)
- Multi-Agent Collaboration Mechanisms: A Survey of LLMs - Taxonomy of collaboration types (cooperation, competition, coopetition)
- Voting or Consensus? Decision-Making in Multi-Agent Debate - Empirical comparison of 7 decision protocols for multi-agent systems.
- Aegean: Formal Consensus Protocol for Stochastic Reasoning - Formal consensus protocol for stochastic reasoning with
- CP-WBFT: Confidence Probe-based Weighted Byzantine Fault Tolerant - Confidence Probe-based Weighted Byzantine Fault Tolerant consensus.
- Free-MAD: Score-Based Decision with Anti-Conformity - Score-based decision with anti-conformity to prevent majority
- MAR: Multi-Agent Reflexion Improves Reasoning Abilities - Multiple agents reflect and critique each other’s outputs.
- Higher-Order Voting: Optimal Weighting and Information Score Prior - Bayesian-optimal aggregation methods (Optimal Weighting and Information
- arXiv Query: search_query=&id_list=2602.03474&start=0&max_results=10 -
- arXiv Query: search_query=&id_list=2505.21503&start=0&max_results=10 -
- arXiv Query: search_query=&id_list=2502.16565&start=0&max_results=10 -
Routing (19 papers)
- A Unified Approach to Routing and Cascading for LLMs - Derives optimal strategy for cascading, proves optimality of existing
- RouteLLM: Cost-Quality Routing for LLM Inference - Quality-constrained routing to cheapest model meeting
- RouteLLM: Learning to Route LLMs with Preference Data - Train router on human preference data for dynamic selection
- OptiRoute - kNN + hierarchical filtering with cost/ethics tradeoffs.
- Capability Instruction Tuning - Achieves 80% GPT-4o coverage with smaller model zoo.
- MoMA: Towards Generalized Routing - Unified LLM + agent routing with TOPSIS algorithm for
- Cross-Attention Routing - Single-head cross-attention for query-model matching.
- SATER: Dual-Mode Routing with Confidence-Aware Rejection - Dual-mode routing with shortest-response preference optimization
- IPR: Intelligent Prompt Routing - Quality-constrained routing with user-controlled tolerance parameter
- PILOT: Preference-Prior Routing with Budget Constraints - Contextual bandit (LinUCB) with preference-prior routing and
- STRMAC: State-Aware Routing - State-aware routing with separate encoding of history and
- Edge Multi-LLM: Hybrid Routing with Cascade/ABC Patterns - Hybrid routing with cascade/ABC (Agreement-Based Cascading) patterns.
- DAAO: Difficulty-Aware Agent Orchestration via VAE - VAE-based difficulty estimation for intelligent task routing. Maps tasks
- arXiv Query: search_query=&id_list=2602.03814&start=0&max_results=10 -
- arXiv Query: search_query=&id_list=2601.19793&start=0&max_results=10 -
- AdaptOrch: Task-Adaptive Multi-Agent Orchestration in the Era of LLM Performance Convergence -
- Towards Efficient Multi-LLM Inference: Characterization and Analysis of LLM Routing and Hierarchical Techniques -
- FrugalGPT: How to Use Large Language Models While Reducing Cost and Improving Performance -
- Sustainable LLM Inference using Context-Aware Model Switching -
Memory (23 papers)
- Acon: Optimizing Context Compression - Task-specific context compression techniques.
- CCF: Context Compression Framework - Learned compression modules for context management.
- Mem0: Building Production-Ready AI Agents with Scalable Long-Term Memory - Scalable memory architecture achieving 91% latency reduction
- Context Engineering Survey - Taxonomy of context management strategies.
- xKV: Cross-Layer SVD for KV-Cache Compression - Cross-layer SVD for KV-cache compression exploiting singular
- TreeKV: Tree-Structured Cache Compression - Tree-structured cache compression with smooth context transitions.
- BET: Behavior-Equivalent Token - Single-token compression of system prompts via reconstruction
- A-MEM: Agentic Memory for LLM Agents - Zettelkasten-inspired agentic memory system where memories are
- MIRIX: Six-Type Memory System - Six-type memory system with multi-agent management architecture.
- MobiMem: Post-Deployment Evolution via Memory Modules - Post-deployment evolution via Profile, Experience, and Action
- MemGPT: Towards LLMs as Operating Systems - Operating system-inspired memory management for LLMs with virtual
- ICAL: Continual Learning of Multimodal Agents - Build memory of multimodal experience from suboptimal
- Lifelong Learning of Large Language Model based Agents: A Roadmap - Three core modules: Perception (multimodal input), Memory
- ARIA: Human-in-the-Loop Test-Time Learning - Agents identify knowledge gaps through self-dialogue, request
- Hindsight Belief Memory for Reasoning Agents - Belief Memory layer for reasoning agents that maintains and updates
- arXiv Query: search_query=&id_list=2602.03784&start=0&max_results=10 -
- arXiv Query: search_query=&id_list=2512.20237&start=0&max_results=10 -
- arXiv Query: search_query=&id_list=2512.21567&start=0&max_results=10 -
- arXiv Query: search_query=&id_list=2511.20857&start=0&max_results=10 -
- arXiv Query: search_query=&id_list=2501.13956&start=0&max_results=10 -
- arXiv Query: search_query=&id_list=2402.17753&start=0&max_results=10 -
- LEGOMem: Modular Procedural Memory for Multi-agent LLM Systems for Workflow Automation -
- Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection -
Code Generation (21 papers)
- MAR: Multi-Agent Reflexion Improves Reasoning Abilities - Multiple agents reflect and critique each other’s outputs.
- Self-Refine: Iterative Refinement with Self-Feedback - A single LLM acts as generator, feedback provider, and refiner
- Teaching Large Language Models to Self-Debug - Self-Debugging teaches LLMs to debug their own code via few-shot
- Reflexion: Language Agents with Verbal Reinforcement Learning - Agents maintain an episodic memory of verbal reflections that
- LATS: Language Agent Tree Search - Combines Monte Carlo Tree Search (MCTS) with LLM-based value
- Godel Agent: A Self-Referential Agent Framework - Inspired by Godel machines, agents can modify their own logic
- SICA: A Self-Improving Coding Agent - A unified agent that performs tasks AND improves its own
- Self-Improving AI Agents through Self-Play - Formalizes self-improvement as a Generator-Verifier-Updater (GVU)
- Voyager: An Open-Ended Embodied Agent with Large Language Models - Lifelong learning agent that builds an ever-growing library of
- CycleQD: Quality-Diversity for Agent Skill Acquisition - Uses Quality-Diversity framework with cyclic task focus, model
- EXIF: Automated Skill Discovery for Language Agents - Exploration-first strategy using two agents (Alice explores,
- RLAIF vs. RLHF: Scaling Reinforcement Learning with AI Feedback - RLAIF achieves comparable performance to RLHF at 10x lower cost.
- Constitutional AI: Harmlessness from AI Feedback - Two-phase approach: supervised phase (model critiques and revises)
- Agent Q: Advanced Reasoning and Learning for Autonomous AI Agents - Combines guided MCTS with self-critique and DPO for learning
- On the Role of Feedback in Test-Time Scaling of Agentic AI Workflows - Feedback mechanisms that enable agents to improve during
- A Survey of Self-Evolving Agents - Survey of self-evolving agent architectures and techniques.
- Confucius Code Agent: Scalable Agent Scaffolding for Real-World Codebases - Production-scale code agent with Confucius SDK. Achieves 54.3% Resolve@1
- Retrieval-Augmented Code Generation: A Survey with Focus on Repository-Level Approaches -
- In Line with Context: Repository-Level Code Generation via Context Inlining -
- Are LLMs Reliable Code Reviewers? Systematic Overcorrection in Requirement Conformance Judgement -
- TAM-Eval: Evaluating LLMs for Automated Unit Test Maintenance -
Orchestration (27 papers)
- Multi-Agent Collaboration Mechanisms: A Survey of LLMs - Taxonomy of collaboration types (cooperation, competition, coopetition)
- Multi-Agent Collaboration via Evolving Orchestration - Puppeteer-style paradigm with RL-trained orchestrator for dynamic
- AgentsNet - Benchmark for 100-agent coordination scenarios.
- TRINITY: Evolved LLM Coordinator - Evolved LLM coordinator with Thinker/Worker/Verifier roles
- LATTS: Locally Adaptive Test-Time Scaling - Locally adaptive test-time scaling with verifier-based
- Hybrid Architectures for LLMs - Transformer + SSM hybrid analysis.
- Multi-LLM Orchestration Engine - Temporal graph + vector DB integration for multi-LLM orchestration.
- Pick and Spin: Efficient Multi-Model Orchestration - Unified Helm-based deployment with adaptive scale-to-zero
- LatentMAS: Direct Latent Space Collaboration - Direct latent space collaboration through hidden state sharing.
- LATS: Language Agent Tree Search - Combines Monte Carlo Tree Search (MCTS) with LLM-based value
- Multi-AI Agent System for Autonomous Optimization - Five specialized agents (Refinement, Execution, Evaluation,
- Meta-Thinking in LLMs via Multi-Agent Reinforcement Learning - Develop meta-thinking (self-reflection, assessment, control of
- A Survey on LLM-based Multi-Agent System - Comprehensive survey on LLM-based multi-agent systems.
- Confucius Code Agent: Scalable Agent Scaffolding for Real-World Codebases - Production-scale code agent with Confucius SDK. Achieves 54.3% Resolve@1
- AFlow: Automatic Workflow Discovery via MCTS - Monte Carlo Tree Search (MCTS) for automatic workflow discovery and
- SEW: Self-Evolving Workflows for Agent Systems - Self-evolving workflow patterns that adapt and improve through execution
- Forest-of-Thought: Multi-Tree Reasoning with Sparse Activation - Multi-tree reasoning architecture with sparse activation for parallel
- Scaling Agent Systems: Coordination Prediction and Resource Optimization - Prediction of coordination requirements for scaling multi-agent systems.
- AOrchestra: Automating Sub-Agent Creation for Agentic Orchestration - Framework for dynamic sub-agent creation via unified agent abstraction.
- arXiv Query: search_query=&id_list=2602.03794&start=0&max_results=10 -
- arXiv Query: search_query=&id_list=2602.03845&start=0&max_results=10 -
- arXiv Query: search_query=&id_list=2602.03695&start=0&max_results=10 -
- arXiv Query: search_query=&id_list=2601.10560&start=0&max_results=10 -
- Rethinking the Role of Entropy in Optimizing Tool-Use Behaviors for Large Language Model Agents - Entropy reduction correlates with higher-quality tool invocations. Dense proc…
- Towards Reducible Uncertainty Modeling for Reliable Large Language Model Agents - Framework for uncertainty quantification in LLM agents using conditional unce…
- ScaleCall — Agentic Tool Calling at Scale for Fintech: Challenges, Methods, and Deployment Insights -
- CAMEL: Communicative Agents for “Mind” Exploration of Large Language Model Society -
Security (2 papers)
- STPA MCP Framework: System-Theoretic Process Analysis for MCP Safety - Formal STPA safety analysis methodology for MCP tools. Systematically
- Agent-SafetyBench: Comprehensive Safety Evaluation for AI Agents - Comprehensive safety evaluation suite for agent behaviors across multiple
GitHub Issues
| Issue | Feature | Related Papers |
|---|---|---|
| #119 | Aegean Consensus Protocol | arxiv-2512.20184 |
| #125 | Task-Type Protocol Selection | arxiv-2502.19130 |
| #103 | CP-WBFT Byzantine Fault Tolerant Consensus | arxiv-2511.10400 |
| #152 | Free-MAD Anti-Conformity Scoring | arxiv-2509.11035 |
| #146 | TOPSIS Multi-Criteria Routing | arxiv-2509.07571 |
| #128 | IPR Quality-Constrained Routing | arxiv-2509.06274, arxiv-2406.18510 |
| #102 | PILOT Budget-Constrained Routing | arxiv-2508.21141 |
| #99 | SATER Confidence-Aware Routing | arxiv-2510.05164 |
| #121 | Agreement-Based Cascading (ABC) | arxiv-2410.10347 |
| #148 | Preference-Trained Router (RouteLLM) | arxiv-2406.18665 |
| #156 | Mem0 Scalable Long-Term Memory | arxiv-2504.19413 |
| #157 | MIRIX Six-Type Memory System | arxiv-2507.07957 |
| #149 | MobiMem Post-Deployment Evolution | arxiv-2512.15784 |
| #142 | Graph-Based Memory | arxiv-2504.19413 |
| #143 | Adaptive Memory | arxiv-2310.08560 |
| #122 | A-MEM Agentic Memory | arxiv-2502.12110 |
| #141 | TRINITY Thinker/Worker/Verifier Roles | arxiv-2512.04695 |
| #154 | RL-Trained Orchestrator | arxiv-2505.19591 |
| #335 | Evolving Orchestration Upgrade | arxiv-2505.19591 |
| #153 | LATTS Adaptive Test-Time Compute | arxiv-2509.20368 |
| #126 | Self-Refine Iterative Loop | arxiv-2303.17651 |
| #130 | Reflexion Verbal Reinforcement Learning | arxiv-2303.11366 |
| #150 | Voyager Skill Library Pattern | arxiv-2305.16291 |
| #151 | SICA Self-Improving Agent | arxiv-2504.15228 |
| #147 | Constitutional AI Self-Critique | arxiv-2212.08073 |
| #131 | Self-Debug Code Repair | arxiv-2304.05128 |
| #328 | STPA MCP Framework | arxiv-2601.08012 |
| #329 | AFlow MCTS Workflow Generation | arxiv-2410.10762 |
| #330 | SEW Self-Evolving Workflows | arxiv-2505.18646 |
| #333 | Higher-Order Voting (OW/ISP) | arxiv-2510.01499 |
| #331 | Forest-of-Thought Multi-Tree Reasoning | arxiv-2412.09078 |
| #332 | Agent-SafetyBench Evaluation Suite | arxiv-2412.14470 |
| #334 | DAAO VAE Difficulty Estimation | arxiv-2509.11079 |
| #336 | Hindsight Belief Memory | arxiv-2512.12818 |
| #337 | Scaling Agent Coordination Predictor | arxiv-2512.08296 |
| #338 | ZeroRouter Universal Difficulty Space | zerorouter-tbd |
| #1568 | Failure Lesson Injection | |
| #1569 | Skill Relevance Matching | |
| #1574 | Context Rot Prevention | |
| #1570 | Write-Time Memory Deduplication | arxiv-2601.02553 |
Search Tags
#adaptive #agent-specialization #agentic-memory #agreement #anti-conformity #attribute-extraction #automatic-discovery #bayesian #belief-state #benchmarking #budget-constraint #byzantine #cascade #code-repair #comprehensive #confidence-aware #constitutional #context-injection #context-management #contextual-bandit #coordination #coordinator #correlation-aware #cost-optimization #cross-critique #debate #deduplication #difficulty-estimation #dynamic #dynamic-linking #dynamic-selection #embedding #ensemble #episodic-memory #evolution #evolutionary #executable #execution-feedback #failure-recovery #fault-tolerance #feedback-learning #formal-safety #formal-verification #graph #hazard-analysis #hindsight #inter-agent #iterative #latent-space #lifelong-learning #lightweight #linucb #long-term #mcp #mcts #memory-efficiency #memory-evolution #multi-criteria #multi-tree #multimodal #observability #optimal-weighting #parallel-execution #pareto #policy #post-deployment #prediction #preference-data #principles #priority #production #protocol-selection #pruning #puppeteer #quality-constrained #quorum #reasoning #recency-decay #reflection #reflexion #reinforcement-learning #relational #relevance #risk-assessment #role-based #routing #rubber-duck #safety-evaluation #scalable #scaling #scoring #self-critique #self-debug #self-evolving #self-feedback #self-improvement #semantic #semantic-similarity #shortest-response #skill-library #sparse-activation #stpa #streaming #structured-memory #task-classification #task-routing #test-time #three-module #token-efficiency #tolerance #training-free #transfer-learning #universal-difficulty #vae #verbal-rl #verifier #versioning #wave-coordination #weighted #worker-dispatch #workflow-generation #workflow-optimization #zettelkasten
Registry Files
- papers.yaml - All 176 papers with metadata
- techniques.yaml - All 43 techniques with status
- sources.yaml - Product docs and other sources
How to Contribute
See CONTRIBUTING.md for guidelines on adding new research.
Generated from YAML registries. Last updated: 2026-03-18 (ET)