Down the MCP Rabbit Hole: Building a Standards Server
Build MCP standards server for Claude AI—implement Model Context Protocol for intelligent code standards and context-aware workflows.
Bottom Line Up Front
I built a standards server that was supposed to be a simple wrapper around my documentation repository. Three weeks later, I had written 6,000 lines of code across 47 components, implementing Redis caching, vector search, six different language analyzers, 88 tests, and a React UI.
For a read-only documentation server. That I'm the only user of.
This is a case study in scope creep, premature optimization, and what happens when you let "one more feature" become your guiding principle. The irony? Version 1 worked perfectly fine at 200 lines of code.
But here's the thing: personal projects are where we learn by overdoing it, by making every mistake in the book when the stakes are low. This post walks through the evolution from working prototype to over-engineered monstrosity, examining the classic developer pitfalls I hit along the way, including tool-driven architecture, the seduction of sophisticated patterns, and the massive gap between "it works" and "it's production ready."
The Numbers: Version 1 (200 lines, 2 hours, functional) → Version 4 (6,000+ lines, 3 weeks, questionably necessary). Redis cache with 30-minute TTL for documentation that changes once a month. Vector search implementation for 50 markdown files. Six language-specific analyzers for standards that are 90% YAML. This isn't a success story: it's a cautionary tale about knowing when good enough is perfect.
When Good Ideas Get Complicated
Remember last week when I was all excited about my standards repository? Well, I made the classic developer mistake: "You know what would make this better? If I rebuilt it from scratch with a completely different architecture!"
Enter the Model Context Protocol (MCP) – Anthropic's new way for LLMs to interact with external tools. The idea was simple: instead of copying CLAUDE.md into every project, why not serve the standards directly to Claude through MCP? This server became a key component of my broader workflow when supercharging development with Claude-Flow, enabling dynamic standards integration across multiple projects.
Three weeks and several rewrites later, I have github.com/williamzujkowski/mcp-standards-server. It works! Mostly. When Redis is happy. And the moon is in the right phase.
The Original Vision vs Reality
What I Planned (Week 1)
"I'll wrap my standards in an MCP server. How hard could it be?"
Pseudocode - Simplified Initial Vision:
# My naive first attempt
class StandardsServer:
def __init__(self):
self.standards = load_standards() # Easy!
def get_standard(self, name):
return self.standards[name] # Done!
The Reality (Week 3)
⚠️ Warning: This code illustrates over-engineering patterns for educational purposes. Implement security controls and proper authentication for production MCP servers with external access.
Pseudocode - Simplified Current Architecture:
# Current reality - 6000+ lines of code later
# Simplified representation showing core components
class MCPStandardsServer:
def __init__(self):
self.rule_engine = RuleEngine()
self.semantic_search = HybridVectorStorage()
self.cache_layer = RedisL1L2Cache()
self.analyzers = MultiLanguageAnalyzerFactory()
self.compliance_mapper = NISTComplianceEngine()
self.token_optimizer = CompressionStrategy()
# Plus 47 additional components for full functionality
Yeah, it got away from me a bit.
The Architecture Journey
Version 1: "Keep It Simple"
I started with stdio communication. Just pipe standards in and out. Clean, simple, working.
Then I thought: "What about caching?"
Version 2: "Add Some Redis"
Added Redis for caching. Now I had two problems: cache invalidation and Redis connection management.
Then I thought: "What about semantic search?"
Version 3: "ChromaDB Will Fix Everything"
Added ChromaDB for vector search. Now I had three problems: embeddings, vector storage, and "why is my laptop fan screaming?"
Then I thought: "What about multi-language support?"
Version 4: "The Kitchen Sink"
Current state. It has:
- 6 language analyzers (Python, JS, Go, Java, Rust, TypeScript)
- 25 comprehensive standards (up from my original 10)
- Redis L1/L2 caching architecture
- Semantic search with boolean operators
- NIST compliance mapping
- A React web UI (because why not?)
- Performance benchmarking
- 88 integration tests
I may have overdone it.
What Actually Works (The Good Parts)
Despite my scope creep, some genuinely useful stuff emerged:
Intelligent Standard Selection
Context-aware rule engine eliminates manual standard selection:
⚠️ Warning: This demonstrates API usage patterns. Implement proper input validation and security controls for production deployments.
context = {
"project_type": "web_application",
"framework": "react",
"requirements": ["accessibility", "performance"]
}
# Automatically loads: react-18-patterns, wcag-2.1, performance-optimization
standards = engine.evaluate(context)
Capabilities:
- Analyzes project structure and dependencies
- Maps requirements to relevant standards automatically
- Loads 3-5 standards per project (vs. 25 total available)
- Eliminates guesswork about which standards apply
- Adapts recommendations based on framework versions
Multi-Language Code Analysis
Automatic language detection with fix generation:
mcp-standards validate src/ --language auto
# Detects Python, finds 3 PEP-8 violations
# Detects JavaScript, suggests ES6 improvements
# Generates fix patches automatically
Features:
- Supports 6 languages: Python, JavaScript, Go, Java, Rust, TypeScript
- Auto-detects language from file extensions and syntax
- Provides violation explanations with line numbers
- Generates
.patchfiles for automatic fixes - Integrates with pre-commit hooks
Token Optimization That Actually Matters
⚠️ Warning: This example demonstrates API optimization techniques. Implement proper authentication and rate limiting when exposing MCP servers to untrusted clients.
Compressed formats reduce LLM token costs by 70-90%. This optimization strategy builds on the principles I discuss in my guide to progressive context loading for LLM workflows, where efficient token management becomes critical for complex projects:
# Full standard: 5000 tokens
# Compressed: 500 tokens
# Reference only: 50 tokens
standard = get_standard("react-patterns", format="compressed")
Compression strategies:
- Bullet-point summaries (500 tokens, 90% reduction)
- Reference-only mode (50 tokens, 99% reduction)
- Dynamic expansion: Request details only when needed
- Saves ~$0.15 per standard load at current API pricing
The Struggles (Learning Moments)
Redis Is Not Your Friend at 3 AM
I spent two frustrating days tracking down silent cache failures before I realized I'd configured Redis with a noeviction policy. My "temporary" cache was storing everything indefinitely.
The Problem:
- Spent 2 days debugging silent cache failures
- Redis exceeded memory limits (maxmemory policy: noeviction)
- "Temporary" cache stored everything indefinitely
- 200MB cached data for documentation that changes monthly
- Cache hit rate: 99% (because nothing ever expired)
The Fix:
- Added 30-minute TTL on all cache entries
- Reduced maxmemory from 512MB to 64MB
- Implemented LRU eviction policy
- Result: Memory usage dropped 87%, performance unchanged
Lesson learned: TTLs exist for a reason. Use them.
Vector Databases Are Hungry
The Absurdity:
- ChromaDB consumed 4GB RAM for 25 markdown documents
- 160MB per document for semantic search
- Index generation: 47 seconds
- Query latency: 89ms average
- Alternative:
grep -r "pattern" docs/→ 12ms
The Reality Check:
- Total corpus size: 250KB of text
- Vector embeddings: 1,536 dimensions per chunk
- Overhead ratio: 16,000:1 (storage vs. original text)
- Use cases requiring semantic search: 0
Lesson learned: Sometimes grep is enough. Not everything needs AI.
The MCP Protocol Is Still Evolving
The Breaking Change:
- MCP spec 0.3 → 0.4 changed
toolsstructure - Server worked perfectly on Friday
- Monday: All tool calls failed with cryptic errors
- Anthropic docs: "We simplified the schema!"
- My perfectly working implementation: Broken
The Recovery:
- 6 hours rewriting tool definitions
- Updated SDK dependencies
- Rewrote 88 tests
- Added version checking middleware
- Now: Server checks MCP protocol version on startup
Lesson learned: Pin your dependencies when working with beta protocols. Check breaking change logs religiously.
Unexpected Discoveries
The Web UI Nobody Asked For
"Quick" debugging interface → Full React application:
cd web && ./start.sh
# Full standards browser at localhost:3000
# Real-time WebSocket updates
# Rule testing playground
What Started as a 2-Hour Debug Tool:
- Interactive standards browser with search
- Real-time WebSocket updates when standards change
- Rule testing playground with live validation
- Syntax highlighting for code examples
- Mobile-responsive design (because of course)
- Dark mode toggle (essential)
- 50+ components, 3,000 lines of React
The Irony: I use this more than the CLI now. Sometimes procrastination produces useful things.
Performance Benchmarking Addiction
⚠️ Warning: This benchmarking code is for development and testing purposes. Never expose benchmarking endpoints in production without proper authentication and rate limiting.
Built tools to prove server speed. Discovered it was slow. Spent a week optimizing. Now: graphs!
python benchmarks/run_benchmarks.py
# Standard retrieval: 12ms average
# Semantic search: 89ms average
# Rule evaluation: 3ms average
Optimization Journey:
- Initial measurements: 340ms average retrieval (embarrassing)
- Profiled with cProfile: 85% time in JSON parsing
- Added msgpack serialization: 180ms (47% faster)
- Implemented response caching: 45ms (75% faster)
- Final optimization: Lazy-load standard details: 12ms (96% faster)
- Time invested: 1 week
- Users who care about 12ms vs 45ms: 0 (only me)
The Graphs: Created dashboards tracking latency percentiles, cache hit rates, memory usage over time. Nobody asked for these metrics. But they're pretty.
Current State: "It Works on My Machine"
The honest status:
✅ What's Working:
- MCP server starts and serves standards
- 25 standards available and searchable
- CLI works (mostly)
- Tests pass (when Redis is running)
- Web UI loads (in Chrome)
⚠️ What's Flaky:
- Redis randomly disconnects
- Web UI websocket reconnection
- Performance under load (untested with >1 user)
- The documentation
❌ What's Broken:
- Windows support (WSL2 or suffer)
- The release automation I spent 3 days on
- My sleep schedule
Lessons Learned (So Far)
Start Smaller Than You Think
The Evolution:
- Week 1: "Simple wrapper" (200 lines, 1 file)
- Week 2: Added caching (1,200 lines, 8 files)
- Week 3: Added vector search (3,800 lines, 23 files, 4GB RAM)
- Week 4: Added web UI (6,000+ lines, 47 components)
What I Should Have Done:
- Ship Version 1
- Get feedback
- Add features based on actual needs, not hypothetical ones
- Iterate based on real usage
The Reality: Built for an audience of one (me). Over-engineered for problems I don't have.
Perfect Is the Enemy of Deployed
Version Comparison:
- Version 1: 200 lines, works, deployed, useful
- Version 2: 1,200 lines, faster caching, zero users noticed
- Version 3: 3,800 lines, semantic search, solves no real problems
- Version 4: 6,000 lines, full UI, impressive demos, occasional Redis crashes
Value Added Per Version:
- V1 → V2: Marginal (caching saves 50ms on repeated queries)
- V2 → V3: Negative (added complexity, solved nothing)
- V3 → V4: Mixed (UI is useful, but could have been separate project)
Lesson: Should have shipped V1 three weeks ago. Iterated based on real feedback.
Tools Shape Solutions
The Pattern I Fell Into:
- Had Redis → Added caching to everything
- Had ChromaDB → Added vector search everywhere
- Had React experience → Built unnecessary UI
- Had time → Spent it adding features instead of shipping
Tools That Influenced Architecture:
- Redis: L1/L2 cache architecture (for 25 files)
- ChromaDB: Semantic search (for text searchable by grep)
- React: Full web UI (for debugging tool)
- Python async: Everything became async (unnecessary complexity)
The Trap: "I have this hammer, so everything looks like a nail." Technology-driven architecture instead of problem-driven. Classic scope creep in action.
What's Next (The Roadmap I'll Probably Ignore)
The Realistic List:
- Fix the Redis connection issues (week 5 of saying this)
- Write actual documentation
- Add integration tests that test integration
- Simplify the architecture (ha!)
The Dream List:
- VS Code extension
- Direct Claude Desktop integration
- Distributed standards federation
- GraphQL API
Try It Yourself (At Your Own Risk)
If you're brave enough to try this server, you might also want to explore my simpler (and more practical) approach to supercharging Claude CLI with standards that doesn't require Redis or ChromaDB. For insights on orchestrating multi-agent development workflows with Claude-Flow, I've documented how MCP servers integrate with swarm intelligence for complex engineering tasks.
AI skepticism note: MCP is bleeding-edge technology from Anthropic. The spec changes. The libraries break. The examples in the docs don't always work. Building production systems on top of experimental protocols is a recipe for midnight debugging sessions.
⚠️ Warning: These installation commands modify system packages. Use isolated environments (virtual environments, containers) to avoid conflicts.
# The optimistic quick start
pip install mcp-standards-server
mcp-standards serve --stdio
# The realistic setup
git clone [https://github.com/williamzujkowski/mcp-standards-server.git](https://github.com/williamzujkowski/mcp-standards-server.git)
cd mcp-standards-server
python -m venv venv && source venv/bin/activate
pip install -e .
# Fix 17 dependency conflicts
# Install Redis
# Sacrifice a keyboard to the demo gods
python -m src
Fair warning: This is a work in progress. It works, but "works" is doing some heavy lifting here.
The Real Talk
This project taught me something important: The gap between "working prototype" and "production ready" is vast. My standards repository was immediately useful. This MCP server is technically superior and practically inferior. It's harder to install, easier to break, and solves problems that don't exist. For lessons on building resilient systems that fail gracefully, I've applied similar architectural patterns to ensure my homelab services remain operational even when individual components break.
The honest assessment: This is over-engineering as a learning exercise. The first version (200 lines) worked fine. The current version (6,000+ lines) is more impressive and less useful. Sometimes the best code is the code you don't write.
But I learned a ton:
- How MCP works (and doesn't)
- Redis patterns I'll never use again
- Why simple solutions often win
- That scope creep is my superpower and weakness
For practical deployment guidance on zero trust network architecture, securing MCP servers requires similar VLAN isolation and firewall rules to protect sensitive standards data from unauthorized access.
Will I keep working on it? Absolutely. Will it ever be "done"? Absolutely not.
That's the beauty of side projects – they're never finished, only in various states of "good enough for now."
The Bottom Line
Building an MCP server for my standards was like using a sledgehammer to hang a picture. It works, the picture is hung, but there's also a hole in the wall and my neighbors are asking questions.
But hey, it's MY hole in the wall, and I learned how to use a sledgehammer. I tested every feature, broke things multiple times, and rebuilt them better each iteration.
Sometimes that's enough.
Want to contribute? The code is at github.com/williamzujkowski/mcp-standards-server. Issues and PRs welcome. Especially if you know why Redis keeps disconnecting.
Or use the original standards repo. It still works great and doesn't require Redis.
Related Posts
From Claude in Your Terminal to Robots in Your Workshop: The Embodied AI Revolution
Deploy Vision-Language-Action models for embodied AI robots—integrate physical world interaction wit...
AI as Cognitive Infrastructure: The Invisible Architecture Reshaping Human Thought
Understand AI cognitive infrastructure shaping how billions think—explore societal effects of langua...
Vulnerability Management at Scale with Open Source Tools
Build enterprise vulnerability management with open source—deploy scanning, remediation tracking, an...