Skip to main content

The Problem: AI Tools That Forget Everything

I built a standards repository that reduced Claude CLI token usage by 90% and automated NIST 800-53r5 compliance checks. This complements progressive context loading for efficient LLM workflow optimization, and integrates with mastering prompt engineering, local LLM deployment, and Git workflow automation. The result: 15-minute project setup instead of 2 hours, automatic violation detection across 55 blog posts, and persistent context that survives sessions.

Why it matters: AI tools forget everything between sessions. You explain coding standards repeatedly. Context explodes token budgets. Consistency depends on human memory.

I was using Claude CLI daily and hitting the same frustrations:

  • Explaining my coding style every single time
  • Getting inconsistent suggestions across sessions
  • Watching token counts explode with context
  • Copy-pasting the same standards repeatedly

Then I had an idea: What if I could give Claude permanent memory of how I like to work?

My First Attempt: Complete Disaster

In June 2025, I integrated my standards repo with Claude CLI for the first time. I ran the validation script on my blog codebase. The result?

87 violations across 23 files. I thought I was following best practices, but the automated checker told a different story. Fixing them took 4.5 hours of tedious work, but it probably prevented 12 broken link issues that would have made it to production.

The humbling part? I discovered I'd been consistently making the same mistake with frontmatter formatting across multiple posts. Manual code review never caught it because it looked fine to human eyes.

Enter the Standards Repository

I built github.com/williamzujkowski/standards, a comprehensive collection of development standards designed specifically for LLM consumption. It's an AI instruction manual for your projects.

The Magic: CLAUDE.md

The centerpiece is a file called CLAUDE.md that acts as an intelligent router. Drop it in your project, and suddenly Claude understands:

# Basic usage in your project
@load [CS:python + TS:pytest + SEC:*]

# Natural language works too
"I need to build a secure API"
→ Automatically loads: CS:api + SEC:auth + TS:integration

Real Example: Setting Up a New Project

Here's how I used it yesterday to bootstrap a new Python service:

View full setup script →

Then in Claude:

Me: I'm building a Python API that handles payment processing. 
    Set up the project structure following the standards.

Claude: I'll set up your payment processing API following the standards. 
        Loading [CS:python + CS:api + SEC:payments + COMPLIANCE:pci]...

        [Creates complete project structure with security controls,
         testing setup, CI/CD pipelines, and NIST compliance tags]

The result? A production-ready structure in minutes, not hours.

The Power of Token Optimization

The real game-changer is 90% token reduction (measured across Python, API, and React projects). Instead of feeding Claude entire documentation, I now use shorthand references.

Old way used 5000+ tokens:

  • "Here are my Python standards" (wall of text)
  • "Here are my API patterns" (another wall)
  • "Here are security requirements" (yet another wall)

New way uses less than 100 tokens. @load [CS:python + CS:api + SEC:*] tells Claude exactly what I need.

Claude references the full standards without needing them in context every time. In practice, it works well when standards are self-contained. Cross-referenced standards sometimes lose nuance. The compression trades completeness for speed.

NIST Compliance Built-In

Since I work in government-adjacent spaces, NIST 800-53r5 compliance is crucial. The standards include automatic control tagging:

View NIST compliance example →

Run the compliance checker:

./scripts/setup-nist-hooks.sh
git commit -m "Add user auth"
# Pre-commit hook validates NIST tags automatically

My Favorite Features

1. Context-Aware Loading

Claude detects what you're working on and loads relevant standards:

# Working on React component?
@load context:[auto]
# Automatically loads: FE:react + WD:components + TS:jest

# Fixing a security bug?
@task security_fix
# Loads: SEC:* + TS:regression + CS:error-handling

2. Project Kickstart

The kickstart prompt is pure magic. Feed it your project idea:

Project: Homelab monitoring dashboard
Tech: Python backend, React frontend
Requirements: Real-time metrics, mobile-friendly

[Paste into Claude with kickstart prompt]

Result: Complete implementation plan with:
- Architecture decisions
- Tool recommendations  
- Security considerations
- Testing strategy
- 6-month roadmap

3. Intelligent Suggestions

Claude now makes connections I wouldn't:

You're using Redis for sessions.
Also consider: [rate-limiting, cache-invalidation patterns]
Teams like yours often use: [Redis Sentinel for HA]
Related standard: [CS:caching + SEC:session-management]

Lessons Learned (The Hard Way)

The Pre-Commit Hook Nightmare

I set up a pre-commit hook to run standards validation automatically. First attempt? 100% failure rate. Every single commit got blocked.

After 2 hours of debugging, I discovered the hook was calling the wrong Python interpreter. The PATH issues were subtle. Now it catches violations automatically and I'm still tuning the sensitivity.

The trade-off:

  • Automated validation is fast and catches issues
  • False positives slow down workflow
  • Strict enforcement improves quality
  • Adds friction to rapid prototyping

I've found the benefits outweigh the costs.

False Positive Hell

My initial validation script flagged 312 "violations" across all 55 blog posts. I manually reviewed each one. Turns out 276 were false positives, an 88% false positive rate.

I spent 3 days tuning regex patterns and adjusting thresholds to get the FP rate down to 4%. That was tedious work.

The lesson:

  • Standards prevent errors
  • Automation needs constant refinement
  • Human oversight prevents blockers
  • Tuning takes longer than expected

Was it worth it? Yes. The time saved catching real issues pays back the 3-day investment.

Template Validation: When the Template is Wrong

I created a blog post template to ensure consistency. First 5 posts using it: 2 passed validation, 3 failed due to subtle frontmatter issues.

Turns out the template was wrong, not the posts. After fixing the template, I validated all 48 past posts again, which took 34 minutes of scan time. Found 6 more issues that had propagated from the bad template.

The double-edged sword:

  • Templates ensure consistency
  • Templates constrain creativity
  • Templates are helpful
  • Templates propagate errors systematically

When the template is wrong, every downstream post inherits the mistake.

The CLAUDE.md Evolution

My CLAUDE.md file grew from 120 lines (v1.0) to 2,847 lines (v3.0) over 6 months. Each version added lessons from failed automation attempts.

I rewrote section 4 (enforcement rules) 12 times before I got it right. Or maybe I still haven't got it right. The complexity catches more edge cases now.

The trade-off between comprehensive rules and maintainability is constant:

  • More rules catch more issues
  • More rules make the system harder to understand
  • More rules require more maintenance effort
  • Complexity vs coverage is a sliding scale

I'm still finding the right balance.

Validation Speed: Fast but Hungry

Initial validation script took 147 seconds to scan all posts. That's too slow for a pre-commit hook.

After optimization with parallel processing and caching, I reduced it to 12 seconds. The cost: memory usage went from 1.8GB to 2.1GB, a 15% increase. For my laptop, that's acceptable. For CI servers with limited RAM, this could be a problem.

The speed improvement is worth the memory cost for my use case. Larger codebases need to test this trade-off.

Git Hook Bypass Discovery

I discovered I could bypass standards validation with git commit --no-verify. That defeated the entire purpose.

I immediately disabled that option by making hooks exit with code 1 on detection. The cost: 3 commits got rejected that I thought were fine. Humbling moment. Turns out my judgment of "good enough" isn't always aligned with the standards I set for myself.

What this taught me:

  • Automation doesn't trust humans
  • Human judgment is flexible
  • Human judgment is inconsistent
  • Standards enforce what I say I want, not what I think I want in the moment

Both frustrating and valuable.

What Worked

  • Start small: I began with just Python standards, expanded gradually
  • Version everything: Standards evolve, Git tracks the journey
  • Real examples: Abstract standards work poorly, concrete code examples work better
  • Token counting: Every character matters for LLM efficiency (I measured 90% reduction after optimization)

What Didn't

  • Over-engineering: My first version had 200+ micro-standards. Way too much complexity.
  • Perfect structure: Spent weeks organizing folders. Claude doesn't care about folder beauty.
  • Forcing adoption: People need to see value before they'll use new tools

Setting It Up for Your Projects

Want to try this yourself? Here's my recommended approach:

Quick Start (5 minutes)

# 1. Add to existing project
curl -O [https://raw.githubusercontent.com/williamzujkowski/standards/master/docs/core/CLAUDE.md](https://raw.githubusercontent.com/williamzujkowski/standards/master/docs/core/CLAUDE.md)

# 2. Tell Claude about it
"Use CLAUDE.md for standards. I'm building [your project type]"

# 3. Watch the magic happen

Full Integration (30 minutes)

View complete integration script →

Real-World Impact: The Numbers

Since implementing this system with all its rough edges, I've measured concrete improvements. For AI development best practices, explore mastering prompt engineering and local LLM deployment:

Time savings:

  • Setup time: 2 hours → 15 minutes for new projects
  • Validation time: 147 seconds → 12 seconds (15% memory cost increase)

Quality improvements:

  • Token usage: 5,000+ tokens → 750 tokens (85% reduction)
  • False positive rate: 88% → 4% after 3 days of tuning

The journey milestones:

  • 87 initial violations found across 23 files
  • 4.5 hours spent fixing them
  • 6 issues propagated from a bad template
  • CLAUDE.md grew from 120 lines to 2,847 lines over 6 months
  • Section 4 was rewritten 12 iterations
  • 3 commits got rejected by hooks that I thought were fine
  • Full portfolio scan now takes 34 minutes

The benefits are real. The system requires ongoing maintenance. Standards reduce errors and add workflow complexity. I've found the trade-off acceptable for my projects. Your mileage may vary.

Tips for Claude CLI Power Users

1. Create Project-Specific Standards

# In your project's CLAUDE.md
project_context:
  style: "Google Python style"
  testing: "pytest with 90% coverage"
  special_rules:
    - "All API endpoints need rate limiting"
    - "Use structured logging everywhere"

2. Chain Commands Efficiently

View automated workflow example →

3. Build Your Own Standards

Don't just use mine – fork and customize:

# Add your team's conventions
team_standards:
  pr_size: "Max 400 lines"
  commit_style: "Conventional commits"
  review_sla: "24 hours"

The Unexpected Benefits (And Costs)

Beyond the obvious productivity gains, this system has delivered unexpected benefits and costs:

Documented tribal knowledge:

  • Those "oh, we always do X" conversations are now codified
  • Cost: updating docs is another maintenance burden

Improved code reviews:

  • "Does this follow our standards?" became "Run the checker"
  • Cost: reduces human judgment in reviews

Easier onboarding:

  • Hand new devs the standards repo, they're ready to go
  • Cost: they follow rules without understanding why

Consistent AI assistance:

  • Claude gives the same advice every time
  • Cost: limiting if the standards need updating

The trade-off between consistency and flexibility is ongoing. Pre-commit hooks catch issues and add friction. Comprehensive standards help and require maintenance effort. I've found the system valuable for my projects. I'm still tuning the balance between safety and speed.

Where It's Heading

I'm working on several ideas. These might change based on what proves useful.

VS Code extension:

  • Real-time standard suggestions while coding
  • Challenge: figuring out the extension API

GitHub Actions integration:

  • Automated standards enforcement in PRs
  • Challenge: performance on CI

Team analytics:

  • Track which standards get used or violated most
  • Challenge: privacy concerns need addressing

LLM fine-tuning:

  • Train models specifically on your standards
  • Challenge: ROI justification

These are ideas, not promises. I've learned that what sounds good in theory doesn't always work in practice. The validation script seemed simple until I hit the false positive problem.

Try It Yourself

The repository is open source and MIT licensed. Fork it, customize it, make it yours at github.com/williamzujkowski/standards.

Start small. Even just adding a CLAUDE.md with your basic preferences will transform how you work with AI tools.

The Bottom Line

We're using AI tools wrong if we're explaining the same things repeatedly. These tools should learn our preferences once and apply them consistently.

This standards repository turns Claude CLI from a smart tool into YOUR smart tool. One that knows your style, your requirements, and your way of working.

The future isn't just AI-assisted development. It's AI that knows how you like to develop.


Have you built something similar? How do you maintain consistency with AI tools? Drop me a line – I'm always looking for new patterns to steal... I mean, learn from.

Related Posts