Vulnerability Management at Scale with Open Source Tools

BLUF: Enterprise Security Without Enterprise Costs

Modern vulnerability management demands continuous asset discovery, multi-source vulnerability correlation, risk-based prioritization, and automated remediation tracking. These capabilities traditionally require Qualys, Rapid7, or Tenable deployments costing $100K+ annually. Open source alternatives, when properly integrated, match or exceed commercial capabilities at zero licensing cost.

The challenge isn't finding open source security tools, it's architecting them into a cohesive vulnerability management program. Over three years, I built and refined an open source stack managing 200+ assets and 10,000+ vulnerability findings in my homelab. This system achieves Mean Time to Detect (MTTD) under 24 hours and Mean Time to Remediate (MTTR) under 7 days for critical vulnerabilities, metrics that rival commercial platforms.

The architecture includes:

24 integrated open source tools spanning discovery, scanning, orchestration, and visualization
Automated daily scanning across VMs, containers, bare metal, and cloud assets
Risk-based prioritization using CVSS scoring and exploit prediction
Grafana dashboards tracking 200+ assets with real-time vulnerability status
PostgreSQL + Elasticsearch backend handling 10,000+ vulnerability records[1]

This isn't theoretical, it's the battle-tested vulnerability management system protecting my infrastructure. Here's how to build it, including the mistakes that taught me what actually matters.

The Homelab Challenge That Changed Everything

Digital security concept with code and lock symbols Photo by Franck on Unsplash

I built a comprehensive vulnerability management system for my homelab using only open source tools. The challenge: achieve enterprise-grade capabilities without commercial licensing costs.

Vulnerability Management Architecture

graph TB
    subgraph "Data Collection"
        NVD[NVD Database]
        CVE[CVE/MITRE]
        GitHub[GitHub Advisory]
        OSV[OSV Database][9]
    end
    
    subgraph "Processing Pipeline"
        Collect[Data Collector]
        Parse[CVE Parser]
        Enrich[Data Enricher]
        Score[Risk Scorer]
    end
    
    subgraph "Storage & Analysis"
        DB[(PostgreSQL)]
        Cache[(Redis Cache)]
        ML[ML Analysis]
    end
    
    subgraph "Output"
        API[REST API]
        Dashboard[Dashboard]
        Alerts[Alert System]
    end
    
    NVD --> Collect
    CVE --> Collect
    GitHub --> Collect
    OSV --> Collect
    
    Collect --> Parse
    Parse --> Enrich
    Enrich --> Score
    
    Score --> DB
    Score --> Cache
    DB --> ML
    
    ML --> API
    ML --> Dashboard
    ML --> Alerts
    
    style Collect fill:#4caf50
    style Score fill:#ff9800
    style Alerts fill:#f44336

I developed an approach that combines multiple open source tools into a cohesive vulnerability management system. This solution, refined over years of experimentation, demonstrates that open source can match commercial capabilities when properly integrated.

Here's how we built it – mistakes, victories, and all.

The Complete Vulnerability Management Stack

Here's the stack I've successfully deployed in my homelab:

Discovery: Nmap, Masscan, Rumble
Vulnerability Scanning: OpenVAS/GVM[8], Nuclei, Wazuh[11]
Container Scanning: Trivy[6], Grype[7], Clair
Web Application: OWASP ZAP, Nikto, SQLMap[12]
Orchestration: Apache Airflow[10], n8n
Data Management: PostgreSQL, Elasticsearch
Visualization: Grafana, Kibana
Ticketing: GLPI, Request Tracker

Building the Foundation

Asset Discovery and Inventory

You can't protect what you don't know exists. Here's how to build comprehensive visibility:

Discovery Methods:

Active Scanning: Nmap for detailed host/service enumeration, Masscan for high-speed network sweeps (5M packets/sec capability)
Passive Monitoring: Network traffic analysis, DHCP logs, DNS queries for zero-impact discovery
Cloud API Integration: AWS Config, Azure Resource Graph, GCP Asset Inventory for authoritative cloud records
Agent-Based Inventory: Wazuh agents reporting from endpoints, especially critical for laptops and mobile devices

Asset Classification:

Infrastructure: Physical servers, virtual machines, network devices (switches, routers, firewalls)
Cloud Resources: EC2/VM instances, S3 buckets, Lambda functions, managed databases
Containers & Orchestration: Docker containers, Kubernetes pods, ephemeral workloads (average lifespan: 2-4 hours)
IoT/OT Devices: SCADA systems, building automation, IP cameras (often the forgotten attack surface)

Challenge Areas:

Shadow IT Discovery: Detecting unauthorized SaaS apps, rogue cloud accounts, developer sandbox environments
Cloud Sprawl: Multi-account AWS setups, cross-region resources, orphaned instances still incurring costs
Ephemeral Infrastructure: Container lifespans measured in minutes, auto-scaling groups creating/destroying instances
Multi-Cloud Visibility: Unified inventory across AWS, Azure, GCP without vendor lock-in

Validation and Ownership:

CMDB Reconciliation: Daily sync between discovered assets and authoritative records, flag discrepancies (expect 5-10% drift)
Ownership Assignment: Automated tagging based on VPC/subnet, AD group membership, or cost center tags
Decommission Detection: Identify assets offline >30 days, trigger cleanup workflows to prevent ghost asset accumulation

Vulnerability Scanning Orchestration

Integrate multiple scanners for comprehensive coverage, because no single tool catches everything.

Scanner Selection by Asset Type:

Network Infrastructure: OpenVAS/GVM[8] for comprehensive CVE coverage (90K+ vulnerability tests), authenticated scanning for patch levels
Containers: Trivy[6] for image scanning (supports 50+ OS packages), Grype[7] for SBOM analysis, Clair for registry integration
Web Applications: OWASP ZAP for active scanning, Nikto for web server misconfigurations, Nuclei for custom templates
Infrastructure as Code: Checkov for Terraform/CloudFormation, Terrascan for policy violations before deployment

Scanning Strategies:

Authentication: Credentialed scans find 3-4x more vulnerabilities than unauthenticated (but require secure credential vaulting)
Frequency Tiers: Critical assets daily, production weekly, dev/test monthly (balance coverage vs. network load)
Bandwidth Management: Rate limiting to <10% network capacity, schedule intensive scans during maintenance windows
Scan Windows: Production scans 2AM-6AM local time, coordinate with change freeze periods

False Positive Management:

Baseline Noise: Expect 20-40% false positive rate on first scan, drops to 5-10% after tuning
Triage Workflows: Automated pre-filtering based on CVSS score, asset criticality, exploit availability
Suppression Rules: Document exceptions (e.g., WAF-protected web apps, compensating controls), expire annually
Exception Tracking: Risk acceptance requires business owner approval, annual recertification

Orchestration and Aggregation:

Apache Airflow DAGs: Daily scan workflows, dependency chains (discovery → scan → normalize → dedupe)
Scan Scheduling: Staggered start times to prevent thundering herd on scanners, retry logic for transient failures
Result Aggregation: Normalize scanner outputs (SARIF, CycloneDX, OWASP Dependency-Check formats) into unified schema
Deduplication Logic: Merge findings across scanners (e.g., OpenVAS + Trivy both flag OpenSSL CVE), single source of truth

Remediation Tracking and Automation

Vulnerability management isn't just about finding issues – it's about fixing them. A structured remediation program requires prioritization frameworks, defined SLAs, automation opportunities, and comprehensive tracking mechanisms.

Prioritization Framework:
- CVSS base score[3] (severity foundation)
- EPSS exploit probability[4] (likelihood of active exploitation)
- CISA KEV catalog membership[2] (known exploited vulnerabilities)
- Asset criticality weighting (business impact multiplier)
SLA Recommendations:
- Critical vulnerabilities: <7 days to remediation
- High vulnerabilities: <30 days to remediation
- Medium vulnerabilities: <90 days to remediation
- Low vulnerabilities: next scheduled maintenance window
Automation Opportunities:
- Patch management via Ansible playbooks
- Configuration remediation (secure defaults, hardening)
- Temporary mitigations (WAF rules, network segmentation)
- Rollback procedures for failed patches
Tracking Mechanisms:
- Automatic ticket creation in GLPI or Request Tracker
- Assignment workflows based on asset ownership
- Aging alerts for approaching SLA breaches
- Velocity metrics showing remediation throughput
- Exception approval workflows for accepted risk

Dashboards and Reporting

Visualization is crucial for managing vulnerabilities at scale. Different stakeholders require different views: executives need strategic KPIs, security teams need operational metrics, and compliance teams need control coverage evidence.

Executive KPIs:
- Total risk score trending over time
- Mean Time to Detect (MTTD) / Mean Time to Remediate (MTTR) by severity
- Coverage percentage across asset inventory
- SLA compliance rate by team/business unit
Security Team Operational Dashboards:
- Vulnerability inventory breakdown by severity (Critical/High/Medium/Low)
- Scan completion status (last scan time, failed scans, coverage gaps)
- False positive queue requiring triage
- Remediation backlog by team with aging distribution
Compliance Views:
- NIST 800-53[5] control coverage (vulnerability scanning requirements)
- PCI DSS requirement mapping (quarterly scanning, critical patching)
- Vulnerability age distribution (evidence of timely remediation)
- Patch compliance percentages by system type
Alerting Configuration:
- Critical vulnerability detection (CVSS ≥9.0, CISA KEV additions)
- SLA breach warnings (7-day, 30-day, 90-day thresholds)
- Scan failures requiring investigation
- New asset discovery requiring baseline scanning

Integration and Automation

The key to scaling vulnerability management is automation:

# Apache Airflow DAG for automated vulnerability management
from airflow import DAG
from airflow.operators.python_operator import PythonOperator
from airflow.operators.bash_operator import BashOperator
from datetime import datetime, timedelta
    # ... (additional implementation details)
# Define workflow
asset_discovery >> vulnerability_scan >> create_remediations >> auto_remediate >> generate_reports

Lessons Learned (The Hard Way)

After years of building and running vulnerability management programs, here's what actually matters:

1. Start with Asset Management (Seriously)

You can't secure what you don't know exists. I learned this when I discovered an old Raspberry Pi during a routine homelab scan – it had been running unpatched for months because I'd forgotten about it. It was hosting "just a test service" that turned out to have direct internet access.

Invest heavily in discovery. Your future self will thank you.

Asset management essentials:

Continuous discovery cycles (daily automated scans for critical networks, weekly for development environments)
Automated inventory reconciliation against CMDB with daily deviation reports
Ownership attribution tied to cost centers or AD groups for accountability
Decommission detection workflows triggering after 30 days offline to prevent ghost asset accumulation

2. Automate Everything Possible (Your Sanity Depends On It)

Manual processes don't scale. When you're managing 10 servers, spreadsheets work. At 100, you're drowning. At 1,000? You're already underwater.

Automate scanning, ticketing, and remediation. The time you spend automating today is time you won't spend firefighting tomorrow.

Automation milestones by scale:

10 servers: Manual scanning with documented procedures, spreadsheet tracking (still manageable)
100 servers: Automated scanning schedules, API-driven ticket creation, standardized remediation playbooks
1,000+ servers: Full orchestration with Apache Airflow, auto-remediation for low-risk patches, ML-assisted prioritization
Cross-cutting automation: Scan result normalization, deduplication logic, SLA monitoring alerts, remediation verification scans

3. Context is King (Not All Vulnerabilities Are Equal)

A critical vulnerability on a development server != critical vulnerability on production domain controller.

I once had a friend panic about a critical SSH vulnerability they found in my homelab. Yes, it was vulnerable. It was my honeypot – that was the point.

Risk-based prioritization factors:

Asset criticality weighting (production databases > development sandboxes > isolated test environments)
Network exposure assessment (internet-facing > DMZ > internal segmented networks)
Compensating controls evaluation (WAF protection, network segmentation, EDR coverage)
Business impact analysis (revenue-generating systems prioritized over back-office tools)

4. Measure What Actually Matters

Forget vanity metrics. Track these instead:

Operational efficiency metrics:

Mean Time to Detect (MTTD): How fast do you find problems? (Target: <24 hours for critical assets)
Mean Time to Remediate (MTTR): How fast do you fix them? (Target: <7 days for critical, <30 days for high)
Vulnerability aging distribution: Are old vulns festering? (Flag anything >90 days for critical/high severity)
Coverage percentage: What percentage of your infrastructure are you actually scanning? (Target: >95% of production assets)
False positive rate: Trending toward <10% after initial tuning period
SLA compliance rate: Percentage of vulnerabilities remediated within defined timeframes by severity tier
Remediation velocity: Number of vulnerabilities closed per week, trending over time to identify capacity constraints

If you can't answer these questions, you're flying blind. These metrics align with the SANS Vulnerability Management Maturity Model[13], providing a roadmap from initial to optimized program maturity.

5. Integration is Essential (Islands of Excellence Are Still Islands)

Your vulnerability management tools must integrate with:

Critical integration points:

CMDB/Asset Management: Authoritative source for asset inventory, ownership assignment, criticality ratings (know what you're protecting)
Ticketing systems: Automated workflow creation in GLPI/Jira/ServiceNow, assignment based on asset ownership (track the work)
CI/CD pipelines: Container scanning in build stages, infrastructure-as-code policy checks, break builds on critical findings (shift left, catch issues early)
SIEM/SOAR platforms: Correlation with active threats, exploit attempt detection, automated response playbooks (correlate with actual threats)
Patch management systems: Remediation verification, deployment success confirmation, rollback coordination
Configuration management: Ansible Tower/AWX integration for automated remediation playbook execution
Cloud APIs: AWS Security Hub, Azure Security Center, GCP Security Command Center for multi-cloud visibility

A disconnected tool is a tool that will be abandoned.

Conclusion

Building an effective vulnerability management program with open source tools is superior to commercial solutions when properly integrated. The key is thoughtful integration, automation, and continuous improvement.

Start small with asset discovery and basic scanning, then gradually add automation, integration, and advanced features. The framework presented here has successfully managed vulnerabilities across dozens of systems in my homelab environment, providing insights that scale to enterprise deployments.

Remember: vulnerability management is a program, not a project. Build it to be sustainable, scalable, and automated from day one.

References

NIST National Vulnerability Database (NVD) - U.S. government repository maintaining 200,000+ CVE records with CVSS scores, Common Platform Enumeration (CPE) data, and remediation guidance. Serves as the authoritative source for vulnerability intelligence worldwide, providing the foundation for vulnerability correlation across scanning tools.
CISA Known Exploited Vulnerabilities (KEV) Catalog - Cybersecurity and Infrastructure Security Agency's catalog of 1,100+ vulnerabilities with confirmed active exploitation in the wild. Provides binding operational directives for federal agencies and best-practice prioritization guidance for all organizations, updated continuously as new exploits emerge.
FIRST.org CVSS v3.1 Specification - Industry standard severity scoring system maintained by the Forum of Incident Response and Security Teams (FIRST). Provides consistent vulnerability severity assessment across base score (inherent characteristics), temporal score (exploit availability), and environmental score (organizational impact).
FIRST.org Exploit Prediction Scoring System (EPSS) - Probability-based framework estimating the likelihood of vulnerability exploitation within 30 days. Uses machine learning models analyzing 1,000+ features including exploit code availability, social media mentions, and threat intelligence feeds to prioritize remediation beyond CVSS severity alone.
NIST Special Publication 800-53 Rev. 5 - Comprehensive security and privacy control catalog for federal information systems and organizations. Defines vulnerability scanning requirements (RA-5) including frequency, coverage scope, remediation tracking, and information sharing that form the compliance foundation for federal and many commercial vulnerability management programs.
Trivy - Container Vulnerability Scanner - Aqua Security's open source vulnerability scanner supporting container images, filesystems, Git repositories, and infrastructure as code. Detects OS packages (Alpine, RHEL, Debian, Ubuntu, etc.) and language-specific dependencies (npm, pip, Maven, etc.) with comprehensive CVE database coverage and SBOM generation capabilities.
Grype - Vulnerability Scanner for Container Images and Filesystems - Anchore's open source scanner providing fast vulnerability detection against multiple databases (NVD, GitHub Security Advisories, OS-specific feeds). Supports SBOM ingestion via Syft, making it ideal for CI/CD pipeline integration where build-time vulnerability blocking is required.
Greenbone Vulnerability Management (GVM) / OpenVAS - Comprehensive open source vulnerability scanner maintaining 90,000+ network vulnerability tests (NVTs). Provides authenticated and unauthenticated scanning, compliance policy checks, and extensive CVE coverage rivaling commercial platforms. The community edition offers full scanning capabilities without licensing restrictions.
OSV.dev - Open Source Vulnerabilities Database - Distributed vulnerability database aggregating data from GitHub Security Advisories, PyPI, RustSec, and other ecosystem-specific sources. Provides precise affected version ranges and standardized JSON format ideal for automated tooling integration, particularly strong for open source dependency vulnerabilities.
Apache Airflow Documentation - Workflow orchestration platform enabling programmatic scheduling and monitoring of complex data pipelines. In vulnerability management contexts, Airflow DAGs coordinate multi-stage scanning workflows (discovery → scan → normalize → dedupe → ticket creation) with dependency management, retry logic, and detailed execution logging.
Wazuh Open Source Security Platform - Unified XDR and SIEM platform providing intrusion detection, vulnerability detection, compliance monitoring, and threat intelligence integration. Deploys agents to endpoints for continuous monitoring and integrates with vulnerability scanners to correlate detected vulnerabilities with actual exploitation attempts in real-time.
OWASP Top 10 Web Application Security Risks - Open Web Application Security Project's definitive awareness document identifying the most critical security risks to web applications. Updated every 3-4 years based on data from 40+ organizations spanning 400,000+ applications, providing the vulnerability categories that web application scanners prioritize in their detection rules.
SANS Vulnerability Management Maturity Model - Framework for assessing and improving vulnerability management program maturity across five levels: initial (reactive, manual processes), developing (some automation, inconsistent coverage), defined (standardized processes, full asset coverage), managed (metrics-driven, SLA compliance), and optimized (continuous improvement, predictive analytics). Provides roadmap for organizations evolving from ad-hoc scanning to enterprise-scale vulnerability programs.

Questions about scaling vulnerability management? Want to share your open source security stack? Let's connect and improve our collective security posture!