Skip to content

Data Sources Manager

A centralized, version‑controlled catalog of high‑quality data feeds for LLM‑based projects, with an initial focus on vulnerability research.

Overview

The Data Sources Manager streamlines source discovery, scoring, and consumption, enabling consistent, automated integration of reliable information into downstream workflows.

Features

  • Structured Metadata: Consistent JSON schema for tracking diverse source types
  • Quality Scoring: Numeric scoring (0-100) based on freshness, authority, coverage, and availability
  • User Preference Weights: Customize source priorities based on your needs
  • Automated Updates: Daily fetching, scoring, and indexing via GitHub Actions
  • Fast Lookups: Minimal-token lookups via lightweight SQLite index

Project Structure

data-sources-manager/
├── data-sources/                # Data source metadata files
│   └── vulnerability/           # Grouped by category
│       ├── cve/                 # Subcategories
│       │   ├── nvd.json
│       │   ├── vendor-advisory.json
│       │   └── …
│       ├── exploit-db.json
│       └── …
├── schemas/                     # JSON schema definitions
│   ├── source.schema.json       # Source metadata schema
│   └── quality.schema.json      # Quality scoring schema
├── config/                      # Configuration files
│   ├── categories.json          # Category definitions
│   └── scoring-config.json      # Quality scoring weights
├── tools/                       # Python utilities
│   ├── fetch_sources.py         # Update source data
│   ├── score_sources.py         # Calculate quality scores
│   └── index_sources.py         # Build search index
└── .github/workflows/           # CI/CD automation
    ├── update-sources.yml       # Daily source updates
    └── lint-schemas.yml         # Schema validation

License

This project is available under the MIT License.