Python Dependency Security: Scanning PyPI... | GeekWala
Loading...
Skip to main content
Ecosystem Guide

Python Dependency Scanning: What pip-audit Misses

Python's dependency ecosystem makes security uniquely challenging — fragmented advisory databases, multiple manifest formats, and transitive dependency chains. Learn how to check your PyPI packages for vulnerabilities and build a practical security workflow.

GeekWala Team
Updated
11 min read

The data science team ran pip install and went home. Six months later, a vulnerability buried three levels deep in their TensorFlow dependency chain made it into their production API. Nobody had scanned for it. Nobody knew it was there.

pip install requests pulls in 4 dependencies. pip install tensorflow pulls in 97. Any of them could be vulnerable—and Python's advisory ecosystem is fragmented enough that no single tool will catch everything. Unlike npm, which has a unified advisory database and npm audit baked into the CLI, Python's vulnerability data lives scattered across NVD, OSV, PyPI, and GitHub advisories that don't fully sync with each other.

This guide cuts through the fragmentation: here's why Python scanning is harder than Node, what each tool actually covers, and how to build a practical workflow that catches the vulnerabilities that matter before they reach production.

TL;DR: Python's security tooling is more fragmented than npm's — multiple advisory databases don't fully sync, there are 7 different dependency manifest formats, and transitive dependency trees can run 100+ packages deep. pip-audit is a solid local tool but misses exploitation signals and some database coverage. For production monitoring, you need EPSS (exploitation probability) and CISA KEV on top of advisory data, plus support for poetry.lock and Pipfile.lock. Always use lock files — without them, scanning is guesswork.

What We'll Cover

The Python Dependency Landscape (It's Messier Than You Think)

Python's security problems stem from architectural choices made years ago, before security became table stakes:

Multiple advisory databases that don't sync: npm has a single security advisory channel. Python has fragmented databases where no single source covers everything:

Python Vulnerability Advisory Coverage (approximate)

    ┌─────────────────────────────────────────────┐
    │                NVD (NIST)                    │
    │   ┌─────────────────────────────────┐       │
    │   │         OSV / GitHub            │       │
    │   │   ┌─────────────────────┐       │       │
    │   │   │   PyPI Advisories   │       │       │
    │   │   │                     │       │       │
    │   │   │   Only here: ~15%   │       │       │
    │   │   │   of Python vulns   │       │       │
    │   │   └─────────────────────┘       │       │
    │   │   Only here: ~10%               │       │
    │   └─────────────────────────────────┘       │
    │   Only here: ~20%                           │
    │   (C extension vulns, system lib vulns)     │
    └─────────────────────────────────────────────┘
    Overlap: ~55%   |   Gap (not in any): ~5-10%

A vulnerability might be in NVD but skipped by the PyPI project. Or disclosed on GitHub but not reported to NIST. pip-audit queries PyPI + OSV. safety queries its own database. You could scan the same project with both tools and get different results. GeekWala queries all four sources simultaneously through OSV aggregation.

Dependency manifest chaos: Python has too many ways to declare dependencies, and tools don't support all of them equally:

File TypeWhat It DoesTool Support
requirements.txtExact pinned versions (best)Most tools, but often ignores comments
setup.pyDeclarative with metadataRequires parsing Python code (fragile)
setup.cfgINI-style declarativeGood support, rarely used
pyproject.tomlModern standardGrowing support, edge cases remain
poetry.lockLock file from PoetryLimited tool support
Pipfile.lockLock file from PipenvBarely supported outside Pipenv

A project using poetry.lock? Some scanners just ignore it. Using pyproject.toml with complex version specs? You might get false positives or false negatives on transitive dependencies.

Transitive dependencies explode: Data science projects are notorious for this. Install scipy, get 30 indirect packages. Install tensorflow, get 100+. Each one is a potential vulnerability vector. Tracking exact versions across this tree requires lock files and accurate resolution—and many Python teams skip lock files entirely.

Slow disclosure cycle: JavaScript has a rapid-response culture. A Node.js vulnerability drops on Monday, exploit PoC on Wednesday, everyone patches by Friday. Python projects? Some take months to report fixes. Obscure packages might never get fixes.

What Dependency Files GeekWala Actually Scans

We support the main formats, but with caveats:

✓ requirements.txt        Exact versions only (requests==2.31.0)
✓ setup.py               With install_requires
✓ setup.cfg              INI-style metadata
✓ pyproject.toml         PEP 517/518 standard
✓ poetry.lock            Full dependency tree
✓ Pipfile.lock           Full dependency tree
✓ requirements.lock      pip-compile output
✗ requirements-dev.txt   (unpinned) → Unscannable
✗ abstract specs         (requests>=2.0) → Can't resolve exact version

Real talk: If you're not using a lock file, you're not getting accurate vulnerability scanning. Period. Tools can't map requests>=2.0 to a specific version unless they resolve it themselves, and different environments might resolve to different versions. Use poetry.lock, Pipfile.lock, or pip-compile output. Commit it. Move on.

pip-audit vs GeekWala: When Each Tool Makes Sense

pip-audit is the official PyPA tool. It's lightweight, runs locally, and is genuinely trustworthy. But "official" doesn't mean "sufficient."

pip-audit's strengths are real: it integrates with virtualenvs (pip-audit --require-hashes), runs offline, and is maintained by the Python Packaging Authority. But it has Python-specific limitations that matter for production security:

Capabilitypip-auditGeekWala
Advisory sourcesPyPI + OSVOSV + NVD + CISA KEV (wider coverage for C extension vulns)
poetry.lock / Pipfile.lock✗ (requires pip format)✓ native parsing
Virtualenv integrationpip-audit -r / --strict✗ (file upload)
Transitive path tracingLists affected packageShows full chain: app → requests → urllib3 → CVE
Exploitation signalsCVSS onlyEPSS + CISA KEV
C extension vuln coverageLimited (Python advisories only)Cross-database (catches numpy/pillow C-layer vulns in NVD)
Historical trend tracking✓ (track EPSS changes over time)

Use pip-audit for local development — it catches obvious issues with zero setup, integrates into virtualenvs, and works offline. Run it in pre-commit hooks.

Use GeekWala for production monitoring — the EPSS and KEV enrichment, broader database coverage, and transitive path tracing matter when you're prioritizing 50+ findings across a team. Scan your Python dependencies →


See which PyPI vulnerabilities in your project are actually being exploited.

Upload your requirements.txt or poetry.lock → — get findings enriched with EPSS scores and CISA KEV status in under a minute. No account needed.


Real-World Python Vulnerability Examples

Vulnerabilities rarely stay contained in a single package. They cascade:

Example 1: urllib3 → requests → your app

In 2023, urllib3 had CVE-2023-45803 (CVSS 7.5, a proxy validation bypass). Your app doesn't directly depend on urllib3 — it comes in transitively:

your-app/
├── requirements.txt
│   └── requests==2.31.0         ← You installed this
│       ├── urllib3==2.0.5       ← This came along for the ride
│       │   └── 🔴 CVE-2023-45803 (CVSS 7.5, proxy bypass)
│       ├── charset-normalizer
│       ├── idna
│       └── certifi
└── ...

pip-audit tells you "urllib3 has a vulnerability." GeekWala tells you the path: your-app → requests-2.31.0 → urllib3-2.0.5 → CVE-2023-45803. This matters because your fix is to bump requests (which pulls a patched urllib3), not to pin urllib3 directly — direct pinning can break requests' version constraints.

Example 2: cryptography (the cascading dependency)

cryptography is installed by 80%+ of PyPI packages, either directly or transitively. When cryptography-41.0.0 had a critical issue, it affected thousands of projects:

  • Django projects using encryption
  • FastAPI apps with auth
  • Data science projects using paramiko for SSH
  • ML pipelines using TensorFlow

A single vulnerability cascaded through dozens of unrelated packages. Your scanner must track this path accurately.

Example 3: YAML deserialization in PyYAML

PyYAML versions < 6.0 deserialize arbitrary Python objects by default. This is a code execution vulnerability if you load untrusted YAML. But if you control all YAML being loaded, it's a false positive.

CVSS says "Critical" (9.8). EPSS says "Low" (0.15). Why? Because most exploits require you to already control the input. Scanners that only look at CVSS flag this as critical. Scanners that include EPSS deprioritize it. GeekWala shows both, so you decide.

Setting Up Python Scanning: A Practical Workflow

Step 1: Audit your current state

# If using pip + requirements.txt
pip freeze > requirements.txt

# If using Poetry
poetry lock  # Already committed

# If using Pipenv
pipenv lock  # Already committed

# If using setuptools + pyproject.toml
# Ensure [project] section exists with dependencies

Step 2: Upload to GeekWala

Visit GeekWala's Python scanning page and upload your dependency file. GeekWala parses it and queries the NVD, OSV, PyPI, GitHub, and CISA KEV all at once.

Step 3: Interpret the results

GeekWala enriches each finding with three signals: CVSS (severity), EPSS (exploitation probability), and CISA KEV (confirmed active exploitation). Sort by KEV first, then EPSS — not by CVSS.

For Python specifically, pay extra attention to vulnerabilities in these high-impact packages:

Critical Python packages (vulnerability here = urgent regardless of EPSS):
  cryptography    → Used by 80%+ of PyPI transitively
  urllib3         → Under requests, the most-installed package
  numpy/scipy     → C extensions = memory safety risks
  pillow          → Image parsing = classic attack surface
  django/flask    → Web framework vulns = direct exposure

Step 4: Trace the dependency path

Click any vulnerability and GeekWala shows you: "Which package brought this in? Can I update just that package, or do I need to update a parent?"

Step 5: Set up continuous scanning

For production projects, set up scheduled scans:

  • Daily or weekly automatic scans
  • Webhooks notify Slack when new vulnerabilities appear
  • Custom thresholds (e.g., "fail CI if any vulnerability has EPSS > 0.8 and no patch exists")

Why Python Dependency Security Is Different From Node

Python teams often ask: "Why can't Python scanning be as simple as npm?"

Three reasons:

  1. No centralized advisory database: npm's advisory system is built into the registry. Python's advisory data lives in multiple places that don't sync. This means no tool can be 100% comprehensive without querying multiple sources.

  2. Fragmented package ecosystem: Python allows pure wheels, C extensions, system bindings, and mixed environments. A vulnerability in NumPy's C layer might not appear in Python-specific databases. You need cross-ecosystem scanning.

  3. Multiple manifest formats with poor tooling support: npm has package.json and package-lock.json. Python has 7 different ways to declare dependencies, and tools support different subsets. Poetry users get different scanning coverage than Poetry users. This is architectural fragmentation.

Bottom line: Python scanning requires more vigilance. Use lock files. Use multiple sources. Monitor EPSS scores, not just CVSS. And run continuous scans—a new exploit might emerge months after a CVE is published.

Python Dependency Security Best Practices

1. Always use lock files

Never commit abstract versions. Use poetry.lock or pip freeze > requirements.lock. This is non-negotiable. Your CI environment must build the same dependency tree every time, otherwise scanning is meaningless.

2. Automate lock file updates

Python's dependency trees are deep. Use:

  • pip-compile (pip-tools) for requirements.txt
  • dependabot (GitHub) for automatic updates
  • renovate (GitLab/GitHub) for sophisticated version bumping

Let these tools handle transitive dependency updates automatically. Review the PRs, test in CI, merge. Don't manually manage lock files for large projects.

3. Scan weekly, not just before release

EPSS scores change constantly. A vulnerability disclosed with EPSS 0.1 might spike to 0.9 when a PoC emerges. Weekly scans catch rising trends before they become emergencies.

4. Monitor data science libraries obsessively

NumPy, SciPy, Pandas, TensorFlow—these are high-value targets. A vulnerability in NumPy affects 100,000+ downstream projects. Don't ignore "moderate" CVSS vulnerabilities in these packages. Patch them faster than you would a low-traffic package.

5. Understand the difference between "patched" and "fixed for your code path"

A package might have a vulnerability you don't actually hit. Example: "Deserialization vulnerability in YAML parser—only affects untrusted YAML input." If you only load YAML you control, you might defer patching. But this is conservative thinking. Supply chain attacks could inject new code later. Patch conservatively.

6. Keep cryptography up to date

cryptography is installed by ~80% of PyPI packages transitively. When it has a vulnerability, update it immediately. Don't wait for dependency chains to bubble up the update—pin it directly if needed.

7. Test updates in staging before production

A newer version of Django might fix a vulnerability but break an integration point. Test in staging. Never auto-deploy critical library updates to production without human verification.

Frequently Asked Questions

What if a transitive dependency has a vulnerability but the direct parent doesn't?

You're still responsible. In Python, transitive dependencies are where most vulnerabilities hide — requests alone brings in 4 packages. Update the parent package (preferred) or pin the transitive dependency directly in your lock file. Use pip-compile or poetry update to resolve the chain cleanly.

My project uses setup.py with install_requires — can I scan it?

Yes, but with caveats. setup.py declares abstract version ranges (requests>=2.0), which can't be mapped to specific vulnerable versions without resolution. GeekWala will attempt to parse it, but results are more accurate with a lock file. Run pip freeze > requirements.txt from your virtualenv for a snapshot scan.

A vulnerability was published, but my scanner didn't catch it — why?

Python's advisory fragmentation means vulns can appear in NVD days before OSV, or vice versa. GeekWala queries multiple databases to minimize this gap, but propagation takes 24-48 hours. For C extension vulnerabilities (numpy, pillow, cryptography), NVD often leads — these may not appear in PyPI advisories at all.

Does GeekWala work with private PyPI repositories?

Not yet. GeekWala scans against public PyPI + OSV databases. Private packages are checked based on their declared version against known CVEs. If your private package wraps a public library, vulnerabilities in the public library are still detected. Enterprise private registry support is planned.

I have 50+ vulnerabilities — where do I start?

Filter for CISA KEV entries first, then sort by EPSS descending. For the full prioritization framework, see our EPSS deep dive. For Python specifically, also prioritize C extension libraries (cryptography, numpy, pillow) higher, since their vulnerabilities often involve memory safety issues that bypass Python's protections.

Can I use pip-audit --fix and then verify with GeekWala?

That's a solid workflow. pip-audit --fix attempts automatic resolution within your virtualenv. After fixing, regenerate your lock file and rescan with GeekWala to verify the fix didn't introduce new transitive vulnerabilities — which happens more often than you'd expect in deep Python dependency trees.

Does GeekWala support conda or mamba environments?

GeekWala doesn't parse environment.yml or conda lock files directly — conda uses its own package format and registry separate from PyPI. However, most conda environments also have pip-managed packages. Export those with pip freeze > requirements.txt (run from within your activated conda environment) and scan the requirements file. For pure conda packages (channels like conda-forge or defaults), use jake (Sonatype's open-source conda scanner) or the anaconda audit scan command if you're on the Anaconda commercial platform. Mamba works the same way since it's a drop-in conda replacement.


See which Python vulnerabilities are actually being exploited — not just which ones exist.

Upload your requirements.txt or poetry.lock → — get results enriched with EPSS exploitation scores and CISA KEV status in under a minute. No account needed.