GPXSheet security audit¶
Scope: the GPXSheet library, CLI, and especially the FastAPI web service
(src/gpxsheet/service/), reviewed against the Phase 2 audit checklist in
TODO.md ahead of
public-internet exposure.
Date: 2026-06-08. Audited at version 0.1.0.
Legend: Fixed = remediated in code; Mitigated = bounded but residual risk remains; Deployment = must be handled by the operator (documented, not enforceable in code).
Summary of changes made¶
| Area | Finding | Severity | Status |
|---|---|---|---|
| XML safety | XXE / entity-expansion if lxml backend is present | High | Fixed |
| AuthN | No authentication on any endpoint | High | Fixed (optional) |
| Upload validation | Only a byte cap; no content sniff or point cap | Medium | Fixed |
| SSRF / egress | Client-triggered outbound Overpass calls | Medium | Mitigated |
| Info leak | Worker exception text returned to clients | Medium | Fixed |
| Secrets | Default minioadmin creds could reach prod |
High | Fixed |
| Headers / CORS | No security headers; CORS undefined | Medium | Fixed |
| DoS | Synchronous endpoints; unbounded route size | Medium | Mitigated |
| Dependencies | No CVE gate in CI | Low | Fixed |
| Least privilege | Worker already runs non-root | — | OK |
1. XML upload safety (XXE / billion-laughs) — Fixed¶
gpxpy selects an XML backend at import time: lxml if installed, otherwise the
stdlib xml.etree/expat parser. Tested empirically:
- Stdlib backend (current default — lxml is not a dependency): external entities
raise
undefined entity, and Python 3.13+ expat enforces an input-amplification limit, so both XXE and billion-laughs are already blocked. - lxml backend: lxml's default parser sets
resolve_entities=True, so a malicious<!DOCTYPE>/<!ENTITY>would be a live XXE / entity-expansion vector. lxml can arrive transitively, so relying on "we don't install it" is fragile.
Fix: gpx.load_route() now rejects any document containing a <!DOCTYPE or
<!ENTITY declaration before parsing (gpx._reject_unsafe_xml). Legitimate
GPX never carries a DTD, so this is backend-independent defense in depth and
protects the CLI as well as the service. Covered by
tests/test_service_security.py::test_load_route_rejects_doctype.
2. Authentication & per-key quotas — Fixed (opt-in)¶
Previously every endpoint was unauthenticated. Added optional API-key auth:
set GPXSHEET_API_KEYS (comma-separated). When set, all /v1/* endpoints
require a matching X-API-Key or Authorization: Bearer <key>; the rate limiter
then keys on the API key instead of the client IP, giving per-key quotas. When
unset, the service stays open (single-process dev convenience).
Result downloads (/v1/jobs/{id}/result) require auth when keys are configured,
so rendered PDFs aren't world-readable by job ID. Job IDs remain unguessable
(uuid4), acting as a capability when auth is disabled.
Object-level authorization (BOLA): when keys are configured, a job records the
identity that created it and GET /v1/jobs/{id} and .../result return 404
(not 403, so IDs aren't confirmable) to any other key. The result cache is also
keyed by identity, so identical inputs from different keys never share a job.
Deployment: for a public deployment, set
GPXSHEET_API_KEYS. Distributed (Redis-backed) quotas are still future work — the limiter is per-process, so behind multiple API replicas the effective limit isreplicas × limit.
3. Upload validation beyond size — Fixed¶
read_gpx() previously enforced only GPXSHEET_MAX_UPLOAD_BYTES (25 MiB). Added:
- Content sniff: the payload must contain a
<gpxroot marker, else400. - Point cap: counts
<trkpt>/<rtept>tags and rejects routes aboveGPXSHEET_MAX_POINTS(default 500 000) with413, before parsing/rendering, so a monster route can't tie up a worker for minutes.
4. SSRF & egress from OSM/Overpass — Mitigated¶
OSM enrichment (enrich.py via osmnx) issues outbound HTTP to a fixed Overpass
endpoint; the client cannot inject the URL, so there is no classic SSRF
surface. The residual risk is amplification: enrichment runs on every request
and a large/dense route triggers slow outbound work.
Mitigations: rate limiting + the async job queue + a 20-minute Dramatiq
time_limit + the upload size/point caps bound it. Operators that must prevent
outbound OSM should restrict egress at the network layer.
Deployment: restrict the container's egress (network policy / firewall) to the Overpass and OSM tile/Nominatim hosts you actually use.
5. Information leakage in job errors — Fixed¶
process_job previously stored f"{type(exc).__name__}: {exc}" on the job,
which is returned to clients in JobStatus.error — that could leak temp paths,
Overpass URLs, or library internals. Now only ValueError (intentional
input-validation messages such as the DTD rejection) is echoed; any other
exception is logged server-side and reported to the client as a generic
"internal render error".
6. MinIO credentials & bucket policy — Fixed + Deployment¶
The compose file and settings.minio_config() default to minioadmin /
minioadmin. The production path (create_app when GPXSHEET_REDIS_URL is set)
now refuses to boot if either MinIO credential is still the default
(_guard_prod_secrets). Downloads use short-lived presigned GET URLs
(url_expiry_seconds=3600); the bucket is created without a public policy, so
objects are reachable only via those signed URLs.
Deployment: supply real
GPXSHEET_MINIO_ACCESS_KEY/_SECRET_KEYfrom a secret store; keep the bucket private; serve MinIO over TLS (GPXSHEET_MINIO_SECURE=1). Tighten presign expiry if download links are shared.
7. CORS & security headers — Fixed¶
Added a response-header middleware applying X-Content-Type-Options: nosniff,
X-Frame-Options: DENY, Referrer-Policy: no-referrer, and a strict
Content-Security-Policy: default-src 'none'; frame-ancestors 'none' (the API
serves only JSON/PDF/PNG). Strict-Transport-Security is sent when
GPXSHEET_ENABLE_HSTS=1 (enable only behind TLS). CORS is deny by default
(same-origin); set GPXSHEET_CORS_ORIGINS to an explicit allow-list for the
future browser front end — never * with credentials.
8. DoS / resource limits — Mitigated¶
- Render jobs run off the request path (Dramatiq) with a 20-minute
time_limitandmax_retries=0. - The point cap (§3) rejects oversized routes early.
- Every operation (render, analyze, validate) is now a queued job, so no request worker is held for the slow work on the prod path.
- Residual: the dev/eager path still runs jobs inline — a slow route can
occupy a request worker there. The byte + point caps bound this; for heavy
public use, set per-container memory limits (e.g. compose
mem_limit) plus a bounded queue.
Deployment: cap worker memory at the container level; run multiple workers behind a bounded queue; put a reverse-proxy request-body limit in front as a second line of defense for the byte cap.
9. Dependency CVEs — Fixed¶
pip-audit reports no known vulnerabilities at audit time. Added an audit job
to CI (.github/workflows/ci.yml) that installs the full extras and runs
pip-audit, so a newly disclosed CVE in a runtime dependency fails the build.
10. Least privilege — OK¶
The Docker image already creates and runs as a non-root app user. The worker
needs only Redis + MinIO; no extra capabilities are granted.
Operator checklist (public deployment)¶
- [ ] Set
GPXSHEET_API_KEYS; distribute keys out-of-band. - [ ] Set real MinIO credentials (boot guard enforces this on the prod path).
- [ ] Serve behind TLS; set
GPXSHEET_ENABLE_HSTS=1andGPXSHEET_MINIO_SECURE=1. - [ ] Set
GPXSHEET_CORS_ORIGINSonly if a browser front end needs it. - [ ] Restrict container egress to the Overpass/OSM hosts you use.
- [ ] Set a reverse-proxy body-size limit and per-container memory limits.
- [ ] Tune
GPXSHEET_RATE_LIMIT_PER_MIN,GPXSHEET_MAX_UPLOAD_BYTES,GPXSHEET_MAX_POINTSfor your traffic.