Shipping a 227 MB database and a 1.9 GB image to the public for $0/month
PantryAtlas ships two big files to anyone who wants them: a ~227 MB prebuilt recipe database (so you skip a multi-hour local ingest) and a ~1.9 GB SD card image (so you can flash a Pi and go). Both are public, both are versioned, and the whole arrangement currently costs me $0 a month — even though it’s built to survive being downloaded a lot.
This is the post about how, because “how do I serve big files to the public without a surprise bill?” is a question every side project eventually hits, and the answer has a genuinely surprising shape.
The cost model nobody believes at first
The artifacts live in a Cloudflare R2 bucket behind a custom domain,
dl.pantryatlas.org. The thing that makes R2 the right tool — and that reframes
the entire “what if people abuse my downloads?” worry — is its pricing:
R2 charges nothing for egress. Bandwidth out is free, at any volume.
Let that land. The classic fear with public downloads is the bandwidth bill: a file gets popular (or someone hammers it), terabytes go out, and S3 hands you an invoice with a comma you didn’t expect. On R2, that line item is zero dollars by design.
What R2 does meter (numbers current as of this writing, per Cloudflare):
| What | Price | Free tier / month |
|---|---|---|
| Egress / bandwidth | free | — |
| Class B ops (reads/GET) | $0.36 / million | 10 million |
| Class A ops (writes/PUT) | $4.50 / million | 1 million |
| Storage | $0.015 / GB-month | 10 GB-month |
My bucket holds 7 objects totalling 2.29 GB — under the 10 GB-month free storage tier. Writes happen only when I publish a release. Reads are free up to 10 million a month, and bandwidth is free, period. Add it up and the bill rounds to nothing.
”Download abuse” is bounded by construction
So what’s the actual worst case if someone decides to be a jerk and pull the files on a loop?
On a traditional object store, the answer is “an unbounded bandwidth bill.” On R2, the only meter that moves is Class B operations — one per uncached GET — at $0.36 per million, with the first 10 million free every month. To spend even a single dollar beyond the free tier, an abuser would need to issue ~13 million extra requests in a month. The bytes themselves — gigabytes, terabytes — cost nothing.
The exposure is real but small and bounded, which is the opposite of the open-ended bandwidth risk people instinctively brace for. That reframe is the whole point: on R2 you’re not defending a bank vault, you’re tending a tip jar.
The one setting that actually matters
If you take one operational thing from this post: when you put a custom domain on
an R2 bucket, disable the bucket’s r2.dev public URL.
The r2.dev URL is the convenient development endpoint — but it bypasses
Cloudflare’s CDN, so every hit is an uncached origin read, and it’s a second public
door you have to reason about. Once dl.pantryatlas.org is live, the r2.dev
door should be shut. I verified mine is:
$ wrangler r2 bucket dev-url get pantryatlas-artifacts
Public access via the r2.dev URL is disabled.
With that closed, every download flows through the custom domain, which means it
flows through Cloudflare’s cache. And because I name artifacts immutably and by
version — recipes-v0.2.0.db, never recipes-latest.db — the cache can hold
them effectively forever. A popular release gets served from the edge, and most
requests never touch R2 at all (so they don’t even count as Class B reads).
One honest caveat: Cloudflare’s Free and Pro plans cap cached objects at 512 MB. The 227 MB database caches beautifully; the 1.9 GB image is over the ceiling, so it passes through to R2 on each download. That’s completely fine — egress is still free, and it’s one Class B op per download. But it’s why I don’t claim “everything serves from the edge.” The DB does; the image rides the free-egress origin path.
Belt and suspenders (none of it required)
Because the baseline is already cheap and bounded, the extra hardening is optional — “nice to have,” not “you’re exposed.” If you want it, three dashboard moves:
- A Cache Rule with a long edge TTL on
dl.pantryatlas.orgto make edge-caching of the cacheable artifacts explicit and aggressive. - A rate-limiting rule on the download path — Cloudflare’s free plan includes one — to throttle a single IP hammering the endpoint.
- R2 usage notifications. Cloudflare has no hard spend cap, so billing/usage alerts on Class B ops and storage are the real backstop. Set them and forget the whole thing.
The gotchas I actually hit
Getting the artifacts up there had its own surprises, mostly around size:
wranglerwon’t upload past ~300 MiB. The recipe DB squeaks under, but for the 1.9 GB image,wrangler r2 object putisn’t the tool. I publish that one withrcloneagainst R2’s S3-compatible endpoint, which does proper multipart uploads. The release scripts and runbook live inops/release/.- Verify end to end, both directions. Every artifact ships with a published
sha256 in a JSON manifest, and the Pi’s bootstrap downloads to a temp file,
checks the hash, and only then atomically moves it into place. A truncated or
corrupted 1.9 GB download fails closed instead of half-installing. You can see the
live manifest at
dl.pantryatlas.org/db/recipes-latest.json— 49,965 recipes, 237,912,064 bytes, hash and all. - Two tracks, one source of truth. The hosted DB and the SD image are separate release artifacts but pin to the same manifest, so “what’s the current version?” has exactly one answer.
The takeaway
If you’ve got a large public artifact and you’ve been nervous about download bills,
R2’s free egress changes the calculation more than any clever caching trick. Pick
immutable versioned names, put a custom domain in front, shut the r2.dev
door, set a billing alert, and get on with building the thing.
PantryAtlas itself — the local-first recipe brain all of this is in service of — is at pantryatlas.org, open source at github.com/PantryAtlas/pantryatlas. The launch post is the what-and-why; the engineering post is how you rank 50,000 recipes on a Pi.