Skip to main content

Shipping a 227 MB database and a 1.9 GB image to the public for $0/month

6 min read By Craig Merry

PantryAtlas ships two big files to anyone who wants them: a ~227 MB prebuilt recipe database (so you skip a multi-hour local ingest) and a ~1.9 GB SD card image (so you can flash a Pi and go). Both are public, both are versioned, and the whole arrangement currently costs me $0 a month — even though it’s built to survive being downloaded a lot.

This is the post about how, because “how do I serve big files to the public without a surprise bill?” is a question every side project eventually hits, and the answer has a genuinely surprising shape.

The cost model nobody believes at first

The artifacts live in a Cloudflare R2 bucket behind a custom domain, dl.pantryatlas.org. The thing that makes R2 the right tool — and that reframes the entire “what if people abuse my downloads?” worry — is its pricing:

R2 charges nothing for egress. Bandwidth out is free, at any volume.

Let that land. The classic fear with public downloads is the bandwidth bill: a file gets popular (or someone hammers it), terabytes go out, and S3 hands you an invoice with a comma you didn’t expect. On R2, that line item is zero dollars by design.

What R2 does meter (numbers current as of this writing, per Cloudflare):

WhatPriceFree tier / month
Egress / bandwidthfree
Class B ops (reads/GET)$0.36 / million10 million
Class A ops (writes/PUT)$4.50 / million1 million
Storage$0.015 / GB-month10 GB-month

My bucket holds 7 objects totalling 2.29 GB — under the 10 GB-month free storage tier. Writes happen only when I publish a release. Reads are free up to 10 million a month, and bandwidth is free, period. Add it up and the bill rounds to nothing.

”Download abuse” is bounded by construction

So what’s the actual worst case if someone decides to be a jerk and pull the files on a loop?

On a traditional object store, the answer is “an unbounded bandwidth bill.” On R2, the only meter that moves is Class B operations — one per uncached GET — at $0.36 per million, with the first 10 million free every month. To spend even a single dollar beyond the free tier, an abuser would need to issue ~13 million extra requests in a month. The bytes themselves — gigabytes, terabytes — cost nothing.

The exposure is real but small and bounded, which is the opposite of the open-ended bandwidth risk people instinctively brace for. That reframe is the whole point: on R2 you’re not defending a bank vault, you’re tending a tip jar.

The one setting that actually matters

If you take one operational thing from this post: when you put a custom domain on an R2 bucket, disable the bucket’s r2.dev public URL.

The r2.dev URL is the convenient development endpoint — but it bypasses Cloudflare’s CDN, so every hit is an uncached origin read, and it’s a second public door you have to reason about. Once dl.pantryatlas.org is live, the r2.dev door should be shut. I verified mine is:

$ wrangler r2 bucket dev-url get pantryatlas-artifacts
Public access via the r2.dev URL is disabled.

With that closed, every download flows through the custom domain, which means it flows through Cloudflare’s cache. And because I name artifacts immutably and by versionrecipes-v0.2.0.db, never recipes-latest.db — the cache can hold them effectively forever. A popular release gets served from the edge, and most requests never touch R2 at all (so they don’t even count as Class B reads).

One honest caveat: Cloudflare’s Free and Pro plans cap cached objects at 512 MB. The 227 MB database caches beautifully; the 1.9 GB image is over the ceiling, so it passes through to R2 on each download. That’s completely fine — egress is still free, and it’s one Class B op per download. But it’s why I don’t claim “everything serves from the edge.” The DB does; the image rides the free-egress origin path.

Belt and suspenders (none of it required)

Because the baseline is already cheap and bounded, the extra hardening is optional — “nice to have,” not “you’re exposed.” If you want it, three dashboard moves:

  • A Cache Rule with a long edge TTL on dl.pantryatlas.org to make edge-caching of the cacheable artifacts explicit and aggressive.
  • A rate-limiting rule on the download path — Cloudflare’s free plan includes one — to throttle a single IP hammering the endpoint.
  • R2 usage notifications. Cloudflare has no hard spend cap, so billing/usage alerts on Class B ops and storage are the real backstop. Set them and forget the whole thing.

The gotchas I actually hit

Getting the artifacts up there had its own surprises, mostly around size:

  • wrangler won’t upload past ~300 MiB. The recipe DB squeaks under, but for the 1.9 GB image, wrangler r2 object put isn’t the tool. I publish that one with rclone against R2’s S3-compatible endpoint, which does proper multipart uploads. The release scripts and runbook live in ops/release/.
  • Verify end to end, both directions. Every artifact ships with a published sha256 in a JSON manifest, and the Pi’s bootstrap downloads to a temp file, checks the hash, and only then atomically moves it into place. A truncated or corrupted 1.9 GB download fails closed instead of half-installing. You can see the live manifest at dl.pantryatlas.org/db/recipes-latest.json — 49,965 recipes, 237,912,064 bytes, hash and all.
  • Two tracks, one source of truth. The hosted DB and the SD image are separate release artifacts but pin to the same manifest, so “what’s the current version?” has exactly one answer.

The takeaway

If you’ve got a large public artifact and you’ve been nervous about download bills, R2’s free egress changes the calculation more than any clever caching trick. Pick immutable versioned names, put a custom domain in front, shut the r2.dev door, set a billing alert, and get on with building the thing.

PantryAtlas itself — the local-first recipe brain all of this is in service of — is at pantryatlas.org, open source at github.com/PantryAtlas/pantryatlas. The launch post is the what-and-why; the engineering post is how you rank 50,000 recipes on a Pi.