Two cost categories: API charges (variable; applies on any platform) and platform costs (compute infrastructure).
--gemini-ocr and --extract-entries both call the Gemini API. Pricing as of early 2026 (verify current rates at ai.google.dev/pricing):
| Stage | Model (default) | Input | Output |
|---|---|---|---|
--gemini-ocr |
gemini-2.0-flash |
$0.10 / 1M tokens | $0.40 / 1M tokens |
--extract-entries |
gemini-3.1-flash-lite-preview |
$0.25 / 1M tokens | $1.50 / 1M tokens |
| fallback (dense pages) | gemini-2.5-flash |
$0.30 / 1M tokens | $2.50 / 1M tokens |
A Green Book page generates roughly 2,000 input tokens and 1,000 output tokens for OCR (gemini-2.0-flash, ~$0.0006/page), and another ~10,000 input / 2,000 output tokens for entry extraction (gemini-3.1-flash-lite-preview, ~$0.0055/page) — about $0.006 per page combined. Dense pages that exceed the output token limit automatically retry with gemini-2.5-flash, but this affects fewer than 5% of pages in practice.
--generate-prompts makes 2 Gemini calls with 4–8 sample images each — a one-time per-volume cost of roughly $0.01–$0.05 total.
Rough collection estimates:
| Collection | Pages | Prompt generation | OCR | NER | Total |
|---|---|---|---|---|---|
| One Green Book volume | ~100 pages | ~$0.02 (one-time) | ~$0.06 | ~$0.55 | ~$0.61 |
| Full Green Books corpus (14 volumes) | ~1,400 pages | ~$0.02/volume | ~$0.84 | ~$7.70 | ~$8.54 |
| Large city directory (500+ pages) | 500 pages | ~$0.02 (one-time) | ~$0.30 | ~$2.75 | ~$3.05 |
Free tier: The Gemini API free tier (no billing required) covers both models at no charge, subject to rate limits of 15 requests/minute and ~1,500 requests/day for gemini-2.0-flash. A single 100-page volume (~200 API calls total) fits comfortably within a single day’s free quota, though the 15 RPM cap means the API stages take ~15–20 minutes rather than a few minutes. For a full multi-volume corpus you will either need billing enabled or spread the run across several days.
Chandra OCR (--chandra-ocr) uses a local 5B model — no API key or API cost. Requires GPU.
The --geocode stage uses Nominatim (free, city-level accuracy) by default. Setting GOOGLE_MAPS_API_KEY enables address-level geocoding at roughly $0.005/request. Google Maps includes a $200/month free credit, which covers ~40,000 geocoding requests — more than the entire Green Books corpus.
The stages that use significant compute are Surya OCR (--surya-ocr, --surya-detect, --review-alignment) and Chandra OCR (--chandra-ocr). Gemini API stages are network-bound and run equally fast everywhere.
| Platform | Cost | Surya OCR (200 pages) | Notes |
|---|---|---|---|
| Mac (M-series, 16 GB+) | $0 (electricity) | ~5–8 min (MPS, --batch-size 4) |
Good for development and single-volume runs |
| Mac (8 GB) | $0 | ~10–15 min (MPS, --batch-size 1–2) |
Works; reduce batch size if OOM errors occur |
| Google Colab (free T4) | $0 | ~2–3 min (CUDA, --batch-size 8) |
Sessions expire; T4 not always available; --review-alignment requires a tunnel (e.g. ngrok) |
| Google Colab Pro | ~$10/month | ~1–2 min (T4/L4, --batch-size 8) |
Reliable GPU access, longer sessions |
| Google Colab Pro+ | ~$50/month | <1 min (A100, --batch-size 16) |
Background execution; best for large multi-volume runs |
The pipeline is designed so the compute-heavy steps (Surya OCR, interactive alignment review) can run on a GPU machine while everything else (downloading, Gemini OCR, entry extraction, geocoding, map generation) runs fine on a laptop.
A ready-to-run Colab notebook covering Surya OCR, alignment, and review is in colab/ocr-align-review.ipynb.