No body text on file.
Open the original to read the full piece.
Epoch AI released a public registry of Docker images for SWE-bench and applied targeted layering optimizations to cut storage and runtime costs. By reorganizing how images are built (three nominal stages: base, env, instance) and reusing large shared artifacts in deeper layers, the team reduced the unique-layer footprint of 2290 x8664 images to 67 GiB (from 684 GiB) and the 500-image SWE-bench Verified set to 30 GiB (from 189 GiB). Concrete changes included moving git clone operations into env so multiple instances share the .git history (example: a Django instance layer fell from 330 MB to 40 MB because .git accounted for 291 MB) and relocating heavy apt preinstalls (matplotlib) to env to shrink a 1.9 GB instance layer to ~110 MB. They also disabled pip caching across images with PIPNOCACHE_DIR=0 to remain compatible with ancient pip versions created by conda for older Python targets.
With the registry, Epoch ran SWE-bench Verified on a single GitHub Actions VM (32 cores, 128 GB RAM) in 62–73 minutes across several major models (gemini-2.0-flash-001: 62m; claude-3-7-sonnet-20250219: 63m; gpt-4o-2024-11-20: 70m), achieving roughly 8 seconds per sample and sustaining ~2M tokens/min under a 300k-token-per-sample cap. Images are hosted on GitHub Container Registry under ghcr.io/epoch-research/swe-bench.eval.., all tagged latest; x8664 coverage is 2290/2294 and arm64 is provided best-effort for 1819 images. Source commits and sizing scripts (getregistry_size.py) are available in Epoch’s SWE-bench repository.
Epoch AI published a public GitHub Container Registry of SWE-bench Docker images: 2290 x86_64 images built (2290/2294, 99.8%) and all 500 SWE-bench Verified images; registry optimized to 67 GiB for the full set (10x reduction from 684 GiB of unique original layers) and 30 GiB for the Verified subset (6x reduction from 189 GiB).
Open the original to read the full piece.