a · snapshot
Snapshot cold starts
Model weights and Python state restored from a CUDA memory snapshot. Cold start at four-tenths of a second, regardless of model size up to 70 B.
00·Inference
Cinder is a serverless runtime for any model that fits on a GPU. Deploy with a single primitive. Scale down to zero between requests. Pay only when the model thinks.
Measured on Llama-3.1 8B across 28 days of production traffic. Hardware: H100 SXM5, 80 GB. Cinder telemetry, January 2026.
01·Runtime
A Cinder function is a Python file with a decorator and a model definition. The runtime handles bin-packing, autoscaling, and cold starts. Nothing else.
a · snapshot
Model weights and Python state restored from a CUDA memory snapshot. Cold start at four-tenths of a second, regardless of model size up to 70 B.
b · batching
Requests join an in-flight batch on arrival. P50 latency stays flat under load, then degrades gracefully past the GPU saturation point.
c · billing
You pay for the milliseconds a GPU is reserved for your function. Idle replicas are reclaimed inside three seconds. No minimum spend.
02·Code
A quickstart that runs against a real model in production. Copy it, paste it,
run cinder deploy.
~/reactor/main.py
build d3f1c8
01 import cinder
02 from transformers import AutoTokenizer, AutoModelForCausalLM
03
04 app = cinder.App("reactor")
05 gpu = cinder.GPU("H100", memory=80)
06
07 @app.cls(gpu=gpu, scaledown=3)
08 class Reactor:
09 @cinder.enter()
10 def load(self):
11 self.tok = AutoTokenizer.from_pretrained("meta/Llama-3.1-8B")
12 self.model = AutoModelForCausalLM.from_pretrained("meta/Llama-3.1-8B")
13
14 @cinder.method()
15 def think(self, prompt: str) -> str:
16 ids = self.tok.encode(prompt, return_tensors="pt")
17 out = self.model.generate(ids, max_new_tokens=512)
18 return self.tok.decode(out[0])
03·Pricing
Three plans. No reserved-capacity contracts. The free tier runs on shared A10G; paid tiers run on dedicated H100. Bring your own VPC on Enterprise.
Free
$0/mo
$0.00012 / sec on shared A10G after 30 GPU-hours.
Reactor
$0.0004/sec
Dedicated H100. Bin-packed across your fleet.
Enterprise
Talk · custom
Reserved capacity, BYO VPC, SLA on P99 latency.
04·Changelog
Snapshot resume. Cold starts on Llama-3.1 8B down to 380 ms — a 64 % drop vs. cold-load. Available on all paid tiers from today.
Continuous batching for vLLM. Throughput on H100 sustains 12 k rps for Llama-3.1 8B under saturating load.
H200 in preview. 141 GB memory per GPU. Enterprise preview opens via solutions engineering.
Region: Frankfurt. Cinder is now in eu-central-1 alongside us-east-1, us-west-2, and ap-northeast-1.