AI Tools Prefill Is Compute-Bound. Decode Is Memory-Bound. Why Your GPU Shouldn’t Do Both. a large enterprise size a Kubernetes cluster for real-time inference on their customer-facing LLM product. We started with 64 H100…