David Nguyen

Life notes

A collection of thoughts, experiences, and life updates.

Latest

Designing a Multi-Tenant LLM Inference Platform, Part 2

Scaling a serving cell when cold starts take minutes: sizing warm spare from forecast error, model-local standby, draining, and failing honestly when the KV cache is gone.

More posts
2026