Tejas Chopra | Sr. Software Engineer
Netflix

Tejas Chopra is a Sr. Engineer working on ML & AI Platform at Netflix. He is also the creator of Headroom - a Context optimization platform, founder of EnsolAI, 2x TEDx speaker, and has built a career in Distributed Systems and Cloud Infrastructure. Tejas holds a Masters degree in Electrical and Computer Engineering from Carnegie Mellon University.

Appearances:

Future of Memory and Storage - Day 3 @ 08:55

S Reducing LLM KV Cache Memory 80% Through Software Context Compression

Every LLM input token creates key-value cache entries in GPU HBM during inference. Today's AI agents routinely push contexts to 100K+ tokens, consuming gigabytes of KV cache per request. The industry response has been hardware-centric: more HBM, KV cache offloading to flash, CXL-attached memory. These approaches address the symptom. Context compression addresses the cause.

This talk presents Headroom, an open-source system (460+ GitHub stars) that compresses LLM input tokens by up to 80% before inference — directly shrinking KV cache footprint by the same ratio. A 128K-token agent context drops from ~4GB to ~800MB, enabling 5x concurrency on identical hardware. The compression is deterministic, model-agnostic, adds <100ms latency, and is fully complementary to hardware-layer solutions — paged attention and flash-based offloading both benefit when operating on already-compressed context.

Tejas Chopra, Sr. Software Engineer, Netflix

last published: 19/May/26 18:25 GMT

back to speakers