Kyung Soo Lee is a senior Principal Engineer at SK hynix, a leading global semiconductor company. With over a decade of experience in the memory and storage technologies, he is a visionary leader driving innovation at the forefront of field. His expertise lies in developing emerging memory and storage solutions that shape the future of data storage and processing.
As LLMs scale to billions of parameters and handle complex, multi-turn workloads, inference efficiency is no longer determined solely by compute power — but by how intelligently KV cache is managed across memory and storage tiers. This talk explores a novel architecture that situates KV caching at the critical junction between GPU memory and hybrid storage. Using Linux volume groups and SPDK for NVMe over Fabrics, we treat SSD/HDD tiers as active memory extensions, not passive backends. Frequently accessed KV states remain in fast layers; less active data moves to cost-efficient storage — eliminating redundant attention recomputation. Integrated with the Dynamo KB Block Manager and dynamic logical volumes, this reduces time-to-first-token and power consumption, while easing GPU memory (HBM) pressure. Result: higher concurrency, more simultaneous users — without sacrificing responsiveness. The system adapts to real-time workload patterns, improving throughput and lowering operational cost. A practical, scalable solution for production LLM deployment.