He is a Distinguished Engineer at SK hynix with extensive experience in system architecture and performance analysis of DRAM-based server platforms and emerging memory solutions. His work spans CXL memory (expansion and pooling), PIM, and MRDIMM to enable scalable data center systems. More recently, he has driven performance characterization of large-scale LLM inference and AI agent systems, focusing on memory bottlenecks, KV cache dynamics, and end-to-end system optimization.
This presentation explores how CXL pooling/sharing could enable KV cache sharing across a memory hierarchy that includes VRAM, local DRAM, local SSD, and an ICMS-like tier. We focus on latency-sensitive and memory-capacity-hungry inference patterns (e.g., multi-turn serving, multi-adapter workloads) where KV reuse and prefix overlap are prominent.The talk is concept-driven and grounded in published literature and public reports. We summarize expected benefits, outline deployment constraints (ecosystem maturity, correctness/coherence boundaries, software support), and discuss how to prioritize a deployable subset of CXL capabilities rather than assuming “full spec implementation” is always optimal.