Khayam Anjam is a Staff Engineer at Micron Technology. As part of Micron’s Data Center Workload Engineering team, he works on LPDDR and GDDR memory—characterizing and optimizing modern AI workloads across GPU-accelerated and data center platforms. He also contributes to technical papers, internal reports, and customer-facing collateral. Prior to Micron, Khayam designed AI systems for computer vision and anomaly detection. At Dell Technologies, he authored six patents while contributing to ML, analytics, and reliability initiatives. He brings 10+ years of industry experience spanning performance engineering, generative AI, and distributed systems. Khayam holds an M.S. in Computer Engineering from Clemson University.
As Large Language Models (LLMs) push context windows into the millions of tokens and serve growing numbers of concurrent users, memory capacity has become a dominant constraint for scalable inference. This talk presents findings demonstrating how larger‑capacity LPDDR, when paired with HBM in unified memory architectures such as NVIDIA’s GH200 systems, enables efficient key‑value (KV) cache offload. The result is a substantial improvement in inference scalability and responsiveness, driven by LPDDR’s combination of large capacity, low power, and high efficiency. The talk will present quantitative results highlighting how LPDDR‑backed KV‑cache offload increases achievable context length, boosts concurrent user throughput and supports a larger number of simultaneous clients—all while benefiting from LPDDR’s inherently superior energy efficiency.