Paul McLeod | System
Super Micro Computer, Inc.

Paul McLeod, System, Super Micro Computer, Inc.

Paul McLeod has over 20 years of experience in the storage and server industry including 14 years at Supermicro where he is currently a Product Director for storage servers.  He was previously a Sr. Field Applications Engineer at Supermicro and a Product Marketing Engineer at Promise Technology.  Paul was an early advocate of software defined storage using industry standards and has worked with numerous customers on designing large scale server and storage implementations.

Appearances:



Future of Memory and Storage - Day 1 @ 09:05

Reinventing Storage for Long-Context LLMs: Tiered KV Cache from HBM to NVMe

Explosive AI growth requires us to reinvent the rules of storage. As context windows and concurrent sessions grow, LLM inference is quietly hitting a wall where KV cache, not FLOPs, becomes the real performance bottleneck; and the traditional options (more GPUs, more HBM, shorter prompts) are all painfully expensive.In this session, Supermicro and Graid Technology present a tiered KV cache design that turns dense NVMe-backed GPU servers into a high-performance KV cache tier that lets you scale context, concurrency, and sessions per node without blowing up your GPU budget. Using Supermicro NVMe-dense GPU platforms with Graid SupremeRAID™, the architecture turns SSD into a high-throughput, resilient KV cache tier with full enterprise RAID protection (0/1/5/6/10).  We will also discuss the 5 tiers of KV cache storage and how the a large scale disaggregated inference workflow partitions the KV cache data.1.    HBM on GPUs2.    CPU DRAM on the storage server3.    Local SSD on the storage server 4.    KV cache storage using DPUs5.    Network storage which can be File or Object.

last published: 19/May/26 18:25 GMT

back to speakers

 

TO EXHIBIT OR SPONSOR

 

TO SPEAK

 

FMS website sponsored by XCena

 

Marketing & Press