Rohan Mehta | Member of Technical Staff
Micron

Rohan Mehta is a Member of Technical Staff - Systems Performance Engineer at Micron Technology, where he works on the Data Center Workload Engineering team. Since 2025, his work has focused on application‑level tracing, performance characterization, and solutions architecting with modeling and data analysis to guide next‑generation SSD architecture and optimizations for hyperscale and enterprise workloads. He also serves as Vice‑Chair of SNIA's DSN Forum, where he helps conduct educational webinars and industry tech talks that provide leadership and practical guidance on end‑to‑end storage and data networking solutions.

Appearances:

Future of Memory and Storage - Day 1 @ 08:50

Optimizing KV Cache Offload for Scalable, Cost-Efficient AI Inference

KV cache is the attention "memory" LLMs build during prefill, enabling token generation during decode without recomputing prior context. As prompts grow longer and sessions become multi‑turn or agentic, KV cache size expands rapidly to include previous prompts and answers as "memory" for subsequent token generations, putting a strain on scarce GPU HBM and forcing tradeoffs across latency (TTFT), throughput, and cost. This pressure is driving tiered memory system architectures.

In this talk, we will share Micron's measured evidence that capacity‑centric designs, combined with targeted upgrades to lower‑cost tiers of fast, persistent storage, can materially improve both near‑term performance and long‑term returns on CapEx. We will also discuss drive‑level optimizations that complement some random and sequential access patterns of KV Cache I/Os. Attendees will take away actionable tiering heuristics grounded in balancing TCO across memory and storage hierarchy.

Rohan Mehta, Member of Technical Staff, Micron

last published: 19/May/26 18:25 GMT

back to speakers