Oleg is a systems architect at Sandisk with 14 years spanning SanDisk’s acquisition by Western Digital (2016) and Sandisk’s return as an independent company (2025), focused on NAND flash management algorithms and performance analysis. He holds B.S. and M.S. degrees in Computer Science from Belarusian State University of Informatics and Radioelectronics (BSUIR) and previously worked as a firmware engineer at Softeq Development.
Large language model inference is increasingly constrained by per-session KV cache growth in multi-turn and long-context workloads. Many KV-offload approaches treat SSD as a generic spillover tier and optimize for averages, leading to unpredictable tail latency, rehydration read amplification from fragmentation, SSD-unaware packing that turns continuation into many small reads, and read/write interference that destabilizes QoS.We present a storage-centric study of KV-cache persistence and rehydration using a behavioral simulator that models batched inference pipeline, captures host-staging backpressure, and makes layout/indexing first-class in read planning. We compare request-end flushing vs token-streaming persistence and alternative packing/placement policies that shape the SSD I/O stream. We translate I/O and latency results into SSD requirements (bandwidth, mixed-workload QoS, endurance) and deliver quantified rehydration costs plus actionable KV layout guidelines to reduce fragmentation and stabilize tail latency. Attendees will leave with an actionable framework for reasoning about KV offload tradeoffs and storage design priorities for LLM inference.