Ugur Kaynar is a Distinguished Engineer leading AI inference acceleration in the Dell AI Data Platform, driving innovation across storage connectors, protocols, data paths, and end‑to‑end I/O pipelines for large‑scale AI systems. She advances storage technologies for AI workloads and builds an open AI ecosystem through partnerships and community engagement. Her work spans storage–compute integration and open‑source collaboration. She holds a PhD in distributed caching for object stores.
Transformer-based generative AI has turned the key/value (KV) cache into one of the largest and most performance-critical working sets in modern AI systems. As context windows grow and request concurrency rises, KVCache capacity and bandwidth increasingly determine latency, throughput, and total cost; often driving decisions around GPU/HBM sizing, host memory, and storage tiering. This session brings together system builders and memory/storage architects to examine KVCache management end to end: data layout and access patterns; paging, allocation, and eviction; compression and quantization; multi-GPU and multi-node sharing; tiering and offload to host DRAM and NVMe/SSD; and reliability, isolation, and security considerations in multi-tenant deployments. We will connect software techniques to emerging hardware directions (e.g., higher-bandwidth memory, pooling/tiering, and disaggregated memory/storage) and highlight where cross-layer co-design is needed. Attendees will leave with a practical taxonomy of KVCache techniques, guidance on when to use each approach, and a set of metrics and workload characteristics to evaluate solutions in production.