Raj Uppala is the Sr. Director of Marketing & Partnerships at Rambus for the Silicon IP business unit. Prior to Rambus, Raj held several roles at Western Digital in product management, product marketing, and ecosystem partnerships, for the Hard Disk Drive (HDD) product line and a Smart Video product line encompassing Cameras, AI analytics, and Video Management System delivered as a service. Raj began his career designing memory and mixed-signal IC’s, subsequently transitioning to marketing and product line management roles across a few Semiconductor companies. He holds an MBA from Cornell University and a MS in EE from Mississippi State University.
AI inference spans diverse workloads, from low‑latency chat to long‑context reasoning and large‑scale recommendations—making single, monolithic accelerator and memory designs increasingly inefficient. This talk explains how inference naturally splits into prefill and decode stages with fundamentally different bottlenecks: prefill is compute‑bound, while decode is dominated by memory bandwidth and latency. By matching memory technologies to each stage, using cost‑efficient GDDR or LPDDR for prefill and reserving premium HBM for decode, with pooled memory for KV offload, operators can significantly reduce cost per token without sacrificing latency. The session outlines emerging disaggregated architectures for AI inference workloads.