Kyumin Park is a seasoned expert in the Memory Business at Samsung Electronics, with two decades of experience in system-level memory analysis and architecture. She specializes in evaluating next-generation memory solutions, particularly CXL-based products and composable memory systems. Throughout her career, she has been at the forefront of bridging hardware innovation with system-level performance, recently sharing her insights as a speaker at the OCP 2025 Global Summit. Her technical authority is further demonstrated by her research published in IEEE (2025), focusing on CXL memory optimization for cluster-based RAG systems. Currently, she leads the part which studies strategic performance analysis for emerging AI workloads, driving the development of scalable and efficient memory hierarchies for the data-centric era.
As AI paradigms shift from computing-centric to data-centric, memory architecture innovation has become a critical imperative. Modern LLM services demand not only HBM’s extreme bandwidth but also unprecedented capacity expansion for KV Cache and RAG workloads.In this session, we propose a CXL-based Heterogeneous Memory Hierarchy to address these challenges. We introduce a Computing Offloading mechanism at the memory device level to minimize data movement between the CPU and memory, significantly improving latency and effective bandwidth utilization—common bottlenecks in large-scale AI inference.Furthermore, we present a scalable capacity strategy using CXL Memory Pooling to transcend individual node limitations through dynamic resource allocation. Moving beyond theory, we provide empirical evaluation data from real-world environments, proving enhanced AI inference performance and resource efficiency over conventional architectures. Building on our previously published research in IEEE (2025) regarding RAG optimization, we conclude with practical architectural guidelines for the next generation of data-centric AI infrastructure.