I am a director leading the planning team for CXL DRAM products and working on CXL-based DRAM solutions from device-level to pooling-system products.
CXL memory is evolving beyond simple capacity expansion for specific workload toward a pooled architecture that enables more flexible and scalable memory disaggregation. By decoupling memory from individual hosts, CXL-based pooling introduces a shared, dynamically allocatable memory layer across heterogeneous compute nodes. This architectural shift provides a practical foundation for tiered memory systems that can transparently support KV cache offloading in large-scale AI inference workloads. In particular, CXL memory pooling enables intermediate-latency tiers between local DRAM and remote storage, optimizing both cost efficiency and performance. As a result, CXL-based tiering emerges as a key enabler for scalable and memory-efficient deployment of next-generation large language models.