Hardware Systems Engineer at Meta working on design of AI and Compute systems.
Compute Express Link (CXL) promises to unlock memory disaggregation and composability at hyperscale, but deploying it in production fleets introduces a new class of system-level challenges. In this talk, we share practical insights from hyperscale environments on where CXL meets reality—covering issues such as latency variability, reliability at scale, firmware/software maturity, and observability gaps. We will discuss how these challenges impact large-scale AI and memory-intensive workloads, and outline the validation and design strategies required to make CXL viable in production data centers.
Hyperscalers are increasingly deploying QLC flash in capacity-oriented tiers, leveraging software intelligence and workload segmentation to maximize efficiency at scale. In large fleet environments, QLC behavior is shaped less by device limitations and more by system-level orchestration, write management, and traffic smoothing. At scale, its performance profile stabilizes under controlled workloads, enabling predictable latency and cost efficiency. This reflects a broader trend where hyperscale architecture, rather than raw media characteristics, defines real-world flash behavior.