Raghu Vamsi is an Associate director at Samsung India, responsible for Architecting and System software development for CXL. He leads a team responsible for development of Processing Near Memory Architecture, Software stack and Solutions for CXL and AXDIMM, Host software and mass production solutions for CXL. He holds 5 U.S patents and authored publications. He holds Master Degree in Information technology from IIT Khargpur.
Processing‑Near‑Memory (PNM) computing mitigates memory‑bandwidth constraints in heterogeneous systems by attaching a CXL‑enabled PNM accelerator that offloads the vector‑similarity search of Retrieval‑Augmented Generation (RAG) pipelines. The design, implemented on an Intel Agilex 7 I‑Series FPGA‑SoC with a quad‑core ARM Cortex‑A53 CPU, DDR4 memory, and a CXL 2.0 (Gen‑4) interface, the design follows a dual‑scope execution model, a host‑resident orchestration kernel performs coarse index partitioning, while device‑resident fine‑search kernels execute highly vectorized, memory‑bound inner‑product/L2‑distance calculations directly on the CXL PNM Device. This approach leverages a CXL PNM hardware-adapted FAISS configuration and on-device vector read to compute the similarity search. Analytical evaluation on representative RAG workloads predicts a 3.32× speedup over CPU + CXL memory‑expander baselines and confirms a 100 % F1‑Score for nearest‑vector retrieval, validating the CXL‑based PNM micro‑architecture and its dual‑scope offload strategy for scalable acceleration of memory‑intensive RAG retrieval tasks.