William Jo is a senior SoC, storage, and memory systems architect with over 18 years of experience spanning SSD firmware, storage controller architecture, FPGA prototyping, RTL development, and hardware/software co-design. He has led advanced architecture efforts across NVMe, PCIe, CXL, Ethernet, LPDDR, and AI storage infrastructure, with a strong focus on translating forward-looking research concepts into practical, product-relevant system designs. His background includes senior technical leadership roles at Samsung and Solidigm, where he worked on SSD architecture, storage virtualization, controller scheduling, firmware platforms, and next-generation memory and storage systems. Currently, William leads architecture and FPGA/RTL development for advanced storage and memory programs, including network-attached direct storage platforms designed to overcome DPU-limited bandwidth paths in AI infrastructure, as well as LPDDR6-focused bridge SoC architecture for next-generation memory test systems. His work combines deep knowledge of storage, memory hierarchy, PCIe/NVMe systems, and performance optimization with hands-on prototyping using FPGA, RTL, Python, C++, SystemC, and system-level modeling. William’s recent work also includes architecture for computational storage and AI data-path acceleration, including near-storage vector processing, NVMe-based compute offload concepts, and FP32 vector compression using ZFP-V with approximately 6x data reduction. These efforts are aimed at reducing host data movement, improving bandwidth efficiency, and enabling more scalable memory and storage tiers for large-scale AI and LLM inference workloads. He holds a Ph.D. in Electrical and Electronic Engineering from Yonsei University and has authored multiple patents and publications in storage systems, SSD scheduling, metadata protection, and high-performance data processing. William is known for bridging low-level hardware architecture with system-level software requirements, leading cross-functional teams, and defining practical architectures for emerging storage, memory, and AI infrastructure platforms.
Large-scale approximate nearest neighbor (ANN) search for high-dimensional vectors (> 1k dimension) at billion-scale datasets faces fundamental system bottlenecks. Distance computation is constrained by limited DRAM capacity and bandwidth, while frequent movement of vectors between storage, memory, and CPU leads to high latency, energy computation, and excessive memory footprint. These challenges restrict scalability and throughput for modern AI and retrieval workloads. We propose a storage-compute co-design approach that offloads distance computation to NVMe devices via an extended compute command. Instead of transferring full vectors to the host, the SSD performs L2 or cosine similarity calculations internally and returns compact returns, significantly reducing PCIe traffic and DRAM bandwidth pressure. By minimizing host-side data movement and compute overhead, this architecture achieves 2-3x search throughput improvement in large-scale ANN workloads while maintaining recall accuracy. Additionally, decoupling vector storage from host-resident processing reduces index memory occupation in graph-based ANN methods such as HNSW.