Valery Polyakov is a Director of Computational Chemistry at SUTRO Biopharma. He works in the field of AI and Machine Learning leveraging Big Data to facilitate drug discovery. He also oversees computational infrastructure development for the needs of Chemistry. Prior to that, he worked at Novartis, developing new drugs for treating cancer and infectious diseases. A compound that he co-invented is now in Phase II of clinical trials. Before that, he worked for Sanofi in molecular modeling and chemoinformatics groups. He holds a Ph.D. in Chemistry from Kiev Tshevchenko University (Ukraine) and prestigious awards including the Fulbright Scholarship. Valery authored more than 36 published papers and 22 issued patents or published patent applications. In his free time Valery plays underwater hockey, underwater rugby, and freedives.
A method is presented for an ultra-fast shape-based search workflow for the screening of large compound collections, i.e., those of vendors. The three-dimensional shape of a molecule dictates its biological activity by enabling the molecule to fit into binding pockets of proteins. Quite often distinctly different chemical compounds that have similar shapes can bind in a similar way. OpenEye pioneered an algorithm for comparing shapes of molecules by overlaying them in the computer and measuring differences between a query molecule and a target molecule. Overlaying shapes is a computationally intensive process and represents a bottleneck in searching for similar molecules. More recent publications describe alternative methods of overlaying molecules, which is accomplished by comparing shape-based descriptors. These methods were implemented in ODDT package. We utilized a combination of open-source software packages like ODDT and RDkit to implement a workflow for ultra-fast conformer generation and matching that does not require storing pre-computed conformers on the file system or in memory. Moreover, the generated descriptors could be optionally stored in a MongoDB for performing searches in the future. To speed up the search, we created a set of indexes from the transformed shape-based descriptors. We are in the process of calculating descriptors for multiple vendors, including Enamine’s “REAL” collection of 1.2 billion compounds. Currently, the shape similarity search on more than 70 million compounds takes less than 8 seconds! We exemplified our methodology with the screen of compounds that can act as putative TLR4 agonists. The search was based on a literature known small molecule TLR4 agonist series. In due course, we identified compounds with novel structural motifs that were active in mouse and human TLR4 reporter cell lines.