David Talby is the Chief Executive Officer at John Snow Labs, helping companies apply artificial intelligence to solve real-world problems in healthcare and life science. David is the creator of Spark NLP – the world’s most widely used natural language processing library in the enterprise. He has extensive experience building and running web-scale software platforms and teams – in startups, for Microsoft’s Bing in the US and Europe, and to scale Amazon’s financial systems in Seattle and the UK. David holds a Ph.D. in Computer Science and Master’s degrees in both Computer Science and Business Administration. He was named USA CTO of the Year by the Global 100 Awards in 2022 and Game Changers Awards in 2023.
The integration of unstructured, multimodal healthcare data into standardized analytics-ready formats has traditionally required multiple disconnected projects: LLM pipelines, OCR/PDF tools, FHIR mappings, and clinical terminology APIs. This talk presents a new approach powered by Generative and Agentic AI that enables healthcare organizations to consolidate all raw, multimodal, longitudinal clinical data into a unified OMOP Common Data Model in a single automated pipeline. We describe the architecture behind John Snow Labs’ Patient Journeys platform, which combines document understanding, entity linking, code normalization, temporal reasoning, and longitudinal patient modeling – entirely within the customer’s private cloud environment. By integrating information from both structured sources and unstructured free-text notes, the resulting data model delivers far more accurate downstream calculations for medical risk scoring, clinical coding, care gap detection, and population health metrics – because all available clinical context is reconciled into one semantic layer. Attendees will learn how to build and deploy a unified LLM-based data engineering pipeline, evaluate its output, and operationalize it for real-world use cases.