Latent Space
Deep technical AI engineering content. The go-to podcast for AI builders.
187 episodes curated
Episodes
🔬Beyond AlphaFold: How Boltz is Open-Sourcing the Future of Drug Discovery
This podcast features Gabriele Corso and Jeremy Wohlwend , co-founders of Boltz and authors of the Boltz Manifesto , discussing the rapid evolution of structural biology models from AlphaFold to their own open-source suite, Boltz-1 and Boltz-2 . The central thesis is that while single-chain protein structure prediction is largely “solved” through evolutionary hints, the next frontier lies in modeling complex interactions (protein-ligand, protein-protein) and generative protein design , which Boltz aims to democratize via open-source foundations and scalable infrastructure. Full Video Pod On Yo
The First Mechanistic Interpretability Frontier Lab — Myra Deng & Mark Bissell of Goodfire AI
From Palantir and Two Sigma to building Goodfire into the poster-child for actionable mechanistic interpretability, Mark Bissell (Member of Technical Staff) and Myra Deng (Head of Product) are trying to turn “peeking inside the model” into a repeatable production workflow by shipping APIs, landing real enterprise deployments, and now scaling the bet with a recent $150M Series B funding round at a $1.25B valuation . In this episode, we go far beyond the usual “SAEs are cool” take. We talk about Goodfire’s core bet : that the AI lifecycle is still fundamentally broken because the only reliable c
🔬 Automating Science: World Models, Scientific Taste, Agent Loops — Andrew White
Editor’s note : Welcome to our new AI for Science pod, with your new hosts RJ and Brandon! See the writeup on Latent.Space (https://Latent.Space) for more details on why we’re launching 2 new pods this year. RJ Honicky is a co-founder and CTO at MiraOmics (https://miraomics.bio/) , building AI models and services for single cell, spatial transcriptomics and pathology slide analysis. Brandon Anderson builds AI systems for RNA drug discovery at Atomic AI ( https://atomic.ai ). Anything said on this podcast is his personal take — not Atomic’s.—From building molecular dynamics simulations at the U
Captaining IMO Gold, Deep Think, On-Policy RL, Feeling the AGI in Singapore — Yi Tay
From shipping Gemini Deep Think and IMO Gold to launching the Reasoning and AGI team in Singapore , Yi Tay has spent the last 18 months living through the full arc of Google DeepMind’s pivot from architecture research to RL-driven reasoning—watching his team go from a dozen researchers to 300+, training models that solve International Math Olympiad problems in a live competition, and building the infrastructure to scale deep thinking across every domain, and driving Gemini to the top of the leaderboards across every category. Yi Returns to dig into the inside story of the IMO effort and more!
Brex’s AI Hail Mary — With CTO James Reggio
From building internal AI labs to becoming CTO of Brex, James Reggio has helped lead one of the most disciplined AI transformations inside a real financial institution where compliance, auditability, and customer trust actually matter. We sat down with Reggio to unpack Brex’s three-pillar AI strategy (corporate, operational, and product AI) [ https://www.brex.com/journal/brex-ai-native-operations ], how SOP-driven agents beat overengineered RL in ops, why Brex lets employees “build their own AI stack” instead of picking winners [ https://www.conductorone.com/customers/brex/ ], and how a small,
Artificial Analysis: Independent LLM Evals as a Service — with George Cameron and Micah-Hill Smith
Happy New Year! You may have noticed that in 2025 we had moved toward YouTube as our primary podcasting platform. As we’ll explain in the next State of Latent Space post, we’ll be doubling down on Substack again and improving the experience for the over 100,000 of you who look out for our emails and website updates! We first mentioned Artificial Analysis in 2024, when it was still a side project in a Sydney basement. They then were one of the few Nat Friedman and Daniel Gross’ AIGrant companies to raise a full seed round from them and have now become the independent gold standard for AI benchm
[State of Evals] LMArena's $1.7B Vision — Anastasios Angelopoulos, LMArena
We are reupping this episode after LMArena announced their fresh Series A ( https://www.theinformation.com/articles/ai-evaluation-startup-lmarena-valued-1-7-billion-new-funding-round?rc=luxwz4 ), raising $150m at a $1.7B valuation, with $30M annualized consumption revenue (aka $2.5m MRR) after their September evals product launch. —- From building LMArena in a Berkeley basement to raising $100M and becoming the de facto leaderboard for frontier AI , Anastasios Angelopoulos returns to Latent Space to recap 2025 in one of the most influential platforms in AI—trusted by millions of users, every m
[NeurIPS Best Paper] 1000 Layer Networks for Self-Supervised RL — Kevin Wang et al, Princeton
From undergraduate research seminars at Princeton to winning Best Paper award at NeurIPS 2025 , Kevin Wang, Ishaan Javali, Michał Bortkiewicz, Tomasz Trzcinski, Benjamin Eysenbach defied conventional wisdom by scaling reinforcement learning networks to 1,000 layers deep —unlocking performance gains that the RL community thought impossible. We caught up with the team live at NeurIPS to dig into the story behind RL1000 : why deep networks have worked in language and vision but failed in RL for over a decade (spoiler: it’s not just about depth, it’s about the objective), how they discovered that
[State of Code Evals] After SWE-bench, Code Clash & SOTA Coding Benchmarks recap — John Yang
From creating SWE-bench in a Princeton basement to shipping CodeClash , SWE-bench Multimodal , and SWE-bench Multilingual , John Yang has spent the last year and a half watching his benchmark become the de facto standard for evaluating AI coding agents—trusted by Cognition (Devin), OpenAI, Anthropic, and every major lab racing to solve software engineering at scale. We caught up with John live at NeurIPS 2025 to dig into the state of code evals heading into 2026: why SWE-bench went from ignored (October 2023) to the industry standard after Devin’s launch (and how Walden emailed him two weeks b
[State of Post-Training] From GPT-4.1 to 5.1: RLVR, Agent & Token Efficiency — Josh McGrath, OpenAI
From pre-training data curation to shipping GPT-4o , o1 , o3 , and now GPT-5 thinking and the shopping model , Josh McGrath has lived through the full arc of OpenAI’s post-training evolution—from the PPO vs DPO debates of 2023 to today’s RLVR era, where the real innovation isn’t optimization methods but data quality, signal trust, and token efficiency . We sat down with Josh at NeurIPS 2025 to dig into the state of post-training heading into 2026: why RLHF and RLVR are both just policy gradient methods (the difference is the input data, not the math), how GRPO from DeepSeek Math was underappre
[State of RL/Reasoning] IMO/IOI Gold, OpenAI o3/GPT-5, and Cursor Composer — Ashvin Nair, Cursor
From Berkeley robotics and OpenAI’s 2017 Dota-era internship to shipping RL breakthroughs on GPT-4o, o1, and o3, and now leading model development at Cursor , Ashvin Nair has done it all. We caught up with Ashvin at NeurIPS 2025 to dig into the inside story of OpenAI’s reasoning team (spoiler: it went from a dozen people to 300+), why IOI Gold felt reachable in 2022 but somehow didn’t change the world when o1 actually achieved it, how RL doesn’t generalize beyond the training distribution (and why that means you need to bring economically useful tasks into distribution by co-designing products
[State of AI Startups] Memory/Learning, RL Envs & DBT-Fivetran — Sarah Catanzaro, Amplify
From investing through the modern data stack era (DBT, Fivetran, and the analytics explosion) to now investing at the frontier of AI infrastructure and applications at Amplify Partners , Sarah Catanzaro has spent years at the intersection of data, compute, and intelligence—watching categories emerge, merge, and occasionally disappoint. We caught up with Sarah live at NeurIPS 2025 to dig into the state of AI startups heading into 2026: why $100M+ seed rounds with no near-term roadmap are now the norm (and why that terrifies her), what the DBT-Fivetran merger really signals about the modern data
One Year of MCP — with David Soria Parra and AAIF leads from OpenAI, Goose, Linux Foundation
One year ago, Anthropic launched the Model Context Protocol (MCP) —a simple, open standard to connect AI applications to the data and tools they need. Today, MCP has exploded from a local-only experiment into the de facto protocol for agentic systems, adopted by OpenAI, Microsoft, Google, Block, and hundreds of enterprises building internal agents at scale. And now, MCP is joining the newly formed Agentic AI Foundation (AAIF) under the Linux Foundation, alongside Block’s Goose coding agent, with founding members spanning the biggest names in AI and cloud infrastructure. We sat down with David
Steve Yegge's Vibe Coding Manifesto: Why Claude Code Isn't It & What Comes After the IDE
Note: Steve and Gene’s talk on Vibe Coding and the post IDE world was one of the top talks of AIE CODE: From building legendary platforms at Google and Amazon to authoring one of the most influential essays on AI-powered development ( Revenge of the Junior Developer , quoted by Dario Amodei himself), Steve Yegge has spent decades at the frontier of software engineering—and now he’s leading the charge into what he calls the “factory farming” era of code. After stints at SourceGraph and building Beads (a purely vibe-coded issue tracker with tens of thousands of users), Steve co-authored The Vibe
⚡️GPT5-Codex-Max: Training Agents with Personality, Tools & Trust — Brian Fioca + Bill Chen, OpenAI
From the frontlines of OpenAI’s Codex and GPT-5 training teams, Bryan and Bill are building the future of AI-powered coding—where agents don’t just autocomplete, they architect, refactor, and ship entire features while you sleep. We caught up with them at AI Engineer Conference right after the launch of Codex Max , OpenAI’s newest long-running coding agent designed to work for 24+ hours straight, manage its own context, and spawn sub-agents to parallelize work across your entire codebase. We sat down with Bryan and Bill to dig into what it actually takes to train a model that developers trust