Latent Space
Deep technical AI engineering content. The go-to podcast for AI builders.
187 episodes curated
Episodes
The new Claude 3.5 Sonnet, Computer Use, and Building SOTA Agents — with Erik Schluntz, Anthropic
We have announced our first speaker , friend of the show Dylan Patel, and topic slates for Latent Space LIVE! at NeurIPS. Sign up for IRL/Livestream and to debate ! We are still taking questions for our next big recap episode! Submit questions and messages on Speakpipe here for a chance to appear on the show! The vibe shift we observed in July - in favor of Claude 3.5 Sonnet, first introduced in June — has been remarkably long lived and persistent, surviving multiple subsequent updates of 4o, o1 and Gemini versions, for Anthropic’s Claude to end 2024 as the preferred model for AI Engineers and
Why Compound AI + Open Source will beat Closed AI
We have a full slate of upcoming events : AI Engineer London, AWS Re:Invent in Las Vegas, and now Latent Space LIVE! at NeurIPS in Vancouver and online. Sign up to join and speak ! We are still taking questions for our next big recap episode! Submit questions and messages on Speakpipe here for a chance to appear on the show! We try to stay close to the inference providers as part of our coverage, as our podcasts with Together AI and Replicate will attest: However one of the most notable pull quotes from our very well received Braintrust episode was his opinion that open source model adoption h
Agents @ Work: Lindy.ai
Alessio will be at AWS re:Invent next week and hosting a casual coffee meetup on Wednesday, RSVP here! And subscribe to our calendar for our Singapore, NeurIPS, and all upcoming meetups! We are still taking questions for our next big recap episode! Submit questions and messages on Speakpipe here for a chance to appear on the show! If you've been following the AI agents space, you have heard of Lindy AI; while founder Flo Crivello is hesitant to call it "blowing up," when folks like Andrew Wilkinson start obsessing over your product, you're definitely onto something. In our latest episode, Flo
Agents @ Work: Dust.tt
We are recording our next big recap episode and taking questions! Submit questions and messages on Speakpipe here for a chance to appear on the show! Also subscribe to our calendar for our Singapore, NeurIPS, and all upcoming meetups! In our first ever episode with Logan Kilpatrick we called out the two hottest LLM frameworks at the time: LangChain and Dust. We’ve had Harrison from LangChain on twice ( as a guest and as a co-host ), and we’ve now finally come full circle as Stanislas from Dust joined us in the studio. After stints at Oracle and Stripe, Stan had joined OpenAI to work on mathema
In the Arena: How LMSys changed LLM Benchmarking Forever
Apologies for lower audio quality; we lost recordings and had to use backup tracks. Our guests today are Anastasios Angelopoulos and Wei-Lin Chiang , leads of Chatbot Arena, fka LMSYS, the crowdsourced AI evaluation platform developed by the LMSys student club at Berkeley, which became the de facto standard for comparing language models. Arena Elo is often more cited than MMLU scores to many folks, and they have attracted >1,000,000 people to cast votes since its launch, leading top model trainers to cite them over their own formal academic benchmarks: The Limits of Static Benchmarks We’ve don
How NotebookLM Was Made
If you’ve listened to the podcast for a while, you might have heard our ElevenLabs-powered AI co-host Charlie a few times. Text-to-speech has made amazing progress in the last 18 months, with OpenAI’s Advanced Voice Mode (aka “Her”) as a sneak peek of the future of AI interactions (see our “Building AGI in Real Time” recap). Yet, we had yet to see a real killer app for AI voice ( not counting music ). Today’s guests, Raiza Martin and Usama Bin Shafqat , are the lead PM and AI engineer behind the NotebookLM feature flag that gave us the first viral AI voice experience, the “Deep Dive” podcast:
Building the AI Engineer Nation — with Josephine Teo, Minister of Digital Development and Information, Singapore
Singapore's GovTech is hosting an AI CTF challenge with ~$15,000 in prizes, starting October 26th, open to both local and virtual hackers. It will be hosted on Dreadnode's Crucible platform; signup here ! It is common to say if you want to work in AI, you should come to San Francisco. Not everyone can. Not everyone should. If you can only do meaningful AI work in one city, then AI has failed to generalize meaningfully . As non-Americans working in the US, we know what it’s like to see AI progress so rapidly here, and yet be at a loss for what our home countries can do. Through Latent Space we’
Building the Silicon Brain - with Drew Houston of Dropbox
CEOs of publicly traded companies are often in the news talking about their new AI initiatives, but few of them have built anything with it. Drew Houston from Dropbox is different; he has spent over 400 hours coding with LLMs in the last year and is now refocusing his 2,500+ employees around this new way of working, 17 years after founding the company. Timestamps 00:00 Introductions 00:43 Drew's AI journey 04:14 Revalidating expectations of AI 08:23 Simulation in self-driving vs. knowledge work 12:14 Drew's AI Engineering setup 15:24 RAG vs. long context in AI models 18:06 From "FileGPT" to Dr
Production AI Engineering starts with Evals — with Ankur Goyal of Braintrust
We are in 🗽 NYC this Monday! Join the AI Eng NYC meetup , bring demos and vibes! It is a bit of a meme that the first thing developer tooling founders think to build in AI is all the non-AI operational stuff outside the AI. There are well over 60 funded LLM Ops startups all with hoping to solve the new observability, cost tracking, security, and reliability problems that come with putting LLMs in production, not to mention new LLM oriented products from incumbent, established ops/o11y players like Datadog and Weights & Biases. 2 years in to the current hype cycle, the early winners have tende
Building AGI in Real Time (OpenAI Dev Day 2024)
We all have fond memories of the first Dev Day in 2023 : and the blip that followed soon after. As Ben Thompson has noted , this year’s DevDay took a quieter, more intimate tone. No Satya, no livestream, (slightly fewer people?). Instead of putting ChatGPT announcements in DevDay as in 2023, o1 was announced 2 weeks prior, and DevDay 2024 was reserved purely for developer-facing API announcements, primarily the Realtime API, Vision Finetuning, Prompt Caching, and Model Distillation . However the larger venue and more spread out schedule did allow a lot more hallway conversations with attendees
Language Agents: From Reasoning to Acting
OpenAI DevDay is almost here ! Per tradition, we are hosting a DevDay pregame event for everyone coming to town! Join us with demos and gossip! Also sign up for related events across San Francisco: the AI DevTools Night , the xAI open house , the Replicate art show , the DevDay Watch Party (for non-attendees), Hack Night with OpenAI at Cloudflare . For everyone else, join the Latent Space Discord for our online watch party and find fellow AI Engineers in your city. OpenAI’s recent o1 release (and Reflection 70b debacle ) has reignited broad interest in agentic general reasoning and tree search
The Ultimate Guide to Prompting
Noah Hein from Latent Space University is finally launching with a free lightning course this Sunday for those new to AI Engineering. Tell a friend! Did you know there are >1,600 papers on arXiv just about prompting ? Between shots, trees, chains, self-criticism, planning strategies, and all sorts of other weird names, it’s hard to keep up. Luckily for us, Sander Schulhoff and team read them all and put together The Prompt Report as the ultimate prompt engineering reference, which we’ll break down step-by-step in today’s episode. In 2022 swyx wrote “Why “Prompt Engineering” and “Generative AI”
From API to AGI: Structured Outputs, OpenAI API platform and O1 Q&A — with Michelle Pokrass & OpenAI Devrel + Strawberry team
Congrats to Damien on successfully running AI Engineer London ! See our community page and the Latent Space Discord for all upcoming events. This podcast came together in a far more convoluted way than usual, but happens to result in a tight 2 hours covering the ENTIRE OpenAI product suite across ChatGPT-latest, GPT-4o and the new o1 models , and how they are delivered to AI Engineers in the API via the new Structured Output mode, Assistants API, client SDKs, upcoming Voice Mode API, Finetuning/Vision/Whisper/Batch/Admin/Audit APIs, and everything else you need to know to be up to speed in Sep
Efficiency is Coming: 3000x Faster, Cheaper, Better AI Inference from Hardware Improvements, Quantization, and Synthetic Data Distillation
AI Engineering is expanding! Join the first 🇬🇧 AI Engineer London meetup in Sept and get in touch for sponsoring the second 🗽 AI Engineer Summit in NYC this Dec! The commoditization of intelligence takes on a few dimensions: * Time to Open Model Equivalent : 15 months between GPT-4 and Llama 3.1 405B * 10-100x CHEAPER/year : from $30/mtok for Claude 3 Opus to $3/mtok for L3-405B, and a 400x reduction in the frontier OpenAI model from 2022-2024. Notably, for personal use cases, both Gemini Flash and now Cerebras Inference offer 1m tokens/day inference free, causing the Open Model Red Wedding
Why you should write your own LLM benchmarks — with Nicholas Carlini, Google DeepMind
Today's guest, Nicholas Carlini, a research scientist at DeepMind, argues that we should be focusing more on what AI can do for us individually , rather than trying to have an answer for everyone. "How I Use AI" - A Pragmatic Approach Carlini's blog post "How I Use AI" went viral for good reason. Instead of giving a personal opinion about AI's potential, he simply laid out how he, as a security researcher, uses AI tools in his daily work. He divided it in 12 sections: * To make applications * As a tutor * To get started * To simplify code * For boring tasks * To automate tasks * As an API refe