Skip to main content
← All Podcasts
Latent Space

Latent Space

Deep technical AI engineering content. The go-to podcast for AI builders.

187 episodes curated

ShareShare

Episodes

Aug 22, 2024· 1h 5min

Is finetuning GPT4o worth it? — with Alistair Pullen, Cosine (Genie)

Betteridge's law says no: with seemingly infinite flavors of RAG, and >2million token context + prompt caching from Anthropic/Deepmind/Deepseek, it's reasonable to believe that "in context learning is all you need". But then there’s Cosine Genie , the first to make a huge bet using OpenAI’s new GPT4o fine-tuning for code at the largest scale it has ever been used externally; resulting in what is now the #1 coding agent in the world according to SWE-Bench Full, Lite, and Verified: SWE-Bench has been the most successful agent benchmark of the year, receiving honors at ICLR (our interview here )

Aug 16, 2024· 58 min

AI Magic: Shipping 1000s of successful products with no managers and a team of 12 — Jeremy Howard of Answer.ai

Disclaimer: We recorded this episode ~1.5 months ago, timing for the FastHTML release. It then got bottlenecked by Llama3.1 , Winds of AI Winter , and SAM2 episodes, so we’re a little late. Since then FastHTML was released , swyx is building an app in it for AINews , and Anthropic has also released their prompt caching API . Remember when Dylan Patel of SemiAnalysis coined the GPU Rich vs GPU Poor war ? (if not, see our pod with him ). The idea was that if you’re GPU poor you shouldn’t waste your time trying to solve GPU rich problems (i.e. pre-training large models) and are better off working

Aug 7, 2024· 1h 3min

Segment Anything 2: Demo-first Model Development

Because of the nature of SAM, this is more video heavy than usual. See our YouTube ! Because vision is first among equals in multimodality, and yet SOTA vision language models are closed, we’ve always had an interest in learning what’s next in vision. Our first viral episode was Segment Anything 1 , and we have since covered LLaVA , IDEFICS , Adept , and Reka . But just like with Llama 3 , FAIR holds a special place in our hearts as the New Kings of Open Source AI. The list of sequels better than the originals is usually very short, but SAM 2 delighted us by not only being a better image segme

Aug 2, 2024· 1h 55min

The Winds of AI Winter (Q2 Four Wars Recap) + ChatGPT Voice Mode Preview

Thank you for 1m downloads of the podcast and 2m readers of the Substack! 🎉 This is the audio discussion following The Winds of AI Winter essay that also serves as a recap of Q2 2024 in AI viewed through the lens of our Four Wars framework . Enjoy! Full Video Discussion Full show notes are here . Timestamps * [00:00:00] Intro Song by Suno.ai * [00:02:01] Swyx and Alessio in Singapore * [00:05:49] GPU Rich vs Poors: Frontier Labs * [00:06:35] GPU Rich Frontier Models: Claude 3.5 * [00:10:37] GPU Rich helping Poors: Llama 3.1: The Synthetic Data Model * [00:15:41] GPU Rich helping Poors: Fronti

Jul 23, 2024· 1h 5min

Llama 2, 3 & 4: Synthetic Data, RLHF, Agents on the path to Open Source AGI

If you see this in time, join our emergency LLM paper club on the Llama 3 paper! For everyone else, join our special AI in Action club on the Latent Space Discord for a special feature with the Cursor cofounders on Composer, their newest coding agent! Today, Meta is officially releasing the largest and most capable open model to date, Llama3-405B , a dense transformer trained on 15T tokens that beats GPT-4 on all major benchmarks: The 8B and 70B models from the April Llama 3 release have also received serious spec bumps , warranting the new label of Llama 3.1 . If you are curious about the inf

Jul 12, 2024· 58 min

Benchmarks 201: Why Leaderboards > Arenas >> LLM-as-Judge

The first AI Engineer World’s Fair talks from OpenAI and Cognition are up! In our Benchmarks 101 episode back in April 2023 we covered the history of AI benchmarks, their shortcomings, and our hopes for better ones. Fast forward 1.5 years, the pace of model development has far exceeded the speed at which benchmarks are updated. Frontier labs are still using MMLU and HumanEval for model marketing, even though most models are reaching their natural plateau at a ~90% success rate (any higher and they’re probably just memorizing/overfitting). From Benchmarks to Leaderboards Outside of being stale,

Jul 5, 2024· 1h 44min

The 10,000x Yolo Researcher Metagame — with Yi Tay of Reka

Livestreams for the AI Engineer World’s Fair ( Multimodality ft. the new GPT-4o demo , GPUs and Inference (ft. Cognition/Devin), CodeGen , Open Models tracks) are now live! Subscribe to @aidotEngineer to get notifications of the other workshops and tracks! It’s easy to get de-sensitized to new models topping leaderboards every other week — however, the top of the LMsys leaderboard has typically been the exclusive domain of very large, very very well funded model labs like OpenAI, Anthropic, Google, and Meta. OpenAI had about 600 people at the time of GPT-4, and Google Gemini had 950 co-authors

Jun 25, 2024· 1h 21min

State of the Art: Training >70B LLMs on 10,000 H100 clusters

It’s return guest season here at Latent Space! We last talked to Kanjun in October and Jonathan in May (and December post Databricks acquisition): Imbue and Databricks are back for a rare treat: a double-header interview talking about DBRX from Databricks and Imbue 70B , a new internal LLM that “outperforms GPT-4o” zero-shot on a range of reasoning and coding-related benchmarks and datasets, while using 7x less data than Llama 3 70B . While Imbue, being an agents company rather than a model provider, are not releasing their models today, they are releasing almost everything else: * Cleaned-up

Jun 25, 2024· 49 min

[High Agency] AI Engineer World's Fair Preview

The World’s Fair is officially sold out! Thanks for all the support and stay tuned for recaps of all the great goings on in this very special celebration of the AI Engineer! Longtime listeners will remember the fan favorite Raza Habib, CEO of HumanLoop, on the pod: Well, he’s caught the podcasting bug and is now flipping the tables on swyx! Subscribe to High Agency wherever the finest Artificial Intelligence podcast are sold. High Agency Pod Description In this episode, I chatted with Shawn Wang about his upcoming AI engineering conference and what an AI engineer really is. It's been a year si

Jun 21, 2024· 1h 3min

How To Hire AI Engineers — with James Brady & Adam Wiggins of Elicit

Editor’s note: One of the top reasons we have hundreds of companies and thousands of AI Engineers joining the World’s Fair next week is, apart from discussing technology and being present for the big launches planned, to hire and be hired! Listeners loved our previous Elicit episode and were so glad to welcome 2 more members of Elicit back for a guest post (and bonus podcast) on how they think through hiring. Don’t miss their AI engineer job description , and template which you can use to create your own hiring plan! How to Hire AI Engineers James Brady , Head of Engineering @ Elicit (ex Sprin

Jun 11, 2024· 54 min

How AI is eating Finance — with Mike Conover of Brightwave

In April 2023 we released an episode named “Mapping the future of *truly* open source models” to talk about Dolly , the first open, commercial LLM. Mike was leading the OSS models team at Databricks at the time. Today, Mike is back on the podcast to give us the “one year later” update on the evolution of large language models and how he’s been using them to build Brightwave , an an AI research assistant for investment professionals. Today they are announcing a $6M seed round (led by Alessio and Decibel!), and sharing some of the learnings from serving customers with >$120B of assets under mana

Jun 10, 2024· 4h 29min

ICLR 2024 — Best Papers & Talks (Benchmarks, Reasoning & Agents) — ft. Graham Neubig, Aman Sanger, Moritz Hardt)

Our second wave of speakers for AI Engineer World’s Fair were announced ! The conference sold out of Platinum/Gold/Silver sponsors and Early Bird tickets! See our Microsoft episode for more info and buy now with code LATENTSPACE . This episode is straightforwardly a part 2 to our ICLR 2024 Part 1 episode , so without further ado, we’ll just get right on with it! Timestamps [00:03:43] Section A: Code Edits and Sandboxes, OpenDevin, and Academia vs Industry — ft. Graham Neubig and Aman Sanger * [00:07:44] WebArena * [00:18:45] Sotopia * [00:24:00] Performance Improving Code Edits * [00:29:39] Op

May 30, 2024· 57 min

How to train a Million Context LLM — with Mark Huang of Gradient.ai

AI Engineer World’s Fair in SF! Prices go up soon. Note that there are 4 tracks per day and dozens of workshops/expo sessions; the livestream will air the most stacked speaker list/AI expo floor of 2024 . Apply for free/discounted Diversity Program and Scholarship tickets here. We hope to make this the definitive technical conference for ALL AI engineers. Exactly a year ago, we declared the Beginning of Context=Infinity when Mosaic made their breakthrough training an 84k token context MPT-7B. A Brief History of Long Context Of course right when we released that episode, Anthropic fired the sta

May 27, 2024· 3h 38min

ICLR 2024 — Best Papers & Talks (ImageGen, Vision, Transformers, State Space Models) ft. Durk Kingma, Christian Szegedy, Ilya Sutskever

Speakers for AI Engineer World’s Fair have been announced ! See our Microsoft episode for more info and buy now with code LATENTSPACE — we’ve been studying the best ML research conferences so we can make the best AI industry conf! Note that this year there are 4 main tracks per day and dozens of workshops/expo sessions; the free livestream will air much less than half of the content this time. Apply for free/discounted Diversity Program and Scholarship tickets here. We hope to make this the definitive technical conference for ALL AI engineers. UPDATE: This is a 2 part episode - see Part 2 here

May 16, 2024· 54 min

Emulating Humans with NSFW Chatbots - with Jesse Silver

Disclaimer: today’s episode touches on NSFW topics. There’s no graphic content or explicit language, but we wouldn’t recommend blasting this in work environments. Product website: https://usewhisper.me/ For over 20 years it’s been an open secret that porn drives many new consumer technology innovations, from VHS and Pay-per-view to VR and the Internet . It’s been no different in AI - many of the most elite Stable Diffusion and Llama enjoyers and merging/prompting/PEFT techniques were born in the depths of subreddits and 4chan boards affectionately descibed by friend of the pod as The Waifu Res