Latent Space: The AI Engineer Podcast — Practitioners talking LLMs, CodeGen, Agents, Multimodality, AI UX, GPU Infra and all things Software 3.0-logo

Latent Space: The AI Engineer Podcast — Practitioners talking LLMs, CodeGen, Agents, Multimodality, AI UX, GPU Infra and all things Software 3.0

Technology Podcasts

The podcast by and for AI Engineers! In 2023, over 1 million visitors came to Latent Space to hear about news, papers and interviews in Software 3.0. We cover Foundation Models changing every domain in Code Generation, Multimodality, AI Agents, GPU Infra and more, directly from the founders, builders, and thinkers involved in pushing the cutting edge. Striving to give you both the definitive take on the Current Thing down to the first introduction to the tech you'll be using in the next 3 months! We break news and exclusive interviews from OpenAI, tiny (George Hotz), Databricks/MosaicML (Jon Frankle), Modular (Chris Lattner), Answer.ai (Jeremy Howard), et al. Full show notes always on https://latent.space www.latent.space

Location:

United States

Description:

The podcast by and for AI Engineers! In 2023, over 1 million visitors came to Latent Space to hear about news, papers and interviews in Software 3.0. We cover Foundation Models changing every domain in Code Generation, Multimodality, AI Agents, GPU Infra and more, directly from the founders, builders, and thinkers involved in pushing the cutting edge. Striving to give you both the definitive take on the Current Thing down to the first introduction to the tech you'll be using in the next 3 months! We break news and exclusive interviews from OpenAI, tiny (George Hotz), Databricks/MosaicML (Jon Frankle), Modular (Chris Lattner), Answer.ai (Jeremy Howard), et al. Full show notes always on https://latent.space www.latent.space

Language:

English


Episodes
Ask host to enable sharing for playback control

WebSim, WorldSim, and The Summer of Simulative AI — with Joscha Bach of Liquid AI, Karan Malhotra of Nous Research, Rob Haisfield of WebSim.ai

4/27/2024
We are 200 people over our 300-person venue capacity for AI UX 2024, but you can subscribe to our YouTube for the video recaps. Our next event, and largest EVER, is the AI Engineer World’s Fair. See you there! Parental advisory: Adult language used in the first 10 mins of this podcast. Any accounting of Generative AI that ends with RAG as its “final form” is seriously lacking in imagination and missing out on its full potential. While AI generation is very good for “spicy autocomplete” and “reasoning and retrieval with in context learning”, there’s a lot of untapped potential for simulative AI in exploring the latent space of multiverses adjacent to ours. GANs Many research scientists credit the 2017 Transformer for the modern foundation model revolution, but for many artists the origin of “generative AI” traces a little further back to the Generative Adversarial Networks proposed by Ian Goodfellow in 2014, spawning an army of variants and Cats and People that do not exist: We can directly visualize the quality improvement in the decade since: GPT-2 Of course, more recently, text generative AI started being too dangerous to release in 2019 and claiming headlines. AI Dungeon was the first to put GPT2 to a purely creative use, replacing human dungeon masters and DnD/MUD games of yore. More recent gamelike work like the Generative Agents (aka Smallville) paper keep exploring the potential of simulative AI for game experiences. ChatGPT Not long after ChatGPT broke the Internet, one of the most fascinating generative AI finds was Jonas Degrave (of Deepmind!)’s Building A Virtual Machine Inside ChatGPT: The open-ended interactivity of ChatGPT and all its successors enabled an “open world” type simulation where “hallucination” is a feature and a gift to dance with, rather than a nasty bug to be stamped out. However, further updates to ChatGPT seemed to “nerf” the model’s ability to perform creative simulations, particularly with the deprecation of the `completion` mode of APIs in favor of `chatCompletion`. WorldSim (https://worldsim.nousresearch.com/) It is with this context we explain WorldSim and WebSim. We recommend you watch the WorldSim demo video on our YouTube for the best context, but basically if you are a developer it is a Claude prompt that is a portal into another world of your own choosing, that you can navigate with bash commands that you make up. The live video demo was highly enjoyable: Why Claude? Hints from Amanda Askell on the Claude 3 system prompt gave some inspiration, and subsequent discoveries that Claude 3 is "less nerfed” than GPT 4 Turbo turned the growing Simulative AI community into Anthropic stans. WebSim (https://websim.ai/) This was a one day hackathon project inspired by WorldSim that should have won: In short, you type in a URL that you made up, and Claude 3 does its level best to generate a webpage that doesn’t exist, that would fit your URL. All form POST requests are intercepted and responded to, and all links lead to even more webpages, that don’t exist, that are generated when you make them. All pages are cachable, modifiable and regeneratable - see WebSim for Beginners and Advanced Guide. In the demo I saw we were able to “log in” to a simulation of Elon Musk’s Gmail account, and browse examples of emails that would have been in that universe’s Elon’s inbox. It was hilarious and impressive even back then. Since then though, the project has become even more impressive, with both Siqi Chen and Dylan Field singing its praises: Joscha Bach Joscha actually spoke at the WebSim Hyperstition Night this week, so we took the opportunity to get his take on Simulative AI, as well as a round up of all his other AI hot takes, for his first appearance on Latent Space. You can see it together with the full 2hr uncut demos of WorldSim and WebSim on YouTube! Timestamps * [00:01:59] WorldSim at Replicate HQ * [00:11:03] WebSim at AGI House SF * [00:22:02] Joscha Bach at Hyperstition...

Duration:00:53:47

Ask host to enable sharing for playback control

High Agency Pydantic > VC Backed Frameworks — with Jason Liu of Instructor

4/19/2024
We are reuniting for the 2nd AI UX demo day in SF on Apr 28. Sign up to demo here! And don’t forget tickets for the AI Engineer World’s Fair — for early birds who join before keynote announcements! About a year ago there was a lot of buzz around prompt engineering techniques to force structured output. Our friend Simon Willison tweeted a bunch of tips and tricks, but the most iconic one is Riley Goodside making it a matter of life or death: Guardrails (friend of the pod and AI Engineer speaker), Marvin (AI Engineer speaker), and jsonformer had also come out at the time. In June 2023, Jason Liu (today’s guest!) open sourced his “OpenAI Function Call and Pydantic Integration Module”, now known as Instructor, which quickly turned prompt engineering black magic into a clean, developer-friendly SDK. A few months later, model providers started to add function calling capabilities to their APIs as well as structured outputs support like “JSON Mode”, which was announced at OpenAI Dev Day (see recap here). In just a handful of months, we went from threatening to kill grandmas to first-class support from the research labs. And yet, Instructor was still downloaded 150,000 times last month. Why? What Instructor looks like Instructor patches your LLM provider SDKs to offer a new response_model option to which you can pass a structure defined in Pydantic. It currently supports OpenAI, Anthropic, Cohere, and a long tail of models through LiteLLM. What Instructor is for There are three core use cases to Instructor: * Extracting structured data: Taking an input like an image of a receipt and extracting structured data from it, such as a list of checkout items with their prices, fees, and coupon codes. * Extracting graphs: Identifying nodes and edges in a given input to extract complex entities and their relationships. For example, extracting relationships between characters in a story or dependencies between tasks. * Query understanding: Defining a schema for an API call and using a language model to resolve a request into a more complex one that an embedding could not handle. For example, creating date intervals from queries like “what was the latest thing that happened this week?” to then pass onto a RAG system or similar. Jason called all these different ways of getting data from LLMs “typed responses”: taking strings and turning them into data structures. Structured outputs as a planning tool The first wave of agents was all about open-ended iteration and planning, with projects like AutoGPT and BabyAGI. Models would come up with a possible list of steps, and start going down the list one by one. It’s really easy for them to go down the wrong branch, or get stuck on a single step with no way to intervene. What if these planning steps were returned to us as DAGs using structured output, and then managed as workflows? This also makes it easy to better train model on how to create these plans, as they are much more structured than a bullet point list. Once you have this structure, each piece can be modified individually by different specialized models. You can read some of Jason’s experiments here: While LLMs will keep improving (Llama3 just got released as we write this), having a consistent structure for the output will make it a lot easier to swap models in and out. Jason’s overall message on how we can move from ReAct loops to more controllable Agent workflows mirrors the “Process” discussion from our Elicit episode: Watch the talk As a bonus, here’s Jason’s talk from last year’s AI Engineer Summit. He’ll also be a speaker at this year’s AI Engineer World’s Fair! Timestamps * [00:00:00] Introductions * [00:02:23] Early experiments with Generative AI at StitchFix * [00:08:11] Design philosophy behind the Instructor library * [00:11:12] JSON Mode vs Function Calling * [00:12:30] Single vs parallel function calling * [00:14:00] How many functions is too many? * [00:17:39] How to evaluate function calling *...

Duration:00:52:20

Ask host to enable sharing for playback control

Supervise the Process of AI Research — with Jungwon Byun and Andreas Stuhlmüller of Elicit

4/11/2024
Maggie, Linus, Geoffrey, and the LS crew are reuniting for our second annual AI UX demo day in SF on Apr 28. Sign up to demo here! And don’t forget tickets for the AI Engineer World’s Fair — for early birds who join before keynote announcements! It’s become fashionable for many AI startups to project themselves as “the next Google” - while the search engine is so 2000s, both Perplexity and Exa referred to themselves as a “research engine” or “answer engine” in our NeurIPS pod. However these searches tend to be relatively shallow, and it is challenging to zoom up and down the ladders of abstraction to garner insights. For serious researchers, this level of simple one-off search will not cut it. We’ve commented in our Jan 2024 Recap that Flow Engineering (simply; multi-turn processes over many-shot single prompts) seems to offer far more performance, control and reliability for a given cost budget. Our experiments with Devin and our understanding of what the new Elicit Notebooks offer a glimpse into the potential for very deep, open ended, thoughtful human-AI collaboration at scale. It starts with prompts When ChatGPT exploded in popularity in November 2022 everyone was turned into a prompt engineer. While generative models were good at "vibe based" outcomes (tell me a joke, write a poem, etc) with basic prompts, they struggled with more complex questions, especially in symbolic fields like math, logic, etc. Two of the most important "tricks" that people picked up on were: * Chain of Thought prompting strategy proposed by Wei et al in the “Chain-of-Thought Prompting Elicits Reasoning in Large Language Models”. Rather than doing traditional few-shot prompting with just question and answers, adding the thinking process that led to the answer resulted in much better outcomes. * Adding "Let's think step by step" to the prompt as a way to boost zero-shot reasoning, which was popularized by Kojima et al in the Large Language Models are Zero-Shot Reasoners paper from NeurIPS 2022. This bumped accuracy from 17% to 79% compared to zero-shot. Nowadays, prompts include everything from promises of monetary rewards to… whatever the Nous folks are doing to turn a model into a world simulator. At the end of the day, the goal of prompt engineering is increasing accuracy, structure, and repeatability in the generation of a model. From prompts to agents As prompt engineering got more and more popular, agents (see “The Anatomy of Autonomy”) took over Twitter with cool demos and AutoGPT became the fastest growing repo in Github history. The thing about AutoGPT that fascinated people was the ability to simply put in an objective without worrying about explaining HOW to achieve it, or having to write very sophisticated prompts. The system would create an execution plan on its own, and then loop through each task. The problem with open-ended agents like AutoGPT is that 1) it’s hard to replicate the same workflow over and over again 2) there isn’t a way to hard-code specific steps that the agent should take without actually coding them yourself, which isn’t what most people want from a product. From agents to products Prompt engineering and open-ended agents were great in the experimentation phase, but this year more and more of these workflows are starting to become polished products. Today’s guests are Andreas Stuhlmüller and Jungwon Byun of Elicit (previously Ought), an AI research assistant that they think of as “the best place to understand what is known”. Ought was a non-profit, but last September, Elicit spun off into a PBC with a $9m seed round. It is hard to quantify how much a workflow can be improved, but Elicit boasts some impressive numbers for research assistants: Just four months after launch, Elicit crossed $1M ARR, which shows how much interest there is for AI products that just work. One of the main takeaways we had from the episode is how teams should focus on supervising the process, not the output. Their philosophy...

Duration:00:56:20

Ask host to enable sharing for playback control

Latent Space Chats: NLW (Four Wars, GPT5), Josh Albrecht/Ali Rohde (TNAI), Dylan Patel/Semianalysis (Groq), Milind Naphade (Nvidia GTC), Personal AI (ft. Harrison Chase — LangFriend/LangMem)

4/6/2024
Our next 2 big events are AI UX and the World’s Fair. Join and apply to speak/sponsor! Due to timing issues we didn’t have an interview episode to share with you this week, but not to worry, we have more than enough “weekend special” content in the backlog for you to get your Latent Space fix, whether you like thinking about the big picture, or learning more about the pod behind the scenes, or talking Groq and GPUs, or AI Leadership, or Personal AI. Enjoy! AI Breakdown The indefatigable NLW had us back on his show for an update on the Four Wars, covering Sora, Suno, and the reshaped GPT-4 Class Landscape: and a longer segment on AI Engineering trends covering the future LLM landscape (Llama 3, GPT-5, Gemini 2, Claude 4), Open Source Models (Mistral, Grok), Apple and Meta’s AI strategy, new chips (Groq, MatX) and the general movement from baby AGIs to vertical Agents: Thursday Nights in AI We’re also including swyx’s interview with Josh Albrecht and Ali Rohde to reintroduce swyx and Latent Space to a general audience, and engage in some spicy Q&A: Dylan Patel on Groq We hosted a private event with Dylan Patel of SemiAnalysis (our last pod here): Not all of it could be released so we just talked about our Groq estimates: Milind Naphade - Capital One In relation to conversations at NeurIPS and Nvidia GTC and upcoming at World’s Fair, we also enjoyed chatting with Milind Naphade about his AI Leadership work at IBM, Cisco, Nvidia, and now leading the AI Foundations org at Capital One. We covered: * Milind’s learnings from ~25 years in machine learning * His first paper citation was 24 years ago * Lessons from working with Jensen Huang for 6 years and being CTO of Metropolis * Thoughts on relevant AI research * GTC takeaways and what makes NVIDIA special If you’d like to work on building solutions rather than platform (as Milind put it), his Applied AI Research team at Capital One is hiring, which falls under the Capital One Tech team. Personal AI Meetup It all started with a meme: Within days of each other, BEE, FRIEND, EmilyAI, Compass, Nox and LangFriend were all launching personal AI wearables and assistants. So we decided to put together a the world’s first Personal AI meetup featuring creators and enthusiasts of wearables. The full video is live now, with full show notes within. Timestamps * [00:01:13] AI Breakdown Part 1 * [00:02:20] Four Wars * [00:13:45] Sora * [00:15:12] Suno * [00:16:34] The GPT-4 Class Landscape * [00:17:03] Data War: Reddit x Google * [00:21:53] Gemini 1.5 vs Claude 3 * [00:26:58] AI Breakdown Part 2 * [00:27:33] Next Frontiers: Llama 3, GPT-5, Gemini 2, Claude 4 * [00:31:11] Open Source Models - Mistral, Grok * [00:34:13] Apple MM1 * [00:37:33] Meta's $800b AI rebrand * [00:39:20] AI Engineer landscape - from baby AGIs to vertical Agents * [00:47:28] Adept episode - Screen Multimodality * [00:48:54] Top Model Research from January Recap * [00:53:08] AI Wearables * [00:57:26] Groq vs Nvidia month - GPU Chip War * [01:00:31] Disagreements * [01:02:08] Summer 2024 Predictions * [01:04:18] Thursday Nights in AI - swyx * [01:33:34] Dylan Patel - Semianalysis + Latent Space Live Show * [01:34:58] Groq Transcript [00:00:00] swyx: Welcome to the Latent Space Podcast Weekend Edition. This is Charlie, your AI co host. Swyx and Alessio are off for the week, making more great content. We have exciting interviews coming up with Elicit, Chroma, Instructor, and our upcoming series on NSFW, Not Safe for Work AI. In today's episode, we're collating some of Swyx and Alessio's recent appearances, all in one place for you to find. [00:00:32] swyx: In part one, we have our first crossover pod of the year. In our listener survey, several folks asked for more thoughts from our two hosts. In 2023, Swyx and Alessio did crossover interviews with other great podcasts like the AI Breakdown, Practical AI, Cognitive Revolution, Thursday Eye, and Chinatalk, all of which you can find in...

Duration:02:02:20

Ask host to enable sharing for playback control

Presenting the AI Engineer World's Fair — with Sam Schillace, Deputy CTO of Microsoft

3/29/2024
TL;DR: You can now buy tickets, apply to speak, or join the expo for the biggest AI Engineer event of 2024. We’re gathering *everyone* you want to meet - see you this June. In last year’s the Rise of the AI Engineer we put our money where our mouth was and announced the AI Engineer Summit, which fortunately went well: With ~500 live attendees and over ~500k views online, the first iteration of the AI Engineer industry affair seemed to be well received. Competing in an expensive city with 3 other more established AI conferences in the fall calendar, we broke through in terms of in-person experience and online impact. So at the end of Day 2 we announced our second event: the AI Engineer World’s Fair. The new website is now live, together with our new presenting sponsor: We were delighted to invite both Ben Dunphy, co-organizer of the conference and Sam Schillace, the deputy CTO of Microsoft who wrote some of the first Laws of AI Engineering while working with early releases of GPT-4, on the pod to talk about the conference and how Microsoft is all-in on AI Engineering. Rise of the Planet of the AI Engineer Since the first AI Engineer piece, AI Engineering has exploded: and the title has been adopted across OpenAI, Meta, IBM, and many, many other companies: 1 year on, it is clear that AI Engineering is not only in full swing, but is an emerging global industry that is successfully bridging the gap: * between research and product, * between general-purpose foundation models and in-context use-cases, * and between the flashy weekend MVP (still great!) and the reliable, rigorously evaluated AI product deployed at massive scale, assisting hundreds of employees and driving millions in profit. The greatly increased scope of the 2024 AI Engineer World’s Fair (more stages, more talks, more speakers, more attendees, more expo…) helps us reflect the growth of AI Engineering in three major dimensions: * Global Representation: the 2023 Summit was a mostly-American affair. This year we plan to have speakers from top AI companies across five continents, and explore the vast diversity of approaches to AI across global contexts. * Topic Coverage: * In 2023, the Summit focused on the initial questions that the community wrestled with - LLM frameworks, RAG and Vector Databases, Code Copilots and AI Agents. Those are evergreen problems that just got deeper. * This year the AI Engineering field has also embraced new core disciplines with more explicit focus on Multimodality, Evals and Ops, Open Source Models and GPU/Inference Hardware providers. * Maturity/Production-readiness: Two new tracks are dedicated toward AI in the Enterprise, government, education, finance, and more highly regulated industries or AI deployed at larger scale: * AI in the Fortune 500, covering at-scale production deployments of AI, and * AI Leadership, a closed-door, side event for technical AI leaders to discuss engineering and product leadership challenges as VPs and Heads of AI in their respective orgs. We hope you will join Microsoft and the rest of us as either speaker, exhibitor, or attendee, in San Francisco this June. Contact us with any enquiries that don’t fall into the categories mentioned below. Show Notes * Ben Dunphy * 2023 Summit * GitHub confirmed $100m ARR on stage * History of World’s Fairs * Sam Schillace * Writely on Acquired.fm * Early Lessons From GPT-4: The Schillace Laws * Semantic Kernel * Sam on Kevin Scott (Microsoft CTO)’s podcast in 2022 * AI Engineer World’s Fair (SF, Jun 25-27) * Buy Super Early Bird tickets (Listeners can use LATENTSPACE for $100 off any ticket until April 8, or use GROUP if coming in 4 or more) * Submit talks and workshops for Speaker CFPs (by April 8) * Enquire about Expo Sponsorship (Asap.. selling fast) Timestamps * [00:00:16] Intro * [00:01:04] 2023 AI Engineer Summit * [00:03:11] Vendor Neutral * [00:05:33] 2024 AIE World's Fair * [00:07:34] AIE World's Fair: 9 Tracks * [00:08:58]...

Duration:00:42:58

Ask host to enable sharing for playback control

Why Google failed to make GPT-3 + why Multimodal Agents are the path to AGI — with David Luan of Adept

3/22/2024
Our next SF event is AI UX 2024 - let’s see the new frontier for UX since last year! Last call: we are recording a preview of the AI Engineer World’s Fair with swyx and Ben Dunphy, send any questions about Speaker CFPs and Sponsor Guides you have! Alessio is now hiring engineers for a new startup he is incubating at Decibel: Ideal candidate is an “ex-technical co-founder type”. Reach out to him for more! David Luan has been at the center of the modern AI revolution: he was the ~30th hire at OpenAI, he led Google's LLM efforts and co-led Google Brain, and then started Adept in 2022, one of the leading companies in the AI agents space. In today's episode, we asked David for some war stories from his time in early OpenAI (including working with Alec Radford ahead of the GPT-2 demo with Sam Altman, that resulted in Microsoft’s initial $1b investment), and how Adept is building agents that can “do anything a human does on a computer" — his definition of useful AGI. Why Google *couldn’t* make GPT-3 While we wanted to discuss Adept, we couldn’t talk to a former VP Eng of OpenAI and former LLM tech lead at Google Brain and not ask about the elephant in the room. It’s often asked how Google had such a huge lead in 2017 with Vaswani et al creating the Transformer and Noam Shazeer predicting trillion-parameter models and yet it was David’s team at OpenAI who ended up making GPT 1/2/3. David has some interesting answers: “So I think the real story of GPT starts at Google, of course, right? Because that's where Transformers sort of came about. However, the number one shocking thing to me was that, and this is like a consequence of the way that Google is organized…what they (should) have done would be say, hey, Noam Shazeer, you're a brilliant guy. You know how to scale these things up. Here's half of all of our TPUs. And then I think they would have destroyed us. He clearly wanted it too… You know, every day we were scaling up GPT-3, I would wake up and just be stressed. And I was stressed because, you know, you just look at the facts, right? Google has all this compute. Google has all the people who invented all of these underlying technologies. There's a guy named Noam who's really smart, who's already gone and done this talk about how he wants a trillion parameter model. And I'm just like, we're probably just doing duplicative research to what he's doing. He's got this decoder only transformer that's probably going to get there before we do. And it turned out the whole time that they just couldn't get critical mass. So during my year where I led the Google LM effort and I was one of the brain leads, you know, it became really clear why. At the time, there was a thing called the Brain Credit Marketplace. Everyone's assigned a credit. So if you have a credit, you get to buy end chips according to supply and demand. So if you want to go do a giant job, you had to convince like 19 or 20 of your colleagues not to do work. And if that's how it works, it's really hard to get that bottom up critical mass to go scale these things. And the team at Google were fighting valiantly, but we were able to beat them simply because we took big swings and we focused.” Cloning HGI for AGI Human intelligence got to where it is today through evolution. Some argue that to get to AGI, we will approximate all the “FLOPs” that went into that process, an approach most famously mapped out by Ajeya Cotra’s Biological Anchors report: The early days of OpenAI were very reinforcement learning-driven with the Dota project, but that's a very inefficient way for these models to re-learn everything. (Kanjun from Imbue shared similar ideas in her episode). David argues that there’s a shortcut. We can bootstrap from existing intelligence. “Years ago, I had a debate with a Berkeley professor as to what will it actually take to build AGI. And his view is basically that you have to reproduce all the flops that went into evolution in order to be able to get...

Duration:00:41:52

Ask host to enable sharing for playback control

Making Transformers Sing - with Mikey Shulman of Suno

3/14/2024
Giving computers a voice has always been at the center of sci-fi movies; “I’m sorry Dave, I’m afraid I can’t do that” wouldn’t hit as hard if it just appeared on screen as a terminal output, after all. The first electronic speech synthesizer, the Voder, was built at Bell Labs 85 years ago (1939!), and it’s…. something: We will not cover the history of Text To Speech (TTS), but the evolution of the underlying architecture has generally been Formant Synthesis → Concatenative Synthesis → Neural Networks. Nowadays, state of the art TTS is just one API call away with models like Eleven Labs and OpenAI’s TTS, or products like Descript. Latency is minimal, they have very good intonation, and can mimic a variety of accents. You can hack together your own voice AI therapist in a day! But once you have a computer that can communicate via voice, what comes next? Singing🎶 of course! From Barking 🐶 to Singing 🎤 Today’s guest is Suno’s CEO and co-founder Mikey Shulman. He and his three co-founders, Georg, Martin, and Keenan, previously worked together at Kensho. One of their projects was financially-focused speech recognition (think earnings calls, etc), but all four of them happened to be musicians and audiophiles. They started playing around with text to speech + AI + audio generation and eventually left Kensho to work on it full time. A lot of people when we started a company told us to focus on speech. If we wanted to build an audio company, everyone said, speech is a bigger market. But I think there's something about music that's just so human and you almost couldn't prevent us from doing it. Like we just couldn't keep ourselves from building music models and playing with them because it was so much fun. Their first big product was Bark, the first open source transformer-based “text-to-audio” model (architecturally inspired by Karpathy’s NanoGPT) that went from 0 to ~19,000 Github stars in a month. At the time they felt like audio was years behind text and image as a generation modality; unlike its predecessors, Bark could not only generate speech, but also music and sound effects like crying, laughing, sighing, etc. You can find a few examples here. The main limitation they saw was text to speech training data being extremely limited. So what they did instead is build a new type of foundation model from scratch, trained on audio, and then tweak it to do text to speech. Turning audio into tokens to do self-supervised learning was the most important innovation. Unlike TTS models which are very narrow (and often sound unnatural), Bark was trained on real audio of real people from broad contexts, which made it harder to output unnatural sounding speech. As Bark got popular, more and more people started using it to generate music and it became clear that their architecture would work to generate music that people enjoyed, even though it might not be "on the AGI path” of other labs: Everybody is so focused on LLMs, for good reason, and information processing and intelligence there. And I think it's way too easy to forget that there's this whole other side of things that makes people feel, and maybe that market is smaller, but it makes people feel and it makes us really happy. Suno bursts on the scene In December 2023, Suno went viral with a gorgeous new website and launch tweet: And rave reviews: Music is core to our culture, but very few people are able to create it; Mikey and team want to make everyone an active participant in music making, not just a listener. A “Midjourney of Music”, if you like. We definitely had a lot of fun playing with Suno to generate all sort of Latent Space jingles and songs; the product is live at suno.ai if you want to get in the studio yourself! If Nas joined Latent Space instead of The Firm: 182B models > Blink-182 The soundtrack of the post-scarcity Latent Space ranch Scaling with Modal Given the December launch, scaling up for the Christmas rush was a major concern. This will be a...

Duration:00:52:51

Ask host to enable sharing for playback control

Top 5 Research Trends + OpenAI Sora, Google Gemini, Groq Math (Jan-Feb 2024 Audio Recap) + Latent Space Anniversary with Lindy.ai, RWKV, Pixee, Julius.ai, Listener Q&A!

3/9/2024
We will be recording a preview of the AI Engineer World’s Fair soon with swyx and Ben Dunphy, send any questions about Speaker CFPs and Sponsor Guides you have! Alessio is now hiring engineers for a new startup he is incubating at Decibel: Ideal candidate is an ex-technical co-founder type (can MVP products end to end, comfortable with ambiguous prod requirements, etc). Reach out to him for more! Thanks for all the love on the Four Wars episode! We’re excited to develop this new “swyx & Alessio rapid-fire thru a bunch of things” format with you, and feedback is welcome. Jan 2024 Recap The first half of this monthly audio recap pod goes over our highlights from the Jan Recap, which is mainly focused on notable research trends we saw in Jan 2024: Feb 2024 Recap The second half catches you up on everything that was topical in Feb, including: * OpenAI Sora - does it have a world model? Yann LeCun vs Jim Fan * Google Gemini Pro 1.5 - 1m Long Context, Video Understanding * Groq offering Mixtral at 500 tok/s at $0.27 per million toks (swyx vs dylan math) * The {Gemini | Meta | Copilot} Alignment Crisis (Sydney is back!) * Grimes’ poetic take: Art for no one, by no one * F*** you, show me the prompt Latent Space Anniversary Please also read Alessio’s longform reflections on One Year of Latent Space! We launched the podcast 1 year ago with Logan from OpenAI: and also held an incredible demo day that got covered in The Information: Over 750k downloads later, having established ourselves as the top AI Engineering podcast, reaching #10 in the US Tech podcast charts, and crossing 1 million unique readers on Substack, for our first anniversary we held Latent Space Final Frontiers, where 10 handpicked teams, including Lindy.ai and Julius.ai, competed for prizes judged by technical AI leaders from (former guest!) LlamaIndex, Replit, GitHub, AMD, Meta, and Lemurian Labs. The winners were Pixee and RWKV (that’s Eugene from our pod!): And finally, your cohosts got cake! We also captured spot interviews with 4 listeners who kindly shared their experience of Latent Space, everywhere from Hungary to Australia to China: * Balázs Némethi * Sylvia Tong * RJ Honicky * Jan Zheng Our birthday wishes for the super loyal fans reading this - tag @latentspacepod on a Tweet or comment on a @LatentSpaceTV video telling us what you liked or learned from a pod that stays with you to this day, and share us with a friend! As always, feedback is welcome. Timestamps * [00:03:02] Top Five LLM Directions * [00:03:33] Direction 1: Long Inference (Planning, Search, AlphaGeometry, Flow Engineering) * [00:11:42] Direction 2: Synthetic Data (WRAP, SPIN) * [00:17:20] Wildcard: Multi-Epoch Training (OLMo, Datablations) * [00:19:43] Direction 3: Alt. Architectures (Mamba, RWKV, RingAttention, Diffusion Transformers) * [00:23:33] Wildcards: Text Diffusion, RALM/Retro * [00:25:00] Direction 4: Mixture of Experts (DeepSeekMoE, Samba-1) * [00:28:26] Wildcard: Model Merging (mergekit) * [00:29:51] Direction 5: Online LLMs (Gemini Pro, Exa) * [00:33:18] OpenAI Sora and why everyone underestimated videogen * [00:36:18] Does Sora have a World Model? Yann LeCun vs Jim Fan * [00:42:33] Groq Math * [00:47:37] Analyzing Gemini's 1m Context, Reddit deal, Imagegen politics, Gemma via the Four Wars * [00:55:42] The Alignment Crisis - Gemini, Meta, Sydney is back at Copilot, Grimes' take * [00:58:39] F*** you, show me the prompt * [01:02:43] Send us your suggestions pls * [01:04:50] Latent Space Anniversary * [01:04:50] Lindy.ai - Agent Platform * [01:06:40] RWKV - Beyond Transformers * [01:15:00] Pixee - Automated Security * [01:19:30] Julius AI - Competing with Code Interpreter * [01:25:03] Latent Space Listeners * [01:25:03] Listener 1 - Balázs Némethi (Hungary, Latent Space Paper Club * [01:27:47] Listener 2 - Sylvia Tong (Sora/Jim Fan/EntreConnect) * [01:31:23] Listener 3 - RJ (Developers building Community & Content) * [01:39:25]...

Duration:01:48:52

Ask host to enable sharing for playback control

Open Source AI is AI we can Trust — with Soumith Chintala of Meta AI

3/6/2024
Speaker CFPs and Sponsor Guides are now available for AIE World’s Fair — join us on June 25-27 for the biggest AI Engineer conference of 2024! Soumith Chintala needs no introduction in the ML world — his insights are incredibly accessible across Twitter, LinkedIn, podcasts, and conference talks (in this pod we’ll assume you’ll have caught up on the History of PyTorch pod from last year and cover different topics). He’s well known as the creator of PyTorch, but he's more broadly the Engineering Lead on AI Infra, PyTorch, and Generative AI at Meta. Soumith was one of the earliest supporters of Latent Space (and more recently AI News), and we were overjoyed to catch up with him on his latest SF visit for a braindump of the latest AI topics, reactions to some of our past guests, and why Open Source AI is personally so important to him. Life in the GPU-Rich Lane Back in January, Zuck went on Instagram to announce their GPU wealth: by the end of 2024, Meta will have 350k H100s. By adding all their GPU clusters, you'd get to 600k H100-equivalents of compute. At FP16 precision, that's ~1,200,000 PFLOPS. If we used George Hotz's (previous guest!) "Person of Compute" measure, Meta now has 60k humans of compute in their clusters. Occasionally we get glimpses into the GPU-rich life; on a recent ThursdAI chat, swyx prompted PaLM tech lead Yi Tay to write down what he missed most from Google, and he commented that UL2 20B was trained by accidentally leaving the training job running for a month, because hardware failures are so rare in Google. Meta AI’s Epic LLM Run Before Llama broke the internet, Meta released an open source LLM in May 2022, OPT-175B, which was notable for how “open” it was - right down to the logbook! They used only 16 NVIDIA V100 GPUs and Soumith agrees that, with hindsight, it was likely under-trained for its parameter size. In Feb 2023 (pre Latent Space pod), Llama was released, with a 7B version trained on 1T tokens alongside 65B and 33B versions trained on 1.4T tokens. The Llama authors included Guillaume Lample and Timothée Lacroix, who went on to start Mistral. July 2023 was Llama2 time (which we covered!): 3 model sizes, 7B, 13B, and 70B, all trained on 2T tokens. The three models accounted for a grand total of 3,311,616 GPU hours for all pre-training work. CodeLlama followed shortly after, a fine-tune of Llama2 specifically focused on code generation use cases. The family had models in the 7B, 13B, 34B, and 70B size, all trained with 500B extra tokens of code and code-related data, except for 70B which is trained on 1T. All of this on top of other open sourced models like Segment Anything (one of our early hits!), Detectron, Detectron 2, DensePose, and Seamless, and in one year, Meta transformed from a company people made fun of for its “metaverse” investments to one of the key players in the AI landscape and its stock has almost tripled since (about $830B in market value created in the past year). Why Open Source AI The obvious question is why Meta would spend hundreds of millions on its AI efforts and then release them for free. Zuck has addressed this in public statements: But for Soumith, the motivation is even more personal: “I'm irrationally interested in open source. I think open source has that fundamental way to distribute opportunity in a way that is very powerful. Like, I grew up in India… And knowledge was very centralized, but I saw that evolution of knowledge slowly getting decentralized. And that ended up helping me learn quicker and faster for like zero dollars. And I think that was a strong reason why I ended up where I am. So like that, like the open source side of things, I always push regardless of like what I get paid for, like I think I would do that as a passion project on the side… …I think at a fundamental level, the most beneficial value of open source is that you make the distribution to be very wide. It's just available with no friction and people can do transformative...

Duration:01:20:00

Ask host to enable sharing for playback control

A Brief History of the Open Source AI Hacker - with Ben Firshman of Replicate

2/28/2024
This Friday we’re doing a special crossover event in SF with of SemiAnalysis (previous guest!), and we will do a live podcast on site. RSVP here. Also join us on June 25-27 for the biggest AI Engineer conference of the year! Replicate is one of the most popular AI inference providers, reporting over 2 million users as of their $40m Series B with a16z. But how did they get there? The Definitive Replicate Story (warts and all) Their overnight success took 5 years of building, and it all started with arXiv Vanity, which was a 2017 vacation project that scrapes arXiv PDFs and re-renders them into semantic web pages that reflow nicely with better typography and whitespace. From there, Ben and Andreas’ idea was to build tools to make ML research more robust and reproducible by making it easy to share code artefacts alongside papers. They had previously created Fig, which made it easy to spin up dev environments; it was eventually acquired by Docker and turned into `docker-compose`, the industry standard way to define services from containerized applications. 2019: Cog The first iteration of Replicate was a Fig-equivalent for ML workloads which they called Cog; it made it easy for researchers to package all their work and share it with peers for review and reproducibility. But they found that researchers were terrible users: they’d do all this work for a paper, publish it, and then never return to it again. “We talked to a bunch of researchers and they really wanted that.... But how the hell is this a business, you know, like how are we even going to make any money out of this? …So we went and talked to a bunch of companies trying to sell them something which didn't exist. So we're like, hey, do you want a way to share research inside your company so that other researchers or say like the product manager can test out the machine learning model? They're like, maybe. Do you want like a deployment platform for deploying models? Do you want a central place for versioning models? We were trying to think of lots of different products we could sell that were related to this thing… So we then got halfway through our YC batch. We hadn't built a product. We had no users. We had no idea what our business was going to be because we couldn't get anybody to like buy something which didn't exist. And actually there was quite a way through our, I think it was like two thirds the way through our YC batch or something. And we're like, okay, well we're kind of screwed now because we don't have anything to show at demo day.” The team graduated YCombinator with no customers, no product and nothing to demo - which was fine because demo day got canceled as the YC W’20 class graduated right into the pandemic. The team spent the next year exploring and building Covid tools. 2021: CLIP + GAN = PixRay By 2021, OpenAI released CLIP. Overnight dozens of Discord servers got spun up to hack on CLIP + GANs. Unlike academic researchers, this community was constantly releasing new checkpoints and builds of models. PixRay was one of the first models being built on Replicate, and it quickly started taking over the community. Chris Dixon has a famous 2010 post titled “The next big thing will start out looking like a toy”; image generation would have definitely felt like a toy in 2021, but it gave Replicate its initial boost. 2022: Stable Diffusion In August 2022 Stable Diffusion came out, and all the work they had been doing to build this infrastructure for CLIP / GANs models became the best way for people to share their StableDiffusion fine-tunes: And like the first week we saw people making animation models out of it. We saw people make game texture models that use circular convolutions to make repeatable textures. We saw a few weeks later, people were fine tuning it so you could put your face in these models and all of these other ways. […] So tons of product builders wanted to build stuff with it. And we were just sitting in there in the middle,...

Duration:01:10:04

Ask host to enable sharing for playback control

Truly Serverless Infra for AI Engineers - with Erik Bernhardsson of Modal

2/16/2024
We’re writing this one day after the monster release of OpenAI’s Sora and Gemini 1.5. We covered this on ‘s ThursdAI space, so head over there for our takes. IRL: We’re ONE WEEK away from Latent Space: Final Frontiers, the second edition and anniversary of our first ever Latent Space event! Also: join us on June 25-27 for the biggest AI Engineer conference of the year! Online: All three Discord clubs are thriving. Join us every Wednesday/Friday! Almost 12 years ago, while working at Spotify, Erik Bernhardsson built one of the first open source vector databases, Annoy, based on ANN search. He also built Luigi, one of the predecessors to Airflow, which helps data teams orchestrate and execute data-intensive and long-running jobs. Surprisingly, he didn’t start yet another vector database company, but instead in 2021 founded Modal, the “high-performance cloud for developers”. In 2022 they opened doors to developers after their seed round, and in 2023 announced their GA with a $16m Series A. More importantly, they have won fans among both household names like Ramp, Scale AI, Substack, and Cohere, and newer startups like (upcoming guest!) Suno.ai and individual hackers (Modal was the top tool of choice in the Vercel AI Accelerator): We've covered the nuances of GPU workloads, and how we need new developer tooling and runtimes for them (see our episodes with Chris Lattner of Modular and George Hotz of tiny to start). In this episode, we run through the major limitations of the actual infrastructure behind the clouds that run these models, and how Erik envisions the “postmodern data stack”. In his 2021 blog post “Software infrastructure 2.0: a wishlist”, Erik had “Truly serverless” as one of his points: * The word cluster is an anachronism to an end-user in the cloud! I'm already running things in the cloud where there's elastic resources available at any time. Why do I have to think about the underlying pool of resources? Just maintain it for me. * I don't ever want to provision anything in advance of load. * I don't want to pay for idle resources. Just let me pay for whatever resources I'm actually using. * Serverless doesn't mean it's a burstable VM that saves its instance state to disk during periods of idle. Swyx called this Self Provisioning Runtimes back in the day. Modal doesn’t put you in YAML hell, preferring to colocate infra provisioning right next to the code that utilizes it, so you can just add GPU (and disk, and retries…): After 3 years, we finally have a big market push for this: running inference on generative models is going to be the killer app for serverless, for a few reasons: * AI models are stateless: even in conversational interfaces, each message generation is a fully-contained request to the LLM. There’s no knowledge that is stored in the model itself between messages, which means that tear down / spin up of resources doesn’t create any headaches with maintaining state. * Token-based pricing is better aligned with serverless infrastructure than fixed monthly costs of traditional software. * GPU scarcity makes it really expensive to have reserved instances that are available to you 24/7. It’s much more convenient to build with a serverless-like infrastructure. In the episode we covered a lot more topics like maximizing GPU utilization, why Oracle Cloud rocks, and how Erik has never owned a TV in his life. Enjoy! Show Notes * Modal * ErikBot * Erik’s Blog * Software Infra 2.0 Wishlist * Luigi * Annoy * Hetzner * CoreWeave * Cloudflare FaaS * Poolside AI * Modular Inference Engine Chapters * [00:00:00] Introductions * [00:02:00] Erik's OSS work at Spotify: Annoy and Luigi * [00:06:22] Starting Modal * [00:07:54] Vision for a "postmodern data stack" * [00:10:43] Solving container cold start problems * [00:12:57] Designing Modal's Python SDK * [00:15:18] Self-Revisioning Runtime * [00:19:14] Truly Serverless Infrastructure * [00:20:52] Beyond model inference * [00:22:09]...

Duration:01:02:25

Ask host to enable sharing for playback control

Cloud Intelligence at the speed of 5000 tok/s - with Ce Zhang and Vipul Ved Prakash of Together AI

2/8/2024
Our first ever demo day aimed for 15-20 people and ended up ballooning to >200 and covered in the news. We are now running the 2024 edition in SF on Feb 23: Latent Space Final Frontiers, a startup and research competition in “The Autonomous Workforce”, ​”Beyond Transformers & GPUs”, and “​Embodied AI”. RSVP here! You can find all LS online/IRL events on our new calendar. Super Early Bird tickets have just gone on sale for AI Engineer World’s Fair, June 25-27! Today we have the honor of hosting two of Together AI’s co-founders: Ce Zhang (CTO) and Vipul Ved Prakash (CEO). This is a rare opportunity to recap the history of the company since our last check-in with Tri Dao (Chief Scientist), some of their big releases, and do a deep dive into the state of the AI inference market. Together has emerged as one of the most consequential new startups in the new AI summer, last announcing a ~$100m Series A raise in November (at a ~$360-565m valuation). Note from future: about a week after this pod was published, rumors were confirmed that Salesforce had led another $100m Series B at a $1b valuation. But there are at least three Togethers - Together the Research Lab, Together the Fine Tuning & Inference platform, and Together the custom models service. As we clarify on the pod, the overarching philosophy of Together is the ability to improve on all these fronts simultaneously by being “full stack”, from the lowest level kernel and systems programming to the highest level mathematical abstractions driving new model architectures and inference algorithms. Bringing Research and Industry Together In just one year, Together has been behind some of the most exciting research in AI: * RedPajama, a fully open source dataset for model pre-training which mirrored the Llama1 recipe. Then followed by RedPajama2, a 30T tokens dataset of filtered and de-duplicated tokens. * RedPajama-INCITE-3B and 7B, which were SOTA in a few benchmarks at the time of release. * FlashAttention-2, developed by Together’s Chief Scientist Tri Dao. We covered FA-2 in a previous episode with him. * Mamba-3B, the most promising transformer-alternative model that they released in collaboration with Cartesia. * StripedHyena, a SOTA graft of Hyena state space models and transformer models together * Medusa, an alternative to speculative decoding that lets you use multiple decoding heads instead of a draft model. * MonarchMixer, which was one of the most popular orals at NeurIPS 2023. It’s an approach to transformers that replaces many of its core parts with Monarch matrices for better computational efficiency. And I’m sure we missed something! As Vipul reveals, almost 50% of Together staff is researchers, and two of their co-founders (Chris Ré and Percy Liang) are professors at Stanford, so we can expect a lot more here. Bringing “Disaggregated” GPUs Together On their cloud, they offer inference as a service, fine-tuning, pre-training, etc, but unlike other providers they think of themselves as a disaggregated cloud. Today, they have ~8,000 A100 and H100 GPUs on their platform (an exclusive revealed on the pod!) totaling over 20 exaflops of compute, but instead of just buying more and putting them in a cluster and then exposing a `us-east-1` option for customers, they are taking heterogenous compute sources and adding a unified layer on top of it for developers to consume. Building on Ce’s research, Together’s GPU Clusters are taking on comparable AWS and GCP offerings in both cost and speed: Take the Hessian AI center in Germany or the DoE’s INCITE; they have GPUs that they want to share with researchers, but they lack the cloud layer over it. Similarly, there’s starting to be more and more differentiation amongst types of GPUs: H100s, A100s, MI3000s, etc. Each of them has different availability and performance based on task, and the end user shouldn’t have to be an hardware expert to run inference on a model, so Together abstracts a lot of that away. A big...

Duration:01:03:11

Ask host to enable sharing for playback control

Why StackOverflow usage is down 50% — with David Hsu of Retool

2/1/2024
We are announcing the second edition of our Latent Space demo day event in SF on 2/23: Final Frontiers, a startup and research competition in “The Autonomous Workforce”, ​”Beyond Transformers & GPUs”, and “​Embodied AI”. RSVP here! The first one was aimed for 15-20 people and ended up blowing up to >200 and covered in the Information - let’s see what a year of growth (and competition) does to the local events space in 2024. You can find all Latent Space events here, and of course get in touch with us to host your own AI Engineer meetups like AI Engineering Singapore. In our December 2023 recap we covered the Four Wars of the AI stack. But how do we know when it’s time to crown a winner? As we kick off 2024, we wanted to do a recap of the State of AI in 2023 to set a baseline of adoption for different products. Retool had a great report at the end of last year which covered a lot of it. David Hsu, CEO and co-founder of Retool, joined us to go over it together. We also talked about the history of Retool, why they were too embarrassed to present at YC demo day, and how they got to $1M ARR with 3 employees. If you’re a founder, there are a lot of nuggets of advice in here! Retool AI In our modeling of the “Software 3.0 Stack”, we have generally left a pretty wide open gap as to the “user interface” equivalent of the AI stack: Retool AI launched 4 months ago with some nifty features for SQL generation, and its own hosted vector storage service (using pgvector). However, as he explains on the pod, the more interesting potential of Retool is in helping developers build AI infused applications quickly, in combination with its Workflows feature. This moves Retool down the stack from just the UI for internal tooling to the business logic “piping” as well. There are a bunch of dedicated tools in this space like Respell, BuildShip, Flowise, and Ironclad Rivet. "We think that practically every internal app is going to be AI infused over the next three years." - David on the pod RIP StackOverflow? In July 2023 we talked about the impact of ChatGPT and Copilot: This was then disputed by StackOverflow, who pointed out (very fairly so) that there were privacy-related changes in their analytics instrumentation in 2022. StackOverflow no longer reports traffic, but based on StackOverflow’s continuing transparency we can see that organic declines have continued throughout 2023. Retool’s report comes over a year after those changes and has some self reported samples from users: * 57.6% of people said they have used StackOverflow less; almost all of them replaced it with ChatGPT and Copilot. * 10.2% said they no longer use StackOverflow. We also saw a lot more tools being released in the dev tools space such as (one of our oldest pod friends) Codeium (which just raised a $65M Series B), SourceGraph (and their newly released Cody), Codium AI (just released AlphaCodium which was picked up by Karpathy), Phind (which beat GPT-4 with OSS models), and Cursor, one of the most beloved products in the dev community at the moment. Intelligence is getting closer and closer to the IDE, and the trend doesn’t seem to be reverting. We already said that “You are not too old (to pivot into AI)“, and the advice still stands. When asked to rate “Preference for hiring engineers effective at using ChatGPT/Copilot for coding” on a scale of 1 to 10, where 10 is “Much more likely”, ~40% of companies voted 8-10. Having an AI Engineer skillset is extremely important. 45% of companies between 1,000-4,999 employees said that they increased the difficulty of technical interviews to compensate for these new tools, so the gap between users and non-users will keep widening. Crossing the AI in Production Chasm Geoffrey Moore’s “Crossing the Chasm” is one of the most quoted business frameworks. Every market has an initial group of Innovators and Early Adopters, who are willing to suffer through the rough edges of products initially, and eventually crosses into...

Duration:00:57:20

Ask host to enable sharing for playback control

The Four Wars of the AI Stack (Dec 2023 Audio Recap)

1/25/2024
Note for Latent Space Community members: we have now soft-launched meetups in Singapore, as well as two new virtual paper club/meetups for AI in Action and LLM Paper Club. We’re also running Latent Space: Final Frontiers, our second annual demo day hackathon from last year. Edit from March 2024: We did a followup on the Four Wars on the AI Breakdown. For the first time, we are doing an audio version of monthly AI Engineering recap that we publish on Latent Space! This month it’s “The Four Wars of the AI Stack”; you can find the full recap with all the show notes here: https://latent.space/p/dec-2023 * [00:00:00] Intro * [00:01:42] The Four Wars of the AI stack: Data quality, GPU rich vs poor, Multimodality, and Rag/Ops war * [00:03:17] Selection process for the four wars and notable mentions * [00:06:58] The end of low background tokens and the impact on data engineering * [00:08:36] The Quality Data Wars (UGC, licensing, synthetic data, and more) * [00:14:51] Synthetic Data * [00:17:49] The GPU Rich/Poors War * [00:18:21] Anyscale benchmark drama * [00:22:00] The math behind Mixtral inference costs * [00:28:48] Transformer alternatives and why they matter * [00:34:40] The Multimodality Wars * [00:38:10] Multiverse vs Metaverse * [00:45:00] The RAG/Ops Wars * [00:50:00] Will frameworks expand up, or will cloud providers expand down? * [00:54:32] Syntax to Semantics * [00:56:41] Outer Loop vs Inner Loop * [00:59:54] Highlight of the month Get full access to Latent Space at www.latent.space/subscribe

Duration:01:08:18

Ask host to enable sharing for playback control

How to train your own Large Multimodal Model — with Hugo Laurençon & Leo Tronchon of HuggingFace M4

1/19/2024
Latent Space is heating up! Our paper club ran into >99 person Discord limits, oops. We are also introducing 2 new online meetups: LLM Paper Club Asia for Asia timezone (led by Ivan), and AI in Action: hands-on application of AI (led by KBall). To be notified of all upcoming Latent Space events, subscribe to our new Luma calendar (sign up for individual events, or hit the RSS icon to sync all events to calendar). In the halcyon open research days of 2022 BC (Before-ChatGPT), DeepMind was the first to create a SOTA multimodal model by taking a pre-existing LLM (Chinchilla 80B - now dead?) and pre-existing vision encoder (CLIP) and training a “glue” adapter layer, inspiring a generation of stunningly cheap and effective multimodal models including LLaVA (one of the Best Papers of NeurIPS 2023), BakLLaVA and FireLLaVA. However (for reasons we discuss in today’s conversation), DeepMind’s Flamingo model was never open sourced. Based on the excellent paper, LAION stepped up to create OpenFlamingo, but it never scaled beyond 9B. Simultaneously, the M4 (audio + video + image + text multimodality) research team at HuggingFace announced an independent effort to reproduce Flamingo up to the full 80B scale: The effort started in March, and was released in August 2023. We happened to visit Paris last year, and visited HuggingFace HQ to learn all about HuggingFace’s research efforts, and cover all the ground knowledge LLM people need to become (what Chip Huyen has termed) “LMM” people. In other words: What is IDEFICS? IDEFICS is an Open Access Visual Language Model, available in 9B and 80B model sizes. As an attempt to re-create an open-access version of Flamingo, it seems to track very well on a range of multimodal benchmarks (which we discuss in the pod): You can see the reasoning abilities of the models to take a combination of interleaved images + text in a way that allows users to either describe images, ask questions about the images, or extend/combine the images into different artworks (e.g. poetry). 📷 From IDEFICS’s model card and blog post The above demo screenshots are actually fine-tuned instruct versions of IDEFICS — which are again in 9B and 80B versions. IDEFICS was built by connecting two unimodal models together to provide the multi-modality you see showcased above. * Llama v1 for language (specifically huggyllama/llama-65b) - the best available open model at the time, to be swapped for Mistral in the next version of IDEFICS * A CLIP model for vision (specifically laion/CLIP-ViT-H-14-laion2B-s32B-b79K - after a brief exploration of EVA-CLIP, which we discuss on the pod) OBELICS: a new type of Multimodal Dataset IDEFICS’ training data used the usual suspect datasets, but to get to par with Flamingo they needed to create a new data set. Enter OBELICS: “An Open Web-Scale Filtered Dataset of Interleaved Image-Text Documents”: * 115B text tokens * 141M English documents * 353M images These bullets are carefully curated and filtered by going through Common Crawl dumps between FEB 2020 - FEB 2023. We discuss the 2 months of mindnumbing, unglamorous work creating this pipeline: There’s a lot of mentions of ‘multi-modal' web documents’ which deserves some explanation. We’ll show you instead of tell you: You can see from this graph that OBELICS ends up outperforming the other image-text pairs dataset (LAION in this case) when stacked head-to-head. You can view a subset of OBELICS and perform visualizations on them here: 2024 Update: WebSight et al Most of this interview was recorded on Halloween 2023 at HuggingFace’s headquarters in Paris: In anticipation of an IDEFICS v2 release. However, several roadblocks emerged, including a notable scandal around CSAM in LAION-5B, which affected all models using that dataset. The M4 team have adopted a strategy of shipping smaller advancements in 2024, and the first ship of the year is WebSight, a dataset of 823,000 HTML/CSS codes representing synthetically generated...

Duration:01:11:50

Ask host to enable sharing for playback control

RLHF 201 - with Nathan Lambert of AI2 and Interconnects

1/11/2024
In 2023 we did a few Fundamentals episodes covering Benchmarks 101, Datasets 101, FlashAttention, and Transformers Math, and it turns out those were some of your evergreen favorites! So we are experimenting with more educational/survey content in the mix alongside our regular founder and event coverage. Pls request more! We have a new calendar for events; join to be notified of upcoming things in 2024! Today we visit the shoggoth mask factory: how do transformer models go from trawling a deeply learned latent space for next-token prediction to a helpful, honest, harmless chat assistant? Our guest “lecturer” today is ; you might know him from his prolific online writing on and Twitter, or from his previous work leading RLHF at HuggingFace and now at the Allen Institute for AI (AI2) which recently released the open source GPT3.5-class Tulu 2 model which was trained with DPO. He’s widely considered one of the most knowledgeable people on RLHF and RLAIF. He recently gave an “RLHF 201” lecture at Stanford, so we invited him on the show to re-record it for everyone to enjoy! You can find the full slides here, which you can use as reference through this episode. Full video with synced slides For audio-only listeners, this episode comes with slide presentation along our discussion. You can find it on our YouTube (like, subscribe, tell a friend, et al). Theoretical foundations of RLHF The foundation and assumptions that go into RLHF go back all the way to Aristotle (and you can find guidance for further research in the slide below) but there are two key concepts that will be helpful in thinking through this topic and LLMs in general: * Von Neumann–Morgenstern utility theorem: you can dive into the math here, but the TLDR is that when humans make decision there’s usually a “maximum utility” function that measures what the best decision would be; the fact that this function exists, makes it possible for RLHF to model human preferences and decision making. * Bradley-Terry model: given two items A and B from a population, you can model the probability that A will be preferred to B (or vice-versa). In our world, A and B are usually two outputs from an LLM (or at the lowest level, the next token). It turns out that from this minimal set of assumptions, you can build up the mathematical foundations supporting the modern RLHF paradigm! The RLHF loop One important point Nathan makes is that "for many tasks we want to solve, evaluation of outcomes is easier than producing the correct behavior". For example, it might be difficult for you to write a poem, but it's really easy to say if you like or dislike a poem someone else wrote. Going back to the Bradley-Terry Model we mentioned, the core idea behind RLHF is that when given two outputs from a model, you will be able to say which of the two you prefer, and we'll then re-encode that preference into the model. An important point that Nathan mentions is that when you use these preferences to change model behavior "it doesn't mean that the model believes these things. It's just trained to prioritize these things". When you have preference for a model to not return instructions on how to write a computer virus for example, you're not erasing the weights that have that knowledge, but you're simply making it hard for that information to surface by prioritizing answers that don't return it. We'll talk more about this in our future Fine Tuning 101 episode as we break down how information is stored in models and how fine-tuning affects it. At a high level, the loop looks something like this: For many RLHF use cases today, we can assume the model we're training is already instruction-tuned for chat or whatever behavior the model is looking to achieve. In the "Reward Model & Other Infrastructure" we have multiple pieces: Reward + Preference Model The reward model is trying to signal to the model how much it should change its behavior based on the human preference, subject to a KL...

Duration:01:25:30

Ask host to enable sharing for playback control

The Accidental AI Canvas - with Steve Ruiz of tldraw

1/5/2024
Happy 2024! We appreciated all the feedback on the listener survey (still open, link here)! Surprising to see that some people’s favorite episodes were others’ least, but we’ll always work on improving our audio quality and booking great guests. Help us out by leaving reviews on Twitter, YouTube, and Apple Podcasts! 🙏 Big thanks to Chris Anderson for the latest review - be like Chris! Note to the Audio-only Listener Because of the nature of today’s topic, it makes the most sense to follow along the demo on video rather than audio. There’s also about 30 mins of demos and technical detail that we had to remove from the audio version, because they didn’t make sense without video. Trailer here. Full 90min chat: (In other words, pls jump over and watch on our YouTube if you can! Did you know we are now posting every episode to YouTube? We’ve been multimodal for a long time!) Trend 1: GPT4-V Coding You might remember Greg Brockman’s hand-scribble-to-working-website demo from the GPT-4 demo from March. This was largely inaccessible to the rest of us until the GPT4-V API was released at Dev Day in November. As mentioned in our November 2023 recap, one of the biggest viral trends was tldraw’s open source “Make It Real” demo: starting from a simple wireframe and text annotations, you could create a real, functioning UI with the click of a button. Provoking another crisis of confidence in developer circles: And using state charts: And provoking responses from Excalidraw, a competitor. You can see us creating a Replit clone in this silent video here: Since our intervew the new GPT4V Coding metagame has been merging app UI’s and SQL with Supabase (another AIE Summit speaker) and other backend tools: * generating SQL * converting ERDs to SQL (part 2, for MariaDB) * seeding sample data * doing migrations Trend 2: Latent Consistency Models As covered in the Latent Space Paper Club in November, 3 papers drove a roughly ~100x acceleration in the speed of text to image generation over the past year: * Consistency Models (with Ilya Sutskever) * Latent Consistency Models (from Tsinghua) * LCM-LoRA (also Tsinghua, same authors) With the invaluable help of Fal.ai (friends of the show and AI Engineer Summit and progenitors of the viral GPU Rich/Poor hats mentioned on the Semianalysis episode), TLDraw has also been at the forefront of putting this research into production, with two projects: * drawfast: add a prompt, start sketching into the canvas and see each stroke affect the drawing. Overlap multiple of them to extend and merge drawings. * lens: a collaborative canvas where in real time people can draw and have their sketch turn into AI-generated art. Start drawing at the bottom and see it scroll into the magic canvas. For nontechnical people in your life, we do recommend showing them lens.tldraw.com (and its predecessor that we discuss on the show) on your and their mobile devices. The Rise of Multimodal Prompting At the first AI Engineer Summit in October, Logan (our first guest!) declared this the Year of Multimodality. Over the next 2 months we saw an explosion of activity in multimodal: GPT-4V’s API release at OpenAI Dev Day (our coverage here), LLaVA (our chat with author here on Visual Instruction Tuning), BakLLaVA, Qwen-VL, CogVLM, etc. On today’s episode we have Steve Ruiz, founder of tldraw. The project originally started as an open source whiteboard that Steve built for himself and then “accidentally made a really, really good visual multimodal prompting application environment”. Turns out that infinite canvas and generative models are a very good match: * Design is iterative: DALL-E, Midjourney, etc all work in a linear way: prompt goes in, 1-4 images come back. As you generate more, the previous images scroll away from your view. In a canvas environment, you can see the progression of your generation and visually “branch” by putting new prompts in different spaces. * UI has “layers”: when designing...

Duration:01:04:09

Ask host to enable sharing for playback control

NeurIPS 2023 Recap — Top Startups

12/30/2023
We are running an end of year listener survey! Please let us know any feedback you have, what episodes resonated with you, and guest requests for 2024! Survey link here. We can’t think of a more Latent-Space-y way to end 2023 than with a mega episode featuring many old and new friends recapping their biggest news, achievements, and themes and memes of the year! We previously covered the Best Papers of NeurIPS 2023, but the other part of NeurIPS being an industry friendly conference is all the startups that show up to hire and promote their latest and greatest products and papers! As a startup-friendly podcast, we of course were ready with our mics to talk to everyone we could track down. In lieu of an extended preamble, we encourage you to listen and click through all the interviews and show notes, all of which have been curated to match the references mentioned in the episode. Timestamps & Show Notes * [00:01:26] Jonathan Frankle - Chief Scientist, MosaicML/Databricks * see also the Mosaic/MPT-7B episode * $1.3B MosaicML x Databricks acquisition * [00:22:11] Lin Qiao - CEO, Fireworks AI * Fireworks Mixtral * [00:38:24] Aman Sanger - CEO, Anysphere (Cursor) * see also the Cursor episode * $8m seed from OpenAI * Tweet: Request-level memory-based KV caching * Tweet: GPT-4 grading and Trueskill ratings for rerankers * [00:51:14] Aravind Srinivas - CEO, Perplexity * 1m app installs on iOS and Android * pplx-online api 7b and 70b models * Shaan Puri/Paul Graham Fierce Nerds story * [01:04:26] Will Bryk - CEO, Metaphor * “Andrew Huberman may have singlehandedly ruined the SF social scene” * [01:12:49] Jeremy Howard - CEO, Answer.ai * see also the End of Finetuning episode * Jeremy’s podcast with Tanishq Abraham, Jess Leao * Announcing Answer.ai with $10m from Decibel VC * Laundry Buddy, Nov 2023 AI Meme of the Month * [01:37:13] Joel Hestness - Principal Scientist, Cerebras * CerebrasGPT, all the Cerebras papers we discussed * [01:56:34] Jason Corso - CEO, Voxel51 * Open Source FiftyOne project * CVPR Survival Guide * [02:02:39] Brandon Duderstadt - CEO, Nomic.ai * GPT4All, Atlas, Demo * [02:12:39] Luca Antiga - CTO, Lightning.ai * Pytorch Lightning, Lightning Studios, LitGPT * [02:29:46] Jay Alammar - Engineering Fellow, Cohere * The Illustrated Transformer Get full access to Latent Space at www.latent.space/subscribe

Duration:02:41:54

Ask host to enable sharing for playback control

NeurIPS 2023 Recap — Best Papers

12/23/2023
We are running an end of year listener survey! Please let us know any feedback you have, what episodes resonated with you, and guest requests for 2024! Survey link here. NeurIPS 2023 took place from Dec 10–16 in New Orleans. The Latent Space crew was onsite for as many of the talks and workshops as we could attend (and more importantly, hosted cocktails and parties after hours)! Picking from the 3586 papers accepted to the conference (available online, full schedule here) is an impossible task, but we did our best to present an audio guide with brief commentary on each. We also recommend MLContests.com NeurIPS recap and Seb Ruder’s NeurIPS primer and Jerry Liu’s paper picks. We also found the VizHub guide useful for a t-SNE clustering of papers. Lots also happened in the arxiv publishing world outside NeurIPS, as highlighted by Karpathy, especially DeepMind’s Beyond Human Data: Scaling Self-Training for Problem-Solving with Language Models. Jan 2024 update: we also strongly recommend ‘s pick of the year’s 10 best papers, including Pythia. We’ll start with the NeurIPS Best Paper Awards, and then go to a selection of non-awarded but highly influential papers, and then arbitrary personal picks to round out the selection. Where we were able to do a poster session interview, please scroll to the relevant show notes for images of their poster for discussion. We give Chris Ré the last word due to the Mamba and StripedHyena state space models drawing particular excitement but still being too early to assess impact. Timestamps * [0:01:19] Word2Vec (Jeff Dean, Greg Corrado) * [0:15:28] Emergence Mirage (Rylan Schaeffer) * [0:28:48] DPO (Rafael Rafailov) * [0:41:36] DPO Poster Session (Archit Sharma) * [0:52:03] Datablations (Niklas Muennighoff) * [1:00:50] QLoRA (Tim Dettmers) * [1:12:23] DataComp (Samir Gadre) * [1:25:38] DataComp Poster Session (Samir Gadre, Alex Dimakis) * [1:35:25] LLaVA (Haotian Liu) * [1:47:21] LLaVA Poster Session (Haotian Liu) * [1:59:19] Tree of Thought (Shunyu Yao) * [2:11:27] Tree of Thought Poster Session (Shunyu Yao) * [2:20:09] Toolformer (Jane Dwivedi-Yu) * [2:32:26] Voyager (Guanzhi Wang) * [2:45:14] CogEval (Ida Momennejad) * [2:59:41] State Space Models (Chris Ré) Papers covered * Distributed Representations of Words and Phrases and their Compositionality (Word2Vec) Tomas Mikolov · Ilya Sutskever · Kai Chen · Greg Corrado · Jeff Dean. The recently introduced continuous Skip-gram model is an efficient method for learning high-quality distributed vector representations that capture a large number of precise syntactic and semantic word relationships. In this paper we present several improvements that make the Skip-gram model more expressive and enable it to learn higher quality vectors more rapidly. We show that by subsampling frequent words we obtain significant speedup, and also learn higher quality representations as measured by our tasks. We also introduce Negative Sampling, a simplified variant of Noise Contrastive Estimation (NCE) that learns more accurate vectors for frequent words compared to the hierarchical softmax. An inherent limitation of word representations is their indifference to word order and their inability to represent idiomatic phrases. For example, the meanings of Canada'' and "Air'' cannot be easily combined to obtain "Air Canada''. Motivated by this example, we present a simple and efficient method for finding phrases, and show that their vector representations can be accurately learned by the Skip-gram model. * Some notable reflections from Tomas Mikolov - and debate over the Seq2Seq paper credit with Quoc Le * Are Emergent Abilities of Large Language Models a Mirage? (Schaeffer et al.). Emergent abilities are abilities that are present in large-scale models but not in smaller models and are hard to predict. Rather than being a product of models’ scaling behavior, this paper argues that emergent abilities are mainly an artifact of the choice of metric...

Duration:03:20:26

Ask host to enable sharing for playback control

The AI-First Graphics Editor - with Suhail Doshi of Playground AI

12/20/2023
We are running an end of year survey for our listeners! Please let us know any feedback you have, what episodes resonated with you, and guest requests for 2024! Survey link here! Listen to the end for a little surprise from Suhail. Before language models became all the rage in November 2022, image generation was the hottest space in AI (it was the subject of our first piece on Latent Space!) In our interview with Sharif Shameem from Lexica we talked through the launch of StableDiffusion and the early days of that space. At the time, the toolkit was still pretty rudimentary: Lexica made it easy to search images, you had the AUTOMATIC1111 Web UI to generate locally, some HuggingFace spaces that offered inference, and eventually DALL-E 2 through OpenAI’s platform, but not much beyond basic text-to-image workflows. Today’s guest, Suhail Doshi, is trying to solve this with Playground AI, an image editor reimagined with AI in mind. Some of the differences compared to traditional text-to-image workflows: * Real-time preview rendering using consistency: as you change your prompt, you can see changes in real-time before doing a final rendering of it. * Style filtering: rather than having to prompt exactly how you’d like an image to look, you can pick from a whole range of filters both from Playground’s model as well as Stable Diffusion (like RealVis, Starlight XL, etc). We talk about this at 25:46 in the podcast. * Expand prompt: similar to DALL-E3, Playground will do some prompt tuning for you to get better results in generation. Unlike DALL-E3, you can turn this off at any time if you are a prompting wizard * Image editing: after generation, you have tools like a magic eraser, inpainting pencil, etc. This makes it easier to do a full workflow in Playground rather than switching to another tool like Photoshop. Outside of the product, they have also trained a new model from scratch, Playground v2, which is fully open source and open weights and allows for commercial usage. They benchmarked the model against SDXL across 1,000 prompts and found that humans preferred the Playground generation 70% of the time. They had similar results on PartiPrompts: They also created a new benchmark, MJHQ-30K, for “aesthetic quality”: We introduce a new benchmark, MJHQ-30K, for automatic evaluation of a model’s aesthetic quality. The benchmark computes FID on a high-quality dataset to gauge aesthetic quality. We curate the high-quality dataset from Midjourney with 10 common categories, each category with 3K samples. Following common practice, we use aesthetic score and CLIP score to ensure high image quality and high image-text alignment. Furthermore, we take extra care to make the data diverse within each category. Suhail was pretty open with saying that Midjourney is currently the best product for imagine generation out there, and that’s why they used it as the base for this benchmark. I think it's worth comparing yourself to maybe the best thing and try to find like a really fair way of doing that. So I think more people should try to do that. I definitely don't think you should be kind of comparing yourself on like some Google model or some old SD, Stable Diffusion model and be like, look, we beat Stable Diffusion 1.5. I think users ultimately want care, how close are you getting to the thing that people mostly agree with? [00:23:47] We also talked a lot about Suhail’s founder journey from starting Mixpanel in 2009, then going through YC again with Mighty, and eventually sunsetting that to pivot into Playground. Enjoy! Show Notes * Suhail’s Twitter * “Starting my road to learn AI” * Bill Gates book trip * Playground * Playground v2 Announcement * $40M raise announcement * “Running infra dev ops for 24 A100s” * Mixpanel * Mighty * “I decided to stop working on Mighty” * Fast.ai * Civit Timestamps * [00:00:00] Intros * [00:02:59] Being early in ML at Mixpanel * [00:04:16] Pivoting from Mighty to Playground and focusing...

Duration:00:57:09