ThursdAI - The top AI news from the past week

News

Every ThursdAI, Alex Volkov hosts a panel of experts, ai engineers, data scientists and prompt spellcasters on twitter spaces, as we discuss everything major and important that happened in the world of AI for the past week. Topics include LLMs, Open source, New capabilities, OpenAI, competitors in AI space, new LLM models, AI art and diffusion aspects and much more. sub.thursdai.news

Location:

United States

Genres:

News

Technology Podcasts

Description:

Every ThursdAI, Alex Volkov hosts a panel of experts, ai engineers, data scientists and prompt spellcasters on twitter spaces, as we discuss everything major and important that happened in the world of AI for the past week. Topics include LLMs, Open source, New capabilities, OpenAI, competitors in AI space, new LLM models, AI art and diffusion aspects and much more. sub.thursdai.news

Language:

English

Website:

https://thursdai.substack.com/podcast

Episodes

ThursdAI - May 2nd - New GPT2? Copilot Workspace, Evals and Vibes from Reka, LLama3 1M context (+ Nous finetune) & more AI news

5/2/2024

Hey 👋 Look it May or May not be the first AI newsletter you get in May, but it's for sure going to be a very information dense one. As we had an amazing conversation on the live recording today, over 1K folks joined to listen to the first May updates from ThursdAI. As you May know by now, I just love giving the stage to folks who are the creators of the actual news I get to cover from week to week, and this week, we had again, 2 of those conversations. First we chatted with Piotr Padlewski from Reka, the author on the new Vibe-Eval paper & Dataset which they published this week. We've had Yi and Max from Reka on the show before, but it was Piotr's first time and he was super super knowledgeable, and was really fun to chat with. Specifically, as we at Weights & Biases launch a new product called Weave (which you should check out at https://wandb.me/weave) I'm getting more a LOT more interested in Evaluations and LLM scoring, and in fact, we started the whole show today with a full segment on Evals, Vibe checks and covered a new paper from Scale about overfitting. The second deep dive was with my friend Idan Gazit, from GithubNext, about the new iteration of Github Copilot, called Copilot Workspace. It was a great one, and you should definitely give that one a listen as well TL;DR of all topics covered + show notes * Scores and Evals * No notable changes, LLama-3 is still #6 on LMsys * gpt2-chat came and went (in depth chan writeup) * Scale checked for Data Contamination on GSM8K using GSM-1K (Announcement, Paper) * Vibes-Eval from Reka - a set of multimodal evals (Announcement, Paper, HF dataset) * Open Source LLMs * Gradient releases 1M context window LLama-3 finetune (X) * MaziyarPanahi/Llama-3-70B-Instruct-DPO-v0.4 (X, HF) * Nous Research - Hermes Pro 2 - LLama 3 8B (X, HF) * AI Town is running on Macs thanks to Pinokio (X) * LMStudio releases their CLI - LMS (X, Github) * Big CO LLMs + APIs * Github releases Copilot Workspace (Announcement) * AI21 - releases Jamba Instruct w/ 256K context (Announcement) * Google shows Med-Gemini with some great results (Announcement) * Claude releases IOS app and Team accounts (X) * This weeks Buzz * We're heading to SF to sponsor the biggest LLama-3 hackathon ever with Cerebral Valley (X) * Check out my video for Weave our new product, it's just 3 minutes (Youtube) * Vision & Video * Intern LM open sourced a bunch of LLama-3 and Phi based VLMs (HUB) * And they are MLXd by the "The Bloke" of MLX, Prince Canuma (X) * AI Art & Diffusion & 3D * ByteDance releases Hyper-SD - Stable Diffusion in a single inference step (Demo) * Tools & Hardware * Still haven't open the AI Pin, and Rabbit R1 just arrived, will open later today * Co-Hosts and Guests * Piotr Padlewski (@PiotrPadlewski) from Reka AI * Idan Gazit (@idangazit) from Github Next * Wing Lian (@winglian) * Nisten Tahiraj (@nisten) * Yam Peleg (@yampeleg) * LDJ (@ldjconfirmed) * Wolfram Ravenwolf (@WolframRvnwlf) * Ryan Carson (@ryancarson) Scores and Evaluations New corner in today's pod and newsletter given the focus this week on new models and comparing them to existing models. What is GPT2-chat and who put it on LMSys? (and how do we even know it's good?) For a very brief period this week, a new mysterious model appeared on LMSys, and was called gpt2-chat. It only appeared on the Arena, and did not show up on the leaderboard, and yet, tons of sleuths from 4chan to reddit to X started trying to figure out what this model was and wasn't. Folks started analyzing the tokenizer, the output schema, tried to get the system prompt and gauge the context length. Many folks were hoping that this is an early example of GPT4.5 or something else entirely. It did NOT help that uncle SAMA first posted the first tweet and then edited it to remove the - and it was unclear if he's trolling again or foreshadowing a completely new release or an old GPT-2 but retrained on newer data or something. The model...

Duration:01:49:03

📅 ThursdAI - April 25 - Phi-3 3.8B impresses, LLama-3 gets finetunes, longer context & ranks top 6 in the world, Snowflake's new massive MoE and other AI news this week

4/25/2024

Hey hey folks, happy ThursdAI 🎉 Not a lot of house-keeping here, just a reminder that if you're listening or reading from Europe, our European fullyconnected.com conference is happening in May 15 in London, and you're more than welcome to join us there. I will have quite a few event updates in the upcoming show as well. Besides this, this week has been a very exciting one for smaller models, as Microsoft teased and than released Phi-3 with MIT license, a tiny model that can run on most macs with just 3.8B parameters, and is really punching above it's weights. To a surprising and even eyebrow raising degree! Let's get into it 👇 ThursdAI - Recaps of the most high signal AI weekly spaces is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber. TL;DR of all topics covered: * Open Source LLMs * Microsoft open sources Phi-3 (X, HF) * LLama3 70B top5 (no top 6) on LMsys (LMsys Arena) * Snowflake open sources Arctic - A massive hybrid MoE (X, Try it, HF) * Evolutionary Model merges support in MergeKit (Blog) * Llama-3 8B finetunes roundup - Longer Context (128K) and Dolphin & Bagel Finetunes * HuggingFace FINEWEB - a massive 45TB (the GPT4 of datasets) and 15T tokens high quality web data dataset (HF) * Cohere open sourced their chat interface (X) * Apple open sources OpenElm 4 models + training library called corenet (HF, Github, Paper) * Big CO LLMs + APIs * Google Gemini 1.5 pro is #2 on LMsys arena * Devin is now worth 2BN and Perplexity is also a Unicorn * A new comer called Augment (backed by Eric Schmidt) is now coming out of stealth (X) * Vision & Video * Adobe releases VideoGigaGAN - high quality upscaler with temporal consistency (paper) * TLDraw autocomplete UI demo (X) * This Weeks Buzz - What I learned in WandB this week * Joe Spisak talk about Llama3 on Stage at WandB Fully connected (Full Talk, TLDR) * Voice & Audio * Play.ai (previously play.ht) releases conversational Voice AI platform (X) * AI Art & Diffusion & 3D * IMGsys.org- like LMsys but for image generation model + leaderboard from FAL (try it) * Tools & Hardware * Rabbit R1 release party & no shipping update in sight * I'm disillusioned about my AI Pin and will return it Open Source LLMs Llama-3 1 week-aversary 🎂 - Leaderboard ranking + finetunes Well, it's exactly 1 week since we got Llama-3 from Meta and as expected, the rankings show a very very good story. (also it was downloaded over 1.2M times and already has 600 derivatives on HuggingFace) Just on Monday, Llama-3 70B (the bigger version) took the incredible 5th place (now down to 6th) on LMSys, and more surprising, given that the Arena now has category filters (you can filter by English only, Longer chats, Coding etc) if you switch to English Only, this model shows up 2nd and was number 1 for a brief period of time. So just to sum up, an open weights model that you can run on most current consumer hardware is taking over GPT-4-04-94, Claude Opus etc' This seems dubious, because well, while it's amazing, it's clearly not at the level of Opus/Latest GPT-4 if you've used it, in fact it fails some basic logic questions in my tests, but it's a good reminder that it's really hard to know which model outperforms which and that the arena ALSO has a bias, of which people are using it for example and that evals are not a perfect way to explain which models are better. However, LMsys is a big component of the overall vibes based eval in our community and Llama-3 is definitely a significant drop and it's really really good (even the smaller one) One not so surprising thing about it, is that the Instruct version is also really really good, so much so, that the first finetunes of Eric Hartfords Dolphin (Dolphin-2.8-LLama3-70B) is improving just a little bit over Meta's own instruct version, which is done very well. Per Joe Spisak (Program Manager @ Meta AI) chat at the Weights & Biases conference last week...

Duration:01:21:34

📅 ThursdAI - Apr 18th - 🎉 Happy LLama 3 day + Bigxtral instruct, WizardLM gives and takes away + Weights & Biases conference update

4/18/2024

Happy LLama 3 day folks! After a lot of rumors, speculations, and apparently pressure from the big Zuck himself, we finally can call April 18th, 2024, LLaMa 3 day! I am writing this, from a lobby of the Mariott hotel in SF, where our annual conference is happening called Fully Connected, and I recorded today's episode from my hotel room. I really wanna shout out how awesome it was to meet folks who are listeners of the ThursdAI pod and newsletter subscribers, participate in the events, and give high fives. During our conference, we had the pleasure to have Joe Spisak, the Product Director of LLaMa at Meta, to actually announce LLaMa3 on stage! It was so exhilarating, I was sitting in the front row, and then had a good chat with Joe outside of the show 🙌 The first part of the show was of course, LLaMa 3 focused, we had such a great time chatting about the amazing new 8B and 70B models we got, and salivating after the announced but not yet released 400B model of LLaMa 3 😮 We also covered a BUNCH of other news from this week, that was already packed with tons of releases, AI news and I was happy to share my experiences running a workshop a day before our conference, with focus on LLM evaluations. (If there's an interest, I can share my notebooks and maybe even record a video walkthrough, let me know in the comments) Ok let's dive in 👇 Happy LLama 3 day 🔥 The technical details Meta has finally given us what we're all waiting for, an incredibly expensive (2 clusters of 24K H100s over 15 Trillion tokens) open weights models, the smaller 8B one and the larger 70B one. We got both instruction fine tune and base models, which are great for finetuners, and worth mentioning that this is a dense model (not a mixture of experts, all the parameters are accessible for the model during inference) It is REALLY good at benchmarks, with the 7B model beating the previous (LLaMa 2 70B) on pretty much all benchmarks, and the new 70B is inching on the bigger releases from the past month or two, like Claude Haiku and even Sonnet! The only downsides are the 8K context window + non multimodality, but both are coming according to Joe Spisak who announced LLama3 on stage at our show Fully Connected 🔥 I was sitting in the front row and was very excited to ask him questions later! By the way, Joe did go into details they haven't yet talked about pulblicly (see? I told you to come to our conference! and some of you did!) and I've been live-tweeting his whole talk + the chat outside with the "extra" spicy questions and Joes winks haha, you can read that thread here The additional info Meta has also partnered with both Google and Bing (take that OpenAI) and inserted LLama 3 into the search boxes of Facebook, Instagram, Messenger and Whatsapp plus deployed it to a new product called meta.ai (you can try it there now) and is now serving LLama 3 to more than 4 Billion people across all of those apps, talk about compute cost! Llama 3 also has a new Tokenizer (that Joe encouraged us to "not sleep on") and a bunch of new security tools like Purple LLama and LLama Guard. PyTorch team recently released finetuning library called TorchTune is now supporting LLama3 finetuning natively out of the box as well (and integrates Wandb as it's first party experiment tracking tool) If you'd like more details, directly from Joe, I was live tweeting his whole talk, and am working at getting the slides from our team. We'll likely have a recording as well, will post it as soon as we have it. Here's a TL;DR (with my notes for the first time) of everything else we talked about, but given today is LLaMa day, and I still have to do fully connected demos, I will "open source" my notes and refer you to the podcast episode to hear more detail about everything else that happened today 🫡 TL;DR of all topics covered: * Meta releases LLama 3 -8B, 70B and later 400B (Announcement, Models, Try it, Run Locally) * Open Source LLMs * Meta LLama 3 8B, 70B and later 400B...

Duration:02:13:43

📅 ThursdAI - Apr 11th, 2024 - GPT4 is king again, New Mixtral 8x22B + First finetune, New Gemini 1.5, Cohere beats old GPT4, more AI news

4/11/2024

this week was absolutely bonkers. For starters, for the first time ever, we got an Open Weights model (Command R+) to jump over GPT-4 in human rankings on LMsys, this is huge! Then on Tuesday, it seems that all the companies just wanted to one up one another, first Gemini 1.5 released with updates, made it available in 180 countries, added audio mode + tons of API improvements and system prompts, then less than an hour later, OpenAI has given us a "majorly improved" GPT-4 Turbo version (2024-04-09) that is now back to being the BEST LLM IN THE WORLD and to cap that day off, Mistral did the thing again, the thing being, dropping a torrent link in a tweet with no explanations. What was in that torrent is a Mixtral 8x22B MoE (which we started calling Bixtral) which comes with an Apache2 license and seems to be VERY good! We also saw the first finetune from HuggingFace/KAIST folks less than 48 hours later (the authors of said finetune actually came on the show 🎉 ) Fully Connected is a week from today! If you haven't yet signed up, use THURSDAI promo code and come hear from Richard Socher (You.com), Jerry Liu (Ilamaindex CEO), Karoly (TwoMinutePapers), Joe Spisak (Meta) and and leaders from NVIDIA, Snowflake, Microsoft, Coatue, Adobe, Siemens, Lambda and tons more 👇 TL;DR of all topics covered: * Open Source LLMs * 🔥 Mistral releases Mixtral 8x22 Apache 2 licensed MoE model (Torrent, TRY IT) * Cohere CMDR+ jumps to no 6 on LMSys and beats GPT4 (X) * CodeGemma, RecurrentGemma & Gemma Instruct 1.1 (Announcement) * Auto-code-rover gets 22% on SWE bench (Announcement) * HuggingFace - Zephyr 141B-A35B - First Bixtral Finetune (Announcement) * Mistral 22B - 1 single expert extracted from MoE (Announcement, HF) * This weeks Buzz - Weights & Biases updates * FullyConnected is in 1 week! (Come meet us) * Big CO LLMs + APIs * 🔥 GPT-4 turbo is back to being number 1 AI with 88.2% Human Eval score (X) * Gemini 1.5 Pro now understands audio, uses unlimited files, acts on your commands, and lets devs build incredible things with JSON mode (X) * LLama 3 coming out in less than a month (confirmed by Meta folks) * XAI Grok now powers news summaries on X (Example) * Cohere new Rerank 3 (X) * Voice & Audio * HuggingFace trained Parler-TTS (Announcement, Github) * Udio finally launched it's service (Announcement, Leak, Try It) * Suno has added explore mode (suno.ai/explore) * Hardware * Humane AI pin has started shipping - reviews are not amazing Open Source LLMs Command R+ first open weights model that beats last year GPT4 versions This is massive, really a milestone to be discussed, and even though tons of other news happened, the first time an open weights model is beating GPT-4 not on a narrow case (coding, medical) but on a general human evaluation on the arena. This happened just a year after GPT-4 first came out, and is really really impressive. Command R+ has been getting a lot of great attention from the community as well, folks were really surprised by the overall quality, not to mention the multilingual abilities of CommandR+ Mixtral 8x22B MoE with 65K context and Apache 2 license (Bigstral) Despite the above, Cohere time in the sun (ie top open weights model on lmsys) may not be that long if the folks at Mistral have anything to say about it! Mistral decided to cap the crazy Tuesday release day with another groundbreaking tweet of theirs which includes a torrent link and nothing else (since then they of course uploaded the model to the hub) giving us what potentially will unseat Command R from the rankings. The previous Mixtral (8x7B) signaled the age of MoEs and each expert in that was activated from Mistral 7B, but for this new affectionally named Bixtral model, each expert is a 22B sized massive model. We only got a base version of it, which is incredible on it's own right, but it's not instruction finetuned yet, and the finetuner community is already cooking really hard! Though it's hard because...

Duration:01:38:35

📅 ThursdAI Apr 4 - Weave, CMD R+, SWE-Agent, Everyone supports Tool Use + JAMBA deep dive with AI21

4/4/2024

Happy first ThursdAI of April folks, did you have fun on April Fools? 👀 I hope you did, I made a poll on my feed and 70% did not participate in April Fools, which makes me a bit sad! Well all-right, time to dive into the news of this week, and of course there are TONS of news, but I want to start with our own breaking news! That's right, we at Weights & Biases have breaking new of our own today, we've launched our new product today called Weave! Weave is our new toolkit to track, version and evaluate LLM apps, so from now on, we have Models (what you probably know as Weights & Biases) and Weave. So if you're writing any kind RAG system, anything that uses Claude or OpenAI, Weave is for you! I'll be focusing on Weave and I'll be sharing more on the topic, but today I encourage you to listen to the launch conversation I had with Tim & Scott from the Weave team here at WandB, as they and the rest of the team worked their ass off for this release and we want to celebrate the launch 🎉 TL;DR of all topics covered: * Open Source LLMs * Cohere - CommandR PLUS - 104B RAG optimized Sonnet competitor (Announcement, HF) * Princeton SWE-agent - OSS Devin - gets 12.29% on SWE-bench (Announcement, Github) * Jamba paper is out (Paper) * Mozilla LLamaFile now goes 5x faster on CPUs (Announcement, Blog) * Deepmind - Mixture of Depth paper (Thread, ArXiv) * Big CO LLMs + APIs * Cloudflare AI updates (Blog) * Anthropic adds function calling support (Announcement, Docs) * Groq lands function calling (Announcement, Docs) * OpenAI is now open to customers without login requirements * Replit Code Repair - 7B finetune of deep-seek that outperforms Opus (X) * Google announced Gemini Prices + Logan joins (X)קרמ * This weeks Buzz - oh so much BUZZ! * Weave lunch! Check weave out! (Weave Docs, Github) * Sign up with Promo Code THURSDAI at fullyconnected.com * Voice & Audio * OpenAI Voice Engine will not be released to developers (Blog) * Stable Audio v2 dropped (Announcement, Try here) * Lightning Whisper MLX - 10x faster than whisper.cpp (Announcement, Github) * AI Art & Diffusion & 3D * Dall-e now has in-painting (Announcement) * Deep dive * Jamba deep dive with Roi Cohen from AI21 and Maxime Labonne Open Source LLMs Cohere releases Command R+, 104B RAG focused model (Blog) Cohere surprised us, and just 2.5 weeks after releasing Command-R (which became very popular and is No 10 on Lmsys arena) gave us it's big brother, Command R PLUS With 128K tokens in the context window, this model is multilingual as well, supporting 10 languages and is even beneficial on tokenization for those languages (a first!) The main focus from Cohere is advanced function calling / tool use, and RAG of course, and this model specializes in those tasks, beating even GPT-4 turbo. It's clear that Cohere is positioning themselves as RAG leaders as evident by this accompanying tutorial on starting with RAG apps and this model further solidifies their place as the experts in this field. Congrats folks, and thanks for the open weights 🫡 SWE-Agent from Princeton Folks remember Devin? The super cracked team born agent with a nice UI that got 13% on the SWE-bench a very hard (for LLMs) benchmark that requires solving real world issues? Well now we have an open source agent that comes very very close to that called SWE-Agent SWE agent has a dedicated terminal and tools, and utilizes something called ACI (Agent Computer Interface) allowing the agent to navigate, search, and edit code. The dedicated terminal in a docker environment really helps as evident by a massive 12.3% score on SWE-bench where GPT-4 gets only 1.4%! Worth mentioning that SWE-bench is a very hard benchmark that was created by the folks who released SWE-agent, and here's some videos of them showing the agent off, this is truly an impressive achievement! Deepmind publishes Mixture of Depth (arXiv) Thanks to Hassan who read the paper and wrote a deep dive, this paper by Deepmind...

Duration:01:50:05

📅 ThursdAI - Mar 28 - 3 new MoEs (XXL, Medium and Small), Opus is 👑 of the arena, Hume is sounding emotional + How Tanishq and Paul turn brainwaves into SDXL images 🧠👁️

3/28/2024

Hey everyone, this is Alex and can you believe that we're almost done with Q1 2024? March 2024 was kind of crazy of course, so I'm of course excited to see what April brings (besides Weights & Biases conference in SF called Fully Connected, which I encourage you to attend and say Hi to me and the team!) This week we have tons of exciting stuff on the leaderboards, say hello to the new best AI in the world Opus (+ some other surprises), in the open source we had new MoEs (one from Mosaic/Databricks folks, which tops the open source game, one from AI21 called Jamba that shows that a transformers alternative/hybrid can actually scale) and tiny MoE from Alibaba, as well as an incredible Emotion TTS from Hume. I also had the pleasure to finally sit down with friend of the pod Tanishq Abraham and Paul Scotti from MedArc and chatted about MindEye 2, how they teach AI to read minds using diffusion models 🤯🧠👁️ Thank you for reading ThursdAI - Recaps of the most high signal AI weekly spaces. This post is public so feel free to share it. TL;DR of all topics covered: * AI Leaderboard updates * Claude Opus is number 1 LLM on arena (and in the world) * Claude Haiku passes GPT4-0613 * 🔥 Starling 7B beta is the best Apache 2 model on LMsys, passing GPT3.5 * Open Source LLMs * Databricks/Mosaic DBRX - a new top Open Access model (X, HF) * 🔥 AI21 - Jamba 52B - Joint Attention Mamba MoE (Blog, HuggingFace) * Alibaba - Qwen1.5-MoE-A2.7B (Announcement, HF) * Starling - 7B that beats GPT3.5 on lmsys (HF) * LISA beats LORA as the frontrunner PeFT (X, Paper) * Mistral 0.2 Base released (Announcement) * Big CO LLMs + APIs * Emad leaves stability 🥺 * Apple rumors - Baidu, Gemini, Anthropic, who else? (X) * This weeks buzz * WandB Workshop in SF confirmed April 17 - LLM evaluations (sign up here) * Vision & Video * Sora showed some demos by actual artists, Air Head was great (Video) * Tencent Aniportait - generate Photorealistic Animated avatars (X) * MedArc - MindEye 2 - fMRI signals to diffusion models (X) * Voice & Audio * Hume demos EVI - empathic voice analysis & generation (X, demo) * AI Art & Diffusion & 3D * Adobe firefly adds structure reference and style transfer - (X, Demo) * Discussion * Deep dive into MindEye 2 with Tanishq & Paul from MedArc * Is narrow finetuning done-for with larger context + cheaper prices - debate 🥇🥈🥉Leaderboards updates from LMSys (Arena) This weeks updates to the LMsys arena are significant. (Reminder in LMsys they use a mix of MT-Bench, LLM as an evaluation and user ELO scores where users play with these models and choose which answer they prefer) For the first time since the Lmsys arena launched, the top model is NOT GPT-4 based. It's now Claude's Opus, but that's not surprising if you used the model, what IS surprising is that Haiku, it's tiniest, fastest brother is now well positioned at number 6, beating a GPT4 version from the summer, Mistral Large and other models while being dirt cheap. We also have an incredible show from the only Apache 2.0 licensed model in the top 15, Starling LM 7B beta, which is now 13th on the chart, with incredible finetune of a finetune (OpenChat) or Mistral 7B. 👏 Yes, you can now run a GPT3.5 beating model, on your mac, fully offline 👏 Incredible. Open Source LLMs (Welcome to MoE's) Mosaic/Databricks gave us DBRX 132B MoE - trained on 12T tokens (X, Blog, HF) Absolutely crushing the previous records, Mosaic has released the top open access model (one you can download and run and finetune) in a while, beating LLama 70B, Grok-1 (314B) and pretty much every other non closed source model in the world not only on metrics and evals, but also on inference speed It uses a Mixture of Experts (MoE) architecture with 16 experts that each activate for different tokens. this allows it to have 36 billion actively parameters compared to 13 billion for Mixtral. DBRX has strong capabilities in math, code, and natural language understanding. The...

Duration:01:35:10

📅 ThursdAI - Mar 21 - Grok, GTC, first OSS AI hardware, Neuralink Human, Prompting Claude and more AI news

3/21/2024

March madness... I know for some folks this means basketball or something, but since this is an AI newsletter, and this March was indeed mad, I am claiming it. This week seemed madder from one day to another. And the ai announcements kept coming throughout the recording, I used the "breaking news" button a few times during this week's show! This week we covered tons of corporate AI drama in the BigCO segment, from Inflection → Microsoft move, to Apple Gemini rumors, to Nvidia GTC conference, but we also had a bunch of OpenSource to go over, including an exciting glimpse into the O1 from Open Interpreter, which the founder Killian (of the ThursdAI mafia haha) joined to chat about briefly after an all nighter release push! Another returning FOTP (friend of the pod) Matt Shumer joined as we did a little deep dive into prompting Claude, and how he went viral (seems to happen a lot to Matt) with a project of his to make Claude write prompts for itself! Definitely worth a listen, it's the first segment post the TL'DR on the pod 👂 this week. Btw, did you already check out fully connected? It's the annual Weights & Biases conference in SF next month, and tickets are flying, I'm going to be there and actually do a workshop one day prior, would love to invite you to join as well! TL;DR of all topics covered: * Open Source LLMs * Xai open sources Grok (X, Blog, HF, Github) * Sakana AI releases a new paper + 2 JP merged SOTA models (X, Paper, Blogpost) * Open Interpreter announces O1 - the Linux for AI devices (X, Project) * LM studio new modes (X) * Big CO LLMs + APIs * Nvidia GTC conference - Blackwell platform, NIMs and Gr00t robotics * Jensen interviewed transformers authors * Apple rumored to look at a deal including GEMINI * Apple releases a multi modal MM1 paper (X) * Inflection founders leave to head Microsoft AI * Google opens up Gemini 1.5 with 1M context access to all (X) * Vision & Video * NVIDIA + MIT release VILA (13B, 7B and 2.7B) (X, HuggingFace, Paper) * This week's BUZZ * Fully Connected is coming, sign up here, get tickets, join us. * I'm running a workshop in SF a day before on improving your LLM step by step including exciting announcements (same link) * Voice & Audio * Suno V3 launched officially (X, Blog, Play with it) * Distil-whisper-v3 - more accurate, and 6x version of whisper large (X, Code) * AI Art & Diffusion & 3D * Stability presents SD3 TURBO - 4 steps to get same high quality generation (Paper) * Stability open sources Stable Video 3D (Blog, Models) * Tools & Others * Neuralink interview with the first Human NeuroNaut - Nolan (X) * Lex & Sama released a podcast, barely any news * Matt Shumer releases his Claude Prompt engineer (X, Metaprompt, Matt's Collab) Open Source LLMs Xai open sources Grok (X, Blog, HF, Github) Well, Space Uncle Elon has a huge week, from sending starship into orbit successfully to open sourcing an LLM for us, and a huge one at that. Grok is a 314B parameter behemoth, with a mixture of experts architecture of 80B per expert and two active at the same time. It's released as a base model, and maybe that's why it was received with initial excitement but then, nobody in the GPU poor compute category has the ability to run/finetune it! In terms of performance, it barely beats out Mixtral, while being almost 10x larger, which just shows that.... data is important, maybe more important than Github stars as Arthur (CEO Mistral) helpfully pointed out to Igor (founder of Xai). Still big props to the team for training and releasing this model under apache 2 license. Sakana AI launches 2 new models using evolutionary algo merging Yeah, that's a mouthful, i've been following Hardmaru (David Ha) for a while before he joined Sakana, and only when the founder (and a co-author on transformers) LLion Jones talked about it on stage at GTC the things connected. Sakana means fish in Japanese, and the idea behind this lab is to create things with using nature like...

Duration:01:44:52

🎂 ThursdAI BirthdAI March 14: Anthropic Haiku, Devin the new AI SWE, GPT4 gets hands, Cohere and Nous give us tool use models & more AI news

3/14/2024

"...Happy birthday dear ThursdAIiiiiiiii, happy birthday to youuuuuu 🎂" What a day! Today is π-day (March 14th), 2024. For some reason it's important, not only because it's GPT-4 anniversary, or Claude 1 anniversary, or even that Starship flew to space, but also 🥁 it's ThursdAI BirthdAI 🎉 Yeah, you heard that right, last year following GPT-4 release, I hopped into a twitter space with a few friends, and started chatting about AI, and while some friends came and went, I never stopped, in fact, I decided to leave my 15 year career in software, and focus on AI, learning publicly, sharing my learnings with as many people as possible and it's been glorious. And so today, I get to celebrate a little 💃 I also get to reminisce about the state of AI that we were at, back exactly a year ago. Context windows were tiny, GPT-4 came out with 8K (we casually now have models with 200K that cost $0.25/1M tokens), GPT-4 also showed unprecedented levels vision capabilities back then, and now, we have 1.3B parameters models that have similar level of visual understanding, open source was nascent (in fact, LLama.cpp only had it's first commit 4 days prior to GPT4 launch, Stanford released the first Alpaca finetune of Llama just a day prior. Hell even the chatGPT API only came out a few days before, so there was barely any products built with AI out there. Not to mention that folks were only starting to figure out what vector DBs were, what RAG is, how to prompt, and that it's possible to run these things in a loop and create agents! Other fields evolved as well, just hit play on this song I generated for ThursdAI with Suno V3 alpha, I can’t stop listening to it and imagining that this was NOT possible even a few months ago It's all so crazy and happening so fast, that annual moments like these propose a great opportunity to pause the acceleration for a sec. and contextualize it, and bask in the techno-optimism glory of aren't we lucky to live in these times? I sure am, and for me it's the ThursdAI birthday gift to be able to share my excitement with all of you! Thank you for being a subscriber, the best way you can support ThursdAI is to share this with a friend and tag us on socials 🫡 TL;DR of all topics covered: * Open Source LLMs * Together releases Sequoia speculative decoding (X, Blog) * Hermes Pro from NousResearch - Tool use and function calling (X, HF, Github) * Big CO LLMs + APIs * Anthropic releases Claude 3 Haiku (Announcement, Blog) * Cohere CMD+R (Announcement, HF) * This weeks Buzz * Early bird tickets for Fully Connected in SF are flying, come meet the Weights & Biases team. We're also going to be running a workshop a day before, come join us! (X) * Vision & Video * Deepseek VLM 1.3B and 7B (X,Announcement, HF) * Voice & Audio * Made a song with Suno v3 Alpha for ThursdAI, it's a banger (Song) * Hardware & Robotics (New) * OpenAI now powers Figure - the humanoid robot company (X) * Cerebras announces the fastest AI chip on earth (X) * Extropic made an announcement about their TPU - Thermodynamic Processing Unit * Tools & Agents * Devin from Cognition Labs (Announcement, 47 minute demo) Agents for your house and your Github tasks Say hello to Devin from Cognition Labs (Announcement, Real world demo) By far the most excited I've seen my X feed be this week, was excitement about Cognition Labs new agent called Devin, which they call the first AI software engineer. You should really watch the video, and then watch a few other videos, because, well, only a few folks are getting access, and yours truly is not one of them. It seems like a very published launch, backed by tons of VC folks, and everybody kept highlighting not only the innovative UI that Devin has, and it has a very polished UX/UI/Dev experience with access to a browser (where you can authenticate and it can pick up doing tasks), terminal (where you can scroll back and forth in time to see what it did when), but also a chat window and a...

Duration:01:58:04

📅 ThursdAI - Mar 7 - Anthropic gives us Claude 3, Elon vs OpenAI, Inflection 2.5 with Pi, img-2-3D from Stability & More AI news

3/7/2024

Hello hello everyone, happy spring! Can you believe it? It's already spring! We have tons of AI news for you to cover, starting with the most impactful one, did you already use Claude 3? Anthropic decided to celebrate Claude 1's birthday early (which btw is also ThursdAI's birthday and GPT4 release date, March 14th, 2023) and gave us 3 new Clauds! Opus, Sonnet and Haiku. TL;DR of all topics covered: * Big CO LLMs + APIs * 🔥 Anthropic releases Claude Opus, Sonnet, Haiku (Announcement, try it) * Inflection updates Pi 2.5 - claims GPT4/Gemini equivalent with 40% less compute (announcement) * Elon sues OpenAI (link) * OpenAI responds (link) * ex-Google employee was charged with trading AI secrets with China (article) * Open Source LLMs * 01AI open sources - Yi 9B (Announcement) * AnswerAI - Jeremy Howard, Johno & Tim Detmers - train 70B at home with FSDP/QLoRA (X, Blog) * GaLORE - Training 7B on a single consumer-grade GPU (24GB) (X) * Nous open sources Genstruct 7B - instruction-generation model (Hugging Face) * Yam's GEMMA-7B Hebrew (X) * This weeks Buzz * Weights & Biases is coming to SF in April! Our annual conference called Fully Connected is open for registration (Get your tickets and see us in SF) * Vision & Video * Vik releases Moondream 2 (Link) * Voice & Audio * Suno v3 alpha is blowing minds (Link) * AI Art & Diffusion & 3D * SD3 research paper is here (Link) * Tripo + Stability release TripoSR - FAST image-2-3D (link, Demo, FAST demo) * Story how I created competition of inference providers to get us sub 1.5s playground image gen (X) Big CO LLMs + APIs Anthropic releases Claude 3 Opus, Sonnet and Haiku This was by far the biggest news of this week, specifically because, the top keeps getting saturated with top of the line models! Claude Opus is actually preferable to many folks in blind studies over some GPT-4 features, and as we were recording the pod, LMSys released their rankings and Claude Opus beats Gemini, and is now 3rd in user preference on the LMSys rank. There release is vast, they have announced 3 new models but only gave us access to 2 of them teasing that Haiku is much faster / cheaper than other options in that weight class out there. In addition to being head to head with GPT-4, Claude 3 is now finally also multimodal on inputs, meaning it can take images, understand graphs and charts. They also promised significantly less refusals and improved accuracy by almost 2x. One incredible thing that Claude always had was 200K context window, and here they announced that they will be supporting up to 1M, but for now we still only get 200K. We were also promised support for function calling and structured output, but apparently that's "coming soon" but still great to see that they are aiming for it! We were all really impressed with Claude Opus, from folks on stage who mentioned that it's easier to talk to and feels less sterile than GPT-4, to coding abilities that are not "lazy" and don't tell you to continue writing the rest of the code yourself in comments, to even folks who are jailbreaking the guardrales and getting Claude to speak about the "I" and metacognition. Speaking of meta-cognition sparks, one of the prompt engineers on the team shared a funny story about doing a needle-in-haystack analysis, and that Claude Opus responded with I suspect this pizza topping "fact" may have been inserted as a joke or to test if I was paying attention This split the X AI folks in 2, many claiming, OMG it's self aware, and many others calling for folks to relax and that like other models, this is still just spitting out token by token. I additional like the openness with which Anthropic folks shared the (very simple but carefuly crafted) system prompt My personal take, I've always liked Claude, even v2 was great until they nixed the long context for the free tier. This is a very strong viable alternative for GPT4 if you don't need DALL-E or code interpreter features, or the GPTs...

Duration:01:45:27

📅 ThursdAI - Feb 29 - Leap Year Special ✨

2/29/2024

Happy leap year day everyone, very excited to bring you a special once-in-a-4 year edition of ThursdAI 👏 (Today is also Dune 2 day (am going to see the movie right after I write these here words) and well.. to some folks, this is the bull market ₿ days as well. So congrats to all who weathered the bear market!) This week we had another great show, with many updates, and a deep dive, and again, I was able to cover most of the news AND bring you a little bit of a deep dive into a very interesting concept called Matryoshka Representation Learning (aka 🪆 embeddings) and two of the authors on paper to chat with me on the pod! TL;DR of all topics covered: * AI Art & Diffusion & 3D * Playground releases a new diffusion foundational model Playground V2.5 (DEMO) * Alibaba teasing EMO - incredible animating faces (example) * Ideogram 1.0 announced - SOTA text generation (Annoucement) * Open Source LLMs * Gemma update - hard to finetune, not better than 7B mistral * LLama 3 will release in June 2024, not anytime soon * Starcoder 2 + stack V2 (Announcement) * Berkeley Function-Calling leaderboard Leaderboard (Announcement) * Argilla released OpenHermesPreferences the largest open dataset for RLHF & DPO (Announcement) * STORM from Stanford to write long documents (Thread) * Big CO LLMs + APIs * Mistral releases Mistral Large & Le Chat (Announcement, Le Chat) * Microsoft + Mistral strike a deal (Blog) * Google teases GENIE - model makes images into interactive games (announcement) * OpenAI allowing fine-tune on GPT 3.5 * Wordpress & Tumbler preparing to sell user data to OpenAI & Midjourney * Other * Mojo releases their MAX inference engine, compatible with PyTorch, Tensorflow & ONNX models (Announcement) * Interview with MRL (Matryoshka Representation Learning) authors (in audio only) AI Art & Diffusion Ideogram 1.0 launches - superb text generation! Ideogram, founded by ex google Imagen folks, which we reported on before, finally announces 1.0, and focuses on superb image generation. It's really great, and I generated a few owls already (don't ask, hooot) and I don't think I will stop. This is superb for meme creation, answering in multimedia, and is fast as well, I'm very pleased! They also announced a round investment from A16Z to go with their 1.0 release, definitely give them a try Playground V2.5 Suhail Doshi and Playground release a new foundational image model called Playground v2.5 and it looks awesome, very realistic and honestly looks like it beats MJ and DALL-E on many simple prompts. They also announced that this model received higher user preference scores based on 1K prompts (which we didn't get to see) but they have released this model into the wild, you can download it and play with a free demo provided by modal folks Another SORA moment? Alibaba teases EMO 🤯 (website) Ok this one has to be talked about, Alibaba released quite a few preview videos + paper about something called EMO, a way to animate a talking/singing Avatars from just 1 image. It broke my brain, and I couldn't stop staring at it. Honestly, it's quite quite something. This model animates not only the mouth, eyes are blinking, there are emotions, hairs move, even earrings, and the most impressive, the whole Larynx muscle structure seem to be animated as well! Just look at this video, and then look at it again. The Github repo was created but no code released and I really hope we get this code at some point, because animating videos with this fidelity + something like SORA can mean so many possible creations! I wrote this tweet only two weeks ago, and I'm already feeling that it's outdated and we're farther along on the curve to there with EMO, what a great release! And just because it's so mind-blowing, here are a few more EMO videos for you to enjoy: Open Source LLMs Starcoder 2 + The Stack V2 Folks at hugging face and BigCode have released a beast on us, StarCoder 2 ⭐️ The most complete open Code-LLM 🤖 StarCoder 2...

Duration:01:53:53

📅 ThursdAI Feb 22nd - Groq near instant LLM calls, SDXL Lightning near instant SDXL, Google gives us GEMMA open weights and refuses to draw white people, Stability announces SD3 & more AI news

2/23/2024

Hey, this is Alex, Ok let's start with the big news, holy crap this week was a breakthrough week for speed! We had both Groq explode in popularity, and ByteDance release an updated SDXL model called Lightning, able to generate full blown SDXL 1024 images in 300ms. I've been excited about seeing what real time LLM/Diffusion can bring, and with both of these news release the same week, I just had to go and test them out together: Additionally, we had Google step into a big open weights role, and give us Gemma, 2 open weights models 2B and 7B (which is closer to 9B per Junyang) and it was great to see google committing to releasing at least some models in the open. We also had breaking news, Emad from Stability announced SD3, which looks really great, Google to pay Reddit 200M for AI training on their data & a few more things. TL;DR of all topics covered: * Big CO LLMs + APIs * Groq custom LPU inference does 400T/s Llama/Mistral generation (X, Demo) * Google image generation is in Hot Waters and was reportedly paused (refuses to generate white people) * Gemini 1.5 long context is very impressive to folks (Matt Shumer, Ethan Mollick) * Open Weights LLMs * Google releases GEMMA, open weights 2B and 7B models (Announcement, Models) * Teknium releases Nous Hermes DPO (Announcement, HF) * Vision & Video * YoLo V9 - SOTA real time object detector is out (Announcement, Code) * This weeks Buzz (What I learned in WandB this week) * Went to SF to cohost an event with A16Z, Nous, Mistral (Thread, My Report) * AI Art & Diffusion & 3D * ByteDance presents SDXL-Lightning (Try here, Model) * Stability announces Stable Diffusion 3 (Announcement) * Tools * Replit releases a new experimental Figma plugin for UI → Code (Announcement) * Arc browser adds "AI pinch to understand" summarization (Announcement) Big CO LLMs + APIs Groq's new LPU show extreme performance for LLMs - up to 400T/s (example) * Groq created a novel processing unit known as the Tensor Streaming Processor (TSP) which they categorize as a Linear Processor Unit (LPU). Unlike traditional GPUs that are parallel processors with hundreds of cores designed for graphics rendering, LPUs are architected to deliver deterministic performance for AI computations. * Analogy: They know where all the cars are going when everyone wakes up for work (when they compile) and how fast they all drive (compute latency) so they can get rid of traffic lights (routers) and turn lanes (backpressure) by telling everyone when to leave the house. * Why would we need something like this? Some folks are saying that average human reading is only 30T/s, I created an example that uses near instant Groq Mixtral + Lightning SDXL to just create images with Mixtral as my prompt manager Open Source Weights LLMs Google Gemma - 2B and 7B open weights models (demo) * 4 hours after release, Llama.cpp added support, Ollama and LM Studio added support, Tri dao added Flash attention support * Vocab size is 256K * 8K context window * Tokenizer similar to LLama * Folks are... not that impressed as far as I've seen * Trained on 6 trillion tokens * Google also released Gemma.cpp (local CPU inference) - Announcement Nous/Teknium re-release Nous Hermes with DPO finetune (Announcement) * DPO RLHF is performing better than previous models * Models are GGUF and can be found here * DPO enables Improvements across the board This weeks Buzz (What I learned with WandB this week) * Alex was in SF last week * A16Z + 20 something cohosts including Weights & Biases talked about importance of open source * Huge Shoutout Rajko and Marco from A16Z, and tons of open source folks who joined * Nous, Ollama, LLamaIndex, LMSys folks, Replicate, Perplexity, Mistral, Github, as well as Eric Hartford, Jon Durbin, Haotian Liu, HuggingFace, tons of other great folks from Mozilla, linux foundation and Percy from Together/Stanford Also had a chance to checkout one of the smol dinners in SF, they go really hard,...

Duration:01:48:04

🔥 ThursdAI - Feb 15, 2024 - OpenAI changes the Video Game, Google changes the Context game, and other AI news from past week

2/16/2024

Holy SH*T, These two words have been said on this episode multiple times, way more than ever before I want to say, and it's because we got 2 incredible exciting breaking news announcements in a very very short amount of time (in the span of 3 hours) and the OpenAI announcement came as we were recording the space, so you'll get to hear a live reaction of ours to this insanity. We also had 3 deep-dives, which I am posting on this weeks episode, we chatted with Yi Tay and Max Bane from Reka, which trained and released a few new foundational multi modal models this week, and with Dome and Pablo from Stability who released a new diffusion model called Stable Cascade, and finally had a great time hanging with Swyx (from Latent space) and finally got a chance to turn the microphone back at him, and had a conversation about Swyx background, Latent Space, and AI Engineer. I was also very happy to be in SF today of all days, as my day is not over yet, there's still an event which we Cohost together with A16Z, folks from Nous Research, Ollama and a bunch of other great folks, just look at all these logos! Open Source FTW 👏 TL;DR of all topics covered: * Breaking AI News * 🔥 OpenAI releases SORA - text to video generation (Sora Blogpost with examples) * 🔥 Google teases Gemini 1.5 with a whopping 1 MILLION tokens context window (X, Blog) * Open Source LLMs * Nvidia releases Chat With RTX local models (Blog, Download) * Cohere open sources Aya 101 - 101 languages supporting 12.8B model (X, HuggingFace) * Nomic releases Nomic Embed 1.5 + with Matryoshka embeddings (X) * Big CO LLMs + APIs * Andrej Karpathy leaves OpenAI (Announcement) * OpenAI adds memory to chatGPT (X) * This weeks Buzz (What I learned at WandB this week) * We launched a new course with Hamel Husain on enterprise model management (Course) * Vision & Video * Reka releases Reka-Flash, 21B & Reka Edge MM models (Blog, Demo) * Voice & Audio * WhisperKit runs on WatchOS now! (X) * AI Art & Diffusion & 3D * Stability releases Stable Casdade - new AI model based on Würstchen v3 (Blog, Demo) * Tools & Others * Goody2ai - A very good and aligned AI that does NOT want to break the rules (try it) 🔥 Let's start with Breaking News (in the order of how they happened) Google teases Gemini 1.5 with a whopping 1M context window This morning, Jeff Dean released a thread, full of crazy multi modal examples of their new 1.5 Gemini model, which can handle up to 1M tokens in the context window. The closest to that model so far was Claude 2.1 and that was not multi modal. They also claim they are researching up to 10M tokens in the context window. The thread was chock full of great examples, some of which highlighted the multimodality of this incredible model, like being able to pinpoint and give a timestamp of an exact moment in an hour long movie, just by getting a sketch as input. This, honestly blew me away. They were able to use the incredible large context window, break down the WHOLE 1 hour movie to frames and provide additional text tokens on top of it, and the model had near perfect recall. They used Greg Kamradt needle in the haystack analysis on text, video and audio and showed incredible recall, near perfect which highlights how much advancement we got in the area of context windows. Just for reference, less than a year ago, we had this chart from Mosaic when they released MPT. This graph Y axis at 60K the above graph is 1 MILLION and we're less than a year apart, not only that, Gemini Pro 1.5 is also multi modal I got to give promps to the Gemini team, this is quite a huge leap for them, and for the rest of the industry, this is a significant jump in what users will expect going forward! No longer will we be told "hey, your context is too long" 🤞 A friend of the pod Enrico Shipolle joined the stage, you may remember him from our deep dive into extending Llama context window to 128K and showed that a bunch of new research makes all this possible...

Duration:01:57:37

📅 ThursdAI - Feb 8 - Google Gemini Ultra is here, Qwen 1.5 with Junyang and deep dive into ColBERT, RAGatouille and DSPy with Connor Shorten and Benjamin Clavie

2/9/2024

Hihi, this is Alex, from Weights & Biases, coming to you live, from Yosemite! Well, actually I’m writing these words from a fake virtual yosemite that appears above my kitchen counter as I’m not a Vision Pro user and I will force myself to work inside this thing and tell you if it’s worth it. I will also be on the lookout on anything AI related in this new spatial computing paradigm, like THIS for example! But back to rfeality for a second, we had quite the show today! We had the awesome time to have Junyang Justin Lin, a dev lead in Alibaba, join us and talk about Qwen 1.5 and QwenVL and then we had a deep dive into quite a few Acronyms I’ve been seeing on my timeline lately, namely DSPy, ColBERT and (the funniest one) RAGatouille and we had a chat with Connor from Weaviate and Benjamin the author of RAGatouille about what it all means! Really really cool show today, hope you don’t only read the newsletter but listen on Spotify, Apple or right here on Substack. TL;DR of all topics covered: * Open Source LLMs * Alibaba releases a BUNCH of new QWEN 1.5 models including a tiny .5B one (X announcement) * Abacus fine-tunes Smaug, top of HF leaderboard based Qwen 72B (X) * LMsys adds more open source models, sponsored by Together (X) * Jina Embeddings fine tune for code * Big CO LLMs + APIs * Google rebranding Bard to Gemini and launching Gemini Ultra (Gemini) * OpenAI adds image metadata (Announcement) * OpenAI keys are now restricted per key (Announcement) * Vision & Video * Bria - RMBG 1.4 - Open Source BG removal that runs in your browser (X, DEMO) * Voice & Audio * Meta voice, a new apache2 licensed TTS - (Announcement) * AI Art & Diffusion & 3D * Microsoft added DALL-E editing with "designer" (X thread) * Stability AI releases update to SVD - video 1.1 launches with a webUI, much nicer videos * Deep Dive with Benjamin Clavie and Connor Shorten show notes: * Benjamin's announcement of RAGatouille (X) * Connor chat with Omar Khattab (author of DSPy and ColBERT) - Weaviate Podcast * Very helpful intro to ColBert + RAGatouille - Notion Open Source LLMs Alibaba releases Qwen 1.5 - ranges from .5 to 72B (DEMO) With 6 sizes, including 2 new novel ones, from as little as .5B parameter models to an interesting 4B, to all the way to a whopping 72B, Alibaba open sources additional QWEN checkpoints. We've had the honor to have friend of the pod Junyang Justin Lin again, and he talked to us about how these sizes were selected, that even thought this model beats Mistral Medium on some benchmarks, it remains to be seen how well this performs on human evaluations, and shared a bunch of details about open sourcing this. The models were released with all the latest and greatest quantizations, significantly improved context length (32K) and support for both Ollama and Lm Studio (which I helped make happen and am very happy for the way ThursdAI community is growing and connecting!) We also had a chat about QwenVL Plus and QwebVL Max, their API only examples for the best open source vision enabled models and had the awesome Piotr Skalski from Roborflow on stage to chat with Junyang about those models! To me a success of ThursdAI, is when the authors of things we talk about are coming to the show, and this is Junyang second appearance, which he joined at midnight at the start of the chinese new year, so greately appreciated and def. give him a listen! Abacus Smaug climbs to top of the hugging face leaderboard Junyang also mentioned that Smaug is now at the top of the leaderboards, coming from Abacus, this is a finetune of the previous Qwen-72B, not even this new one. First model to achieve an average score of 80, this is an impressive appearance from Abacus, though they haven't released any new data, they said they are planning to! They also said that they are planning to finetune Miqu, which we covered last time, the leak from Mistral that was acknowledged by Arthur Mensch the CEO of Mistral. The techniques that...

Duration:01:53:51

📖 ThursdAI - Sunday special on datasets classification & alternative transformer architectures

2/5/2024

Hello hello everyone, welcome to another special episode (some podcasts call them just.. episodes I guess, but here you get AI news every ThurdsdAI, and on Sunday you get the deeper dives) BTW, I'm writing these words, looking at a 300 inch monitor that's hovering above my usual workstation in the Apple Vision Pro, and while this is an AI newsletter, and I've yet to find a connecting link (there's like 3 AI apps in there right now, one fairly boring chatbot, and Siri... don't get me started on Siri), I'll definitely be covering my experience in the next ThursdAI, because well, I love everything new and technological, AI is a huge part of it, but not the ONLY part! 📖 It's all about the (big) Datasets Ok back to the matter at hand, if you've used, finetuned, trained or heard about an AI model, you may or may not realize how important the dataset the model was trained with is. We often talk of this model, that model, and often the only different is, additional data that folks (who I sometimes refer to as alchemists) have collected, curated and structured, and creating/curating/editing those datasets is an art and a science. For example, three friends of the pod, namely LDJ with Capybara, Austin with OpenChat and Teknium with Hermes, have been consistently taking of the shelves open source models and making them smarter, more instruction tuned, better for specific purposes. These datasets are paired with different techniques as well, for example, lately the so-called DPO (Direct preference optimization) is a technique that showed promise, since it not only shows a model which answer is the correct for a specific query, it shows an incorrect answer as well, and trains the model to prefer one over the other. (see the recent Capybara DPO improvement by Argilla, which improved model metrics across every evaluation) These datasets can range from super high quality 16K rows, to millions of rows (Teknium's recently released Hermes, one of the higher quality datasets comes in at just a tad over exactly 1 million rows) and often times it's an amalgamation of different other datasets into 1. In the case of Hermes, Teknium has compiled this 1 million chats from at least 15 different datasets, some his own, some by folks like Jon Durbin, Garage bAInd, and shareGPT, from LMsys.org, which was complied by scraping the very popular sharegpt.com website, from folks who used the shareGPT extension to share they GPT4 conversations. It's quite remarkable how much of these datasets are just, conversations that users had with GPT-4! Lilac brings Garden With that backdrop of information, today on the pod we've got the co-founders of Lilac, Nikhil Thorat and Daniel Smilkov, who came on to chat about the new thing they just released called Lilac Garden. Lilac is an open source tool (you can find it RIGHT HERE) which is built to help make dataset creation, curation and classification, more science than art, and help visualize the data, cluster it and make it easily available. In the case of Hermes, that could be more than millions of rows of data. On the pod, I talk with Nikhil and Daniel about the origin of what they both did at Google, working on Tensorflow.js and then something called "know your data" and how eventually they realized that in this era of LLMs, open sourcing a tool that can understand huge datasets, run LLM based classifiers on top of them, or even train specific ones, is important and needed! To strengthen the point, two friends of the pod (Teknium was in the crowd sending us 👍), LDJ and Austin (aka Alignment Lab) were on stage with us and basically said that "It was pretty much the dark ages before Lilac", since something like OpenOrca dataset is a whopping 4M rows of text. Visualizations in the Garden. So what does lilac actually look like? Here's a quick visualization of the top categories of texts from OpenOrca's 4 million rows, grouped by category title and showing each cluster. So you can see here, Translation...

Duration:00:50:37

ThursdAI - Feb 1, 2024- Code LLama, Bard is now 2nd best LLM?!, new LLaVa is great at OCR, Hermes DB is public + 2 new Embed models + Apple AI is coming 👀

2/2/2024

TL;DR of all topics covered + Show notes * Open Source LLMs * Meta releases Code-LLama 70B - 67.8% HumanEval (Announcement, HF instruct version, HuggingChat, Perplexity) * Together added function calling + JSON mode to Mixtral, Mistral and CodeLLama * RWKV (non transformer based) Eagle-7B - (Announcement, Demo, Yam's Thread) * Someone leaks Miqu, Mistral confirms it's an old version of their model * Olmo from Allen Institute - fully open source 7B model (Data, Weights, Checkpoints, Training code) - Announcement * Datasets & Embeddings * Teknium open sources Hermes dataset (Announcement, Dataset, Lilac) * Lilac announces Garden - LLM powered clustering cloud for datasets (Announcement) * BAAI releases BGE-M3 - Multi-lingual (100+ languages), 8K context, multi functional embeddings (Announcement, Github, technical report) * Nomic AI releases Nomic Embed - fully open source embeddings (Announcement, Tech Report) * Big CO LLMs + APIs * Bard with Gemini Pro becomes 2nd LLM in the world per LMsys beating 2 out of 3 GPT4 (Thread) * OpenAI launches GPT mention feature, it's powerful! (Thread) * Vision & Video * 🔥 LLaVa 1.6 - 34B achieves SOTA vision model for open source models (X, Announcement, Demo) * Voice & Audio * Argmax releases WhisperKit - super optimized (and on device) whisper for IOS/Macs (X, Blogpost, Github) * Tools * Infinite Craft - Addicting concept combining game using LLama 2 (neal.fun/infinite-craft/) Haaaapy first of the second month of 2024 folks, how was your Jan? Not too bad I hope? We definitely got quite a show today, the live recording turned into a proceeding of breaking news, authors who came up, deeper interview and of course... news. This podcast episode is focusing only on the news, but you should know, that we had deeper chats with Eugene (PicoCreator) from RWKV, and a deeper dive into dataset curation and segmentation tool called Lilac, with founders Nikhil & Daniel, and also, we got a breaking news segment and (from ) joined us to talk about the latest open source from AI2 👏 Besides that, oof what a week, started out with the news that the new Bard API (apparently with Gemini Pro + internet access) is now the 2nd best LLM in the world (According to LMSYS at least), then there was the whole thing with Miqu, which turned out to be, yes, a leak from an earlier version of a Mistral model, that leaked, and they acknowledged it, and finally the main release of LLaVa 1.6 to become the SOTA of vision models in the open source was very interesting! Open Source LLMs Meta releases CodeLLama 70B Benches 67% on MMLU (without fine-tuninig) and already available on HuggingChat, Perplexity, TogetherAI, Quantized for MLX on Apple Silicon and has several finetunes, including SQLCoder which beats GPT-4 on SQL Has 16K context window, and is one of the top open models for code Eagle-7B RWKV based model I was honestly disappointed a bit for the multilingual compared to 1.8B stable LM , but the folks on stage told me to not compare this in a transitional sense to a transformer model ,rather look at the potential here. So we had Eugene, from the RWKV team join on stage and talk through the architecture, the fact that RWKV is the first AI model in the linux foundation and will always be open source, and that they are working on bigger models! That interview will be released soon Olmo from AI2 - new fully open source 7B model (announcement) This announcement came as Breaking News, I got a tiny ping just before Nathan dropped a magnet link on X, and then they followed up with the Olmo release and announcement. A fully open source 7B model, including checkpoints, weights, Weights & Biases logs (coming soon), dataset (Dolma) and just... everything that you can ask, they said they will tell you about this model. Incredible to see how open this effort is, and kudos to the team for such transparency. They also release a 1B version of Olmo, and you can read the technical report here Big CO LLMs +...

Duration:01:22:35

📅 ThursdAI - Sunday special on Merging with Maxime LaBonne

1/28/2024

Hey everyone, we have an exciting interview today with Maxime Labonne. Maxime is a senior Machine Learning Scientist at JPMorgan, the author of Hands on GNNs book and his own ML Blog, creator of LazyMergeKit (which we cover on the pod) and holds a PHD in Artificial Intelligence from the Institut Polytechnique de Paris. Maxime has been mentioned on ThursdAI a couple of times before, as he released the first Phi mixture-of-experts, and has previously finetuned OpenHermes using DPO techniques which resulted in NeuralChat7B For the past couple of months, following AI on X, it was hard not to see Maxime's efforts show up on the timeline, and one of the main reasons I invited Maxime to chat was the release of NeuralBeagle7B, which at the time of writing was the top performing 7B model on the LLM leaderboard, and was specifically a merge of a few models. Model merging Model merging has been around for a while but recently has been heating up, and Maxime has a lot to do with that, as he recently checked, and his wrapper on top of MergeKit by Charles Goddard (which is the library that put model merging into the mainstream) called LazyMergeKit was in charge of >50% of the merged models on HuggingFace hub leaderboard. Maxime also authored a model merging blogpost on Hugging Face and wrote quite a few articles and shared code that helped others to put merged models out. Modern day Alchemy This blogpost is a great resource on what model merging actually does, so I won't go into depth of what the algorithms are, please refer to that if you want a deep dive, but in a nutshell, model merging is a technique to apply algorithms to the weights of a few models, even a few instances of the same model (like Mistral7B) and create a new model, that often performs better than the previous ones, without additional training! Since this is algorithmic, it doesn't require beefy GPUs burning power to keep training or finetuning, and since the barrier of entry is very low, we get some cool and crazy results as you'll see below. Yeah, quite crazy as it sounds, this method can also create models of non standard sizes, like 10B or 120B models, since it's slicing pieces of other models and stitching them together in new ways. If you recall, we had a deep dive with Jon Durbin who released Bagel, and Jon specifically mentioned that he created Bagel (based on everything everywhere all at once) as a good base for merges, that will include all the prompt formats, you can read and listen to that episode here This merge frenzy, made HuggingFace change the leaderboard, and add a checkbox that hides model merges, because they are flooding the leaderboard, and often, and require much smaller effort than actually pre-training or even finetuning a model And quite often the top of the leaderboard was overrun with model merges like in this example of Bagel and it's merges by CloudYu (which are not the top ones but still in the top 10 as I write this) ThursdAI - Recaps of the most high signal AI weekly spaces is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber. On why it works? Nisten summarized this pretty well in this now famous copypasta tweet and I've confirmed with Maxime that this is his current understanding as well, it's quite unclear why this seems to perform so well, but it of course doesn't stop the "folks who look for AI Waifus" to keep merging. Following folks like Nathan Lambert from interconnects.ai to start paying attention even though he didn't want to! (Still waiting on your writeup Nathan!) UPDATE: As of today Monday Jan 29th, just released a super comprehensive deep dive into merges, which you can read here 👇👏 YALL + Automated LLM Evaluation Maxime as also worked on so many models of his own, that he built a convenient little tracking leaderboard to track their performance, which he called YALL, Yet Another LLM Leaderboard and it's on HuggingFace. You can see that...

Duration:00:35:30

📅 ThursdAI - Jan 24 - ⌛Diffusion Transformers,🧠 fMRI multimodality, Fuyu and Moondream1 VLMs, Google video generation & more AI news

1/26/2024

What A SHOW folks, I almost don't want to write anything in the newsletter to MAKE you listen haha but I will I know many of you don't like listening to be babble. But if you chose one episode to listen to instead of just skimming the show-notes, make it this one. We've had 2 deep dives, one into the exciting world of multi-modalilty, we chatted with the creator of Moondream1, Vik and the co-founders of Prophetic, Wes and Eric about their EEG/fMRI multimodal transformer (that's right!) and then we had a DEEP dive into the new Hourglass Diffusion Transformers with Tanishq from MedArc/Stability. More than 1300 tuned in to the live show 🔥 and I've got some incredible feedback on the fly, which I cherish so if you have friends who don't already know about ThursdAI, why not share this with them as well? TL;DR of all topics covered: * Open Source LLMs * Stability AI releases StableLM 1.6B params (X, Blog, HF) * InternLM2-Math - SOTA on math LLMs (90% GPT4 perf.) (X, Demo, Github) * MedArc analysis for best open source use for medical research finds Qwen-72 the best open source doctor (X) * Big CO LLMs + APIs * Google teases LUMIERE - incredibly powerful video generation (TTV and ITV) (X, Blog, ArXiv) * 🤗 HuggingFace announces Google partnership (Announcement) * OpenAi 2 new embeddings models, tweaks turbo models and cuts costs (My analysis, Announcement) * Google to add 3 new AI features to Chrome (X, Blog) * Vision & Video * Adept Fuyu Heavy - Third in the world MultiModal while being 20x smaller than GPT4V, Gemini Ultra (X, Blog) * FireLLaVa - First LLaVa model with commercial permissive license from fireworks (X, Blog, HF, DEMO) * Vikhyatk releases Moondream1 - tiny 1.6B VLM trained on Phi 1 (X, Demo, HF) * This weeks's buzz 🐝🪄 - What I learned in WandB this week * New course announcement from Jason Liu & WandB - LLM Engineering: Structured Outputs (Course link) * Voice & Audio * Meta W2V-BERT - Speech encoder for low resource languages (announcement) * 11 labs has dubbing studio (my dubbing test) * AI Art & Diffusion & 3D * Instant ID - zero shot face transfer diffusion model (Demo) * 🔥 Hourglass Diffusion (HDiT) paper - High Resolution Image synthesis - (X, Blog, Paper, Github) * Tools & Others * Prophetic announces MORPHEUS-1, their EEG/fMRI multimodal ultrasonic transformer for Lucid Dream induction (Announcement) * NSF announces NAIRR with partnership from all major government agencies & labs including, OAI, WandB (Blog) * Runway adds multiple motion brushes for added creativity (X, How to) Open Source LLMs Stability releases StableLM 1.6B tiny LLM Super super fast tiny model, I was able to run this in LMStudio that just released an update supporting it, punches above it's weight specifically on other languages like German/Spanish/French/Italian (beats Phi) Has a very surprisingly decent MT-Bench score as well License is not commercial per se, but a specific Stability AI membership I was able to get above 120tok/sec with this model with LM-Studio and it was quite reasonable and honestly, it’s quite ridiculous how fast we’ve gotten to a point where we have an AI model that can weight less that 1GB and has this level of performance 🤯 Vision & Video & Multimodality Tiny VLM Moonbeam1 (1.6B) performs really well (Demo) New friend of the pod Vik Hyatk trained Moonbeam1, a tiny multimodal VLM with LLaVa on top of Phi 1 (not 2 cause.. issues) and while it's not commercially viable, it's really impressive in how fast and how quite good it is. Here's an example featuring two of my dear friends talking about startups, and you can see how impressive this TINY vision enabled model can understand this scene. This is not cherry picked, this is literally the first image I tried with and my first result. The image features two men sitting in chairs, engaged in a conversation. One man is sitting on the left side of the image, while the other is on the right side. They are both looking at a laptop...

Duration:01:40:45

📅 ThursdAI Jan 18 - Nous Mixtral, Deepmind AlphaGeometry, LMSys SGLang, Rabbit R1 + Perplexity, LLama 3 is training & more AI news this week

1/19/2024

👋 Hey there, been quite a week, started slow and whoah, the last two days were jam-packed with news, I was able to barely keep up! But thankfully, the motto of ThursdAI is, we stay up to date so you don’t have to! We had a milestone, 1.1K listeners tuned into the live show recording, it’s quite the number, and I’m humbled to present the conversation and updates to that many people, if you’re reading this but never joined live, welcome! We’re going live every week on ThursdAI, 8:30AM pacific time. TL;DR of all topics covered: * Open Source LLMs * Nous Hermes Mixtral finetune (X, HF DPO version, HF SFT version) * NeuralBeagle14-7B - From Maxime Labonne (X, HF,) * It's the best-performing 7B parameter model on the Open LLM Leaderboard (when released, now 4th) * We had a full conversation with Maxime about merging that will release as a standalone episode on Sunday! * LMsys - SGLang - a 5x performance on inference (X, Blog, Github) * NeuralMagic applying #sparceGPT to famous models to compress them with 50% sparsity (X, Paper) * Big CO LLMs + APIs * 🔥 Google Deepmind solves geometry at Olympiad level with 100M synthetic data (Announcement, Blog) * Meta announces Llama3 is training, will have 350,000 H100 GPUs (X) * Open AI releases guidelines for upcoming elections and removes restrictions for war use (Blog) * Sam Altman (in Davos) doesn't think that AGI will change things as much as people think (X) * Samsung S24 has AI everywhere, including real time translation of calls (X) * Voice & Audio * Meta releases MAGNet (X, HF) * AI Art & Diffusion & 3D * Stable diffusion runs 100% in the browser with WebGPU, Diffusers.js (X thread) * DeciAI - Deci Diffusion - A text-to-image 732M-parameter model that’s 2.6x faster and 61% cheaper than Stable Diffusion 1.5 with on-par image quality * Tools & Hardware * Rabbit R1 announces a deal with Perplexity, giving a full year of perplexity pro to Rabbit R1 users and will be the default search engine on Rabbit (link) Open Source LLMs Nous Research releases their first Mixtral Finetune, in 2 versions DPO and SFT (X, DPO HF) This is the first Mixtral finetune from Teknium1 and Nous team, trained on the Hermes dataset and comes in two variants, the SFT and SFT+DPO versions, and is a really really capable model, they call it their flagship! This is the fist Mixtral finetune to beat Mixtral instruct, and is potentially the best open source model available right now! 👏 Already available at places like Together endpoints, GGUF versions by the Bloke and I’ve been running this model on my mac for the past few days. Quite remarkable considering where we are in only January and this is the best open chat model available for us. Make sure you use ample system prompting for it, as it was trained with system prompts in mind. LMsys new inference 5x with SGLang & RadixAttention (Blog) LMSys introduced SGLang, a new interface and runtime for improving the efficiency of large language model (LLM) inference. It claims to provide up to 5x faster inference speeds compared to existing systems like Guidance and vLLM. SGLang was designed to better support complex LLM programs through features like control flow, prompting techniques, and external interaction. It co-designs the frontend language and backend runtime. - On the backend, it proposes a new technique called RadixAttention to automatically handle various patterns of key-value cache reuse, improving performance. - Early users like LLaVa reported SGLang providing significantly faster inference speeds in their applications compared to other options. The LMSys team released code on GitHub for others to try it out. Big CO LLMs + APIs Meta AI announcements (link) These #BreakingNews came during our space, Mark Zuckerberg posted a video on Instagram saying that Llama3 is currently training, and will be open sourced! He also said that Meta will have 350K (that’s not a typo, 350,000) H100 GPUs by end of the year, and a total of...

Duration:01:10:40

🔥 ThursdAI Sunday special - Deep dives into Crew AI with Joao then a tasty Bagel discussion with Jon Durbin

1/15/2024

ThursdAI - Sunday special deep dive, interviews with Joao, and Jon, AI agent Crews and Bagel Merges. Happy Sunday dear reader, As you know by now, ThursdAI pod is not a standard interview based podcast, we don't focus on a 1:1 guest/host conversation, but from time to time we do! And this week I was very lucky to have one invited guest and one surprise guest, and I'm very happy to bring you both these conversations today. Get your Crew together - interview with João Moura, creator of CrewAI We'll first hear from João Moura, the creator of Crew AI, the latest agent framework. João is a director of AI eng. at Clearbit (acquired by Hubspot recently) and created Crew AI for himself, to automate many of the things he didn't want to keep doing, for example, post more on Linkedin. Crew has been getting a lot of engagement lately, and we go into the conversation about it with João, it's been trending #1 on Github, and received #2 product of the day when Chris Messina hunted this (to João's complete surprise) on Product Hunt. CrewAI is built on top of Langchain, and is an agent framework, focusing on Orchestration or role-playing, autonomous agents. In our chat with João we go into the inspiration, the technical challenges and the success of CrewAI so far, how maintenance for crew is now partly a family effort and what's next for crew Merges and Bagels - chat with Jon Durbin about Bagel, DPO and merging The second part of today's pod was a conversation with Jon Durbin, a self described AI tinkerer and software engineer. Jon is a Sr. applied AI researcher at Convai, and is well known in our AI circles as a master finetuner and dataset curator. This interview was not scheduled, but I'm very happy it happened! If you've been following along with the AI / Finetuning space, Jon's Airoboros dataset and set of models have been often mentioned, and cited, and Jon's latest work on the Bagel models took the lead on HuggingFace open LLM leaderboard So when I mentioned on X (as I often do) that I'm going to mention this on ThursdAI, Jon came up to the space and we had a great conversation, in which he shared a LOT of deep insights into finetuning, DPO (Direct Preference Optimizations) and merging. The series of Bagel dataset and models, was inspired by the Everything Everywhere All at Once movie (which is a great movie, watch it if you haven't!) and is alluding to, Jon trying to throw as many datasets together as he could, but not only datasets! There has been a lot of interest in merging models recently, specifically many folks are using MergeKit to merge models with other models (and often a model with itself) to create larger/better models, without additional training or GPU requirements. This is solely an engineering thing, some call it frankensteining, some frankenmerging. If you want to learn about Merging, Maxime Labonne (the author of Phixtral) has co-authored a great deep-dive on Huggingface blog, it's a great resource to quickly get up to speed So given the merging excitement, Jon has set out to create a model that can be an incredible merge base, many models are using different prompt techniques, and Jon has tried to cover as many as possible. Jon also released a few versions of Bagel models, DPO and non DPO, that and we had a brief conversation about why the DPO versions are more factual and better at math, but not great for Role Playing (which is unsurprisingly what many agents are using these models for) or creative writing. The answer is, as always, dataset mix! I learned a TON from this brief conversation with Jon, and if you're interested in the incredible range of techniques in the Open Source LLM world, DPO and Merging are definitely at the forefront of this space right now, and Jon is just at the cross-roads of them, so definitely worth a listen and I hope to get Jon to say more and learn more in future episodes so stay tuned! So I'm in San Francisco, again... As I've mentioned on the previous newsletter, I...

Duration:00:42:23

📅 ThursdAI Jan 11 - GPTs store, Mixtral paper, Phi is MIT + Phixtral, 🥯 by Jon Durbin owns the charts + Alex goes to SF again and 2 deep dive interviews 🎙️

1/12/2024

Hey hey everyone, how are you this fine ThursdAI? 👋 I’m gud thanks for asking! I’m continuing my experiment of spilling the beans, and telling you about everything we talked about in advance, both on the pod and in the newsletter, so let me know if this is the right way to go or not, for the busy ones it seems that it is. If you don’t have an hour 15, here’s a short video recap of everything we chatted about: ThursdAI - Jan 11 2024 TL;DR TL;DR of all topics covered + Show notes * Open Source LLMs * 🔥 Donut from Jon Durbin is now top of the LLM leaderboard (X, HF, Wolframs deep dive and scoring) * OpenChat January Update - Best open source 7B LLM (X, Hugging Face) * Our friends at NousResearch announce a seed round of 5.2M as their models pass 1.2 million downloads (X) * Argilla improved (Distillabeled?) the DPO enhanced Neural Hermes with higher quality DPO pairs (X) * New MoEs are coming out like hotcakes - PhixTral and DeepSeek MoE (X, Omar Thread, Phixtral Thread) * Microsoft makes Phi MIT licensed 👏 * Big CO LLMs + APIs * OpenAI adds personalization & team tiers (Teams announcement) * OpenAI launches GPT store (Store announcement, Store link) * Mixtral medium tops the LMsys human evaluation arena, is the best LLM overall after GPT4 👏 (X) * Hardware * Rabbit R1 is announced, $200/mo without a subscription, everybody has a take (X) * This weeks Buzz from Weights & Biases * Hackathon with Together, Langchain and WandB (and ME!) this weekend in AGI house (X, Signup) * Video * Bytedance releases MagicVideo-V2 video gen that looks great and passes Pika labs in human tests (X) * AI Art & Diffusion & 3D * Luma launched their online version of Genie and it's coming to the API (X) * Show notes and links mentioned * MergeKit (github) * Jon Durbins Contextual DPO dataset (HuggingFace) * Phixtral from Maxime Lebonne (X, HuggingFace) * WandGPT - out custom Weights & Biases GPT (GPT store) * Visual Weather GPT by me - https://chatg.pt/artweather * Ask OpenAI to not train on your chats - https://privacy.openai.com/policies AI Hardware It seems that the X conversation had a new thing this week, the AI hardware startup Rabbit, showcased their new $200 device (no subscriptions!) at CES and everyone and their mom had an opinion! We had quite a long conversation about that with (his first time on ThursdAI 👏) as we both pre-ordered one, however there were quite a few red flags, like for example, GPUs are costly, so how would an AI device that has AI in the cloud just cost a 1 time 200 bucks?? There were other interesting things they showed during the demo, and I’ll let you watch the full 30 minutes and if you want to read more, here’s a great deeper dive into this from . UPDATE: Ss I’m writing this, the CEO of Rabbit (who’s also on the board of Teenage Engineering, the amazing company that designed this device) tweeted that they sold out the initial first AND second batch of 10K unites, netting a nice $2M in hardware sales in 48 hours! Open Source LLMs Mixtral paper dropped (ArXiv, Morgans take) Mistral finally published the paper on Mixtral of experts, the MoE that's the absolutel best open source model right now, and it's quite the paper. Nisten did a full paper reading with explanations on X space, which I co-hosted and we had almost 3K people tune in to listen. Here's the link to the live reading X space by Nisten. And here's some notes courtecy Morgan McGuire (who's my boss at WandB btw 🙌) Strong retrieval across the entire context window Mixtral achieves a 100% retrieval accuracy regardless of the context length or the position of passkey in the sequence. Experts don't seem to activate based on topic Surprisingly, we do not observe obvious patterns in the assignment of experts based on the topic. For instance, at all layers, the distribution of expert assignment is very similar for ArXiv papers (written in Latex), for biology (PubMed Abstracts), and for Philosophy (PhilPapers)...

Duration:01:16:41