GPT Reviews-logo

GPT Reviews

News

A daily show about AI made by AI: news, announcements, and research from arXiv, mixed in with some fun. Hosted by Giovani Pete Tizzano, an overly hyped AI enthusiast; Robert, an often unimpressed analyst, Olivia, an overly online reader, and Belinda, a witty research expert.

Location:

United States

Genres:

News

Description:

A daily show about AI made by AI: news, announcements, and research from arXiv, mixed in with some fun. Hosted by Giovani Pete Tizzano, an overly hyped AI enthusiast; Robert, an often unimpressed analyst, Olivia, an overly online reader, and Belinda, a witty research expert.

Language:

English


Episodes
Ask host to enable sharing for playback control

OpenAI's Strawberry Revolution 🍓 // Nvidia's Lucrative Paychecks 💸 // Google Pipe SQL Simplification 📊

8/29/2024
This episode dives into OpenAI's promising new model, Strawberry, which could revolutionize interactions in ChatGPT. We explore the financial envy Nvidia employees inspire in their Google and Meta counterparts due to lucrative stock options. Google’s new Pipe SQL syntax aims to simplify data querying, while concerns about research accessibility are raised. Finally, we discuss BaichuanSEED and Dolphin models, which highlight advancements in extensible data collection and energy-efficient processing, paving the way for enhanced AI capabilities. Contact: sergi@earkind.com Timestamps: 00:34 Introduction 01:40 OpenAI Races to Launch Strawberry 03:07 Google, Meta workers envy Nvidia staffers’ fat paychecks: ‘Bought a 100K car … all cash’ 05:01 Google's New Pipe SQL Syntax 06:12 Fake sponsor 07:47 BaichuanSEED: Sharing the Potential of ExtensivE Data Collection and Deduplication by Introducing a Competitive Large Language Model Baseline 09:20 Dolphin: Long Context as a New Modality for Energy-Efficient On-Device Language Models 11:09 Eagle: Exploring The Design Space for Multimodal LLMs with Mixture of Encoders 12:50 Outro

Duration:00:14:01

Ask host to enable sharing for playback control

OpenAI's 'Strawberry' AI 🚀 // World's Fastest AI Inference ⚡ // Photo-realistic 3D Avatars 🎨

8/28/2024
OpenAI's 'Strawberry' AI tackles complex math and programming with enhanced reasoning, while Cerebras claims to have launched the fastest AI inference, enabling real-time applications at competitive prices. The GenCA model revolutionizes avatar creation with photo-realistic, controllable 3D avatars, and the "Build-A-Scene" paper introduces interactive 3D layout control for text-to-image generation, enhancing creative fields with dynamic object manipulation. Contact: sergi@earkind.com Timestamps: 00:34 Introduction 02:02 OpenAI Shows ‘Strawberry’ AI to the Feds and Uses It to Develop ‘Orion’ 03:23 Cerebras Launches the World’s Fastest AI Inference 05:07 Diffusion Models Are Real-Time Game Engines 06:15 Fake sponsor 08:06 The Mamba in the Llama: Distilling and Accelerating Hybrid Models 09:42 GenCA: A Text-conditioned Generative Model for Realistic and Drivable Codec Avatars 11:16 Build-A-Scene: Interactive 3D Layout Control for Diffusion-Based Image Generation 13:04 Outro

Duration:00:14:14

Ask host to enable sharing for playback control

Grok-2's Speed & Accuracy 🚀 // OpenAI's Transparency Push 🗳️ // LlamaDuo for Local LLMs 🔄

8/27/2024
Grok-2's advancements in speed and accuracy position it as a leading AI model, particularly in math and coding. OpenAI's backing of California's AI bill highlights the critical need for transparency in synthetic content, especially during an election year. The episode features groundbreaking research on the SwiftBrush diffusion model and K-Sort Arena for generative model evaluation. Additionally, the LlamaDuo pipeline offers a practical solution for migrating from cloud-based LLMs to local models, tackling privacy and operational challenges. Contact: sergi@earkind.com Timestamps: 00:34 Introduction 01:55 grok-2 is Faster and Better 03:32 OpenAI supports California AI bill requiring 'watermarking' of synthetic content 04:53 Fake sponsor 06:45 SwiftBrush v2: Make Your One-step Diffusion Model Better Than Its Teacher 08:10 SWE-bench-java: A GitHub Issue Resolving Benchmark for Java 09:40 K-Sort Arena: Efficient and Reliable Benchmarking for Generative Models via K-wise Human Preferences 11:24 LlamaDuo: LLMOps Pipeline for Seamless Migration from Service LLMs to Small-Scale Local LLMs 13:26 Outro

Duration:00:14:46

Ask host to enable sharing for playback control

Salesforce's AI Sales Agents 🤖 // NVIDIA's Compact Language Model ⚡ // Optimized Computation for Performance 📊

8/26/2024
This episode dives into Salesforce's innovative AI sales agents that automate tasks but risk losing human touch, NVIDIA's compact yet powerful language model that promises efficiency, groundbreaking research showing how optimized computation can enhance model performance, and insights into compound inference systems revealing the delicate balance in maximizing language model effectiveness. Contact: sergi@earkind.com Timestamps: 00:34 Introduction 01:49 Salesforce's New Sales AI Agents 03:09 Lightweight Champ: NVIDIA Releases Small Language Model With State-of-the-Art Accuracy 04:52 avante.nvim 05:56 Fake sponsor 07:45 Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters 09:22 Large Language Monkeys: Scaling Inference Compute with Repeated Sampling 11:15 Are More LLM Calls All You Need? Towards Scaling Laws of Compound Inference Systems 13:10 Outro

Duration:00:14:20

Ask host to enable sharing for playback control

Amazon Cloud Chief Spicy Takes 🚀 // Zuckerberg's AI Vision 📈 // Multimodal Models for Safety 🔒

8/23/2024
This episode dives deep into the future of coding, challenging the belief that AI will render developers obsolete. It highlights Meta's stock surge, attributing it to Zuckerberg's compelling AI narrative that captivates investors. The discussion also covers groundbreaking research like Transfusion, which merges text and image processing, and the innovative approach of automated design for intelligent agents. Lastly, it emphasizes the xGen-MM framework's commitment to safety in AI, showcasing the critical need to mitigate harmful behaviors in advanced models. Contact: sergi@earkind.com Timestamps: 00:34 Introduction 01:28 Amazon cloud chief: Devs may stop coding when AI takes over 02:53 Meta Shares Are Flying High as Zuckerberg Sells His AI Vision 04:34 I've Built My First Successful Side Project, and I Hate It 05:41 Fake sponsor 07:35 Transfusion: Predict the Next Token and Diffuse Images with One Multi-Modal Model 09:16 Automated Design of Agentic Systems 10:56 xGen-MM (BLIP-3): A Family of Open Large Multimodal Models 12:44 Outro

Duration:00:13:54

Ask host to enable sharing for playback control

OpenAI's SearchGPT Launch 🔍 // Vision Transformers Efficiency 📊 // Automated Agent Design Revolution 🚀

8/19/2024
OpenAI's SearchGPT is launching with limited access for only 10,000 users, raising questions about trust and the potential risks of generative search products. A comprehensive analysis challenges the belief that Vision Transformers are inefficient, suggesting they can handle higher resolutions effectively. The introduction of Automated Design of Agentic Systems (ADAS) could revolutionize how intelligent agents are created, outperforming traditional hand-designed models. The xGen-MM framework aims to enhance multimodal AI capabilities while prioritizing safety measures to mitigate harmful behaviors. Contact: sergi@earkind.com Timestamps: 00:34 Introduction 01:43 OpenAI is fresh out of SearchGPT 02:50 From ChatGPT to Gemini: how AI is rewriting the internet 04:32 On the speed of ViTs and CNNs 05:49 Fake sponsor 07:49 JPEG-LM: LLMs as Image Generators with Canonical Codec Representations 09:34 Automated Design of Agentic Systems 11:12 xGen-MM (BLIP-3): A Family of Open Large Multimodal Models 13:01 Outro

Duration:00:14:11

Ask host to enable sharing for playback control

Grok-2 Beta Release 🚀 // Apple's $1,000 Home Robot 🏡 // ChemVLM Breakthrough in Chemistry 🔬

8/15/2024
This episode dives into the Grok-2 Beta Release, highlighting its advanced reasoning capabilities and competitive edge. We explore Apple’s ambitious plans for a $1,000 tabletop robotic home device, set to transform smart home technology. The introduction of ChemVLM marks a breakthrough in chemistry research, effectively integrating chemical images and text. Lastly, InfinityMATH presents a scalable dataset that enhances language models' mathematical reasoning, showcasing impressive performance improvements. Contact: sergi@earkind.com Timestamps: 00:34 Introduction 01:37 Grok-2 Beta Release 02:58 Apple Aiming to Launch Tabletop Robotic Home Device as Soon as 2026 With Pricing Around $1,000 04:29 Gemlite: Towards Building Custom Low-Bit Fused CUDA Kernels 05:34 Fake sponsor 07:16 Seeing and Understanding: Bridging Vision with Chemical Knowledge Via ChemVLM 08:55 Generative Photomontage 10:26 InfinityMATH: A Scalable Instruction Tuning Dataset in Programmatic Mathematical Reasoning 12:22 Outro

Duration:00:13:41

Ask host to enable sharing for playback control

Gemini Live AI Assistant 📱 // OpenAI’s Coding Benchmark ✅ // LongWriter’s 10K Word Generation ✍️

8/14/2024
This episode dives into Gemini Live's interactive AI capabilities, OpenAI's improved coding benchmark for reliable evaluations, LongWriter's breakthrough in generating ultra-long outputs, and SlotLifter's advancements in 3D object-centric learning. Each topic highlights significant innovations and their implications in the AI landscape. Contact: sergi@earkind.com Timestamps: 00:34 Introduction 01:48 Gemini makes your mobile device a powerful AI assistant 03:08 New OpenAI Coding Benchmark 04:52 Things I learned from teaching 05:59 Fake sponsor 07:38 Imagen 3 09:05 LongWriter: Unleashing 10,000+ Word Generation from Long Context LLMs 10:46 SlotLifter: Slot-guided Feature Lifting for Learning Object-centric Radiance Fields 12:22 Outro

Duration:00:13:23

Ask host to enable sharing for playback control

Google Meet's AI Note-Taking 📝 // Trump’s AI Crowd Claims 🤔 // ControlNeXt & Image Generation 🎨

8/13/2024
Google Meet's new AI note-taking feature could change meeting dynamics, while Trump’s claims about Kamala Harris reveal the political implications of AI. The exploration of AI's role in scientific research raises ethical concerns, and cutting-edge papers on ControlNeXt, rStar, and FruitNeRF showcase advancements in image generation, reasoning capabilities, and fruit counting accuracy. Contact: sergi@earkind.com Timestamps: 00:34 Introduction 01:43 Google Meet call will soon be able to take notes for you 02:56 Trump falsely claims Kamala Harris ‘AI’d’ her rally crowd size 04:23 The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery 05:35 Fake sponsor 07:15 ControlNeXt: Powerful and Efficient Control for Image and Video Generation 08:47 Mutual Reasoning Makes Smaller LLMs Stronger Problem-Solvers 10:41 FruitNeRF: A Unified Neural Radiance Field based Fruit Counting Framework 12:41 Outro

Duration:00:13:51

Ask host to enable sharing for playback control

OpenAI's Strawberry Model 🍓 // Meta's Celebrity Voice Assistants 🎙️ // Human-level Robot Table Tennis 🏓

8/12/2024
OpenAI's mysterious "Strawberry" AI model is causing a buzz in the tech world, with rumors of advanced reasoning capabilities. Meta is trying to improve their AI assistants by enlisting the help of celebrities like Awkwafina to give them a more relatable and entertaining vibe. Google DeepMind's research on building a robot capable of playing table tennis at a human level is a remarkable exploration of robotics and sports. UC Berkeley and Google DeepMind's paper on optimizing LLMs and Harbin Institute of Technology's research on building a general-purpose AI agent capable of completing long-horizon tasks are both groundbreaking developments in the field of AI. Contact: sergi@earkind.com Timestamps: 00:34 Introduction 01:35 Sam Altman teases project Strawberry 03:06 Meta courts celebs like Awkwafina to voice AI assistants ahead of Meta Connect 04:58 Achieving Human Level Competitive Robot Table Tennis 06:11 Fake sponsor 08:15 Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters 09:55 Optimus-1: Hybrid Multimodal Memory Empowered Agents Excel in Long-Horizon Tasks 11:41 UniBench: Visual Reasoning Requires Rethinking Vision-Language Beyond Scaling 13:30 Outro

Duration:00:15:27

Ask host to enable sharing for playback control

Nvidia's Stock Struggles 📉 // Meta's AI Hallucinations 🤖 // Superconducting Microprocessors ⚡

8/2/2024
This episode dives into Nvidia's stock struggles amid rising competition, while also unpacking Meta's AI blunders and the implications of "hallucinations" in tech. We explore cutting-edge superconducting microprocessors that promise unprecedented energy efficiency and highlight groundbreaking AI research, including eavesdropping techniques and advancements in reinforcement learning. Contact: sergi@earkind.com Timestamps: 00:34 Introduction 01:50 Nvidia Sank Again Today -- Time to Buy the Artificial Intelligence (AI) Growth Stock Hand Over Fist? 03:09 Meta blames hallucinations after its AI said Trump rally shooting didn’t happen 04:52 Superconducting Microprocessors? Turns Out They're Ultra-Efficient 06:07 Fake sponsor 07:48 Deep-TEMPEST: Using Deep Learning to Eavesdrop on HDMI from its Unintended Electromagnetic Emanations 09:22 SAPG: Split and Aggregate Policy Gradients 10:45 MindSearch: Mimicking Human Minds Elicits Deep AI Searcher 12:44 Outro

Duration:00:14:41

Ask host to enable sharing for playback control

Google's Gemma 2 vs. GPT-3.5 ⚔️ // Black Forest Labs' Flux Model 🌲 // Ethical Concerns in AI 🚨

8/2/2024
This episode dives into Google’s Gemma 2, which claims to outperform GPT-3.5 while tackling responsible AI practices. We explore Black Forest Labs' Flux model, featuring 12 billion parameters and tailored versions for various users. Olivia sheds light on the ethical concerns surrounding the resurgence of pseudoscience in machine learning, particularly physiognomy. Lastly, Belinda reviews critical research on AI safety, advocating for clearer metrics to prevent misleading claims about safety advancements. Contact: sergi@earkind.com Timestamps: 00:34 Introduction 01:37 Google’s tiny AI model bests GPT-3.5 02:48 Announcing Flux by Black Forest Labs: The Next Leap in Text-to-Image Models 04:28 The reanimation of pseudoscience in machine learning and its ethical repercussions 06:06 Fake sponsor 08:04 MoMa: Efficient Early-Fusion Pre-training with Mixture of Modality-Aware Experts 09:55 Spectra: A Comprehensive Study of Ternary, Quantized, and FP16 Language Models 11:41 Safetywashing: Do AI Safety Benchmarks Actually Measure Safety Progress? 13:33 Outro

Duration:00:14:44

Ask host to enable sharing for playback control

Apple's AI Feature Delay 📅 // SAM 2 Object Segmentation 🖼️ // Google's TPU Chips Shift ⚡

7/30/2024
Apple’s delay in releasing AI features until October could affect iPhone 16 sales and customer excitement. The tech giant’s choice to use Google’s TPU chips instead of Nvidia marks a significant shift in AI hardware competition. Meta’s SAM 2 introduces groundbreaking real-time object segmentation with zero-shot generalization, revolutionizing visual content interaction. Additionally, Sony AI’s research presents a cost-effective approach to training diffusion models, democratizing access to advanced AI technology. Contact: sergi@earkind.com Timestamps: 00:34 Introduction 01:54 Apple Intelligence Won't Be Released Until October 03:09 Apple used Google's chips to train two AI models, research paper shows 04:44 A Visual Guide to Quantization 05:38 Introducing SAM 2: The next generation of Meta Segment Anything Model for videos and images 06:41 Fake sponsor 08:46 Stretching Each Dollar: Diffusion Training from Scratch on a Micro-Budget 10:28 Theia: Distilling Diverse Vision Foundation Models for Robot Learning 12:27 Outro

Duration:00:14:25

Ask host to enable sharing for playback control

OpenAI's SearchGPT 🧐 // AI in Math Olympiad 🏅 // Unreliable AI Existential Risk 🔍

7/29/2024
OpenAI's new prototype, SearchGPT, promises to combine AI smarts with real-time web information to make search easier. AI has achieved silver-medal standards at the International Mathematical Olympiad, raising questions about the future of mathematics and the role of AI in solving complex problems. The reliability of AI existential risk probabilities is called into question in a thought-provoking article, challenging the authority we often assign to these forecasts and calling for more scrutiny. Three fascinating papers from UNC Chapel Hill, Google DeepMind, and a collaboration between Caltech and NVIDIA explore advancements in theorem proving, balancing fast and slow planning, and aligning large language models with Best-of-N distillation. These papers could transform the way we approach complex problems with language models and streamline the development of LLMs. Contact: sergi@earkind.com Timestamps: 00:34 Introduction 01:54 OpenAI Announces SearchGPT 03:15 AI achieves silver-medal standard solving International Mathematical Olympiad problems 04:55 AI existential risk probabilities are too unreliable to inform policy 06:25 Fake sponsor 08:21 LeanDojo: Theorem Proving with Retrieval-Augmented Language Models 10:10 System-1.x: Learning to Balance Fast and Slow Planning with Language Models 12:01 BOND: Aligning LLMs with Best-of-N Distillation 13:43 Outro

Duration:00:15:50

Ask host to enable sharing for playback control

Mistral Large 2 🌍 // Memphis Supercluster 💻 // Emergence in Complex Systems 🧩

7/26/2024
Mistral Large 2 release with advanced features and multilingual support. Elon Musk's announcement of the Memphis Supercluster for creating the world's most powerful AI. Discussion of emergence in complex systems and the MINT-1T dataset for training large multimodal models. Introduction of OpenDevin, an open platform for developing AI agents and MOMAland, a benchmark framework for multi-objective multi-agent reinforcement learning. Contact: sergi@earkind.com Timestamps: 00:34 Introduction 01:39 Mistral Large 2 Release 03:01 Elon Musk Announces Memphis Supercomputer 04:48 The Puzzle of How Large-Scale Order Emerges in Complex Systems 06:22 Fake sponsor 08:37 MINT-1T: Scaling Open-Source Multimodal Data by 10x: A Multimodal Dataset with One Trillion Tokens 10:16 OpenDevin: An Open Platform for AI Software Developers as Generalist Agents 11:53 MOMAland: A Set of Benchmarks for Multi-Objective Multi-Agent Reinforcement Learning 13:31 Outro

Duration:00:14:51

Ask host to enable sharing for playback control

Llama 3.1 Unveiled 🦙 // Alphabet's 14% Revenue Growth 📈 // MovieDreamer Revolutionizes Video 🎬

7/24/2024
This episode features the introduction of Llama 3.1, Meta's cutting-edge AI model with remarkable flexibility and extensive language support. We delve into Alphabet's impressive 14% revenue growth, highlighting the increasing demand for AI infrastructure in cloud computing. The System-1.x Planner is explored, demonstrating its innovative balance between fast and slow planning modes, leading to enhanced performance. Finally, we discuss MovieDreamer, a groundbreaking model that elevates video generation by ensuring narrative coherence and high visual quality in long-form content. Contact: sergi@earkind.com Timestamps: 00:34 Introduction 01:39 Introducing Llama 3.1: Our most capable models to date 02:59 Alphabet revenue jump shows no sign of AI denting search business 04:36 Open Source AI Is the Path Forward 05:40 Fake sponsor 07:41 System-1.x: Learning to Balance Fast and Slow Planning with Language Models 09:31 KAN or MLP: A Fairer Comparison 11:08 MovieDreamer: Hierarchical Generation for Coherent Long Visual Sequence 12:53 Outro

Duration:00:14:50

Ask host to enable sharing for playback control

Meta's Llama 3.1 vs. GPT-4o 🤯 // OpenAI's own AI chips 🧐 // SlowFast-LLaVA for Video LLMs 🎬

7/23/2024
Meta's upcoming Llama 3.1 models could outperform the current state-of-the-art closed-source LLM model, OpenAI's GPT-4o. OpenAI is planning to develop its own AI chip to optimize performance and potentially supercharge their progress towards AGI. Apple's SlowFast-LLaVA is a new training-free video large language model that captures both detailed spatial semantics and long-range temporal context in video without exceeding the token budget of commonly used LLMs. Google's Conditioned Language Policy (CLP) framework is a general framework that builds on techniques from multi-task training and parameter-efficient finetuning to develop steerable models that can trade-off multiple conflicting objectives at inference time. Contact: sergi@earkind.com Timestamps: 00:34 Introduction 01:28 LLAMA 405B Performance Leaked 03:01 OpenAI Wants Its Own AI Chips 04:25 Towards more cooperative AI safety strategies 06:01 Fake sponsor 07:35 SlowFast-LLaVA: A Strong Training-Free Baseline for Video Large Language Models 09:17 AssistantBench: Can Web Agents Solve Realistic and Time-Consuming Tasks? 10:56 Conditioned Language Policy: A General Framework for Steerable Multi-Objective Finetuning 12:46 Outro

Duration:00:14:06

Ask host to enable sharing for playback control

Claude for Android 🤖 // AI for Material Sciences ⚡ // TinkerBird Disrupts RAG Workflows 🐦

7/22/2024
Claude for Android is now available, bringing AI-powered assistance to a wider audience. MIT researchers have developed a new machine-learning framework that can predict materials' thermal properties up to 1,000 times faster than other AI-based techniques, potentially improving energy efficiency. TinkerBird, a vector database designed for efficient storage and retrieval of high-dimensional vectors, is disrupting traditional RAG workflows and eliminating roundtrip delays associated with client-server models. ChatQA 2, a Llama3-based model from NVIDIA, bridges the gap between open-access LLMs and leading proprietary models in long-context understanding and retrieval-augmented generation capabilities, while Stable Audio Open, an open-weights text-to-audio model from Stability AI, showcases potential for high-quality stereo sound synthesis at 44.1kHz. Contact: sergi@earkind.com Timestamps: 00:34 Introduction 01:34 Claude for Android is here 02:50 AI method radically speeds predictions of materials’ thermal properties 04:44 TinkerBird 06:10 Fake sponsor 08:10 ChatQA 2: Bridging the Gap to Proprietary LLMs in Long Context and RAG Capabilities 09:54 Stable Audio Open 11:28 Jumping Ahead: Improving Reconstruction Fidelity with JumpReLU Sparse Autoencoders 13:54 Outro

Duration:00:15:04

Ask host to enable sharing for playback control

OpenAI's GPT-4o mini 💰 // NVIDIA's Mistral NeMo 12B 🚀 // Transcribro speech recognition 🎤

7/19/2024
OpenAI has released their newest model, GPT-4o mini, which is more cost-efficient and excels in mathematical reasoning and coding tasks. NVIDIA's Mistral NeMo 12B is a state-of-the-art language model with unprecedented accuracy and enterprise-grade support. A new speech recognition keyboard and service for Android called Transcribro has been developed, which is private and on-device. Research papers explore the impact of vocabulary size on language model scaling, the use of large datastores for retrieval-based language models, and a method for generating long sequences of views of a cityscape using AI and computer vision. Contact: sergi@earkind.com Timestamps: 00:34 Introduction 01:40 OpenAI Announces GPT 4o mini 03:11 Mistral AI and NVIDIA Unveil Mistral NeMo 12B, a Cutting-Edge Enterprise AI Model 05:28 Transcribro: Private and on-device speech recognition keyboard and service for Android 06:43 Fake sponsor 08:49 Scaling Laws with Vocabulary: Larger Models Deserve Larger Vocabularies 10:19 Scaling Retrieval-Based Language Models with a Trillion-Token Datastore 11:49 Streetscapes: Large-scale Consistent Street View Generation Using Autoregressive Video Diffusion 13:26 Outro

Duration:00:14:36

Ask host to enable sharing for playback control

Copyright Infringement in AI Training 🚫 // Open-Source AI Models 🤖 // NVIDIA's Open-Source Transition 🆕

7/18/2024
Apple, Nvidia, Anthropic, and Salesforce caught using content without creators' consent for AI training. Mistral AI launches two new open-source models, Codestral Mamba and Mathstral, with impressive capabilities. NVIDIA transitions to fully open-source GPU kernel modules, offering new capabilities and easy switching for users. Exciting research papers include Ref-AVS for multimodal object segmentation, Qwen2-Audio for large-scale audio-language modeling, and DiT-MoE for scalable language modeling and image generation. Contact: sergi@earkind.com Timestamps: 00:34 Introduction 01:27 Apple, Nvidia, Anthropic Used Thousands of Swiped YouTube Videos to Train AI 02:46 Mistral's New Open Source Models 04:09 NVIDIA Transitions Fully Towards Open-Source GPU Kernel Modules 05:37 Fake sponsor 07:15 Ref-AVS: Refer and Segment Objects in Audio-Visual Scenes 08:47 Qwen2-Audio Technical Report 10:49 Scaling Diffusion Transformers to 16 Billion Parameters 12:21 Outro

Duration:00:13:22