Data Brew by Databricks-logo

Data Brew by Databricks

Technology Podcasts

Welcome to Data Brew by Databricks with Denny and Brooke! In this series, we explore various topics in the data and AI community and interview subject matter experts in data engineering/data science. So join us with your morning brew in hand and get ready to dive deep into data + AI! For this first season, we will be focusing on lakehouses – combining the key features of data warehouses, such as ACID transactions, with the scalability of data lakes, directly against low-cost object stores.

Location:

United States

Description:

Welcome to Data Brew by Databricks with Denny and Brooke! In this series, we explore various topics in the data and AI community and interview subject matter experts in data engineering/data science. So join us with your morning brew in hand and get ready to dive deep into data + AI! For this first season, we will be focusing on lakehouses – combining the key features of data warehouses, such as ACID transactions, with the scalability of data lakes, directly against low-cost object stores.

Twitter:

@databricks

Language:

English

Contact:

9254872927


Episodes
Ask host to enable sharing for playback control

Reinforcement Fine-Tuning and the Future of Specialized AI Models

8/5/2025
What if building a custom AI model for your business was as simple as giving feedback—no massive labeled datasets required? In this episode, we sit down with Travis Addair, CTO and Co-Founder of Predibase, creators of the first reinforcement fine-tuning platform, to explore the future of specialized AI. Discover how reinforcement fine-tuning is revolutionizing model customization, enabling you to start fast, adapt to your unique data, and keep improving through human feedback. Whether you’re an AI enthusiast or a business leader, you’ll learn how this breakthrough is making advanced AI accessible to everyone. Highlights:

Duration:00:40:24

Ask host to enable sharing for playback control

Benchmarking Domain Intelligence | Data Brew | Episode 45

4/24/2025
In this episode, Pallavi Koppol, Research Scientist at Databricks, explores the importance of domain-specific intelligence in large language models (LLMs). She discusses how enterprises need models tailored to their unique jargon, data, and tasks rather than relying solely on general benchmarks. Highlights include: - Why benchmarking LLMs for domain-specific tasks is critical for enterprise AI. - An introduction to the Databricks Intelligence Benchmarking Suite (DIBS). - Evaluating models on real-world applications like RAG, text-to-JSON, and function calling. - The evolving landscape of open-source vs. closed-source LLMs. - How industry and academia can collaborate to improve AI benchmarking.

Duration:00:31:41

Ask host to enable sharing for playback control

SWE-bench & SWE-agent | Data Brew | Episode 44

4/17/2025
In this episode, Kilian Lieret, Research Software Engineer, and Carlos Jimenez, Computer Science PhD Candidate at Princeton University, discuss SWE-bench and SWE-agent, two groundbreaking tools for evaluating and enhancing AI in software engineering. Highlights include: - SWE-bench: A benchmark for assessing AI models on real-world coding tasks. - Addressing data leakage concerns in GitHub-sourced benchmarks. - SWE-agent: An AI-driven system for navigating and solving coding challenges. - Overcoming agent limitations, such as getting stuck in loops. - The future of AI-powered code reviews and automation in software engineering.

Duration:00:36:22

Ask host to enable sharing for playback control

Enterprise AI: Research to Product | Data Brew | Episode 43

4/10/2025
In this episode, Dipendra Kumar, Staff Research Scientist, and Alnur Ali, Staff Software Engineer at Databricks, discuss the challenges of applying AI in enterprise environments and the tools being developed to bridge the gap between research and real-world deployment. Highlights include: - The challenges of real-world AI—messy data, security, and scalability. - Why enterprises need high-accuracy, fine-tuned models over generic AI APIs. - How QuickFix learns from user edits to improve AI-driven coding assistance. - The collaboration between research & engineering in building AI-powered tools. - The evolving role of developers in the age of generative AI.

Duration:00:38:03

Ask host to enable sharing for playback control

Multimodal AI | Data Brew | Episode 42

4/7/2025
In this episode, Chang She, CEO and Co-founder of LanceDB, discusses the challenges of handling multimodal data and how LanceDB provides a cutting-edge solution. He shares his journey from contributing to Pandas to building a database optimized for images, video, vectors, and subtitles. Highlights include: - The limitations of traditional storage systems like Parquet for multimodal AI. - How LanceDB enables efficient querying and processing of diverse data types. - The growing importance of multimodal AI in enterprise applications. - Future trends in AI, including a shift from single models to holistic AI systems. - Predictions and "spicy takes" on AI advancements in 2025.

Duration:00:42:14

Ask host to enable sharing for playback control

Age of Agents | Data Brew | Episode 41

3/27/2025
In this episode, Michele Catasta, President of Replit, explores how AI-driven agents are transforming software development by making coding more accessible and automating application creation. Highlights include: - The difference between AI agents and copilots in software development. - How AI is democratizing coding, enabling non-programmers to build applications. - Challenges in AI agent development, including error handling and software quality. - The growing role of AI in entrepreneurship and business automation. - Why 2025 could be the year of AI agents and what’s next for the industry.

Duration:00:40:47

Ask host to enable sharing for playback control

Reward Models | Data Brew | Episode 40

3/20/2025
In this episode, Brandon Cui, Research Scientist at MosaicML and Databricks, dives into cutting-edge advancements in AI model optimization, focusing on Reward Models and Reinforcement Learning from Human Feedback (RLHF). Highlights include: - How synthetic data and RLHF enable fine-tuning models to generate preferred outcomes. - Techniques like Policy Proximal Optimization (PPO) and Direct Preference Optimization (DPO) for enhancing response quality. - The role of reward models in improving coding, math, reasoning, and other NLP tasks. Connect with Brandon Cui: https://www.linkedin.com/in/bcui19/

Duration:00:39:58

Ask host to enable sharing for playback control

Retrieval, rerankers, and RAG tips and tricks | Data Brew | Episode 39

2/20/2025
In this episode, Andrew Drozdov, Research Scientist at Databricks, explores how Retrieval Augmented Generation (RAG) enhances AI models by integrating retrieval capabilities for improved response accuracy and relevance. Highlights include: - Addressing LLM limitations by injecting relevant external information. - Optimizing document chunking, embedding, and query generation for RAG. - Improving retrieval systems with embeddings and fine-tuning techniques. - Enhancing search results using re-rankers and retrieval diagnostics. - Applying RAG strategies in enterprise AI for domain-specific improvements.

Duration:00:45:22

Ask host to enable sharing for playback control

The Power of Synthetic Data | Data Brew | Episode 38

2/4/2025
In this episode, Yev Meyer, Chief Scientist at Gretel AI, explores how synthetic data transforms AI and ML by improving data access, quality, privacy, and model training. Highlights include: - Leveraging synthetic data to overcome AI data limitations. - Enhancing model training while mitigating ethical and privacy risks. - Exploring the intersection of computational neuroscience and AI workflows. - Addressing licensing and legal considerations in synthetic data usage. - Unlocking private datasets for broader and safer AI applications.

Duration:00:42:28

Ask host to enable sharing for playback control

Secret to Production AI: Tools & Infrastructure | Data Brew | Episode 37

1/22/2025
In this episode, Julia Neagu, CEO & co-founder of Quotient AI, explores the challenges of deploying Generative AI and LLMs, focusing on model evaluation, human-in-the-loop systems, and iterative development. Highlights include: - Merging reinforcement learning and unsupervised learning for real-time AI optimization. - Reducing bias in machine learning with fairness and ethical considerations. - Lessons from large-scale AI deployments on scalability and feedback loops. - Automating workflows with AI through successful business examples. - Best practices for managing AI pipelines, from data collection to validation.

Duration:00:37:14

Ask host to enable sharing for playback control

Mixture of Memory Experts (MoME) | Data Brew | Episode 36

1/10/2025
In this episode, Sharon Zhou, Co-Founder and CEO of Lamini AI, shares her expertise in the world of AI, focusing on fine-tuning models for improved performance and reliability. Highlights include: - The integration of determinism and probabilism for handling unstructured data and user queries effectively. - Proprietary techniques like memory tuning and robust evaluation frameworks to mitigate model inaccuracies and hallucinations. - Lessons learned from deploying AI applications, including insights from GitHub Copilot’s rollout. Connect with Sharon Zhou and Lamini: https://www.linkedin.com/in/zhousharon/ https://x.com/realsharonzhou https://www.lamini.ai/

Duration:00:41:24

Ask host to enable sharing for playback control

Kumo AI & Relational Deep Learning | Data Brew | Episode 34

10/14/2024
In this episode, Jure Leskovec, Co-founder of Kumo AI and Professor of Computer Science at Stanford University, discusses Relational Deep Learning (RDL) and its role in automating feature engineering. Highlights include: - How RDL enhances predictive modeling. - Applications in fraud detection and recommendation systems. - The use of graph neural networks to simplify complex data structures.

Duration:00:43:27