AI Engineering Podcast-logo

AI Engineering Podcast

Technology Podcasts

This show is your guidebook to building scalable and maintainable AI systems. You will learn how to architect AI applications, apply AI to your work, and the considerations involved in building or customizing new models. Everything that you need to know to deliver real impact and value with machine learning and artificial intelligence.

Location:

United States

Description:

This show is your guidebook to building scalable and maintainable AI systems. You will learn how to architect AI applications, apply AI to your work, and the considerations involved in building or customizing new models. Everything that you need to know to deliver real impact and value with machine learning and artificial intelligence.

Language:

English


Episodes
Ask host to enable sharing for playback control

From Blind Spots to Observability: Operationalizing LLM Apps with OpenLit

2/15/2026
Summary In this episode of the AI Engineering Podcast, Aman Agarwal, creator of OpenLit, discusses the operational foundations required to run LLM-powered applications in production. He highlights common early blind spots teams face, including opaque model behavior, runaway token costs, and brittle prompt management, emphasizing that strong observability and cost tracking must be established before an MVP ships. Aman explains how OpenLit leverages OpenTelemetry for vendor-neutral tracing across models, tools, and data stores, and introduces features such as prompt and secret management with versioning, evaluation workflows (including LLM-as-a-judge), and fleet management for OpenTelemetry collectors. The conversation covers experimentation patterns, strategies to avoid vendor lock-in, and how detailed stepwise traces reshape system design and debugging. Aman also shares recent advancements like a Kubernetes operator for zero-code instrumentation, multi-database configurations for environment isolation, and integrations with platforms such as Grafana and Dash0. They conclude by discussing lessons learned from building in the open, prioritizing reliability, developer experience, and data security, and preview future work on context management and closing the loop from experimentation to prompt/dataset improvements. Announcements aiengineeringpodcast.com/bruin Interview Contact Info LinkedIn Parting Question Closing Announcements Data Engineering PodcastPodcast.__init__siteiTunes Links OpenLitFleet HubOpenTelemetryLangFuseLangSmithTensorZeroAI Engineering Podcast EpisodeTraceloopHeliconeClickhouse The intro and outro music is from Hitman's Lovesong feat. Paola Graziano by The Freak Fandango Orchestra/CC BY-SA 3.0

Duration:00:50:36

Ask host to enable sharing for playback control

Taming Voice Complexity with Dynamic Ensembles at Modulate

2/8/2026
Summary In this episode of the AI Engineering Podcast, Carter Huffman, co-founder and CTO of Modulate, discusses the engineering behind low-latency, high-accuracy Voice AI. He explains why voice is a uniquely challenging modality due to its rich non-textual signals like tone, emotion, and context, and how simple speech-to-text-to-speech pipelines can't capture the necessary nuance. Carter introduces Modulate's Ensemble Listening Model (ELM) architecture, which uses dynamic routing and cost-based optimization to achieve scalability and precision in various audio environments. He covera topics such as reliability under distributed systems constraints, watchdogging with periodic model checks, structured long-horizon memory for conversations, and the trade-offs that make ensemble approaches compelling for repeated tasks at scale. Carter also shares insights on how ELMs generalize beyond voice, draws parallels to database query planners and mixture-of-experts, and discusses strategies for observability and evaluation in complex processing pipelines. Announcements aiengineeringpodcast.com/bruin Interview Contact Info LinkedIn Parting Question Closing Announcements Data Engineering PodcastPodcast.__init__siteiTunes Links ModulateNasa Jet Propulsion LaboratoryOpenAI WhisperMulti-Armed BanditCost-Based OptimizerGPT 5LLM AttentionTransformer ArchitectureMixture of ExpertsDilated ConvolutionWavenet The intro and outro music is from Hitman's Lovesong feat. Paola Graziano by The Freak Fandango Orchestra/CC BY-SA 3.0

Duration:00:59:25

Ask host to enable sharing for playback control

GPU Clouds, Aggregators, and the New Economics of AI Compute

1/27/2026
Summary In this episode I sit down with Hugo Shi, co-founder and CTO of Saturn Cloud, to map the strategic realities of sourcing and operating GPUs across clouds. Hugo breaks down today’s provider landscape—from hyperscalers to full-service GPU clouds, bare metal/concierge providers, and emerging GPU aggregators—and how to choose among them based on security posture, managed services, and cost. We explore practical layers of capability (compute, orchestration with Kubernetes/Slurm, storage, networking, and managed services), the trade-offs of portability on “Kubernetes-native” stacks, and the persistent challenge of data gravity. We also discuss current supply dynamics, the growing availability of on-demand capacity as newer chips roll out, and how AMD’s ecosystem is maturing as real competition to NVIDIA. Hugo shares patterns for separating training and inference across providers, why traditional ML is far from dead, and how usage varies wildly across domains like biotech. We close with predictions on consolidation, full‑stack experiences from GPU clouds, financial-style GPU marketplaces, and much-needed advances in reliability for long-running GPU jobs. Announcements aiengineeringpodcast.com/bruin Interview Contact Info LinkedIn Parting Question Closing Announcements Data Engineering PodcastPodcast.__init__siteiTunes Links Saturn CloudPandasNumPyMatLabAWSGCPAzureOracle CloudRunPodFluidStackSFComputeKubeFlowLightning AIDStackMetaflowFlyteArya AIDagsterCoreweaveVultrNebiusVast.aiWekaVast DataSlurmCNCF == Cloud-Native Computing FoundationKubernetesTerraformECSHelm ChartBlock StorageObject StorageContainer RegistryCrusoeAlluxioData VirtualizationGB300H100Spot InstanceAWS TrainiumGoogle TPU (Tensor Processing Unit)AMDROCMPyTorchGoogle Vertex AIAWS BedrockCUDA PythonMojoXGBoostRandom ForestLudwigPaperspaceVoltage ParkWeights & Biases The intro and outro music is from Hitman's Lovesong feat. Paola Graziano by The Freak Fandango Orchestra/CC BY-SA 3.0

Duration:00:46:02

Ask host to enable sharing for playback control

The Future of Dev Experience: Spotify’s Playbook for Organization‑Scale AI

1/19/2026
Summary In this episode of the AI Engineering Podcast Niklas Gustavsson, Chief Architect at Spotify, talks about scaling AI across engineering and product. He explores how Spotify's highly distributed architecture was built to support rapid adoption of coding agents like Copilot, Cursor, and Claude Code, enabled by standardization and Backstage. The conversation covers the tension between bottoms-up experimentation and platform standardization, and how Spotify is moving toward monorepos and fleet management. Niklas discusses the emergence of "fleet-wide agents" that can execute complex code changes with robust testing and LLM-as-judge loops to ensure quality. He also touches on the shift in engineering workflows as code generation accelerates, the growing use of agents beyond coding, and the lessons learned in sandboxing, agent skills/rules, and shared evaluation frameworks. Niklas highlights Spotify's decade-long experience with ML product work and shares his vision for deeper end-to-end integration of agentic capabilities across the full product lifecycle and making collaborative "team-level memory" for agents a reality. Announcements aiengineeringpodcast.com/bruin Interview Contact Info LinkedIn Parting Question Closing Announcements Data Engineering PodcastPodcast.__init__siteiTunes Links SpotifyDeveloper ExperienceLLM == Large Language ModelTransformersBackStageGitHub CopilotCursorClaude SkillsMonorepoMCP == Model Context ProtocolClaude CodeProduct ManagerDORA MetricsType AnnotationsBigQueryPRD == Product Requirements DocumentAI EvalsLLM-as-a-JudgeAgentic Memory The intro and outro music is from Hitman's Lovesong feat. Paola Graziano by The Freak Fandango Orchestra/CC BY-SA 3.0

Duration:00:56:17

Ask host to enable sharing for playback control

Generative AI Meets Accessibility: Benchmarks, Breakthroughs, and Blind Spots with Joe Devon

1/4/2026
Summary In this episode Joe Devon, co-founder of Global Accessibility Awareness Day (GAAD), talks about how generative AI can both help and harm digital accessibility — and what it will take to tilt the balance toward inclusion. Joe shares his personal motivation for the work, real-world stakes for disabled users across web, mobile, and developer tooling, and compelling stories that illustrate why accessible design is a human-rights issue as much as a compliance checkbox. He digs into AI’s current and future roles: from improving caption quality and auto-generating audio descriptions to evaluating how well code-gen models produce accessible UI by default. Joe introduces AIMAC (AI Model Accessibility Checker), a new benchmark comparing top models on accessibility-minded code generation, what the results reveal, and how model providers and engineering teams can practically raise the bar with linters, training data, and cultural change. He closes with concrete guidance for leaders, why involving people with disabilities is non-negotiable, and how solving for edge cases makes AI—and products—better for everyone. Announcements aiengineeringpodcast.com/prefectaiengineeringpodcast.com/bruin Interview Contact Info LinkedIn Parting Question Closing Announcements Data Engineering PodcastPodcast.__init__siteiTunes Links AIMACGitHubGlobal Accessibility Awareness Day (GAAD)GAAD FoundationAltaVistaCursorAccessibilityBraille DisplayBen OgilvieState of Mobile App Accessibility ReportVT-100GhosttyWarp TerminalLLM-as-a-JudgeFFMPEGAria TagsAxe-CoreMiniMax M1Codex MiniQwenKimiGoogle LighthouseGitHub CopilotBe-My-EyesBe-My-AIWebAIMXRAccessXR == Extended RealityDeque UniversityFable The intro and outro music is from Hitman's Lovesong feat. Paola Graziano by The Freak Fandango Orchestra/CC BY-SA 3.0

Duration:00:56:12

Ask host to enable sharing for playback control

Beyond the Chatbot: Practical Frameworks for Agentic Capabilities in SaaS

12/28/2025
Summary In this episode product and engineering leader Preeti Shukla explores how and when to add agentic capabilities to SaaS platforms. She digs into the operational realities that AI agents must meet inside multi-tenant software: latency, cost control, data privacy, tenant isolation, RBAC, and auditability. Preeti outlines practical frameworks for selecting models and providers, when to self-host, and how to route capabilities across frontier and cheaper models. She discusses graduated autonomy, starting with internal adoption and low-risk use cases before moving to customer-facing features, and why many successful deployments keep a human-in-the-loop. She also covers evaluation and observability as core engineering disciplines - layered evals, golden datasets, LLM-as-a-judge, path/behavior monitoring, and runtime vs. offline checks - to achieve reliability in nondeterministic systems. Announcements aiengineeringpodcast.com/prefectaiengineeringpodcast.com/bruin Interview Contact Info LinkedIn Parting Question Links SaaS == Software as a ServiceMulti-TenancyFew-shot LearningLLM as a JudgeRAG == Retrieval Augmented GenerationMCP == Model Context ProtocolLoveable The intro and outro music is from Hitman's Lovesong feat. Paola Graziano by The Freak Fandango Orchestra/CC BY-SA 3.0

Duration:00:53:47

Ask host to enable sharing for playback control

MCP as the API for AI‑Native Systems: Security, Orchestration, and Scale

12/15/2025
Summary In this episode Craig McLuckie, co-creator of Kubernetes and founder/CEO of Stacklok, talks about how to improve security and reliability for AI agents using curated, optimized deployments of the Model Context Protocol (MCP). Craig explains why MCP is emerging as the API layer for AI‑native applications, how to balance short‑term productivity with long‑term platform thinking, and why great tools plus frontier models still drive the best outcomes. He digs into common adoption pitfalls (tool pollution, insecure NPX installs, scattered credentials), the necessity of continuous evals for stochastic systems, and the shift from “what the agent can access” to “what the agent knows.” Craig also shares how ToolHive approaches secure runtimes, a virtual MCP gateway with semantic search, orchestration and transactional semantics, a registry for organizational tooling, and a console for self‑service—along with pragmatic patterns for auth, policy, and observability. Announcements aiengineeringpodcast.com/prefectaiengineeringpodcast.com/bruinInterview Contact Info GitHubLinkedInParting Question Links StackLokMCP == Model Context ProtocolKubernetesCNCF == Cloud Native Computing FoundationSDLC == Software Development Life CycleThe Bitter LessonTLA+Jepsen TestsToolHiveAPI GatewayGlean The intro and outro music is from Hitman's Lovesong feat. Paola Graziano by The Freak Fandango Orchestra/CC BY-SA 3.0

Duration:01:07:43

Ask host to enable sharing for playback control

Context as Code, DevX as Leverage: Accelerating Software with Multi‑Agent Workflows

11/23/2025
Summary In this episode Max Beauchemin explores how multiplayer, multi‑agent engineering is reshaping individual and team velocity for building data and AI systems. Max shares his journey from Airflow and Superset to going all‑in on AI coding agents, describing a pragmatic “AI‑first reflex” for nearly every task and the emerging role of humans as orchestrators of agents. He digs into shifting bottlenecks — code review, QA, async coordination — and how better DevX/AIX, just‑in‑time context via tools, and structured "context as code" can keep pace with agent‑accelerated execution. He then dives deep into Agor, a new open‑source agent‑orchestration platform: a spatial, multiplayer canvas that manages git worktrees and shared dev environments, enables templated prompts and zone‑based workflows, and exposes an internal MCP so agents can operate the system — and each other. Max discusses session forking, sub‑session trees, scheduling, and safety considerations, and how these capabilities enable parallelization, handoffs across roles, and richer visibility into prompting and cost/usage—pointing to a near future where software engineering centers on orchestrating teams of agents and collaborators. Resources: agor.live (docs, one‑click Codespaces, npm install), Apache Superset, and related MCP/CLI tooling referenced for agent workflows. Announcements aiengineeringpodcast.com/prefectaiengineeringpodcast.com/bruinInterview Contact Info LinkedInLinks AgorApache AirflowApache SupersetPresetClaude CodeCodexPlaywright MCPTmuxGit WorktreesOpencode.aiGitHub CodespacesOna The intro and outro music is from Hitman's Lovesong feat. Paola Graziano by The Freak Fandango Orchestra/CC BY-SA 3.0

Duration:00:59:49

Ask host to enable sharing for playback control

Inside the Black Box: Neuron-Level Control and Safer LLMs

11/16/2025
Summary In this episode of the AI Engineering Podcast Vinay Kumar, founder and CEO of Arya.ai and head of Lexsi Labs, talks about practical strategies for understanding and steering AI systems. He discusses the differences between interpretability and explainability, and why post-hoc methods can be misleading. Vinay shares his approach to tracing relevance through deep networks and LLMs using DL Backtrace, and how interpretability is evolving from an audit tool into a lever for alignment, enabling targeted pruning, fine-tuning, unlearning, and model compression. The conversation covers setting concrete alignment metrics, the gaps in current enterprise practices for complex models, and tailoring explainability artifacts for different stakeholders. Vinay also previews his team's "AlignTune" effort for neuron-level model editing and discusses emerging trends in AI risk, multi-modal complexity, and automated safety agents. He explores when and why teams should invest in interpretability and alignment, how to operationalize findings without overcomplicating evaluation, and the best practices for private, safer LLM endpoints in enterprises, aiming to make advanced AI not just accurate but also acceptable, auditable, and scalable. Announcements aiengineeringpodcast.com/prefectaiengineeringpodcast.com/bruin Interview Contact Info LinkedIn Parting Question Closing Announcements Data Engineering PodcastPodcast.__init__siteiTunes Links Lexsi LabsAyra.aiDeep LearningAlexNetDL BacktraceGradient BoostSAE == Sparse AutoEncoderShapley ValuesLRP == Layerwise Relevance PropagationIG == Integrated GradientsCircuit DiscoveryF1 ScoreLLM As A Judge The intro and outro music is from Hitman's Lovesong feat. Paola Graziano by The Freak Fandango Orchestra/CC BY-SA 3.0

Duration:01:00:52

Ask host to enable sharing for playback control

Building the Internet of Agents: Identity, Observability, and Open Protocols

11/9/2025
Summary In this episode Guillaume de Saint Marc, VP of Engineering at Cisco Outshift, talks about the complexities and opportunities of scaling multi‑agent systems. Guillaume explains why specialized agents collaborating as a team inspire trust in enterprise settings, and contrasts rigid, “lift-and-shift” agentic workflows with fully self-forming systems. We explore the emerging Internet of Agents, the need for open, interoperable protocols (A2A for peer collaboration and MCP for tool calling), and new layers in the stack for syntactic and semantic communication. Guillaume details foundational needs around discovery, identity, observability, and fine-grained, task/tool/transaction-based access control (TBAC), along with Cisco’s open-source Agency initiative, directory concepts, and OpenTelemetry extensions for agent traces. He shares concrete wins in IT/NetOps—network config validation, root-cause analysis, and the CAPE platform engineer agent—showing dramatic productivity gains. We close with human-in-the-loop UX patterns for multi-agent teams and SLIM, a high-performance group communication layer designed for agent collaboration. Announcements aiengineeringpodcast.com/prefectaiengineeringpodcast.com/bruinInterview Contact Info LinkedInParting Question Closing Announcements Data Engineering PodcastPodcast.__init__siteiTunesLinks Outshift by CiscoMulti-Agent SystemsDeep LearningMerakiSymbolic ReasoningTransformer ArchitectureDeepSeekLLM ReasoningRené DescartesKanbanA2A (Agent-to-Agent) ProtocolMCP == Model Context ProtocolAGNTCYICANN == Internet Corporation for Assigned Names and NumbersOSI LayersOCI == Open Container InitiativeOASF == Open Agentic Schema FrameworkOracle AgentSpecSplunkOpenTelemetryCAIPE == Community AI Platform EngineerAGNTCY Coffee ShopThe intro and outro music is from Hitman's Lovesong feat. Paola Graziano by The Freak Fandango Orchestra/CC BY-SA 3.0

Duration:01:07:14

Ask host to enable sharing for playback control

Agents, IDEs, and the Blast Radius: Practical AI for Software Engineers

11/2/2025
Summary In this episode of the AI Engineering Podcast Will Vincent, Python developer advocate at JetBrains (PyCharm), talks about how AI utilities are revolutionizing software engineering beyond basic code completion. He discusses the shift from "vibe coding" to "vibe engineering," where engineers collaborate with AI agents through clear guidelines, iterative specs, and tight guardrails. Will shares practical techniques for getting real value from these tools, including loading the whole codebase for context, creating agent specifications, constraining blast radius, and favoring step-by-step plans over one-shot generations. The conversation covers code review gaps, deployment context, and why continuity across tools matters, as well as JetBrains' evolving approach to integrated AI, including support for external and local models. Will emphasizes the importance of human oversight, particularly for architectural choices and production changes, and encourages experimentation and playfulness while acknowledging the ethics, security, and reliability tradeoffs that come with modern LLMs. Announcements aiengineeringpodcast.com/prefectaiengineeringpodcast.com/bruinInterview Contact Info Will VincentParting Question Closing Announcements Data Engineering PodcastPodcast.__init__siteiTunesLinks JetBrainsSimon WillisonVibe Engineering PostGitHub CopilotAGENTS.mdCopilot AGENTS.md instructionsKiro IDEClaude CodeJetbrains QuickEditClaude Agent in JetBrains IDEsRuffuvtypyreflyIDE == Integrated Development EnvironmentOllamaLM StudioGoogle GemmaDeepseekgpt-ossOllama CloudGemini DiffusionDjango Annual SurveyCo-IntelligenceThe intro and outro music is from Hitman's Lovesong feat. Paola Graziano by The Freak Fandango Orchestra/CC BY-SA 3.0

Duration:00:59:18

Ask host to enable sharing for playback control

From MRI to World Models: How AI Is Changing What We See

10/26/2025
Summary In this episode of the AI Engineering Podcast Daniel Sodickson, Chief of Innovation in Radiology at NYU Grossman School of Medicine, talks about harnessing AI systems to truly understand images and revolutionize science and healthcare. Dan shares his journey from linear reconstruction to early deep learning for accelerated MRI, highlighting the importance of domain expertise when adapting models to specialized modalities. He explores "upstream" AI that changes what and how we measure, using physics-guided networks, prior knowledge, and personal baselines to enable faster, cheaper, and more accessible imaging. The conversation covers multimodal world models, cross-disciplinary translation, explainability, and a future where agents flag abnormalities while humans apply judgment, as well as provocative frontiers like "imaging without images," continuous health monitoring, and decoding brain activity. Dan stresses the need to preserve truth, context, and human oversight in AI-driven imaging, and calls for tools that distill core methodologies across disciplines to accelerate understanding and progress. Announcements aiengineeringpodcast.com/prefectaiengineeringpodcast.com/bruinInterview Contact Info LinkedInParting Question Closing Announcements Data Engineering PodcastPodcast.__init__siteiTunesLinks MRI == Magnetic Resonance ImagingLinear AlgorithmNon-Linear AlgorithmCompressed SensingDictionary Learning AlgorithmDeep LearningCT ScanCambrian ExplosionLIDAR Point CloudSynthetic Aperture RadarGeoffrey HintonCo-IntelligenceTomographyX-Ray CrystallographyCERNCLIP ModelPhysics-Guided Neural NetworkFunctional MRIA Path Toward Autonomous Machine IntelligenceThe intro and outro music is from Hitman's Lovesong feat. Paola Graziano by The Freak Fandango Orchestra/CC BY-SA 3.0

Duration:00:48:51

Ask host to enable sharing for playback control

Specs, Tests, and Self‑Verification: The Playbook for Agentic Engineering Teams

10/19/2025
Summary In this episode Andrew Filev, CEO and founder of ZenCoder, takes a deep dive into the system design, workflows, and organizational changes behind building agentic coding systems. He traces the evolution from autocomplete to truly agentic models, discusses why context engineering and verification are the real unlocks for reliability, and outlines a pragmatic path from “vibe coding” to AI‑first engineering. Andrew shares ZenCoder’s internal playbook: PRD and tech spec co‑creation with AI, human‑in‑the‑loop gates, test‑driven development, and emerging BDD-style acceptance testing. He explores multi-repo context, cross-service reasoning, and how AI reshapes team communication, ownership, and architecture decisions. He also covers cost strategies, when to choose agents vs. manual edits, and why self‑verification and collaborative agent UX will define the next wave. Andrew offers candid lessons from building ZenCoder—why speed of iteration beats optimizing for weak models, how ignoring the emotional impact of vibe coding slowed brand momentum, and where agentic tools fit across greenfield and legacy systems. He closes with predictions for the next year: self‑verification, parallelized agent workflows, background execution in CI, and collaborative spec‑driven development moving code review upstream. Announcements aiengineeringpodcast.com/prefectInterview Contact Info LinkedInParting Question Closing Announcements Data Engineering PodcastPodcast.__init__siteiTunesLinks ZencoderWrikeDARPA Robotics ChallengeCognitive ComputingAndrew NgSebastian ThrunGithub CopilotRAG == Retrieval Augmented GenerationRe-rankingClaude Sonnet 3.5SWE-BenchVibe CodingAI First EngineeringWaterfall Software EngineeringAgile Software EngineeringPRD == Project Requirements DocumentBDD == Behavior-Driven DevelopmentVSCodeThe intro and outro music is from Hitman's Lovesong feat. Paola Graziano by The Freak Fandango Orchestra/CC BY-SA 3.0

Duration:01:06:28

Ask host to enable sharing for playback control

From Probabilistic to Trustworthy: Building Orion, an Agentic Analytics Platform

10/11/2025
Summary In this episode of the AI Engineering Podcast Lucas Thelosen and Drew Gillson talk about Orion, their agentic analytics platform that delivers proactive, push-based insights to business users through asynchronous thinking with rich organizational context. Lucas and Drew share their approach to building trustworthy analysis by grounding in semantic layers, fact tables, and quality-assurance loops, as well as their focus on accuracy through parallel test-time compute and evolving from probabilistic steps to deterministic tools. They discuss the importance of context engineering, multi-agent orchestration, and security boundaries for enterprise deployments, and share lessons learned on consistency, tool design, user change management, and the emerging role of "AI manager" as a career path. The conversation highlights the future of AI knowledge workers collaborating across organizations and tools while simplifying UIs and raising the bar on actionable, trustworthy analytics. Announcements aiengineeringpodcast.com/prefectInterview Contact Info LinkedInLinkedInParting Question Closing Announcements Data Engineering PodcastPodcast.__init__siteiTunesLinks GravityOrionData Engineering Podcast EpisodeSite Reliability EngineeringAnthropic Claude Sonnet 4.5A2A (Agent2Agent) ProtocolSimon WillisonAI Lethal TrifectaBehavioral ScienceGrounded TheoryLLM as a JudgeRLHF == Reinforcement Learning from Human FeedbackThe intro and outro music is from Hitman's Lovesong feat. Paola Graziano by The Freak Fandango Orchestra/CC BY-SA 3.0

Duration:01:12:19

Ask host to enable sharing for playback control

Building Production-Ready AI Agents with Pydantic AI

10/6/2025
Summary In this episode of the AI Engineering Podcast Samuel Colvin, creator of Pydantic and founder of Pydantic Inc, talks about Pydantic AI - a type-safe framework for building structured AI agents in Python. Samuel explains why he built Pydantic AI to bring FastAPI-like ergonomics and production-grade engineering to agents, focusing on strong typing, minimal abstractions, and reliability, observability, and stability. He explores the evolving agent ecosystem, patterns for single vs. many agents, graphs vs. durable execution, and how Pydantic AI approaches structured I/O, tool calling, and MCP with type safety in mind. Samuel also shares insights on design trade-offs, model-provider churn, schema unification, safe code execution, security gaps, and the importance of open standards and OpenTelemetry for observability. Announcements aiengineeringpodcast.com/prefectInterview Contact Info GitHubLinkedInParting Question Closing Announcements Data Engineering PodcastPodcast.__init__siteiTunesLinks PydanticPydantic AIPydantic IncPydantic LogfireOpenAI AgentsGoogle ADKLangChainLlamaIndexCrewAIDurable ExecutionTemporalMCP == Model Context ProtocolClaude CodeTypescriptGemini Structured OutputOpenAI Structured OutputDottxt Outlines SDKsmolagentsLiteLLMOpenRouterOpenAI Responses APIFastAPISQLModelAI SDK JavaScriptLangGraphNextJSPyodideAI ElementsThe intro and outro music is from Hitman's Lovesong feat. Paola Graziano by The Freak Fandango Orchestra/CC BY-SA 3.0

Duration:00:50:53

Ask host to enable sharing for playback control

From GPUs to Workloads: Flex AI’s Blueprint for Fast, Cost‑Efficient AI

9/28/2025
Summary In this episode of the AI Engineering Podcast Brijesh Tripathi, CEO of Flex AI, talks about revolutionizing AI engineering by removing DevOps burdens through "workload as a service". Brijesh shares his expertise from leading AI/HPC architecture at Intel and deploying supercomputers like Aurora, highlighting how access friction and idle infrastructure slow progress. He discusses Flex AI's innovative approach to simplifying heterogeneous compute, standardizing on consistent Kubernetes layers, and abstracting inference across various accelerators, allowing teams to iterate faster without wrestling with drivers, libraries, or cloud-by-cloud differences. Brijesh also shares insights into Flex AI's strategies for lifting utilization, protecting real-time workloads, and spanning the full lifecycle from fine-tuning to autoscaled inference, all while keeping complexity at bay. Announcements aiengineeringpodcast.com/prefectInterview Contact Info LinkedInParting Question Links Flex AIAurora Super ComputerCoreWeaveKubernetesCUDAROCmTensor Processing Unit (TPU)PyTorchTritonTrainiumASIC == Application Specific Integrated CircuitSOC == System On a ChipLoveableFlexAI BlueprintsTenstorrentThe intro and outro music is from Hitman's Lovesong feat. Paola Graziano by The Freak Fandango Orchestra/CC BY-SA 3.0

Duration:00:55:19

Ask host to enable sharing for playback control

Right-Sizing AI: Small Language Models for Real-World Production

9/20/2025
Summary In this episode of the AI Engineering Podcast Steven Huels, VP of AI Engineering at Red Hat, talks about the practical applications of small language models (SLMs) for production workloads. He discusses how SLMs offer a pragmatic choice due to their ability to fit on single enterprise GPUs and provide model selection trade-offs. The conversation covers self-hosting vs using API providers, organizational capabilities needed for running production-grade LLMs, and the importance of guardrails and automated evaluation at scale. They also explore the rise of agentic systems and service-oriented approaches powered by smaller models, highlighting advances in customization and deployment strategies. Steven shares real-world examples and looks to the future of agent cataloging, continuous retraining, and resource efficiency in AI engineering. Announcements aiengineeringpodcast.com/prefectInterview Contact Info LinkedInParting Question Closing Announcements Data Engineering PodcastPodcast.__init__siteiTunesLinks RedHat AI EngineeringGenerative AIPredictive AIChatGPTQLORAHuggingFacevLLMOpenShift AILlama ModelsDeepSeekGPT-OSSMistralMixture of Experts (MoE)QwenInstructLabSFT == Supervised Fine TuningLORAThe intro and outro music is from Hitman's Lovesong feat. Paola Graziano by The Freak Fandango Orchestra/CC BY-SA 3.0

Duration:00:50:58

Ask host to enable sharing for playback control

AI Agents and Identity Management

9/13/2025
Summary In this episode of the AI Engineering Podcast Julianna Lamb, co-founder and CTO of Stytch, talks about the complexities of managing identity and authentication in agentic workflows. She explores the evolving landscape of identity management in the context of machine learning and AI, highlighting the importance of flexible compute environments and seamless data exchange. The conversation covers implications of AI agents on identity management, including granular permissions, OAuth protocol, and adapting systems for agentic interactions. Julianna also discusses rate limiting, persistent identity, and evolving standards for managing identity in AI systems. She emphasizes the need to experiment with AI agents and prepare systems for integration to stay ahead in the rapidly advancing AI landscape. Announcements aiengineeringpodcast.com/prefectInterview Contact Info LinkedInParting Question Closing Announcements Data Engineering PodcastPodcast.__init__siteiTunesLinks StytchAI AgentMachine To Machine AuthenticationAPI AuthenticationMCP == Model Context ProtocolOAuthIdentity ProviderOAuth ScopesOAuth 2.1CaptchaRBAC == Role-Based Access ControlABAC == Attribute-Based Access ControlReBAC == Relationship-Based Access ControlGoogle ZanzibarIdempotenceDynamic Client RegistrationLarge Action ModelsClaude CodeThe intro and outro music is from Hitman's Lovesong feat. Paola Graziano by The Freak Fandango Orchestra/CC BY-SA 3.0

Duration:00:53:32

Ask host to enable sharing for playback control

Revolutionizing Production Systems: The Resolve AI Approach

9/3/2025
Summary In this episode of the AI Engineering Podcast, CEO of Resolve AI Spiros Xanthos shares his insights on building agentic capabilities for operational systems. He discusses the limitations of traditional observability tools and the need for AI agents that can reason through complex systems to provide actionable insights and solutions. The conversation highlights the architecture of Resolve AI, which integrates with existing tools to build a comprehensive understanding of production environments, and emphasizes the importance of context and memory in AI systems. Spiros also touches on the evolving role of AI in production systems, the potential for AI to augment human operators, and the need for continuous learning and adaptation to fully leverage these advancements. Announcements Interview Contact Info LinkedInParting Question Closing Announcements Podcast.__init__AI Engineering PodcastsiteClosing Announcements Data Engineering PodcastPodcast.__init__siteiTunesLinks Resolve AISplunkOpenTelemetrySplunk ObservabilityContext EngineeringGrafanaKubernetesPagerDutyThe intro and outro music is from Hitman's Lovesong feat. Paola Graziano by The Freak Fandango Orchestra/CC BY-SA 3.0

Duration:00:51:01

Ask host to enable sharing for playback control

Designing Scalable AI Systems with FastMCP: Challenges and Innovations

8/26/2025
Summary In this episode of the AI Engineering Podcast Jeremiah Lowin, founder and CEO of Prefect Technologies, talks about the FastMCP framework and the design of MCP servers. Jeremiah explains the evolution of FastMCP, from its initial creation as a simpler alternative to the MCP SDK to its current role in facilitating the deployment of AI tools. The discussion covers the complexities of designing MCP servers, the importance of context engineering, and the potential pitfalls of overwhelming AI agents with too many tools. Jeremiah also highlights the importance of simplicity and incremental adoption in software design, and shares insights into the future of MCP and the broader AI ecosystem. The episode concludes with a look at the challenges of authentication and authorization in AI applications and the exciting potential of MCP as a protocol for the future of AI-driven business logic. Announcements Interview Contact Info LinkedInGitHubParting Question Closing Announcements Data Engineering PodcastPodcast.__init__siteiTunesLinks FastMCPFastMCP CloudPrefectModel Context Protocol (MCP)AI ToolsFastAPIPython DecoratorWebsocketsSSE == Server-Sent EventsStreamable HTTPOAuthMCP GatewayMCP SamplingFlaskDjangoASGIMCP ElicitationAuthKitDynamic Client RegistrationsmolagentsLarge Active ModelsA2AThe intro and outro music is from Hitman's Lovesong feat. Paola Graziano by The Freak Fandango Orchestra/CC BY-SA 3.0

Duration:01:13:57