Linear Digressions-logo

Linear Digressions

Technology Podcasts

Linear Digressions is a podcast about machine learning and data science. Machine learning is being used to solve a ton of interesting problems, and to accomplish goals that were out of reach even a few short years ago.

Location:

United States

Description:

Linear Digressions is a podcast about machine learning and data science. Machine learning is being used to solve a ton of interesting problems, and to accomplish goals that were out of reach even a few short years ago.

Language:

English


Episodes
Ask host to enable sharing for playback control

So long, and thanks for all the fish

7/26/2020
All good things must come to an end, including this podcast. This is the last episode we plan to release, and it doesn’t cover data science—it’s mostly reminiscing, thanking our wonderful audience (that’s you!), and marveling at how this thing that started out as a side project grew into a huge part of our lives for over 5 years. It’s been a ride, and a real pleasure and privilege to talk to you each week. Thanks, best wishes, and good night! —Katie and Ben

Duration:00:35:44

Ask host to enable sharing for playback control

A Reality Check on AI-Driven Medical Assistants

7/19/2020
The data science and artificial intelligence community has made amazing strides in the past few years to algorithmically automate portions of the healthcare process. This episode looks at two computer vision algorithms, one that diagnoses diabetic retinopathy and another that classifies liver cancer, and asks the question—are patients now getting better care, and achieving better outcomes, with these algorithms in the mix? The answer isn’t no, exactly, but it’s not a resounding yes, because...

Duration:00:14:00

Ask host to enable sharing for playback control

A Data Science Take on Open Policing Data

7/12/2020
A few weeks ago, we put out a call for data scientists interested in issues of race and racism, or people studying how those topics can be studied with data science methods, should get in touch to come talk to our audience about their work. This week we’re excited to bring on Todd Hendricks, Bay Area data scientist and a volunteer who reached out to tell us about his studies with the Stanford Open Policing dataset.

Duration:00:23:44

Ask host to enable sharing for playback control

Procella: YouTube's super-system for analytics data storage

7/5/2020
This is a re-release of an episode that originally ran in October 2019. If you’re trying to manage a project that serves up analytics data for a few very distinct uses, you’d be wise to consider having custom solutions for each use case that are optimized for the needs and constraints of that use cases. You also wouldn’t be YouTube, which found themselves with this problem (gigantic data needs and several very different use cases of what they needed to do with that data) and went a different...

Duration:00:29:48

Ask host to enable sharing for playback control

The Data Science Open Source Ecosystem

6/28/2020
Open source software is ubiquitous throughout data science, and enables the work of nearly every data scientist in some way or another. Open source projects, however, are disproportionately maintained by a small number of individuals, some of whom are institutionally supported, but many of whom do this maintenance on a purely volunteer basis. The health of the data science ecosystem depends on the support of open source projects, on an individual and institutional level....

Duration:00:23:06

Ask host to enable sharing for playback control

Rock the ROC Curve

6/21/2020
This is a re-release of an episode that first ran on January 29, 2017. This week: everybody's favorite WWII-era classifier metric! But it's not just for winning wars, it's a fantastic go-to metric for all your classifier quality needs.

Duration:00:15:52

Ask host to enable sharing for playback control

Criminology and Data Science

6/14/2020
This episode features Zach Drake, a working data scientist and PhD candidate in the Criminology, Law and Society program at George Mason University. Zach specializes in bringing data science methods to studies of criminal behavior, and got in touch after our last episode (about racially complicated recidivism algorithms). Our conversation covers a wide range of topics—common misconceptions around race and crime statistics, how methodologically-driven criminology scholars think about building...

Duration:00:30:57

Ask host to enable sharing for playback control

Racism, the criminal justice system, and data science

6/7/2020
As protests sweep across the United States in the wake of the killing of George Floyd by a Minneapolis police officer, we take a moment to dig into one of the ways that data science perpetuates and amplifies racism in the American criminal justice system. COMPAS is an algorithm that claims to give a prediction about the likelihood of an offender to re-offend if released, based on the attributes of the individual, and guess what: it shows disparities in the predictions for black and white...

Duration:00:31:36

Ask host to enable sharing for playback control

An interstitial word from Ben

6/4/2020
A message from Ben around algorithmic bias, and how our models are sometimes reflections of ourselves.

Duration:00:05:59

Ask host to enable sharing for playback control

Convolutional Neural Networks

5/31/2020
This is a re-release of an episode that originally aired on April 1, 2018 If you've done image recognition or computer vision tasks with a neural network, you've probably used a convolutional neural net. This episode is all about the architecture and implementation details of convolutional networks, and the tricks that make them so good at image tasks.

Duration:00:21:55

Ask host to enable sharing for playback control

Stein's Paradox

5/24/2020
This is a re-release of an episode that was originally released on February 26, 2017. When you're estimating something about some object that's a member of a larger group of similar objects (say, the batting average of a baseball player, who belongs to a baseball team), how should you estimate it: use measurements of the individual, or get some extra information from the group? The James-Stein estimator tells you how to combine individual and group information make predictions that, taken...

Duration:00:27:02

Ask host to enable sharing for playback control

Protecting Individual-Level Census Data with Differential Privacy

5/17/2020
The power of finely-grained, individual-level data comes with a drawback: it compromises the privacy of potentially anyone and everyone in the dataset. Even for de-identified datasets, there can be ways to re-identify the records or otherwise figure out sensitive personal information. That problem has motivated the study of differential privacy, a set of techniques and definitions for keeping personal information private when datasets are released or used for study. Differential privacy is...

Duration:00:21:19

Ask host to enable sharing for playback control

Causal Trees

5/10/2020
What do you get when you combine the causal inference needs of econometrics with the data-driven methodology of machine learning? Usually these two don’t go well together (deriving causal conclusions from naive data methods leads to biased answers) but economists Susan Athey and Guido Imbens are on the case. This episodes explores their algorithm for recursively partitioning a dataset to find heterogeneous treatment effects, or for you ML nerds, applying decision trees to causal inference...

Duration:00:15:27

Ask host to enable sharing for playback control

The Grammar Of Graphics

5/3/2020
You may not realize it consciously, but beautiful visualizations have rules. The rules are often implict and manifest themselves as expectations about how the data is summarized, presented, and annotated so you can quickly extract the information in the underlying data using just visual cues. It’s a bit abstract but very profound, and these principles underlie the ggplot2 package in R that makes famously beautiful plots with minimal code. This episode covers a paper by Hadley Wickham (author...

Duration:00:35:38

Ask host to enable sharing for playback control

Gaussian Processes

4/26/2020
It’s pretty common to fit a function to a dataset when you’re a data scientist. But in many cases, it’s not clear what kind of function might be most appropriate—linear? quadratic? sinusoidal? some combination of these, and perhaps others? Gaussian processes introduce a nonparameteric option where you can fit over all the possible types of functions, using the data points in your datasets as constraints on the results that you get (the idea being that, no matter what the “true” underlying...

Duration:00:20:55

Ask host to enable sharing for playback control

Keeping ourselves honest when we work with observational healthcare data

4/19/2020
The abundance of data in healthcare, and the value we could capture from structuring and analyzing that data, is a huge opportunity. It also presents huge challenges. One of the biggest challenges is how, exactly, to do that structuring and analysis—data scientists working with this data have hundreds or thousands of small, and sometimes large, decisions to make in their day-to-day analysis work. What data should they include in their studies? What method should they use to analyze it? What...

Duration:00:19:08

Ask host to enable sharing for playback control

Changing our formulation of AI to avoid runaway risks: Interview with Prof. Stuart Russell

4/12/2020
AI is evolving incredibly quickly, and thinking now about where it might go next (and how we as a species and a society should be prepared) is critical. Professor Stuart Russell, an AI expert at UC Berkeley, has a formulation for modifications to AI that we should study and try implementing now to keep it much safer in the long run. Prof. Russell’s new book, “Human Compatible: Artificial Intelligence and the Problem of Control” gives an accessible but deeply thoughtful exploration of why he...

Duration:00:28:58

Ask host to enable sharing for playback control

Putting machine learning into a database

4/5/2020
Most data scientists bounce back and forth regularly between doing analysis in databases using SQL and building and deploying machine learning pipelines in R or python. But if we think ahead a few years, a few visionary researchers are starting to see a world in which the ML pipelines can actually be deployed inside the database. Why? One strong advantage for databases is they have built-in features for data governance, including things like permissioning access and tracking the provenance...

Duration:00:24:22

Ask host to enable sharing for playback control

The work-from-home episode

3/29/2020
Many of us have the privilege of working from home right now, in an effort to keep ourselves and our family safe and slow the transmission of covid-19. But working from home is an adjustment for many of us, and can hold some challenges compared to coming in to the office every day. This episode explores this a little bit, informally, as we compare our new work-from-home setups and reflect on what’s working well and what we’re finding challenging.

Duration:00:29:06

Ask host to enable sharing for playback control

Understanding Covid-19 transmission: what the data suggests about how the disease spreads

3/22/2020
Covid-19 is turning the world upside down right now. One thing that’s extremely important to understand, in order to fight it as effectively as possible, is how the virus spreads and especially how much of the spread of the disease comes from carriers who are experiencing no or mild symptoms but are contagious anyway. This episode digs into the epidemiological model that was published in Science this week—this model finds that the data suggests that the majority of carriers of the...

Duration:00:25:25