Linear Digressions-logo

Linear Digressions

Technology Podcasts >

In each episode, your hosts explore machine learning and data science through interesting (and often very unusual) applications.

In each episode, your hosts explore machine learning and data science through interesting (and often very unusual) applications.
More Information

Location:

United States

Description:

In each episode, your hosts explore machine learning and data science through interesting (and often very unusual) applications.

Language:

English


Episodes

The Normal Distribution and the Central Limit Theorem

12/9/2018
More
When you think about it, it’s pretty amazing that we can draw conclusions about huge populations, even the whole world, based on datasets that are comparatively very small (a few thousand, or a few hundred, or even sometimes a few dozen). That’s the power of statistics, though. This episode is kind of a two-for-one but we’re excited about it—first we’ll talk about the Normal or Gaussian distribution, which is maybe the most famous probability distribution function out there, and then turn to...

Duration:00:27:11

Software 2.0

12/2/2018
More
Neural nets are a way you can model a system, sure, but if you take a step back, squint, and tilt your head, they can also be called… software? Not in the sense that they’re written in code, but in the sense that the neural net itself operates under the same set of general requirements as does software that a human would write. Namely, neural nets take inputs and create outputs from them according to a set of rules, but the thing about the inside of the neural net black box is that it’s...

Duration:00:17:22

Limitations of Deep Nets for Computer Vision

11/18/2018
More
Deep neural nets have a deserved reputation as the best-in-breed solution for computer vision problems. But there are many aspects of human vision that we take for granted but where neural nets struggle—this episode covers an eye-opening paper that summarizes some of the interesting weak spots of deep neural nets. Relevant links: https://arxiv.org/abs/1805.04025

Duration:00:27:19

Building Data Science Teams

11/11/2018
More
At many places, data scientists don’t work solo anymore—it’s a team sport. But data science teams aren’t simply teams of data scientists working together. Instead, they’re usually cross-functional teams with engineers, managers, data scientists, and sometimes others all working together to build tools and products around data science. This episode talks about some of those roles on a typical data science team, what the responsibilities are for each role, and what skills and traits are most...

Duration:00:25:09

Optimized Optimized Web Crawling

11/4/2018
More
Last week’s episode, about methods for optimized web crawling logic, left off on a bit of a cliffhanger: the data scientists had found a solution to the problem, but it wasn’t something that the engineers (who own the search codebase, remember) liked very much. It was black-boxy, hard to parallelize, and introduced a lot of complexity to their code. This episode takes a second crack, where we formulate the problem a little differently and end up with a different, arguably more elegant...

Duration:00:19:42

Optimized Web Crawling

10/28/2018
More
Got a fun optimization problem for you this week! It’s a two-for-one: how do you optimize the web crawling logic of an operation like Google search so that the results are, on average, as up-to-date as possible, and how do you optimize your solution of choice so that it’s maintainable by software engineers in a huge distributed system? We’re following an excellent post from the Unofficial Google Data Science blog going through this problem. Relevant links:...

Duration:00:22:29

Better Know a Distribution: The Poisson Distribution

10/21/2018
More
The Poisson distribution is a probability distribution function used to for events that happen in time or space. It’s super handy because it’s pretty simple to use and is applicable for tons of things—there are a lot of interesting processes that boil down to “events that happen in time or space.” This episode is a quick introduction to the distribution, and then a focus on two of our favorite applications: using the Poisson distribution to identify supernovas and study army deaths from...

Duration:00:31:51

Searching for Datasets with Google

10/14/2018
More
If you wanted to find a dataset of jokes, how would you do it? What about a dataset of podcast episodes? If your answer was “I’d try Google,” you might have been disappointed—Google is a great search engine for many types of web data, but it didn’t have any special tools to navigate the particular challenges of, well, dataset data. But all that is different now: Google recently announced Google Dataset Search, an effort to unify metadata tagging around datasets and complementary efforts on...

Duration:00:19:53

It's our fourth birthday

10/7/2018
More
We started Linear Digressions 4 years ago… this isn’t a technical episode, just two buddies shooting the breeze about something we’ve somehow built together.

Duration:00:22:05

Gigantic Searches in Particle Physics

9/30/2018
More
This week, we’re dusting off the ol’ particle physics PhD to bring you an episode about ambitious new model-agnostic searches for new particles happening at CERN. Traditionally, new particles have been discovered by “targeted searches,” where scientists have a hypothesis about the particle they’re looking for and where it might be found. However, with the huge amounts of data coming out of CERN, a new type of broader search algorithm is starting to be deployed. It’s a strategy that casts a...

Duration:00:24:46

Gigantic Searches in Particle Physics

9/30/2018
More
This week, we’re dusting off the ol’ particle physics PhD to bring you an episode about ambitious new model-agnostic searches for new particles happening at CERN. Traditionally, new particles have been discovered by “targeted searches,” where scientists have a hypothesis about the particle they’re looking for and where it might be found. However, with the huge amounts of data coming out of CERN, a new type of broader search algorithm is starting to be deployed. It’s a strategy that casts a...

Duration:00:24:46

Data Engineering

9/23/2018
More
If you’re a data scientist, you know how important it is to keep your data orderly, clean, moving smoothly between different systems, well-documented… there’s a ton of work that goes into building and maintaining databases and data pipelines. This job, that of owner and maintainer of the data being used for analytics, is often the realm of data engineers. From data extraction, transform and loading procedures to the data storage strategy and even the definitions of key data quantities that...

Duration:00:16:22

Text Analysis for Guessing the NYTimes Op-Ed Author

9/16/2018
More
A very intriguing op-ed was published in the NY Times recently, in which the author (a senior official in the Trump White House) claimed to be a minor saboteur of sorts, acting with his or her colleagues to undermine some of Donald Trump’s worst instincts and tendencies. Pretty stunning, right? So who is the author? It’s a mystery—the op-ed was published anonymously. That hasn’t stopped people from speculating though, and some machine learning on the vocabulary used in the op-ed is one way...

Duration:00:18:37

The Three Types of Data Scientists, and What They Actually Do

9/9/2018
More
If you've been in data science for more than a year or two, chances are you've noticed changes in the field as it's grown and matured. And if you're newer to the field, you may feel like there's a disconnect between lots of different stories about what data scientists should know, or do, or expect from their job. This week, we cover two thought pieces, one that arose from interviews with 35(!) data scientists speaking about what their jobs actually are (and aren't), and one from the head of...

Duration:00:23:24

Agile Development for Data Scientists, Part 2: Where Modifications Help

8/26/2018
More
There's just too much interesting stuff at the intersection of agile software development and data science for us to be able to cover it all in one episode, so this week we're picking up where we left off last time. We'll give a quick overview of agile for those who missed last week or still have some questions, and then cover some of the aspects of agile that don't work well out-of-the-box when applied to data analytics. Fortunately, though, there are some straightforward modifications to...

Duration:00:27:16

Agile Development for Data Scientists, Part 1: The Good

8/19/2018
More
If you're a data scientist at a firm that does a lot of software building, chances are good that you've seen or heard engineers sometimes talking about "agile software development." If you don't work at a software firm, agile practices might be newer to you. In either case, we wanted to go through a great series of blog posts about some of the practices from agile that are relevant for how data scientists work, in hopes of inspiring some transfer learning from software development to data...

Duration:00:25:56

Re - Release: How To Lose At Kaggle

8/12/2018
More
We've got a classic for you this week as we take a week off for the dog days of summer. See you again next week! Competing in a machine learning competition on Kaggle is a kind of rite of passage for data scientists. Losing unexpectedly at the very end of the contest is also something that a lot of us have experienced. It's not just bad luck: a very specific combination of overfitting on popular competitions can take someone who is in the top few spots in the final days of a contest and bump...

Duration:00:17:54

Troubling Trends In Machine Learning Scholarship

8/5/2018
More
There's a lot of great machine learning papers coming out every day--and, if we're being honest, some papers that are not as great as we'd wish. In some ways this is symptomatic of a field that's growing really quickly, but it's also an artifact of strange incentive structures in academic machine learning, and the fact that sometimes machine learning is just really hard. At the same time, a high quality of academic work is critical for maintaining the reputation of the field, so in this...

Duration:00:29:34

Can Fancy Running Shoes Cause You To Run Faster?

7/29/2018
More
The stars aligned for me (Katie) this past weekend: I raced my first half-marathon in a long time and got to read a great article from the NY Times about a new running shoe that Nike claims can make its wearers run faster. Causal claims like this one are really tough to verify, because even if the data suggests that people wearing the shoe are faster that might be because of correlation, not causation, so I loved reading this article that went through an analysis of thousands of runners'...

Duration:00:28:37

Compliance Bias

7/22/2018
More
When you're using an AB test to understand the effect of a treatment, there are a lot of assumptions about how the treatment (and control, for that matter) get applied. For example, it's easy to think that everyone who was assigned to the treatment arm actually gets the treatment, everyone in the control arm doesn't, and that the two groups get their treatment instantaneously. None of these things happen in real life, and if you really care about measuring your treatment effect then that's...

Duration:00:23:27