Linear Digressions-logo

Linear Digressions

Technology Podcasts >

In each episode, your hosts explore machine learning and data science through interesting (and often very unusual) applications.

In each episode, your hosts explore machine learning and data science through interesting (and often very unusual) applications.
More Information

Location:

United States

Description:

In each episode, your hosts explore machine learning and data science through interesting (and often very unusual) applications.

Language:

English


Episodes

Federated Learning

7/14/2019
More
This is a re-release of an episode first released in May 2017. As machine learning makes its way into more and more mobile devices, an interesting question presents itself: how can we have an algorithm learn from training data that's being supplied as users interact with the algorithm? In other words, how do we do machine learning when the training dataset is distributed across many devices, imbalanced, and the usage associated with any one user needs to be obscured somewhat to protect the...

Duration:00:15:02

Endogenous Variables and Measuring Protest Effectiveness

7/7/2019
More
This is a re-release of an episode first released in February 2017. Have you been out protesting lately, or watching the protests, and wondered how much effect they might have on lawmakers? It's a tricky question to answer, since usually we need randomly distributed treatments (e.g. big protests) to understand causality, but there's no reason to believe that big protests are actually randomly distributed. In other words, protest size is endogenous to legislative response, and understanding...

Duration:00:17:58

Deepfakes

6/30/2019
More
Generative adversarial networks (GANs) are producing some of the most realistic artificial videos we’ve ever seen. These videos are usually called “deepfakes”. Even to an experienced eye, it can be a challenge to distinguish a fabricated video from a real one, which is an extraordinary challenge in an era when the truth of what you see on the news or especially on social media is worthy of skepticism. And just in case that wasn’t unsettling enough, the algorithms just keep getting better and...

Duration:00:15:08

Revisiting Biased Word Embeddings

6/23/2019
More
The topic of bias in word embeddings gets yet another pass this week. It all started a few years ago, when an analogy task performed on Word2Vec embeddings showed some indications of gender bias around professions (as well as other forms of social bias getting reproduced in the algorithm’s embeddings). We covered the topic again a while later, covering methods for de-biasing embeddings to counteract this effect. And now we’re back, with a second pass on the original Word2Vec analogy task,...

Duration:00:18:09

Attention in Neural Nets

6/16/2019
More
There’s been a lot of interest lately in the attention mechanism in neural nets—it’s got a colloquial name (who’s not familiar with the idea of “attention”?) but it’s more like a technical trick that’s been pivotal to some recent advances in computer vision and especially word embeddings. It’s an interesting example of trying out human-cognitive-ish ideas (like focusing consideration more on some inputs than others) in neural nets, and one of the more high-profile recent successes in playing...

Duration:00:26:32

Interview with Joel Grus

6/9/2019
More
This week’s episode is a special one, as we’re welcoming a guest: Joel Grus is a data scientist with a strong software engineering streak, and he does an impressive amount of speaking, writing, and podcasting as well. Whether you’re a new data scientist just getting started, or a seasoned hand looking to improve your skill set, there’s something for you in Joel’s repertoire.

Duration:00:39:45

Re - Release: Factorization Machines

6/2/2019
More
What do you get when you cross a support vector machine with matrix factorization? You get a factorization machine, and a darn fine algorithm for recommendation engines.

Duration:00:20:08

Re-release: Auto-generating websites with deep learning

5/26/2019
More
We've already talked about neural nets in some detail (links below), and in particular we've been blown away by the way that image recognition from convolutional neural nets can be fed into recurrent neural nets that generate descriptions and captions of the images. Our episode today tells a similar tale, except today we're talking about a blog post where the author fed in wireframes of a website design and asked the neural net to generate the HTML and CSS that would actually build a website...

Duration:00:19:38

Advice to those trying to get a first job in data science

5/19/2019
More
We often hear from folks wondering what advice we can give them as they search for their first job in data science. What does a hiring manager look for? Should someone focus on taking classes online, doing a bootcamp, reading books, something else? How can they stand out in a crowd? There’s no single answer, because so much depends on the person asking in the first place, but that doesn’t stop us from giving some perspective. So in this episode we’re sharing that advice out more widely, so...

Duration:00:17:32

Re - Release: Machine Learning Technical Debt

5/12/2019
More
This week, we've got a fun paper by our friends at Google about the hidden costs of maintaining machine learning workflows. If you've worked in software before, you're probably familiar with the idea of technical debt, which are inefficiencies that crop up in the code when you're trying to go fast. You take shortcuts, hard-code variable values, skimp on the documentation, and generally write not-that-great code in order to get something done quickly, and then end up paying for it later on....

Duration:00:22:28

Estimating Software Projects, and Why It's Hard

5/5/2019
More
If you’re like most software engineers and, especially, data scientists, you find it really hard to make accurate estimates of how long a project will take to complete. Don’t feel bad: statistics is most likely actively working against your best efforts to give your boss an accurate delivery date. This week, we’ll talk through a great blog post that digs into the underlying probability and statistics assumptions that are probably driving your estimates, versus the ones that maybe should be...

Duration:00:19:07

The Black Hole Algorithm

4/28/2019
More
53.5 million light-years away, there’s a gigantic galaxy called M87 with something interesting going on inside it. Between Einstein’s theory of relativity and the motion of a group of stars in the galaxy (the motion is characteristic of there being a huge gravitational mass present), scientists have believed for years that there is a supermassive black hole at the center of that galaxy. However, black holes are really hard to see directly because they aren’t a light source like a star or a...

Duration:00:20:17

Structure in AI

4/21/2019
More
As artificial intelligence algorithms get applied to more and more domains, a question that often arises is whether to somehow build structure into the algorithm itself to mimic the structure of the problem. There’s usually some amount of knowledge we already have of each domain, an understanding of how it usually works, but it’s not clear how (or even if) to lend this knowledge to an AI algorithm to help it get started. Sure, it may get the algorithm caught up to where we already were on...

Duration:00:19:05

The Great Data Science Specialist vs. Generalist Debate

4/14/2019
More
It’s not news that data scientists are expected to be capable in many different areas (writing software, designing experiments, analyzing data, talking to non-technical stakeholders). One thing that has been changing, though, as the field becomes a bit older and more mature, is our ideas about what data scientists should focus on to stay relevant. Should they specialize in a particular area (if so, which one)? Should they instead stay general and work across many different areas? In either...

Duration:00:14:10

Google X, and Taking Risks the Smart Way

4/7/2019
More
If you work in data science, you’re well aware of the sheer volume of high-risk, high-reward projects that are hypothetically possible. The fact that they’re high-reward means they’re exciting to think about, and the payoff would be huge if they succeed, but the high-risk piece means that you have to be smart about what you choose to work on and be wary of investing all your resources in projects that fail entirely or starve other, higher-value projects. This episode focuses mainly on Google...

Duration:00:19:04

Statistical Significance in Hypothesis Testing

3/31/2019
More
When you are running an AB test, one of the most important questions is how much data to collect. Collect too little, and you can end up drawing the wrong conclusion from your experiment. But in a world where experimenting is generally not free, and you want to move quickly once you know the answer, there is such a thing as collecting too much data. Statisticians have been solving this problem for decades, and their best practices are encompassed in the ideas of power, statistical...

Duration:00:22:34

The Language Model Too Dangerous to Release

3/24/2019
More
OpenAI recently created a cutting-edge new natural language processing model, but unlike all their other projects so far, they have not released it to the public. Why? It seems to be a little too good. It can answer reading comprehension questions, summarize text, translate from one language to another, and generate realistic fake text. This last case, in particular, raised concerns inside OpenAI that the raw model could be dangerous if bad actors had access to it, so researchers will spend...

Duration:00:21:01

The cathedral and the bazaar

3/17/2019
More
Imagine you have two choices of how to build something: top-down and controlled, with a few people playing a master designer role, or bottom-up and free-for-all, with nobody playing an explicit architect role. Which one do you think would make the better product? “The Cathedral and the Bazaar” is an essay exploring this question for open source software, and making an argument for the bottom-up approach. It’s not entirely intuitive that projects like Linux or scikit-learn, with many...

Duration:00:32:36

AlphaStar

3/10/2019
More
It’s time for our latest installation in the series on artificial intelligence agents beating humans at games that we thought were safe from the robots. In this case, the game is StarCraft, and the AI agent is AlphaStar, from the same team that built the Go-playing AlphaGo AI last year. StarCraft presents some interesting challenges though: the gameplay is continuous, there are many different kinds of actions a player must take, and of course there’s the usual complexities of playing...

Duration:00:22:03

Are machine learning engineers the new data scientists?

3/3/2019
More
For many data scientists, maintaining models and workflows in production is both a huge part of their job and not something they necessarily trained for if their background is more in statistics or machine learning methodology. Productionizing and maintaining data science code has more in common with software engineering than traditional science, and to reflect that, there’s a new-ish role, and corresponding job title, that you should know about. It’s called machine learning engineer, and...

Duration:00:20:46