Data Skeptic Podcast-logo

Data Skeptic Podcast

Science >

The Data Skeptic Podcast features conversations with researchers and other professionals active in applying data science to real world problems. The topics relate to data science, statistics, machine learning, artificial intelligence and the like, all from the perspective of applying critical thinking and the scientific method to evaluate the veracity of claims and efficacy of approaches. The podcast has an alternating format with even episodes featuring long for conversations, and odd episodes featuring short discussions about topics related to data science which are aimed at listeners who might not be familiar with some of the topics discussed on the show.

The Data Skeptic Podcast features conversations with researchers and other professionals active in applying data science to real world problems. The topics relate to data science, statistics, machine learning, artificial intelligence and the like, all from the perspective of applying critical thinking and the scientific method to evaluate the veracity of claims and efficacy of approaches. The podcast has an alternating format with even episodes featuring long for conversations, and odd episodes featuring short discussions about topics related to data science which are aimed at listeners who might not be familiar with some of the topics discussed on the show.
More Information

Location:

United States

Description:

The Data Skeptic Podcast features conversations with researchers and other professionals active in applying data science to real world problems. The topics relate to data science, statistics, machine learning, artificial intelligence and the like, all from the perspective of applying critical thinking and the scientific method to evaluate the veracity of claims and efficacy of approaches. The podcast has an alternating format with even episodes featuring long for conversations, and odd episodes featuring short discussions about topics related to data science which are aimed at listeners who might not be familiar with some of the topics discussed on the show.

Language:

English


Episodes

The Louvain Method for Community Detection

10/12/2018
More
Without getting into definitions, we have an intuitive sense of what a "community" is. The Louvain Method for Community Detection is one of the best known mathematical techniques designed to detect communities. This method requires typical graph data in which people are nodes and edges are their connections. It's easy to imagine this data in the context of Facebook or LinkedIn but the technique applies just as well to any other dataset like cellular phone calling records or pen-pals. The...

Duration:00:26:45

Cultural Cognition of Scientific Consensus

10/5/2018
More
In this episode, our guest is Dan Kahan about his research into how people consume and interpret science news. In an era of fake news, motivated reasoning, and alternative facts, important questions need to be asked about how people understand new information. Dan is a member of the Cultural Cognition Project at Yale University, a group of scholars interested in studying how cultural values shape public risk perceptions and related policy beliefs. In a paper titled Cultural cognition of...

Duration:00:31:47

False Discovery Rates

9/28/2018
More
A false discovery rate (FDR) is a methodology that can be useful when struggling with the problem of multiple comparisons. In any experiment, if the experimenter checks more than one dependent variable, then they are making multiple comparisons. Naturally, if you make enough comparisons, you will eventually find some correlation. Classically, people applied the Bonferroni Correction. In essence, this procedure dictates that you should lower your p-value (raise your standard of evidence) by...

Duration:00:14:05

Deep Fakes

9/21/2018
More
Digital videos can be described as sequences of still images and associated audio. Audio is easy to fake. What about video? A video can easily be broken down into a sequence of still images replayed rapidly in sequence. In this context, videos are simply very high dimensional sequences of observations, ripe for input into a machine learning algorithm. The availability of commodity hardware, clever algorithms, and well-designed software to implement those algorithms at scale make it...

Duration:00:30:22

Fake News Midterm

9/14/2018
More
In this episode, Kyle reviews what we've learned so far in our series on Fake News and talks briefly about where we're going next.

Duration:00:19:17

Quality Score

9/7/2018
More
Two weeks ago we discussed click through rates or CTRs and their usefulness and limits as a metric. Today, we discuss a related metric known as quality score. While that phrase has probably been used to mean dozens of different things in different contexts, our discussion focuses around the idea of quality score encountered in Search Engine Marketing (SEM). SEM is the practice of purchasing keyword targeted ads shown to customers using a search engine. Most SEM is managed via an auction...

Duration:00:18:53

The Knowledge Illusion

8/31/2018
More
Kyle interviews Steven Sloman, Professor in the school of Cognitive, Linguistic, and Psychological Sciences at Brown University. Steven is co-author of The Knowledge Illusion: Why We Never Think Alone and Causal Models: How People Think about the World and Its Alternatives. Steven shares his perspective and research into how people process information and what this teaches us about the existence of and belief in fake news.

Duration:00:40:00

Click Through Rates

8/24/2018
More
A Click Through Rate (CTR) is the proportion of clicks to impressions of some item of content shared online. This terminology is most commonly used in digital advertising but applies just as well to content websites might choose to feature on their homepage or in search results. A CTR is intuitively appealing as a metric for optimization. After all, if users are disinterested in some content, under normal circumstances, it's reasonable to assume they would ignore the content, rather than...

Duration:00:31:43

Algorithmic Detection of Fake News

8/17/2018
More
The scale and frequency with which information can be distributed on social media makes the problem of fake news a rapidly metastasizing issue. To do any content filtering or labeling demands an algorithmic solution. In today's episode, Kyle interviews Kai Shu and Mike Tamir about their independent work exploring the use of machine learning to detect fake news. Kai Shu and his co-authors published Fake News Detection on Social Media: A Data Mining Perspective, a research paper which both...

Duration:00:53:46

Ant Intelligence

8/10/2018
More
If you prepared a list of creatures regarded as highly intelligent, it's unlikely ants would make the cut. This is expected, as on an individual level, ants do not generally display behavior that most humans would regard as intelligence. In fact, it might even be true that most species of ants are unable to learn. Despite this, ant colonies have evolved excellent survival mechanisms through the careful orchestration of ants.

Duration:00:28:16

Human Detection of Fake News

8/3/2018
More
With publications such as "Prior exposure increases perceived accuracy of fake news", "Lazy, not biased: Susceptibility to partisan fake news is better explained by lack of reasoning than by motivated reasoning", and "The science of fake news", Gordon Pennycook is asking and answering analytical questions about the nature of human intuition and fake news. Gordon appeared on Data Skeptic in 2016 to discuss people's ability to recognize pseudo-profound bullshit. This episode explores his work...

Duration:00:28:25

Spam Filtering with Naive Bayes

7/27/2018
More
Today's spam filters are advanced data driven tools. They rely on a variety of techniques to effectively and often seamlessly filter out junk email from good email. Whitelists, blacklists, traffic analysis, network analysis, and a variety of other tools are probably employed by most major players in this area. Naturally content analysis can be an especially powerful tool for detecting spam. Given the binary nature of the problem ( or ) its clear that this is a great problem to use machine...

Duration:00:16:09

The Spread of Fake News

7/20/2018
More
How does fake news get spread online? Its not just a matter of manipulating search algorithms. The social platforms for sharing play a major role in the distribution of fake news. But how significant of an impact can there be? How significantly can bots influence the spread of fake news? In this episode, Kyle interviews Filippo Menczer, Professor of Computer Science and Informatics. Fil is part of the Observatory on Social Media ([OSoMe][https://osome.iuni.iu.edu/tools/]). OSoMe are the...

Duration:00:45:17

Fake News

7/13/2018
More
This episode kicks off our new theme of "Fake News" with guests Robert Sheaffer and Brad Schwartz. Fake news is a new label for an old idea. For our purposes, we will define fake news information created to deliberately mislead while masquerading as a legitimate, journalistic source of truth. It's become a modern topic of discussion as our cultures evolve to the fledgling mechanisms of communication introduced by online platforms. What was the earliest incident of fake news? That's a...

Duration:00:38:17

Dev Ops for Data Science

7/11/2018
More
We revisit the 2018 Microsoft Build in this episode, focusing on the latest ideas in DevOps. Kyle interviews Cloud Developer Advocates Damien Brady, Paige Bailey, and Donovan Brown to talk about DevOps and data science and databases. For a data scientist, what does it even mean to “build”? Packaging and deployment are things that a data scientist doesn't normally have to consider in their day-to-day work. The process of making an AI app is usually divided into two streams of work: data...

Duration:00:38:19

First Order Logic

7/6/2018
More
Logic is a fundamental of mathematical systems. It's roots are the values true and false and it's power is in what it's rules allow you to prove. Prepositional logic provides it's user variables. This episode gets into First Order Logic, an extension to prepositional logic.

Duration:00:16:50

Blind Spots in Reinforcement Learning

6/29/2018
More
An intelligent agent trained in a simulated environment may be prone to making mistakes in the real world due to discrepancies between the training and real-world conditions. The areas where an agent makes mistakes are hard to find, known as "blind spots," and can stem from various reasons. In this week’s episode, Kyle is joined by Ramya Ramakrishnan, a PhD candidate at MIT, to discuss the idea “blind spots” in reinforcement learning and approaches to discover them.

Duration:00:27:34

Defending Against Adversarial Attacks

6/22/2018
More
In this week’s episode, our host Kyle interviews Gokula Krishnan from ETH Zurich, about his recent contributions to defenses against adversarial attacks. The discussion centers around his latest paper, titled “Defending Against Adversarial Attacks by Leveraging an Entire GAN,” and his proposed algorithm, aptly named ‘Cowboy.’

Duration:00:31:28

Transfer Learning

6/15/2018
More
On a long car ride, Linhda and Kyle record a short episode. This discussion is about transfer learning, a technique using in machine learning to leverage training from one domain to have a head start learning in another domain. Transfer learning has some obvious appealing features. Take the example of an image recognition problem. There are now many widely available models that do general image recognition. Detecting that an image contains a "sofa" is an impressive feat. However, for a...

Duration:00:18:03

Medical Imaging Training Techniques

6/8/2018
More
Medical imaging is a highly effective tool used by clinicians to diagnose a wide array of diseases and injuries. However, it often requires exceptionally trained specialists such as radiologists to interpret accurately. In this episode of Data Skeptic, our host Kyle Polich is joined by Gabriel Maicas, a PhD candidate at the University of Adelaide, to discuss machine learning systems that can be used by radiologists to improve their accuracy and speed of diagnosis.

Duration:00:25:20