O'Reilly Data Show-logo

O'Reilly Data Show

448 Favorites

More Information


Sebastopol, CA


The O'Reilly Data Show explores the opportunities and techniques driving big data and data science. Through interviews and analysis, we highlight the people putting data to work.






How Ray makes continuous learning accessible and easy to scale

The O’Reilly Data Show Podcast: Robert Nishihara and Philipp Moritz on a new framework for reinforcement learning and AI applications. In this episode of the Data Show, I spoke with Robert Nishihara and Philipp Moritz, graduate students at UC Berkeley and members of RISE Lab. I wanted to get an update on Ray, an open source distributed execution framework that makes it easy for machine learning engineers and data scientists to scale reinforcement learning and other related continuous...

Duration: 00:18:28

Why AI and machine learning researchers are beginning to embrace PyTorch

The O’Reilly Data Show Podcast: Soumith Chintala on building a worthy successor to Torch and on deep learning within Facebook. In this episode of the Data Show, I spoke with Soumith Chintala, AI research engineer at Facebook. Among his many research projects, Chintala was part of the team behind DCGAN (Deep Convolutional Generative Adversarial Networks), a widely cited paper that introduced a set of neural network architectures for unsupervised learning. Our conversation centered around...

Duration: 00:36:56

How big data and AI will reshape the automotive industry

The O’Reilly Data Show Podcast: Evangelos Simoudis on next-generation mobility services. In this episode of the Data Show, I spoke with Evangelos Simoudis, co-founder of Synapse Partners and a frequent contributor to O’Reilly. He recently published a book entitled The Big Data Opportunity in Our Driverless Future, and I wanted get his thoughts on the transportation industry and the role of big data and analytics in its future. Simoudis is an entrepreneur, and he also advises and invests...

Duration: 00:51:05

A framework for building and evaluating data products

The O’Reilly Data Show Podcast: Pinterest data scientist Grace Huang on lessons learned in the course of machine learning product launches. In this episode of the Data Show, I spoke with Grace Huang, data science lead at Pinterest. With its combination of a large social graph, enthusiastic users, and multimedia data, I’ve long regarded Pinterest as a fascinating lab for data science. Huang described the challenge of building a sustainable content ecosystem and shared lessons from the...

Duration: 00:22:17

Building a next-generation platform for deep learning

The O’Reilly Data Show Podcast: Naveen Rao on emerging hardware and software infrastructure for AI. In this episode of the Data Show, I speak with Naveen Rao, VP and GM of the Artificial Intelligence Products Group at Intel. In an earlier episode, we learned that scaling current deep learning models requires innovations in both software and hardware. Through his startup Nervana (since acquired by Intel), Rao has been at the forefront of building a next generation platform for deep...

Duration: 00:27:49

A scalable time-series database that supports SQL

The O’Reilly Data Show Podcast: Michael Freedman on TimescaleDB and scaling SQL for time-series. In this episode of the Data Show, I spoke with Michael Freedman, CTO of Timescale and professor of computer science at Princeton University. When I first heard that Freedman and his collaborators were building a time-series database, my immediate reaction was: “Don’t we have enough options already?” The early incarnation of Timescale was a startup focused on IoT, and it was while building...

Duration: 00:49:12

Programming collective intelligence for financial trading

The O’Reilly Data Show Podcast: Geoffrey Bradway on building a trading system that synthesizes many different models. In this episode of the Data Show, I spoke with Geoffrey Bradway, VP of engineering at Numerai, a new hedge fund that relies on contributions of external data scientists. The company hosts regular competitions where data scientists submit machine learning models for classification tasks. The most promising submissions are then added to an ensemble of models that the company...

Duration: 00:26:52

Creating large training data sets quickly

The O’Reilly Data Show Podcast: Alex Ratner on why weak supervision is the key to unlocking dark data. In this episode of the Data Show, I spoke with Alex Ratner, a graduate student at Stanford and a member of Christopher Ré’s Hazy research group. Training data has always been important in building machine learning algorithms, and the rise of data-hungry deep learning models has heightened the need for labeled data sets. In fact, the challenge of creating training data is ongoing for many...

Duration: 00:47:09

Data science and deep learning in retail

The O’Reilly Data Show Podcast: Jeremy Stanley on hiring and leading machine learning engineers to build world-class data products. In this episode of the Data Show, I spoke with Jeremy Stanley, VP of data science at Instacart, a popular grocery delivery service that is expanding rapidly. As Stanley describes it, Instacart operates a four-sided marketplace comprised of retail stores, products within the stores, shoppers assigned to the stores, and customers who order from Instacart. The...

Duration: 00:49:27

Language understanding remains one of AI’s grand challenges

The O’Reilly Data Show Podcast: David Ferrucci on the evolution of AI systems for language understanding. In this episode of the Data Show, I spoke with David Ferrucci, founder of Elemental Cognition and senior technologist at Bridgewater Associates. Ferrucci served as principal investigator of IBM’s DeepQA project and led the Watson team that became champion of the Jeopardy! quiz show. Elemental Cognition (EC) is a research group focused on building an AI system that will be equipped...

Duration: 00:38:04

Data preparation in the age of deep learning

The O’Reilly Data Show Podcast: Lukas Biewald on why companies are spending millions of dollars on labeled data sets. In this episode of the Data Show, I spoke with Lukas Biewald, co-founder and chief data scientist at CrowdFlower. In a previous episode we covered how the rise of deep learning is fueling the need for large labeled data sets and high-performance computing systems. CrowdFlower has a service that many leading companies have come to rely on to provide them with labeled data...

Duration: 00:36:16

Scaling machine learning

The O’Reilly Data Show Podcast: Reza Zadeh on deep learning, hardware/software interfaces, and why computer vision is so exciting. In this episode of the Data Show, I spoke with Reza Zadeh, adjunct professor at Stanford University, co-organizer of ScaledML, and co-founder of Matroid, a startup focused on commercial applications of deep learning and computer vision. Zadeh also is the co-author of the forthcoming book TensorFlow for Deep Learning (now in early release). Our conversation...

Duration: 00:56:46

Architecting and building end-to-end streaming applications

The O’Reilly Data Show Podcast: Karthik Ramasamy on Heron, DistributedLog, and designing real-time applications. In this episode of the Data Show, I spoke with Karthik Ramasamy, adjunct faculty member at UC Berkeley, former engineering manager at Twitter, and co-founder of Streamlio. Ramasamy managed the team that built Heron, an open source, distributed stream processing engine, compatible with Apache Storm. While Ramasamy has seen firsthand what it takes to build and deploy large-scale...

Duration: 00:45:10

Becoming a machine learning engineer

The O’Reilly Data Show Podcast: Aurélien Géron on enabling companies to use machine learning in real-world products. In this episode of the Data Show, I spoke with Aurélien Géron, a serial entrepreneur, data scientist, and author of a popular, new book entitled Hands-on Machine Learning with Scikit-Learn and TensorFlow. Géron’s book is aimed at software engineers who want to learn machine learning and start deploying machine learning models in real-world products. As more companies adopt...

Duration: 00:41:03

Natural language analysis using Hierarchical Temporal Memory

The O’Reilly Data Show Podcast: Francisco Webber on building HTM-based enterprise applications. In this episode of the Data Show, I spoke with Francisco Webber, founder of Cortical.io, a startup that is applying tools based on Hierarchical Temporal Memory (HTM) to natural language understanding. While HTM has been around for more than a decade, there aren’t many companies that have released products based on it (at least compared to other machine learning methods). Numenta, an...

Duration: 00:51:03

Saving the world—or at least the world’s scientific and government data

The O’Reilly Data Show Podcast: Max Ogden on data preservation, distributed trust, and bringing cutting-edge technology to journalism. In this special episode of the Data Show, O'Reilly's Jenn Webb speaks with Maxwell Ogden, director of Code for Science and Society. Recently, Ogden and Code for Science have been working on the ongoing rescue of data.gov and assisting with other data rescue projects, such as Data Refuge; they’re also the nonprofit developers supporting Dat, a data...

Duration: 00:40:33

Deep learning that's easy to implement and easy to scale

The O’Reilly Data Show Podcast: Anima Anandkumar on MXNet, tensor computations and deep learning, and techniques for scaling algorithms. In this episode of the Data Show, I spoke with Anima Anandkumar, a leading machine learning researcher, and currently a principal research scientist at Amazon. I took the opportunity to get an update on the latest developments on the use of tensors in machine learning. Most of our conversation centered around MXNet—an open source, efficient, scalable...

Duration: 00:36:41

Building machine learning solutions that can withstand adversarial attacks

The O’Reilly Data Show Podcast: Parvez Ahammad on minimal supervision, and the importance of explainability, interpretability, and security. In this episode of the Data Show, I spoke with Parvez Ahammad, who leads the data science and machine learning efforts at Instart Logic. He has applied machine learning in a variety of domains, most recently to computational neuroscience and security. Along the way, he has assembled and managed teams of data scientists and has had to grapple with...

Duration: 00:44:46

Deep learning for Apache Spark

The O’Reilly Data Show Podcast: Jason Dai on BigDL, a library for deep learning on existing data frameworks. In this episode of the Data Show, I spoke with Jason Dai, CTO of big data technologies at Intel, and co-chair of Strata + Hadoop World Beijing. Dai and his team are prolific and longstanding contributors to the Apache Spark project. Their early contributions to Spark tended to be on the systems side and included Netty-based shuffle, a fair-scheduler, and the “yarn-client” mode....

Duration: 00:30:59

The key to building deep learning solutions for large enterprises

The O’Reilly Data Show Podcast: Adam Gibson on the importance of ROI, integration, and the JVM. As data scientists add deep learning to their arsenals, they need tools that integrate with existing platforms and frameworks. This is particularly important for those who work in large enterprises. In this episode of the Data Show, I spoke with Adam Gibson, co-founder and CTO of Skymind, and co-creator of Deeplearning4J (DL4J). Gibson has spent the last few years developing the DL4J library...

Duration: 00:35:07

See More