DataCafé-logo

DataCafé

Science Podcasts

Welcome to the DataCafé: a special-interest Data Science podcast with Dr Jason Byrne and Dr Jeremy Bradley, interviewing leading data science researchers and domain experts in all things business, stats, maths, science and tech.

Location:

United Kingdom

Description:

Welcome to the DataCafé: a special-interest Data Science podcast with Dr Jason Byrne and Dr Jeremy Bradley, interviewing leading data science researchers and domain experts in all things business, stats, maths, science and tech.

Language:

English


Episodes
Ask host to enable sharing for playback control

Science Communication with physicist Laurie Winkless, author of "Sticky" & "Science and the City"

6/2/2023
A key part of the scientific method is communicating the insights to an audience, for any field of research or problem context. This is where the ultimate value comes from: by sharing the cutting-edge results that can improve our understanding of the world and help deliver new innovations in people's lives. Effective science communication sits at the intersection of data, research, and the art of storytelling. In this episode of the DataCafé we have the pleasure of welcoming Laurie Winkless, a physicist, author and science communications expert. Laurie has extensive experience in science journalism, having written numerous fascinating articles for Forbes Magazine, Wired, Esquire, and The Economist. She has also authored two science books which we will talk about today: Sticky: The Secret Science of SurfacesScience and the City: The Mechanics behind the MetropolisLaurie tells us about the amazing insights in her books from her research, interviews and discussions with leading scientists around the world. She gives us an idea of how the scientific method sits at the core of this work. Her efforts involve moving across many complicated data landscapes to uncover and articulate the key insights of the scientists working in these fields. And she does this through the art of storytelling, in a manner that can capture people's imagination whilst educating and surprising them at the same time. Interview guest: Laurie Winkless, physicist, author, science communicator. Contactable via her website, and on twitter, mastodon, and linkedin. Further information: www.lauriewinkless.com"Why do things stick to each other?"The Royal Institutionhttps://twitter.com/laurie_winklesshttps://scicomm.xyz/@LaurieWinklesshttps://www.linkedin.com/in/laurie-winkless/Sticky: The Secret Science of SurfacesScience and the City: The Mechanics behind the Metropolis Thanks for joining us in the DataCafé. You can follow us on twitter @DataCafePodcast and feel free to contact us about anything you've heard here or think would be an interesting topic in the future.

Duration:00:36:46

Ask host to enable sharing for playback control

A Culture of Innovation

9/6/2022
Culture is a key enabler of innovation in an organisation. Culture underpins the values that are important to people and the motivations for their behaviours. When these values and behaviours align with the goals of innovation, it can lead to high performance across teams that are tasked with the challenge of leading, inspiring and delivering innovation. Many scientists and researchers are faced with these challenges in various scenarios, yet may be unaware of the level of influence that comes from the culture they are part of. In this episode we talk about what it means to design and embed a culture of innovation. We outline some of our findings in literature about the levels of culture that may be invisible or difficult to measure. Assessing culture helps understand the ways it can empower people to experiment and take risks, and the importance this has for innovation. And where a culture is deemed to be limiting innovation, action can be taken to motivate the right culture and steer the organisation towards a better chance of success. Futher Reading PaperHogan & Coote (2014) Organizational Culture, Innovation and PerformanceBookExploring Corporate Strategy: Text and Cases ArticleUnderstanding Organisational Culture - Checklist by CMI ArticleThe Cultural WebPaperMossop et al. (2013) Analysing the hidden curriculum: use of a cultural web BookBruch & Vogel (2011) Fully Charged: How Great Leaders Boost Their Organization’s Energy and Ignite High PerformanceWebinarBruch (2012) Fully Charged: How Great Leaders Boost Their Organization’s Energy and Ignite High PerformanceArticlePisano (2019) The Hard Truth About Innovative Cultures Some links above may require payment or login. We are not endorsing them or receiving any payment for mentioning them. They are provided as is. Often free versions of papers are available and we would encourage you to investigate. Recording date: 12 Aug 2022 Thanks for joining us in the DataCafé. You can follow us on twitter @DataCafePodcast and feel free to contact us about anything you've heard here or think would be an interesting topic in the future.

Duration:00:33:52

Ask host to enable sharing for playback control

Scaling the Internet

7/30/2022
Do you have multiple devices connected to your internet fighting for your bandwidth? Are you asking your children (or even neighbours!) to get off the network so you can finish an important call? Recent lockdowns caused huge network contention as everyone moved to online meetings and virtual classrooms. This is an optimisation challenge that requires advanced modelling and simulation to tackle. How can a network provider know how much bandwidth to provision to a town or a city to cope with peak demands? That's where agent-based simulations come in - to allow network designers to anticipate and then plan for high-demand events, applications and trends. In this episode of the DataCafé we hear from Dr. Lucy Gullon, AI and Optimisation Research Specialist at Applied Research, BT. She tells us about the efforts underway to assess the need for bandwidth across different households and locations, and the work they lead to model, simulate, and optimise the provision of that bandwidth across the network of the UK. We hear how planning for peak use, where, say, the nation is streaming a football match is an important consideration. At the same time, reacting to times of low throughput can help to switch off unused circuits and equipment and save a lot of energy. Interview Guest: Dr. Lucy Gullon, AI and Optimisation Research Specialist from Applied Research, BT. Further reading: BT Research and DevelopmentAnylogic agent-based simulatorArticle:Agent-based modellingArticle:Prisoner's DilemmaArticle: Crowd SimulationBook:Science and the CityResearch group:Traffic ModellingSome links above may require payment or login. We are not endorsing them or receiving any payment for mentioning them. They are provided as is. Often free versions of papers are available and we would encourage you to investigate. Recording date: 5 May 2022 Interview date: 27 Apr 2022 Thanks for joining us in the DataCafé. You can follow us on twitter @DataCafePodcast and feel free to contact us about anything you've heard here or think would be an interesting topic in the future.

Duration:00:45:24

Ask host to enable sharing for playback control

[Bite] Documenting Data Science Projects

6/29/2022
Do you ever find yourself wondering what the data was you used in a project? When was it obtained and where is it stored? Or even just the way to run a piece of code that produced a previous output and needs to be revisited? Chances are the answer is yes. And it’s likely you have been frustrated by not knowing how to reproduce an output or rerun a codebase or even who to talk to to obtain a refresh of the data - in some way, shape, or form. The problem that a lot of project teams face, and data scientists in particular, is the agreement and effort to document their work in a robust and reliable fashion. Documentation is a broad term and can refer to all manner of project details, from the actions captured in a team meeting to the technical guides for executing an algorithm. In this bite episode of DataCafé we discuss the challenges around documentation in data science projects (though it applies more broadly). We motivate the need for good documentation through agreement of the responsibilities, expectations, and methods of capturing notes and guides. This can be everything from a summary of the data sources and how to preprocess input data, to project plans and meeting minutes, through to technical details on the dependencies and setups for running codes. Thanks for joining us in the DataCafé. You can follow us on twitter @DataCafePodcast and feel free to contact us about anything you've heard here or think would be an interesting topic in the future.

Duration:00:16:43

Ask host to enable sharing for playback control

Landing Data Science Projects: The Art of Change Management & Implementation

5/31/2022
Are people resistant to change? And if so, how do you manage that when trying to introduce and deliver innovation through Data Science? In this episode of the DataCafé we discuss the challenges faced when trying to land a data science project. There are a number of potential barriers to success that need to be carefully managed. We talk about "change management" and aspects of employee behaviours and stakeholder management that influence the chances of landing a project. This is especially important for embedding innovation in your company or organisation, and implementing a plan to sustain the changes needed to deliver long-term value. Further reading & references Kotter's 8 Step Change Plan Some links above may require payment or login. We are not endorsing them or receiving any payment for mentioning them. They are provided as is. Often free versions of papers are available and we would encourage you to investigate. Recording date: 10 February 2022 Thanks for joining us in the DataCafé. You can follow us on twitter @DataCafePodcast and feel free to contact us about anything you've heard here or think would be an interesting topic in the future.

Duration:00:29:59

Ask host to enable sharing for playback control

[Bite] Version Control for Data Scientists

5/5/2022
Data scientists usually have to write code to prototype software, be it to preprocess and clean data, engineer features, build a model, or deploy a codebase into a production environment or other use case. The evolution of a codebase is important for a number of reasons which is where version control can help, such as: In this bite episode of the DataCafé we talk about these motivators for version control and how it can strengthen your code development and teamwork in building a data science model, pipeline or product. Further reading: Version controlgit-scm"Version Control & Git""Learn git""Become a git guru"Gitflow workflow"A successful git branching model"Branching strategies Recording date: 21 April 2022 Thanks for joining us in the DataCafé. You can follow us on twitter @DataCafePodcast and feel free to contact us about anything you've heard here or think would be an interesting topic in the future.

Duration:00:15:33

Ask host to enable sharing for playback control

Deep Learning Neural Networks: Building Trust and Breaking Bias

4/7/2022
We explore one of the key issues around Deep Learning Neural Networks - how can you prove that your neural network will perform correctly? Especially if the neural network in question is at the heart of a mission-critical application, such as making a real-time control decision in an autonomous car. Similarly, how can you establish if you've trained your neural network at the heart of a loan decision agent with a prebuilt bias? How can you be sure that your black box is going to adapt to critical new situations? We speak with Prof. Alessio Lomuscio about how Mixed Integer Linear Programs (MILPs) and Symbolic Interval Propagation can be used to capture and solve verification problems in large Neural Networks. Prof. Lomuscio leads the Verification of Autonomous Systems Group in the Dept. of Computing at Imperial College; their results have shown that verification is feasible for models in the millions of tunable parameters, which was previously not possible. Tools like VENUS and VeriNet, developed in their lab, can verify key operational properties in Deep Learning Networks and this has a particular relevance for safety-critical applications in e.g. the aviation industry, medical imaging and autonomous transportation. Particularly importantly, given that neural networks are only as good as the training data that they have learned from, it is also possible to prove that a particular defined bias does or does not exist for a given network. This latter case is, of course, important for many social or industrial applications: being able to show that a decisioning tool treats people of all genders, ethnicities and abilities equitably. Interview Guest Our interview guest Alessio Lomuscio is Professor of Safe Artificial Intelligence in the Department of Computing at Imperial College London. Anyone wishing to contact Alessio about his team's verification technology can do so via his Imperial College website, or via the Imperial College London spin-off Safe Intelligence that will be commercialising the AI verification technology in the future. Further Reading Publication list for Prof. Alessio LomuscioPaper Formal Analysis of Neural Network-based Systems in the Aircraft DomainPaper Scalable Complete Verification of ReLU Neural Networks via Dependency-based BranchingPaperDEEPSPLIT: An Efficient Splitting Method for Neural Network Verification via Indirect Effect AnalysisTeam:Verification of Autonomous Systems GroupToolsVENUS and VeriNetSome links above may require payment or login. We are not endorsing them or receiving any payment for mentioning them. They are provided as is. Often free versions of papers are available and we would encourage you to inve Thanks for joining us in the DataCafé. You can follow us on twitter @DataCafePodcast and feel free to contact us about anything you've heard here or think would be an interesting topic in the future.

Duration:00:51:25

Ask host to enable sharing for playback control

[Bite] Wordle: Winning against the algorithm

3/14/2022
The grey, green and yellow squares taking over social media in the last few weeks is an example of the fascinating field of study known as Game Theory. In this bite episode of DataCafé we talk casually about Wordle - the internet phenomenon currently challenging players to guess a new five letter word each day. Six guesses inform players what letters they have gotten right and if they are in the right place. It’s a lovely example of the different ways people approach game strategy through their choice of guesses and ways to use the information presented within the game. Wordles WordleAbsurdleNerdleQuordleFoclach Analysis Statistical analysis of hard-mode Wordle with MatlabThe science behind Wordle Some links above may require payment or login. We are not endorsing them or receiving any payment for mentioning them. They are provided as is. Often free versions of papers are available and we would encourage you to investigate. Recording date: 15 Feb 2022 Intro music by Music 4 Video Library (Patreon supporter) Thanks for joining us in the DataCafé. You can follow us on twitter @DataCafePodcast and feel free to contact us about anything you've heard here or think would be an interesting topic in the future.

Duration:00:11:28

Ask host to enable sharing for playback control

Series 2 Introduction

3/14/2022
Looks like we might be about to have a new Series of DataCafé! Recording date: 15 Feb 2022 Intro music by Music 4 Video Library (Patreon supporter) Thanks for joining us in the DataCafé. You can follow us on twitter @DataCafePodcast and feel free to contact us about anything you've heard here or think would be an interesting topic in the future.

Duration:00:05:31

Ask host to enable sharing for playback control

[Bite] Why Data Science projects fail

6/21/2021
Data Science in a commercial setting should be a no-brainer, right? Firstly, data is becoming ubiquitous, with gigabytes being generated and collected every second. And secondly, there are new and more powerful data science tools and algorithms being developed and published every week. Surely just bringing the two together will deliver success... In this episode, we explore why so many Data Science projects fail to live up to their initial potential. In a recent Gartner report, it is anticipated that 85% of Data Science projects will fail to deliver the value they should due to "bias in data, algorithms or the teams responsible for managing them". There are many reasons why data science projects stutter even aside from the data, the algorithms and the people. We discuss six key technical reasons why Data Science projects typically don't succeed based on our experience and one big non-technical reason! And being 'on the air' for a year now we'd like to give a big Thank You to all our brilliant guests and listeners - we really could not have done this without you! It's been great getting feedback and comments on episodes. Do get in touch jeremy@datacafe.uk or jason@datacafe.uk if you would like to tell us your experiences of successful or unsuccessful data science projects and share your ideas for future episodes. Further Reading and Resources ArticleWhy Big Data Science & Data Analytics Projects FailArticle10 reasons why data science projects failPressReleaseGartner Says Nearly Half of CIOs Are Planning to Deploy Artificial IntelligenceArticle6 Reasons Why Data Science Projects FailBlogReasons Why Data Projects Fail Some links above may require payment or login. We are not endorsing them or receiving any payment for mentioning them. They are provided as is. Often free versions of papers are available and we would encourage you to investigate. Recording date: 18 June 2021 Intro music by Music 4 Video Library (Patreon supporter) Thanks for joining us in the DataCafé. You can follow us on twitter @DataCafePodcast and feel free to contact us about anything you've heard here or think would be an interesting topic in the future.

Duration:00:19:22

Ask host to enable sharing for playback control

Data Science for Good

5/31/2021
What's the difference between a commercial data science project and a Data Science project for social benefit? Often so-called Data Science for Good projects involve a throwing together of many people from different backgrounds under a common motivation to have a positive effect. We talk to a Data Science team that was formed to tackle the unemployment crisis that is coming out of the pandemic and help people to find excellent jobs in different industries for which they have a good skills match. We interview Erika Gravina, Rajwinder Bhatoe and Dehaja Senanayake about their story helping to create the Job Finder Machine with the Emergent Alliance, DataSparQ, Reed and Google. Further Information Project:Job Finder MachineProject GroupEmergent AllianceDataSparQShout outCode First Girls Some links above may require payment or login. We are not endorsing them or receiving any payment for mentioning them. They are provided as is. Often free versions of papers are available and we would encourage you to investigate. Interview date: 25 March 2021 Recording date: 13 May 2021 Intro audio Music 4 Video Library (Patreon supporter) Thanks for joining us in the DataCafé. You can follow us on twitter @DataCafePodcast and feel free to contact us about anything you've heard here or think would be an interesting topic in the future.

Duration:00:36:14

Ask host to enable sharing for playback control

[Bite] Data Science and the Scientific Method

5/3/2021
The scientific method consists of systematic observation, measurement, and experiment, and the formulation, testing, and modification of hypotheses. But what does this mean in the context of Data Science, where a wealth of unstructured data and variety of computational models can be used to deduce an insight and inform a stakeholder's decision? In this bite episode we discuss the importance of the scientific method for data scientists. Data science is, after all, the application of scientific techniques and processes to large data sets to obtain impact in a given application area. So we ask how the scientific method can be harnessed efficiently and effectively when there is so much uncertainty in the design and interpretation of an experiment or model. Further Reading and Resources PaperDefining the scientific methodPaperBig data: the end of the scientific methodArticleThe Data Scientific MethodArticleThe scientific method of machine learningArticlePutting the 'Science' Back in Data SciencePodcastIn Our Time: The Scientific MethodPodcastThe end of the scientific methodVideoThe Scientific MethodCartoonMachine LearningSome links above may require payment or login. We are not endorsing them or receiving any payment for mentioning them. They are provided as is. Often free versions of papers are available and we would encourage you to investigate. Recording date: 30 April 2021 Intro music by Music 4 Video Library (Patreon supporter) Thanks for joining us in the DataCafé. You can follow us on twitter @DataCafePodcast and feel free to contact us about anything you've heard here or think would be an interesting topic in the future.

Duration:00:17:21

Ask host to enable sharing for playback control

Data Science on Mars

4/19/2021
On 30 July 2020 NASA launched the Mars 2020 mission from Earth carrying a rover called Perseverance, and rotorcraft called Ingenuity, to land on and study Mars. The mission so far has been a resounding success, touching down in Jezero Crater on 18 February 2021, and sending back data and imagery of the Martian landscape since then. The aim of the mission is to advance NASA's scientific goals of establishing if there was ever life on Mars, what its climate and geology are, and to pave the way for human exploration of the red planet in the future. Ingenuity will also demonstrate the first air flight on another world, in the low-density atmosphere of Mars approximately 1% of the density of Earth's atmosphere. The efforts involved are an impressive demonstration of the advances and expertise of the science, engineering, and project teams. Data from the mission will drive new scientific insights as well as prove the technical abilities demonstrated throughout. Of particular interest is the Terrain Relative Navigation (TRN) system that enables autonomous landing of missions on planetary bodies like Mars, being so far away that we cannot have ground communications on Earth in the loop. We talk with Prof. Paul Byrne, a planetary geologist from North Carolina State University, about the advances in planetary science and what the Mars 2020 mission means for him, his field of research, and for humankind. Further Reading and Resources Website:Profile page for Prof. Paul Byrne at the Center for Geospatial Analytics at NCSUhttps://bit.ly/3gkP4vDWebsite:Mars 2020https://mars.nasa.gov/mars2020/PaperMars 2020 Science Definition Team Reporthttps://go.nasa.gov/3x5d6AFVideoPerseverance Rover's Descent and Touchdown on Marshttps://bit.ly/32o6248Website:Lunar rocks and soils from Apollo missionshttps://curator.jsc.nasa.gov/lunar/Article:Terrain Relative Navigationhttps://go.nasa.gov/2RMd9RZPaperA General Approach to Terrain Relative Navigation for Planetary Landinghttps://bit.ly/3mXCN1zVideo:Terrain Relative Navigation, NASA JPLhttps://bit.ly/2QCcTEBVideo:Studying Alien Worlds to Understand Earthhttps://bit.ly/3tpZ1f3Some links above may require payment or login. We are not endorsing them or receiving any payment for mentioning them. They are provided as is. Often free versions of papers are available and we would encourage you to investigate. Interview date: 25 March 2021 Recording date: Thanks for joining us in the DataCafé. You can follow us on twitter @DataCafePodcast and feel free to contact us about anything you've heard here or think would be an interesting topic in the future.

Duration:00:58:35

Ask host to enable sharing for playback control

[Bite] How to hire a great Data Scientist

4/5/2021
Welcome to the first DataCafé Bite: a bite-size episode where Jason and Jeremy drop-in for a quick chat about a relevant or newsworthy topic from the world of Data Science. In this episode, we discuss how to hire a great Data Scientist, which is a challenge faced by many companies and is not easy to get right. From endless coding tests and weird logic puzzles, to personality quizzes and competency-based interviews; there are many examples of how companies try to assess how a candidate handles and reacts to data problems. We share our thoughts and experiences on ways to set yourself up for success in hiring the best person for your team or company. Have you been asked to complete a week-long data science mini-project for a company, or taken part in a data hackathon? We'd love to hear your experiences of good and bad hiring practice around Data Science. You can email us as jason at datacafe.uk or jeremy at datacafe.uk with your experiences. We'll be sure to revisit this topic as it's such a rich and changing landscape. Further Reading Article:Guide to hiring data Scientistshttps://bit.ly/2OjnALiArticle: Hiring a data scientist: the good the bad and the ugly!https://bit.ly/3cMpLR5Article: How to Hirehttps://bit.ly/3dCLTfOPodcast: How to start a startuphttps://bit.ly/3sOWxGUVideoAdam Grant: Hire for Culture Fit or Add?https://bit.ly/3cNGWl3Some links above may require payment or login. We are not endorsing them or receiving any payment for mentioning them. They are provided as is. Often free versions of papers are available and we would encourage you to investigate. Recording date: 1 April 2021 Intro music by Music 4 Video Library (Patreon supporter) Thanks for joining us in the DataCafé. You can follow us on twitter @DataCafePodcast and feel free to contact us about anything you've heard here or think would be an interesting topic in the future.

Duration:00:14:15

Ask host to enable sharing for playback control

Bayesian Inference: The Foundation of Data Science

3/23/2021
In this episode we talk about all things Bayesian. What is Bayesian inference and why is it the cornerstone of Data Science? Bayesian statistics embodies the Data Scientist and their role in the data modelling process. A Data Scientist starts with an idea of how to capture a particular phenomena in a mathematical model - maybe derived from talking to experts in the company. This represents the prior belief about the model. Then the model consumes data around the problem - historical data, real-time data, it doesn't matter. This data is used to update the model and the result is called the posterior. Why is this Data Science? Because models that react to data and refine their representation of the world in response to the data they see are what the Data Scientist is all about. We talk with Dr Joseph Walmswell, Principal Data Scientist at life sciences company Abcam, about his experience with Bayesian modelling. Further Reading Publication list for Dr. Joseph Walmswellhttps://bit.ly/3s8xluHBlogBayesian Inference for parameter estimationhttps://bit.ly/2OX46fVBook ChapterBayesian Inferencehttps://bit.ly/2Pi9Ct9ArticleThe Monty Hall problemhttps://bit.ly/3f1pefrPodcast"The truth about obesity and Covid-19"https://bbc.in/3lBqCGSArticle"Understanding lateral flow antigen testing for people without symptoms"https://bit.ly/313JDs9Article"Households and bubbles of pupils, students and staff of schools, nurseries and colleges: get rapid lateral flow tests"https://bit.ly/3c5ZXihSome links above may require payment or login. We are not endorsing them or receiving any payment for mentioning them. They are provided as is. Often free versions of papers are available and we would encourage you to investigate. Recording date: 16 March 2021 Interview date: 26 February 2021 Thanks for joining us in the DataCafé. You can follow us on twitter @DataCafePodcast and feel free to contact us about anything you've heard here or think would be an interesting topic in the future.

Duration:00:42:16

Ask host to enable sharing for playback control

Apple Tasting: Reinforcement learning for quality control

2/22/2021
Have you ever come home from the supermarket to discover one of the apples you bought is rotten? It's likely your trust for that grocer was diminished, or you might stop buying that particular brand of apples altogether. In this episode, we discuss how the quality controls in a production line need to use smart sampling methods in order to avoid sending bad products to the customer, which could ruin the reputation of both the brand and seller. To do this we describe a thought experiment called Apple Tasting. This allows us to demonstrate the concepts of regret and reward in a sampling process, giving rise to the use of Contextual Bandit Algorithms. Contextual Bandits come from the field of Reinforcement Learning which is a form of Machine Learning where an agent performs an action and tries to maximise the cumulative reward from its environment over time. Standard bandit algorithms simply choose between a number of actions and measure the reward in order to determine the average reward of each action. But a Contextual Bandit also uses information from its environment to inform both the likely reward and regret of subsequent actions. This is particularly useful in personalised product recommendation engines where the bandit algorithm is given some contextual information about the user. Back to Apple Tasting and product quality control. The contextual bandit in this scenario, consumes a signal from a benign test that is indicative, but not conclusive, of there being a fault and then makes the decision to perform a more in-depth test or not. So the answer for when you should discard or test your product depends on the relative costs of making the right decision (reward) or wrong decision (regret) and how your experience of the environment affected these in the past. We speak with Prof. David Leslie about how this logic can be applied to any manufacturing pipeline where there is a downside risk of not quality checking the product but a cost in a false positive detection of a bad product. Other areas of application include: With interview guest David Leslie, Professor of Statistical Learning in the Department of Mathematics and Statistics at Lancaster University. Further Reading Publication list for Prof. David Lesliehttp://bitly.ws/bQ4aPaper"Selecting Multiple Web Adverts - a Contextual Multi-armed Bandit with State Uncertainty" http://bitly.ws/bQ3XPaper"Apple tasting"http://bitly.ws/bQeWPaper "AutoML for Contextual Bandits"https://arxiv.org/abs/1909.03212Some links above may require payment or login. We are not endorsing them or receiving any payment for mentioning th Thanks for joining us in the DataCafé. You can follow us on twitter @DataCafePodcast and feel free to contact us about anything you've heard here or think would be an interesting topic in the future.

Duration:00:35:26

Ask host to enable sharing for playback control

Optimising the Future

1/4/2021
As we look ahead to a new year, and reflect on the last, we consider how data science can be used to optimise the future. But to what degree can we trust past experiences and observations, essentially relying on historical data to predict the future? And with what level of accuracy? In this episode of the DataCafé we ask: how can we optimise our predictions of future scenarios to maximise the benefit we can obtain from them while minimising the risk of unknowns? Data Science is made up of many diverse technical disciplines that can help to answer these questions. Two among them are mathematical optimisation and machine learning. We explore how these two fascinating areas interact and how they can both help to turbo charge the other's cutting edge in the future. We speak with Dimitrios Letsios from King's College London about his work in optimisation and what he sees as exciting new developments in the field by working together with the field of machine learning. With interview guest Dr. Dimitrios Letsios, lecturer (assistant professor) in the Department of Informatics at King's College London and a member of the Algorithms and Data Analysis Group. Further reading Dimirios Letsios' publication listhttps://bit.ly/35vHirHPaper on taking into account uncertainty in an optimisation modelApproximating Bounded Job Start Scheduling with Application in Royal Mail Deliveries under Uncertaintyhttps://bit.ly/3pLHICVPaper on lexicographic optimisationExact Lexicographic Scheduling and Approximate Reschedulinghttps://bit.ly/3rS8XxkPaper on combination of AI and OptimisationArgumentation for Explainable Schedulinghttps://bit.ly/3oobgGFSome links above may require payment or login. We are not endorsing them or receiving any payment for mentioning them. They are provided as is. Often free versions of papers are available and we would encourage you to investigate. Recording date: 23 October 2020 Interview date: 21 February 2020 Intro music by Music 4 Video Library (Patreon supporter) Thanks for joining us in the DataCafé. You can follow us on twitter @DataCafePodcast and feel free to contact us about anything you've heard here or think would be an interesting topic in the future.

Duration:00:35:54

Ask host to enable sharing for playback control

US Election Special

11/1/2020
What exciting data science problems emerge when you try to forecast an election? Many, it turns out! We're very excited to turn our DataCafé lens on the current Presidential race in the US as an exemplar of statistical modelling right now. Typically state election polls are asking around 1000 people in a state of maybe 12 million people how they will vote (or even if they have voted already) and return a predictive result with an estimated polling error of about 4%. In this episode, we look at polling as a data science activity and discuss how issues of sampling bias can have dramatic impacts on the outcome of a given poll. Elections are a fantastic use-case for Bayesian modelling where pollsters have to tackle questions like "What's the probability that a voter in Florida will vote for President Trump, given that they are white, over 60 and college educated". There are many such questions as each electorate feature (gender, age, race, education, and so on) potentially adds another multiplicative factor to the size of demographic sample needed to get a meaningful result out of an election poll. Finally, we even hazard a quick piece of psephological analysis ourselves and show how some naive Bayes techniques can at least get a foot in the door of these complex forecasting problems. (Caveat: correlation is still very important and can be a source of error if not treated appropriately!) Further reading: ArticleEnsemble Learning to Improve Machine Learning Resultshttps://bit.ly/34MW3HOPaperhttps://bit.ly/3efx5nmInteractive map:Explore The Ways Trump Or Biden Could Win The Electionhttps://53eig.ht/2TIlAvhPodcast:538 Politics Podcasthttps://53eig.ht/2HSkwCAUpdate US polling map:Consensus Forecast Electoral Maphttps://bit.ly/2HY1FWkSome links above may require payment or login. We are not endorsing them or receiving any payment for mentioning them. They are provided as is. Often free versions of papers are available and we would encourage you to investigate. Recording date: 30 October 2020 Intro music by Music 4 Video Library (Patreon supporter) Thanks for joining us in the DataCafé. You can follow us on twitter @DataCafePodcast and feel free to contact us about anything you've heard here or think would be an interesting topic in the future.

Duration:00:31:54

Ask host to enable sharing for playback control

Forecasting Solar Radiation Storms

10/19/2020
What are solar storms? How are they caused? And how can we use data science to forecast them? In this episode of DataCafé we talk about the Sun and how it drives space weather, and the efforts to forecast solar radiation storms that can have a massive impact here on Earth. On a regular day, the Sun has a constant stream of charged particles, or plasma, coming off its surface into the solar system, known as the solar wind. But in times of high activity it can undergo much more explosive phenomena: two of these being solar flares and coronal mass ejections (CMEs). These eruptions on the Sun launch energetic particles into space in the form of plasma and magnetic field that can reach us here on Earth and cause radiation storms and/or geomagnetic storms. These storms can degrade satellites, affect telecommunications and power grids, and disrupt space exploration and aviation. Although we can be glad the strongest events are rare, this means they are hard to predict because of the difficulties in observing, studying and classifying them. So the challenge then becomes, how can we forecast them? To answer this we speak to Dr. Hazel Bain, a research scientist specializing in the development of tools for operational space weather forecasting. She tells us about her efforts to bring together physics-based models with machine learning in order to improve solar storm forecasts and provide alerts to customers in industries like aviation, agriculture and space exploration. With special guest Dr. Hazel M Bain, Research Scientist at the Cooperative Institute for Research in Environmental Sciences (CIRES) at the University of Colorado, Boulder and NOAA’s Space Weather Prediction Center (SWPC). Further reading Online Presentation:Solar Radiation Stormshttps://bit.ly/3k8WuBcArticleNASA Space Weatherhttps://go.nasa.gov/2T3v5VGAlgorithmAdaBoosthttps://bit.ly/35bkfSUPress Release: New Space Weather Advisories Serve Aviationhttps://bit.ly/3dyqDHIPaperShock Connectivity in the 2010 August and 2012 July Solar Energetic Particle Events Inferred from Observations and ENLIL Modelinghttps://bit.ly/2IEtGTsPaperDiagnostics of Space Weather Drivers Enabled by Radio Observationshttps://arxiv.org/abs/1904.05817PaperBridging EUV and White-Light Observations to Inspect the Initiation Phase of a “Two-Stage” Solar Eruptive Eventhttps://arxiv.org/abs/1406.4919 Some links above may require payment or login. We are not endorsing them or receiving any payment for mentioning them Thanks for joining us in the DataCafé. You can follow us on twitter @DataCafePodcast and feel free to contact us about anything you've heard here or think would be an interesting topic in the future.

Duration:00:40:03

Ask host to enable sharing for playback control

Entrepreneurship in Data Science

9/19/2020
How do you get your latest and greatest data science tool to make an impact? How can you avoid wasting time building a supposedly great data product only to see it fall flat on launch? In this episode, we discuss how you need to start with the idea before you get to a data product. As all good entrepreneurs know, if you can't sell the idea, you're certainly not going to be able to sell the product. We take inspiration from a particular way of thinking about software engineering called Lean Startup, and learn how it can be applied to data science projects and to startups in general. We are lucky enough to talk with Freddie Odukomaiya, CTO of a startup that is aiming to revolutionise commercial property decision-making. He tells us about his entrepreneur journey, creating an innovative data tech company and we learn how Lean Startup has influenced the way he has approached developing his business. With interview guest Freddie Odukomaiya, CTO and Co-founder of GeoHood. Further reading Article: The Lean Startup Methodology by Eric RiesArticle: Data science and entrepreneurship: Business models for data scienceArticle:A Lean Start-up Approach to Data Science by Ben DiasPodcastLinear Digressions with Katie and BenSome links above may require payment or login. We are not endorsing them or receiving any payment for mentioning them. They are provided as is. Often free versions of papers are available and we would encourage you to investigate. Recording date: 11 September 2020 Interview date: 16 June 2020 Thanks for joining us in the DataCafé. You can follow us on twitter @DataCafePodcast and feel free to contact us about anything you've heard here or think would be an interesting topic in the future.

Duration:00:50:45