Machine Learning with Python
Alexander Cane
Ways to combat retraining depend on the algorithm and consist of the correct values of the trainer's met parameters. In practice, model estimation is not performed on the same input data that was used to train the model. Divide 10-20% of all available data into a separate set and call it a set for evaluation. We will bring the other 10-20% into the set for ratification, and 60-80% of the remaining ones will be given to the trainer. The principle of data sharing depends on the data and the task. Random sampling is often a good method if inputs are independent of each other, and there is no strong imbalance between the number of positive and negative entries.
The intuitive analogy here is the same as with university studies: the teacher solves some problems with students in pairs and gives other’s similar tasks in the exam. What is important here (both in teaching students and models) is that these tasks are varied, and students cannot simply memorize the answers, and those who have mastered the material (similar tasks) will be able to repeat the thought process and answer correctly.
In machine learning, we split into data two sets: we will use the evaluation set to evaluate each model we train, using different approaches, algorithms, and model types to select the best one. That is, for each model, we will have two precision values - precision on the training dataset and precision on the evaluation dataset. It is normal for the former to be higher than the second, but not significantly. A big difference indicates retraining.
Duration - 3h 23m.
Author - Alexander Cane.
Narrator - Alexander Cane.
Published Date - Friday, 12 January 2024.
Copyright - © 2020 MS Publishing LLC ©.
Location:
United States
Description:
Ways to combat retraining depend on the algorithm and consist of the correct values of the trainer's met parameters. In practice, model estimation is not performed on the same input data that was used to train the model. Divide 10-20% of all available data into a separate set and call it a set for evaluation. We will bring the other 10-20% into the set for ratification, and 60-80% of the remaining ones will be given to the trainer. The principle of data sharing depends on the data and the task. Random sampling is often a good method if inputs are independent of each other, and there is no strong imbalance between the number of positive and negative entries. The intuitive analogy here is the same as with university studies: the teacher solves some problems with students in pairs and gives other’s similar tasks in the exam. What is important here (both in teaching students and models) is that these tasks are varied, and students cannot simply memorize the answers, and those who have mastered the material (similar tasks) will be able to repeat the thought process and answer correctly. In machine learning, we split into data two sets: we will use the evaluation set to evaluate each model we train, using different approaches, algorithms, and model types to select the best one. That is, for each model, we will have two precision values - precision on the training dataset and precision on the evaluation dataset. It is normal for the former to be higher than the second, but not significantly. A big difference indicates retraining. Duration - 3h 23m. Author - Alexander Cane. Narrator - Alexander Cane. Published Date - Friday, 12 January 2024. Copyright - © 2020 MS Publishing LLC ©.
Language:
English
Opening Credits
Duration:00:00:10
2 introduction
Duration:00:13:35
3 chapter 1
Duration:00:15:56
4 chapter 2
Duration:00:16:33
5 chapter 3
Duration:00:16:25
6 chapter 4
Duration:00:12:15
7 chapter 5
Duration:00:20:12
8 chapter 6
Duration:00:27:35
9 chapter 7
Duration:00:13:49
10 chapter 8
Duration:00:06:25
11 chapter 9
Duration:00:31:37
12 chapter 10
Duration:00:08:06
13 chapter 11
Duration:00:12:29
14 conclusion
Duration:00:08:27
Ending Credits
Duration:00:00:10