
Premium
Title Page
1/1/2025
Copyright Page
1/1/2025
Dedication
1/1/2025
About the Author
1/1/2025
About the Reviewers
1/1/2025
Acknowledgement
1/1/2025
Preface
1/1/2025
Table of Contents
1/1/2025
1. Creating Data Repositories for Machine Learning
1/1/2025
Introduction
1/1/2025
Structure
1/1/2025
Objectives
1/1/2025
Introduction to data in ML
1/1/2025
Identifying data sources
1/1/2025
Identifying location of data
1/1/2025
Collecting data
1/1/2025
File formats for ML
1/1/2025
Types of data involved
1/1/2025
Analyzing data characteristics
1/1/2025
Determining storage mediums
1/1/2025
Conclusion
1/1/2025
Multiple choice questions
1/1/2025
Answer key
1/1/2025
2. Implementing Data Ingestion Solutions
1/1/2025
Introduction to data ingestion on AWS
1/1/2025
Understanding data ingestion
1/1/2025
Data ingestion in ML workflows
1/1/2025
Overview of AWS services for data ingestion
1/1/2025
Data processing type
1/1/2025
Batch load vs. streaming
1/1/2025
Batch load
1/1/2025
Streaming
1/1/2025
Choosing between batch load and streaming
1/1/2025
Use cases and implications for ML
1/1/2025
Services for batch data ingestion
1/1/2025
Services for real-time data ingestion
1/1/2025
Orchestrating data ingestion pipelines
1/1/2025
Principles of data pipeline orchestration
1/1/2025
Batch-based ML workloads
1/1/2025
Streaming-based ML workloads
1/1/2025
Understanding AWS services for data ingestion
1/1/2025
Real-time data streaming
1/1/2025
Concepts of Kinesis data streams
1/1/2025
Creating and using a data stream
1/1/2025
Scaling your stream
1/1/2025
Simplifying data loading
1/1/2025
Concepts of Kinesis Data Firehose
1/1/2025
Automating data loading
1/1/2025
Processing large datasets
1/1/2025
Concepts of Amazon EMR
1/1/2025
Scaling and optimization
1/1/2025
Serverless data integration
1/1/2025
Concepts of AWS Glue
1/1/2025
Using AWS Glue for data integration
1/1/2025
Leveraging AWS Glue for scalable data integration
1/1/2025
Advanced stream processing
1/1/2025
Concepts of Apache Flink
1/1/2025
Building a stream processing application
1/1/2025
Scaling and monitoring your application
1/1/2025
Scheduling jobs
1/1/2025
Strategies for job scheduling
1/1/2025
Tools for job scheduling in AWS
1/1/2025
Best practices for job management
1/1/2025
3. Transforming Data into Insights
1/1/2025
Understanding data transformation needs
1/1/2025
Data transformation techniques
1/1/2025
Different data transformation techniques
1/1/2025
AWS Glue and its role in data transformation
1/1/2025
Functioning of AWS Glue Data Catalog
1/1/2025
Practical example of using AWS Glue Data Catalog for a data lake
1/1/2025
AWS Glue Data Catalog crawlers
1/1/2025
AWS Glue best practices
1/1/2025
Handling ML-specific data
1/1/2025
Data structures for ML
1/1/2025
Big data processing frameworks overview
1/1/2025
Handling large datasets using SageMaker and EMR
1/1/2025
Optimizing data for ML algorithms
1/1/2025
Techniques to optimize data
1/1/2025
Best practices in data transformation for ML
1/1/2025
Impact of data quality on ML model performance
1/1/2025
Data transformation in action
1/1/2025
4. Data Sanitization and Preparation
1/1/2025
Introduction to data understanding
1/1/2025
Handling unstructured data on AWS
1/1/2025
Descriptive statistics and data exploration
1/1/2025
Identifying and handling missing or corrupt data
1/1/2025
Identifying missing data
1/1/2025
Handling missing data
1/1/2025
Identifying corrupt data
1/1/2025
Handling corrupt data
1/1/2025
Data preprocessing steps
1/1/2025
Data formatting
1/1/2025
Data normalization
1/1/2025
Data augmentation
1/1/2025
Data scaling
1/1/2025
File formats for ML workflows
1/1/2025
Data encryption and security services
1/1/2025
Navigating labeled data challenges
1/1/2025
5. Feature Engineering
1/1/2025
Definition and importance of feature engineering
1/1/2025
ML pipeline
1/1/2025
Identifying and extracting features from text data
1/1/2025
Tokenization
1/1/2025
Bag of Words
1/1/2025
Word embeddings
1/1/2025
N-grams
1/1/2025
Part-of-speech tagging
1/1/2025
Named entity recognition
1/1/2025
Sentiment analysis
1/1/2025
Tools and libraries
1/1/2025
Identifying and extracting features from speech data
1/1/2025
Techniques for feature extraction
1/1/2025
Mel-frequency cepstral coefficients
1/1/2025
Spectrogram
1/1/2025
Pitch and fundamental frequency
1/1/2025
Identifying and extracting features from an image
1/1/2025
Identifying and extracting features from numerical data
1/1/2025
Comparing feature engineering techniques
1/1/2025
6. Data Analysis and Visualization
1/1/2025
Creating graphs
1/1/2025
Scatter plots
1/1/2025
Time series plots
1/1/2025
Histograms
1/1/2025
Box plots
1/1/2025
Interpreting descriptive statistics
1/1/2025
Correlation
1/1/2025
Summary statistics
1/1/2025
Calculating the correlation coefficient
1/1/2025
P-value
1/1/2025
Performing cluster analysis
1/1/2025
Hierarchical clustering
1/1/2025
Diagnosis of clusters
1/1/2025
Elbow plot
1/1/2025
Determining cluster size
1/1/2025
7. Framing Business Problems as ML Problems
1/1/2025
Identifying ML applicability in business scenarios
1/1/2025
Supervised vs. unsupervised learning
1/1/2025
Supervised learning
1/1/2025
Working of supervised learning
1/1/2025
Types of supervised learning models
1/1/2025
Unsupervised learning
1/1/2025
Working of unsupervised learning
1/1/2025
Techniques used in unsupervised learning
1/1/2025
Hybrid learning
1/1/2025
Comparison of supervised and unsupervised learning
1/1/2025
8. Selecting Appropriate ML Models
1/1/2025
Overview of common ML models
1/1/2025
XGBoost
1/1/2025
Working of XGBoost
1/1/2025
Key features and advantages
1/1/2025
Best use cases and practical examples
1/1/2025
Disadvantages of XGBoost
1/1/2025
Logistic regression
1/1/2025
Working of logistic regression
1/1/2025
Advantages of logistic regression
1/1/2025
Log odds interpretation
1/1/2025
Limitations of logistic regression
1/1/2025
Suitable applications and examples
1/1/2025
Use cases not suitable for logistic regression
1/1/2025
Decision trees
1/1/2025
Working of decision trees
1/1/2025
Disadvantages of decision trees
1/1/2025
Random forests
1/1/2025
Working of random forests
1/1/2025
Disadvantages of random forests
1/1/2025
Understanding neural networks
1/1/2025
Recurrent neural networks
1/1/2025
Disadvantages of RNNs
1/1/2025
Convolutional neural networks
1/1/2025
Disadvantages of CNNs
1/1/2025
Insights into ensemble and transfer learning techniques
1/1/2025
Ensemble methods
1/1/2025
Disadvantages of ensemble methods
1/1/2025
Transfer learning
1/1/2025
Disadvantages of transfer learning
1/1/2025
Model selection criteria based on data and problem type
1/1/2025
AWS tools and services for model implementation
1/1/2025
AWS SageMaker
1/1/2025
Key features of AWS SageMaker
1/1/2025
Best use cases
1/1/2025
AWS Deep Learning AMIs
1/1/2025
Key features of AWS Deep Learning AMIs
1/1/2025
AWS Lambda and other services
1/1/2025
Key features of AWS Lambda
1/1/2025
Other AWS services for model implementation
1/1/2025
9. Training ML Models
1/1/2025
Data splitting
1/1/2025
Importance of data splitting
1/1/2025
Basic approach to training and validation sets
1/1/2025
Real-world scenario
1/1/2025
Advanced considerations in cross-validation
1/1/2025
Implementing k-fold cross-validation
1/1/2025
Pitfalls to avoid
1/1/2025
Best practices for data splitting
1/1/2025
Optimization techniques for ML training
1/1/2025
Role of optimization in ML training
1/1/2025
Understanding gradient descent as foundation of optimization
1/1/2025
Practical application of mini-batch gradient descent
1/1/2025
Advanced optimization techniques
1/1/2025
Momentum
1/1/2025