
Premium
Title Page
1/30/2025
Copyright Page
1/30/2025
Dedication Page
1/30/2025
About the Authors
1/30/2025
About the Reviewers
1/30/2025
Acknowledgements
1/30/2025
Preface
1/30/2025
Table of Contents
1/30/2025
1. Understanding Data Engineering
1/30/2025
Introduction
1/30/2025
Structure
1/30/2025
Objectives
1/30/2025
Data engineering’s role in modern data systems
1/30/2025
Core concepts of data engineering
1/30/2025
Data processing and ingestion
1/30/2025
Data storage and serving
1/30/2025
Data orchestration and governance
1/30/2025
Lifecycle of data
1/30/2025
Conclusion
1/30/2025
Questions
1/30/2025
2. Data Engineering Patterns, Terminologies, and Technical Stack
1/30/2025
Understanding data engineering patterns
1/30/2025
Importance of data engineering patterns
1/30/2025
Examples of data engineering patterns
1/30/2025
Real-time data ingestion
1/30/2025
Caching
1/30/2025
Effective use of patterns
1/30/2025
Data processing and ingestion patterns
1/30/2025
Batch ingestion and processing
1/30/2025
Real-time ingestion and processing
1/30/2025
Micro-batching
1/30/2025
Lambda architecture
1/30/2025
ETL and ELT
1/30/2025
Data storage and processing patterns
1/30/2025
Databases and transactional data
1/30/2025
Data warehouse for data analytics
1/30/2025
Data lake and medallion architecture
1/30/2025
Data replication and partitioning
1/30/2025
Hot vs. cold storage
1/30/2025
Data caching and low-latency serving
1/30/2025
Data search patterns
1/30/2025
Domain specific patterns
1/30/2025
Miscellaneous patterns
1/30/2025
Data security patterns
1/30/2025
Data observability and monitoring patterns
1/30/2025
Idempotency and deduplication patterns
1/30/2025
Data orchestration patterns
1/30/2025
3. Batch Ingestion and Processing
1/30/2025
Use cases for batch systems
1/30/2025
ETL pipelines for a data warehouse
1/30/2025
Data archival pipelines
1/30/2025
Building precomputed aggregates for BI
1/30/2025
Training ML models
1/30/2025
Designing batch processing and ingestion system
1/30/2025
Technologies for batch systems
1/30/2025
Real-world examples
1/30/2025
Batch processing in banking
1/30/2025
Batch processing in retail media networks
1/30/2025
4. Real-time Ingestion and Processing
1/30/2025
Use cases for real-time systems
1/30/2025
Pipelines for real-time analytics
1/30/2025
Change data capture for high availability
1/30/2025
Real-time ML scoring
1/30/2025
Designing a real-time system
1/30/2025
Technologies for real-time systems
1/30/2025
Payment fraud detection
1/30/2025
Gaming
1/30/2025
5. Micro-batching
1/30/2025
Use cases for micro-batching
1/30/2025
Data ingestion into data lake
1/30/2025
Near real-time data analysis
1/30/2025
Data quality validations
1/30/2025
Designing micro-batching system
1/30/2025
Technologies for micro-batching systems
1/30/2025
Vehicle tracking in logistics
1/30/2025
IoT
1/30/2025
6. Lambda Architecture
1/30/2025
Use cases for Lambda architecture pattern
1/30/2025
Machine learning model creation and scoring
1/30/2025
Real-time data analysis with historical bias
1/30/2025
Designing system with a Lambda pattern
1/30/2025
Speed layer
1/30/2025
Batch layer
1/30/2025
Serving layer
1/30/2025
Technologies for Lambda systems
1/30/2025
Fintech
1/30/2025
Kafka setup instructions
1/30/2025
7. ETL and ELT
1/30/2025
Use cases for ETL and ELT patterns
1/30/2025
ETL in data warehousing
1/30/2025
ELT in clickstream analysis
1/30/2025
Designing ETL and ELT system
1/30/2025
Forward population using ELT
1/30/2025
Backward population using ETL
1/30/2025
Technologies for ETL and ELT systems
1/30/2025
Banking
1/30/2025
8. Data Fundamentals
1/30/2025
E-commerce application example
1/30/2025
Overview of data modeling
1/30/2025
Structured data and tabular data representation
1/30/2025
Semi-structured data and JSON data format
1/30/2025
JSON data format
1/30/2025
Structured vs. semi-structured data model
1/30/2025
Unstructured data and binary data format
1/30/2025
Transactional and analytical data
1/30/2025
Exercises
1/30/2025
9. Databases and Transactional Data
1/30/2025
Understanding relational databases
1/30/2025
Introduction to distributed databases
1/30/2025
Database views
1/30/2025
Primary and secondary indexes
1/30/2025
Primary indexes
1/30/2025
Secondary indexes
1/30/2025
Importance of index key order in secondary indexes
1/30/2025
ACID transactions in traditional RDBMS
1/30/2025
Transactions in distributed databases
1/30/2025
Durability in MongoDB
1/30/2025
Write to majority with journaling enabled
1/30/2025
Write to all replica sets
1/30/2025
Eventual consistency in DynamoDB
1/30/2025
10. Data Warehouse and Data Analytics
1/30/2025
Data analytics and business intelligence
1/30/2025
Data warehouse
1/30/2025
Differences between database and data warehouse
1/30/2025
Types of data workload
1/30/2025
Data serving latency
1/30/2025
Recent data vs. historical data
1/30/2025
Raw data vs. filtered and processed data
1/30/2025
Data storage format
1/30/2025
Database vs. data warehouse
1/30/2025
Features of data warehouse
1/30/2025
Materializes views
1/30/2025
Refreshing materialised views
1/30/2025
Database views vs. materialized views
1/30/2025
Columnar storage format
1/30/2025
Example of row-oriented and columnar storage formats
1/30/2025
Star schema and Snowflake schema
1/30/2025
Choice between star and Snowflake schemas
1/30/2025
11. Data Lake and Medallion Architecture
1/30/2025
Travel aggregator example
1/30/2025
Differences between data warehouse and data lake
1/30/2025
Data lake architecture
1/30/2025
Organizing data in data lake
1/30/2025
Use of extract-load-transform pattern
1/30/2025
Medallion architecture
1/30/2025
Transforming data from bronze layer
1/30/2025
Transforming data from silver layer
1/30/2025
Putting it all together
1/30/2025
Benefits of medallion architecture
1/30/2025
Separation of concerns
1/30/2025
Reusability of data pipeline
1/30/2025
Importance of bronze layer in medallion architecture
1/30/2025
12. Data Replication and Partitioning
1/30/2025
Faults and fault tolerance
1/30/2025
Basics of data replication
1/30/2025
Types of data replication
1/30/2025
Configuring more than one replica
1/30/2025
Reading the data from the replicas
1/30/2025
Cross datacenter replication
1/30/2025
Bi-directional XDCR and conflict resolution
1/30/2025
Data partitioning
1/30/2025
Hash partitioning
1/30/2025
Range partitioning
1/30/2025
Other popular partitioning schemes
1/30/2025
Scatter and gather operations
1/30/2025
13. Hot Versus Cold Data Storage
1/30/2025
Identifying hot, warm, and cold data
1/30/2025
Data access frequency
1/30/2025
Data recency
1/30/2025
Visualizing hot, warm, and cold data segregation
1/30/2025
Introduction to data caching
1/30/2025
Data archival
1/30/2025
Defining data lifecycle using AWS S3
1/30/2025
Accessing archived data
1/30/2025
Comparing storage classes
1/30/2025
14. Data Caching and Low Latency Serving
1/30/2025
Online movie database
1/30/2025
User authentication service
1/30/2025
Populating the cache
1/30/2025
Using local Memcached for caching
1/30/2025
Reading from the cache
1/30/2025
Quality of data caching and cache eviction policies
1/30/2025
Cache staleness, invalidation, and expiry
1/30/2025
Cache invalidation or cleanup
1/30/2025
Cache expiry
1/30/2025
Caching of pre-processed data
1/30/2025
Data prefetching
1/30/2025
Caching on laptops and mobile device
1/30/2025
15. Data Search Patterns
1/30/2025
Full text search
1/30/2025
Benefits of pre-processing
1/30/2025
Full text search example
1/30/2025
Advanced features of full text search
1/30/2025
Vector search
1/30/2025
Introduction to vector
1/30/2025
Vector similarity search
1/30/2025
Vector databases and vector indexes
1/30/2025
Using vector database
1/30/2025
Vector search example
1/30/2025
Quality of vector search results
1/30/2025