Building Multimodal Generative AI and Agentic Applications

Indrajit Kar

This audiobook is narrated by a digital voice. DESCRIPTION Generative AI and agentic AI are reshaping how we interact with data, enabling intelligent systems that can reason, generate, and autonomously act across multiple modalities. From text and...

Premium Chapters

Premium

Title Page

1/1/2025

Copyright Page

1/1/2025

Dedication Page

1/1/2025

About the Author

1/1/2025

About the Reviewers

1/1/2025

Acknowledgement

1/1/2025

Preface

1/1/2025

1. Introducing New Age Generative AI

1/1/2025

Introduction

1/1/2025

Structure

1/1/2025

Objectives

1/1/2025

Overview of generative AI

1/1/2025

Retrieval system

1/1/2025

Sparse retrieval

1/1/2025

Dense retrieval

1/1/2025

Generation system

1/1/2025

Types of generation systems

1/1/2025

Autoregressive generation

1/1/2025

Prompting strategies

1/1/2025

Understanding where generation systems excel

1/1/2025

Combining retrieval and generation

1/1/2025

Retrieval-augmented generation

1/1/2025

RAG working

1/1/2025

Architecture of a basic RAG pipeline

1/1/2025

Types of RAG architectures

1/1/2025

Iterative RAG

1/1/2025

Vector databases and RAG

1/1/2025

Prompt engineering for RAG

1/1/2025

Advanced RAG techniques

1/1/2025

Applications of RAG

1/1/2025

Orchestration in AI systems

1/1/2025

Orchestration in RAG systems

1/1/2025

Orchestration in agentic systems

1/1/2025

Tokens in AI systems

1/1/2025

Vector database

1/1/2025

Understanding vector databases

1/1/2025

Indexing algorithms in vector databases

1/1/2025

Search algorithms in vector databases

1/1/2025

Embeddings and embedding models

1/1/2025

Importance of vector databases for RAG and agentic systems

1/1/2025

Reranking

1/1/2025

Bi-encoders vs. cross-encoders

1/1/2025

Cross-encoders for reranking

1/1/2025

Guardrails

1/1/2025

Types of guardrails

1/1/2025

Methods of applying guardrails

1/1/2025

Without guardrails

1/1/2025

Industry examples of guardrail solutions

1/1/2025

Agents

1/1/2025

Agentic RAG vs. non-agentic RAG

1/1/2025

Model Context Protocols

1/1/2025

Conclusion

1/1/2025

2. Deep Dive into Multimodal Systems

1/1/2025

Understanding vision-language models

1/1/2025

Categories of vision-language models

1/1/2025

Core architectural components of vision-language models

1/1/2025

Challenges in vision-language models

1/1/2025

Multimodal GenAI system

1/1/2025

Multimodal vector embedding

1/1/2025

Multimodal vector database

1/1/2025

Collections

1/1/2025

Points and point IDs

1/1/2025

Vectors

1/1/2025

Payload

1/1/2025

Storage and vector store

1/1/2025

Indexing

1/1/2025

Implementation comparisons

1/1/2025

Single collection, partitioned via payload

1/1/2025

Multiple collections with global indexing

1/1/2025

Multimodal generative AI systems vs. VLMs

1/1/2025

Vision-language models

1/1/2025

Multimodal generative AI systems

1/1/2025

Using vision-language models

1/1/2025

Using multimodal generative AI systems

1/1/2025

Real-world example comparison

1/1/2025

Output-based classification of multimodal systems

1/1/2025

Text-to-image systems

1/1/2025

Image-to-text systems

1/1/2025

Text and image systems

1/1/2025

Text-only to specifications and image systems

1/1/2025

Text-to-SQL systems

1/1/2025

Text-to-code systems

1/1/2025

3. Implementing Unimodal Local GenAI System

1/1/2025

GPU in today’s generative AI systems

1/1/2025

Using a local GPU

1/1/2025

Architectural components

1/1/2025

About Ollama

1/1/2025

Alternatives to Ollama

1/1/2025

Generate a PDF document with Ollama

1/1/2025

RAG implementation

1/1/2025

Load and chunk the PDF document

1/1/2025

Alternative chunking strategies in LangChain

1/1/2025

Creating embeddings with metadata

1/1/2025

Using them in code

1/1/2025

Hybrid search with semantic and keyword

1/1/2025

Other retrievers you can use

1/1/2025

Conversation memory buffer

1/1/2025

LLM configuration natural language generation

1/1/2025

ReAct prompt template

1/1/2025

Building the conversational QA chain

1/1/2025

User chat loop

1/1/2025

Challenges in RAG

1/1/2025

4. Implementing Unimodal API-based GenAI Systems

1/1/2025

Getting started with OpenAI APIs and models

1/1/2025

OpenAI as a company

1/1/2025

Overview of the OpenAI API

1/1/2025

Core API endpoints

1/1/2025

Major OpenAI models

1/1/2025

Accessing OpenAI models

1/1/2025

Choosing the right model

1/1/2025

Best practices for beginners

1/1/2025

From OpenAI to agentic AI

1/1/2025

OpenAI’s agentic API ecosystem

1/1/2025

Responses API

1/1/2025

Agents SDK

1/1/2025

Operator

1/1/2025

Codex

1/1/2025

Assistants API

1/1/2025

Multi-document query

1/1/2025

Implementing modular RAG with OpenAI

1/1/2025

Main controller

1/1/2025

Configuration

1/1/2025

Embedding initialization

1/1/2025

Vector store setup

1/1/2025

Metadata tagging

1/1/2025

Document loading and chunking

1/1/2025

Hybrid retriever

1/1/2025

Enforce metadata-based filtering during retrieval

1/1/2025

Language model

1/1/2025

Prompt template

1/1/2025

RAG chain assembly

1/1/2025

Conversational memory

1/1/2025

Dependencies

1/1/2025

To do

1/1/2025

5. Implementing Agentic GenAI Systems with Human-in-the-loop

1/1/2025

Architecting agentic GenAI systems

1/1/2025

Parallel pattern

1/1/2025

Sequential pattern

1/1/2025

Loop pattern

1/1/2025

Router pattern

1/1/2025

Aggregator pattern

1/1/2025

Network pattern

1/1/2025

Hierarchical pattern

1/1/2025

Human-in-the-loop pattern

1/1/2025

Shared tools pattern

1/1/2025

Database with tools pattern

1/1/2025

Memory transformation using tools

1/1/2025

Planner-executor pattern

1/1/2025

Critic or validator pattern

1/1/2025

Negotiator pattern

1/1/2025

Multimodal agent pattern

1/1/2025

Voting or consensus pattern

1/1/2025

Supervisor-subordinate pattern

1/1/2025

Watchdog or recovery pattern

1/1/2025

Temporal planner pattern

1/1/2025

Human-in-the-loop

1/1/2025

End-to-end human-in-the-loop RAG workflow

1/1/2025

From HITL to multi-agent human-in-the-loop RAG

1/1/2025

Agentic AI vs. AI agents

1/1/2025

6. Two and Multi-stage GenAI Systems

1/1/2025

Concepts of interactions in dense retrievals

1/1/2025

No interaction

1/1/2025

Full interaction

1/1/2025

Late interaction

1/1/2025

Multi-vector representations

1/1/2025

Differentiation from late interaction architectures

1/1/2025

Role of interaction models in two-stage RAG systems

1/1/2025

Interaction in the retrieval phase

1/1/2025

Reranking with various interaction models

1/1/2025

Integration into two-stage RAG architectures

1/1/2025

Two-stage RAG architecture

1/1/2025

Stage one dense retrievals

1/1/2025

Stage-two, reranking for semantic precision

1/1/2025

The strategic role of two-stage design

1/1/2025

Two-stage RAG vs. late interaction

1/1/2025

Capabilities of ColBERT and ColPali

1/1/2025

Use of two-stage RAG

1/1/2025

Multi-stage RAG

1/1/2025

Beyond two-stage systems

1/1/2025

Components of multi-stage RAG

1/1/2025

Benefits of multi-stage RAG

1/1/2025

Types of multi-stage RAG

1/1/2025

Grading mechanisms

1/1/2025

Challenges and considerations

1/1/2025

Token utilization in multi-stage RAG systems

1/1/2025

Grading types

1/1/2025

Implementation of multi-stage RAG workflow with routing

1/1/2025

7. Building a Bidirectional Multimodal Retrieval System

1/1/2025

Integration and design implications

1/1/2025

Understanding a multimodal retrieval system

1/1/2025

Technical architecture

1/1/2025

Applications and implications

1/1/2025

Code implementation and explanation

1/1/2025

Requirement

1/1/2025

Frontend

1/1/2025

Data directory

1/1/2025

The retrieval system

1/1/2025

Loaders

1/1/2025

Embedding utils

1/1/2025

Building Multimodal Generative AI and Agentic Applications

Indrajit Kar

This audiobook is narrated by a digital voice. DESCRIPTION Generative AI and agentic AI are reshaping how we interact with data, enabling intelligent systems that can reason, generate, and autonomously act across multiple modalities. From text and...

Title Page

Copyright Page

Dedication Page

About the Author

About the Reviewers

Acknowledgement

Preface

Table of Contents

1. Introducing New Age Generative AI

Introduction

Structure

Objectives

Overview of generative AI

Retrieval system

Sparse retrieval

Dense retrieval

Generation system

Types of generation systems

Autoregressive generation

Prompting strategies

Understanding where generation systems excel

Combining retrieval and generation

Retrieval-augmented generation

RAG working

Architecture of a basic RAG pipeline

Types of RAG architectures

Iterative RAG

Vector databases and RAG

Prompt engineering for RAG

Advanced RAG techniques

Applications of RAG

Orchestration in AI systems

Orchestration in RAG systems

Orchestration in agentic systems

Tokens in AI systems

Vector database

Understanding vector databases

Indexing algorithms in vector databases

Search algorithms in vector databases

Embeddings and embedding models

Importance of vector databases for RAG and agentic systems

Reranking

Bi-encoders vs. cross-encoders

Cross-encoders for reranking

Guardrails

Types of guardrails

Methods of applying guardrails

Without guardrails

Industry examples of guardrail solutions

Agents

Agentic RAG vs. non-agentic RAG

Model Context Protocols

Conclusion

2. Deep Dive into Multimodal Systems

Understanding vision-language models

Categories of vision-language models

Core architectural components of vision-language models

Challenges in vision-language models

Multimodal GenAI system

Multimodal vector embedding

Multimodal vector database

Collections

Points and point IDs

Vectors

Payload

Storage and vector store

Indexing

Implementation comparisons

Single collection, partitioned via payload

Multiple collections with global indexing

Multimodal generative AI systems vs. VLMs

Vision-language models

Multimodal generative AI systems

Using vision-language models

Using multimodal generative AI systems

Real-world example comparison

Output-based classification of multimodal systems