Launch a Fully Managed RAG Workflow With MongoDB Atlas and Amazon Bedrock

Babu Srinivasan, Igor Alekseev, Erik Onnen6 min read • Published May 02, 2024 • Updated May 08, 2024

AWS Atlas

Rate this tutorial

Introduction

MongoDB Atlas is now natively integrated with Amazon Bedrock Knowledge Base, making it even easier to build generative AI applications backed by enterprise data.

Amazon Bedrock, Amazon Web Services’ (AWS) managed cloud service for generative AI, empowers developers to build applications on top of powerful foundation models like Anthropic's Claude, Cohere Embed, and Amazon Titan. By integrating with Atlas Vector Search, Amazon Bedrock enables customers to leverage the vector database capabilities of Atlas to bring up-to-date context to Foundational Model outputs using proprietary data.

With the click of a button (see below), Amazon Bedrock now integrates MongoDB Atlas as a vector database into its fully managed, end-to-end retrieval-augmented generation (RAG) workflow, negating the need to build custom integrations to data sources or manage data flows.

Companies using MongoDB Atlas and Amazon Bedrock can now rapidly deploy and scale generative AI apps grounded in the latest up-to-date and accurate enterprise data. For enterprises with the most demanding privacy requirements, this capability is also available via AWS PrivateLink (more details at the bottom of this article).

What is retrieval-augmented generation?

One of the biggest challenges when working with generative AI is trying to avoid hallucinations, or erroneous results returned by the foundation model (FM) being used. The FMs are trained on public information that gets outdated quickly and the models cannot take advantage of the proprietary information that enterprises possess.

One way to tackle hallucinating FMs is to supplement a query with your own data using a workflow known as retrieval-augmented generation, or RAG. In a RAG workflow, the FM will seek specific data — for instance, a customer's previous purchase history — from a designated database that acts as a “source of truth” to augment the results returned by the FM. For a generative AI FM to search for, locate, and augment its responses, the relevant data needs to be turned into a vector and stored in a vector database.

How does the Knowledge Base integration work?

Within Amazon Bedrock, developers can now “click to add” MongoDB Atlas as a knowledge base for their vector data store to power RAG.

In the workflow, a customer chooses two different models: an embedding model and a generative model. These models are then orchestrated and used by Bedrock Agents during the interaction with the knowledge base — in this case, MongoDB Atlas.

Bedrock reads your text data from an S3 bucket, chunks the data, and then uses the embedding model chosen by the user to create the vector embeddings, storing these text chunks, embeddings, and related metadata in MongoDB Atlas’ vector database. An Atlas vector search index is also created as part of the setup for querying the vector embeddings.

Why choose MongoDB Atlas as a Bedrock knowledge base?

MongoDB Atlas combines operational, vector, and metadata in a single platform, making it an ideal knowledge base for Amazon Bedrock users who want to augment their generative AI experiences while also simplifying their generative AI stack.

In addition, MongoDB Atlas gives developers the ability to set up dedicated infrastructure for search and vector search workloads, optimizing compute resources to scale search and database independently.

Solution architecture

In the architecture diagram above, documents are uploaded to the Amazon Bedrock Knowledge Base (via S3) and stored within the MongoDB Atlas vector store. User queries are then addressed through specialized Amazon Bedrock Agents tailored to individual use cases, utilizing the MongoDB Atlas vector search functionality.

Dataset

In this demo, we use the Best Practices Guide for MongoDB to populate our knowledge base. Please download the PDF (by clicking on “Read Whitepaper” or “Email me the PDF”). Alternatively, you can download it from the GitHub repository. Once you have the PDF, upload it into an S3 bucket for hosting. (Note the bucket name as we will use it later in the article.)

Prerequisites

MongoDB Atlas account
AWS account

Implementation steps

Atlas Cluster and Database Setup

Login or Signup to MongoDB Atlas
Setup the MongoDB Atlas cluster with a M10 or greater configuration. Note M0 or free cluster will not support this setup.
Setup the database user and Network access.
Copy the connection string.
Create a database and collection

Atlas Vector Search index

Before we create an Amazon Bedrock knowledge base (using MongoDB Atlas), we need to create an Atlas Vector Search index.

In the MongoDB Atlas Console, navigate to your cluster and select the Atlas Search tab.

Select Create Search Index, select Atlas Vector Search, and select Next.

Select the database and the collection where the embeddings are stored.

Supply the following JSON in the index definition and click Next, confirming and creating the index on the next page.
Code Snippet{ "fields": [ { "numDimensions": 1536, "path": "bedrock_embedding", "similarity": "cosine", "type": "vector" }, { "path": "bedrock_metadata", "type": "filter" }, { "path": "bedrock_text_chunk", "type": "filter" } ] }

Note: The fields in the JSON are customizable but should match the fields we configure in the Amazon Bedrock AWS console. If your source content contains filter metadata, the fields need to be included in the JSON array above in the same format: {"path": "<attribute_name>","type":"filter"}.

Amazon Bedrock Knowledge Base

In the AWS console, navigate to Amazon Bedrock, and then click Get started.

Next, click on Model Access.

Ensure that the Amazon and Anthropic models are selected.

Next, navigate to Knowledge Bases in the left-hand menu and select Create Knowledge Base.

Give your knowledge base a name and select Next. (Add an optional description, if you’d like.)

Supply the S3 bucket where you uploaded the PDF of “Best Practices Guide for MongoDB” from earlier and select Next.

Next, select Titan Embeddings Model.

Scroll down to configure MongoDB Atlas as the vector database which was set up earlier.

Scroll down to fill out the MongoDB configuration options.
- The below configuration steps assume connectivity to MongoDB Atlas over the Internet and it’s recommended only for non-production use cases.
- To configure the connectivity over the secured PrivateLink (PL), follow the additional steps (detailed in the ReadMe) of the CDK script to configure Endpoint Service.
- To supply the secret ARN, create a secret in this format: {"username":"xxxx","password":"xxx"}.
  Note: As a recommended security practice, the credentials should NOT have Atlas Admin privileges. They should be no more permissive than the Read and write to any database permission.

And metadata field mappings from the JSON file you set up earlier on Atlas via the JSON editor.

Next, review and create the knowledge base.

Once the creation is complete, navigate to the Data Source and click the Sync button to sync the data source.

When the sync completes, you can navigate to your database collection in the MongoDB Atlas console. Note the vector size matches the embeddings model vector size.

Amazon Bedrock Agent

Amazon Bedrock Agents orchestrate interactions between foundation models, data sources, software applications, and user conversations. In addition, agents automatically call APIs to take actions and invoke knowledge bases to supplement information for these actions

In the AWS Bedrock console, create an Agent.

Provide the agent Name and an optional description.

Select a model and provide the prompt.

For our agent, we will skip the Action Group and configure our knowledge base instead. Select the knowledge base configured earlier, supply instructions for the Agent and select Add.

Next, save the configuration to create the agent.

Once the agent is successfully created, go ahead and test it by asking a question.

Conclusion

This article demonstrates the process of establishing a knowledge base in Amazon Bedrock, using MongoDB Atlas as the vector database. Once set up, Amazon Bedrock will use your MongoDB Atlas Knowledge Base for data ingestion, and subsequently craft an Agent capable of responding to inquiries based on your accurate, proprietary data.

Useful links and resources:

Rate this tutorial

Article

AI Shop: The Power of LangChain, OpenAI, and MongoDB Atlas Working Together

Nov 29, 2023 | 7 min read

Tutorial

Atlas Data Federation with Azure Blob Storage

Mar 07, 2024 | 7 min read

Tutorial

Next Gen Web Apps with Remix and MongoDB Atlas Data API

Feb 11, 2024 | 10 min read

Tutorial

Streamlining Cloud-Native Development with Gitpod and MongoDB Atlas

Apr 02, 2024 | 5 min read

Introduction
What is retrieval-augmented generation?
How does the Knowledge Base integration work?
Why choose MongoDB Atlas as a Bedrock knowledge base?
Solution architecture
Dataset
Prerequisites
Implementation steps
Amazon Bedrock Agent
Conclusion
Useful links and resources:

Atlas

Launch a Fully Managed RAG Workflow With MongoDB Atlas and Amazon Bedrock

Introduction

What is retrieval-augmented generation?

How does the Knowledge Base integration work?

Why choose MongoDB Atlas as a Bedrock knowledge base?

Solution architecture

Dataset

Prerequisites

Implementation steps

Atlas Cluster and Database Setup

Atlas Vector Search index

Amazon Bedrock Knowledge Base

Amazon Bedrock Agent

Conclusion

Useful links and resources:

Related

AI Shop: The Power of LangChain, OpenAI, and MongoDB Atlas Working Together

Atlas Data Federation with Azure Blob Storage

Next Gen Web Apps with Remix and MongoDB Atlas Data API

Streamlining Cloud-Native Development with Gitpod and MongoDB Atlas

Table of Contents