The Great Continuous Migration: CDC Jobs With Kafka and Relational Migrator

Anaiya Raisinghani12 min read • Published May 02, 2024 • Updated May 02, 2024

AWS Kafka SQL MongoDB

Rate this tutorial

Are you ready to finally move your relational data over to MongoDB while ensuring every change to your database is properly handled? While this process can be jarring, MongoDB’s Relational Migrator is here to help simplify things. In this tutorial, we will go through in-depth how to conduct change data captures from your relational data from MySQL to MongoDB Atlas using Confluent Cloud and Relational Migrator.

What are CDC jobs?

Change data capture or CDC jobs are specific processes that track any and all changes in a database! Even if there is a small update to one row (or 100), a change data capture job will ensure that this change is accurately reflected. This is very important in a world where people want accurate results immediately — data needs to be updated constantly. From basic CRUD (create, read, update, delete) instances to more complex data changes, CDC jobs are incredibly important when dealing with data.

What is MongoDB Relational Migrator?

MongoDB Relational Migrator is our tool to help developers migrate their relational databases to MongoDB. The great part about it is that Relational Migrator will actually help you to write new code or edit existing code to ensure your migration process works as smoothly as possible, as well as automate the conversion process of your database's schema design. This means there’s less complexity and downtime and fewer errors than if tasked with dealing with this manually.

What is Confluent Cloud and why are we using it?

Confluent Cloud is a Kafka service used to handle real-time data streaming. We are using it to deal with streaming real-time changes from our relational database to our MongoDB Atlas cluster. The great thing about Confluent Cloud is it’s simple to set up and integrates seamlessly with a number of other platforms and connectors. Also, you don’t need Kafka to run production migrations as the embedded mode is sufficient for the majority of migrations.

We also recommend that users start off with the embedded version even if they are planning to use Relational Migrator in the future for a quick start since it has all of the same features, except for the additional resilience in long-running jobs.

Kafka can be relatively complex, so it’s best added to your migration job as a specific step to ensure there is limited confusion with the process. We recommend working immediately on your migration plan and schema design and then adding Kafka when planning your production cutover. Let’s get started.

Pre-requisites for success

MongoDB Atlas account
Amazon RDS account
Confluent Cloud account
MongoDB Relational Migrator — this tutorial uses version 1.5.
MySQL
MySQL Workbench — this tutorial uses version 8.0.36. Workbench is so you can visually interact with your MySQL database, so it is optional, but if you’d like to follow the tutorial exactly, please download it onto your machine.

Download MongoDB Relational Migrator

Please make sure you download Relational Migrator on your machine. The version we are using for this tutorial is version 1.5.0. Make sure it works and you can see it in your browser before moving on.

Create your sink database

While our relational database is our source database, where our data ends up is called our sink database. In this tutorial, we want our data and all our changes to end up in MongoDB, so let’s create a MongoDB Atlas cluster to ensure that happens. If you need help creating a cluster, please refer to the documentation. Please keep note of the region you’re creating your cluster in and ensure you are choosing to host your cluster in AWS. Keep your username and password somewhere safe since you’ll need them later on in this tutorial, and please make sure you’ve allowed access from anywhere (0.0.0.0/0) in your “Network Access” tab. If you do not have the proper network access in place, you will not be able to connect to any of the other necessary platforms. Note that “Access from Anywhere” is not recommended for production and is used for this tutorial for ease of reference. Grab your cluster’s connection string and save it in a safe place. We will need it later.

Get your relational database ready

For this tutorial, I created a relational database using MySQL Workbench. The data used is taken from Kaggle in the form of a .csv file, if you want to use the same one: World Happiness Index: 2019. Once your dataset has been properly downloaded into your MySQL database, let’s configure our relational database to our Amazon RDS account. For this tutorial, please make sure you’ve downloaded your .csv file into your MySQL database either by using the terminal commands or by using MySQL Workbench. We’re configuring our relational database to our Amazon RDS account so that instead of hosting our database locally, we can host it in the cloud, and then connect it to Confluent Cloud and ensure any changes to our database are accurately reflected when we eventually sync our data over to MongoDB Atlas.

Create a database in Amazon RDS

As of right now, Confluent Cloud’s Custom Connector only supports Amazon instances, so please ensure you’re using Amazon RDS for your relational databases since other cloud providers will not work at the moment. Since it’s important to keep everything secure, you will need to ensure networking access, with the possibility of requiring AWS Privatelink. Sign in to your Amazon account and head over to “Amazon RDS.” You can find it in the search bar at the top of the screen.

Click on “Databases” on the left-hand side of the screen. If you don’t have a database ready to use (specifically in your Amazon account), please create one by clicking the orange button. You’ll be taken to this page. Please select the MySQL option:

After selecting this, scroll down and change the MySQL version to the version compatible with your version of Workbench. For the tutorial, we are using version 8.0.36. Then, please fill out the Settings area. For your DB cluster identifier, choose a name for your database cluster. Choose a Master username, hit the Self managed credentials toggle, and fill in a password. Please do not forget this username and password, you will need it throughout the tutorial to successfully set up your various connections. For the rest of this database set-up process, you can keep everything default except please press the toggle to ensure the database allows Public Access. This is crucial! Follow the rest of the steps to complete and create your database.

When you see the green “Available” status button, that means your database is ready to go.

Create a parameter group

Now that our database is set up, we need to create a parameter group and modify some things to ensure we can do CDC jobs. We need to make sure this part works in order to successfully handle our CDC jobs. On the left-hand side of your Amazon RDS homepage, you’ll see the “Parameter groups” button. Please press that and create a new parameter group. Under the dropdown “Parameter group family,” please pick mysql8.0 since that is the version we are running for this tutorial. If you’re using something different, please feel free to use a different version. Give the parameter group a name and a description and hit the orange “create” button. Once it’s created, click on the parameter name, hit the “Edit” button, search for binlog_format, and change the “Value” column from “MIXED” to “ROW.” This is important to do because changing this setting allows for recording any database changes at a “row” level. This means each and every little change to your database will be accurately recorded. Without making this change, you won’t be able to properly conduct any CDC jobs. Now, let’s associate our database with this new parameter group. Click on “Databases,” choose the one we just created, and hit “Modify.” Scroll all the way down to “DB Parameter Group.” Click on the drop-down and associate it with the group you just created. As an example, here is mine:

Modify the instance and click “Save.” Once you’re done, go in and “Reboot” your database to ensure these changes are properly saved. Please keep in mind that you’re unable to reboot while the database is being modified and need to wait until it’s in the “Available” state. Head over to the “Connectivity & security” tab in your database and copy your “Endpoint” under where it says “Endpoint & port.” Now, we’re going to connect our Amazon RDS database to our MySQL Workbench!

Connect Amazon RDS to relational database

Launch MySQL Workbench and click the “+” button to establish a new connection. Your endpoint that was copied above will go into your “Hostname.” Keep the port the same. (It should be 3306.) Your username and password are from when you created your cluster. It should look something like this:

Click on “Test Connection” and you should see a successful connection.

If you’re unable to connect when you click on “Test Connection,” go into your Amazon RDS database, click on the VPC security group, click on “Edit inbound rules,” click on “Add rule,” select “All traffic” under “Type,” select “Anywhere-IPv4,” and save it. Try again and it will work.

Now, run a simple SQL command in Workbench to test and see if you can interact with your database and see the logs in Amazon RDS. I’m just running a simple update statement:

Code Snippet

This is just changing the original score of Finland from 7.769 to 7.8.

It’s been successfully changed and if we keep an eye on Amazon RDS, we don’t see any issues.

Now, let’s configure our Confluent Cloud account!

Configure Confluent Cloud account

Our first step is to create a new environment. We can use a free account here as well:

On the cluster page, please choose the “Basic” tier. This tier is free as well. Please make sure you have configured your zones and your region for where you are. These need to match up with both your MongoDB Atlas cluster region and your Amazon RDS database region.

Once your cluster is configured, we need to take note of a number of keys and IDs in order to properly connect to Relational Migrator. We need to take note of the:

Cluster ID.
Environment ID.
Bootstrap server.
REST endpoint.
Cloud API key and secret.
Kafka API key and secret.

You can find most of these from your “Cluster Settings,” and the Environment ID can be found on the right-hand side of your environment page in Confluent.

For Cloud API keys, click on the three lines on the right-hand side of Confluent’s homepage.

Click on “Cloud API keys” and grab the “key” and “secret” if you’ve already created them, or create them if necessary.

For the Kafka API keys, head over to your Cluster Overview, and on the left-hand side, click “API Keys” to create them. Once again, save your “key” and “secret.”

All of this information is crucial since you’re going to need it to insert into your user.properties folder to configure the connection between Confluent Cloud and MongoDB’s Relational Migrator.

As you can see from the documentation linked above, your Cloud API keys will be saved in your user.properties file as:

migrator.confluent.cloud-credentials.api-key
migrator.confluent.cloud-credentials.api-secret

And your Kafka API keys as:

migrator.confluent.kafka-credentials.api-key
migrator.confluent.kafka-credentials.api-secret

Now that we have our Confluent Cloud configured and all our necessary information saved, let’s configure our connection to MongoDB Relational Migrator.

Connect Confluent Cloud to MongoDB Relational Migrator

Prior to this step, please ensure you have successfully downloaded Relational Migrator locally.

We are going to use our terminal to access our user.properties file located inside our Relational Migrator download and edit it accordingly to ensure a smooth connection takes place.

Use the commands to find our file in your terminal window:

Code Snippet

Once you see your user.properties file, open it with:

Code Snippet

Once your file is opened, we need to make some edits. At the very top of the file, uncomment the line that says:

Code Snippet

Be sure to comment out anything else in this section that is uncommented. We only want the Confluent profile active. Immediately under this section, we need to add in all our keys from above. Do it as such:

Code Snippet

There is no need to edit anything else in this file. Just please make sure you’re using the correct server port: 8278.

Once this is properly edited, write it to the file using Ctr + O. Press enter, and exit the file using Ctr + X.

Now, once the file is saved, let’s run MongoDB Relational Migrator.

Running MongoDB Relational Migrator

We can get it up and running straight from our terminal. Use the commands shown below to do so:

Code Snippet

This will open Spring and the Relational Migrator in your browser:

Once Relational Migrator is running in your browser, connect it to your MySQL database:

You want to put in your host name (what we used to connect our Amazon RDS to MySQL Workbench in the beginning), the database with your data in it (mine is called amazonTest but yours will be different), and then your username and password. Hit the “Test connection” button to ensure the connection is successful. You’ll see a green bar at the bottom if it is.

Now, we want to select the tables to use. We are just going to click our database:

Then, define your initial schema. We are just going to start with a recommended MongoDB schema because it’s a little easier to work with.

Once this is done, you’ll see what your relational schema will look like once it’s migrated as documents in MongoDB Atlas!

Now, click on the “Data Migration” tab at the top of the screen. Remember we created a MongoDB cluster at the beginning of this tutorial for our sink data? We need all that connection information.

First, enter in again all your AWS RDS information that we had loaded in earlier. That is our source data, and now we are setting up our destination, or sink, database.

Enter in the MongoDB connection string for your cluster. Please ensure you are putting in the correct username and password.

Then, hit “Test connection” to make sure you can properly connect to your Atlas database.

When you first specify that you want a continuous migration, you will get this message saying you need to generate a script to do so. Click the button and a script will download and then will be placed in your MySQL Workbench. The script looks like this:

Code Snippet

/*
* Relational Migrator needs source database to allow change data capture.
* The following scripts must be executed on MySQL source database before starting migration.
* For more details, please see https://debezium.io/documentation/reference/stable/connectors/mysql.html#setting-up-mysql
*/

/*
* Before initiating migration job, the MySQL user is required to be able to connect to the source database.
* This MySQL user must have appropriate permissions on all databases for which the Relational Migrator is supposed to capture changes.
*
* Connect to Amazon RDS Mysql instance, follow the below link for instructions:
* https://dev.mysql.com/doc/mysql-cluster-excerpt/8.0/en/mysql-cluster-replication-schema.html
*
* Grant the required permissions to the user
*/

GRANT SELECT, RELOAD, SHOW DATABASES, REPLICATION SLAVE, REPLICATION CLIENT ON *.*  TO 'anaiya'@'%' ;

/* Finalize the user’s permissions: */
FLUSH PRIVILEGES;

/* Furthermore, binary logging must be enabled for MySQL replication on AWS RDS instance. Please see the below for instructions:
* https://aws.amazon.com/premiumsupport/knowledge-center/enable-binary-logging-aurora/
*
* If the instance is using the default parameter group, you will need to create a new one before you can make any changes.
* For MySQL RDS instances, create a Parameter Group for your chosen MySQL version.
* For Aurora MySQL clusters, create a DB Cluster Parameter Group for your chosen MySQL version.
* Edit the group and set the "binlog_format" parameter to "ROW".
* Make sure your database or cluster is configured to use the new Parameter Group.
*
* Please note that you must reboot the database cluster or instance to apply changes, follow below for instructions:
* https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/USER_RebootCluster.html
*/

Run this script in MySQL Workbench by hitting the lightning button. You’ll know it was successful if you don’t see any error messages in Workbench. You will also see that in Relational Migrator, the “Generate Script” message is gone, telling you that you can now use continuous snapshot.

Start it and it’ll run! Your snapshot stage will finish first, and then your continuous stage will run:

While the continuous snapshot is running, make a change in your database. I am changing the happiness score for Finland from 7.8 to 5.8:

Code Snippet

Once you run your change in MySQL Workbench, click on the “Complete CDC” button in Relational Migrator.

Now, let’s check out our MongoDB Atlas cluster and see if the data is properly loaded with the correct schema and our change has been properly streamed:

As you can see, all your information from your original MySQL database has been migrated to MongoDB Atlas, and you’re capable of streaming in any changes to your database!

Conclusion

In this tutorial, we have successfully migrated your MySQL data and set up continuous data captures to MongoDB Atlas using Confluent Cloud and MongoDB Relational Migrator. This is super important since it means you are able to see real-time changes in your MongoDB Atlas database which mirrors the changes impacting your relational database.

For more information and help, please use the following resources:

Rate this tutorial

Tutorial

Use MongoDB as the Data Store for your Strapi Headless CMS

Sep 23, 2022 | 8 min read

Tutorial

How to Leverage an Event-Driven Architecture with MongoDB and Databricks

Jul 13, 2023 | 9 min read

Tutorial

Adding Real-Time Notifications to Ghost CMS Using MongoDB and Server-Sent Events

Aug 14, 2023 | 7 min read

News & Announcements

MongoDB's New Time Series Collections

May 13, 2022 | 8 min read

What are CDC jobs?
What is MongoDB Relational Migrator?
What is Confluent Cloud and why are we using it?
Pre-requisites for success
Download MongoDB Relational Migrator
Create your sink database
Get your relational database ready
Create a database in Amazon RDS
Connect Amazon RDS to relational database
Configure Confluent Cloud account
Connect Confluent Cloud to MongoDB Relational Migrator
Running MongoDB Relational Migrator
Conclusion

MongoDB

The Great Continuous Migration: CDC Jobs With Kafka and Relational Migrator

What are CDC jobs?

What is MongoDB Relational Migrator?

What is Confluent Cloud and why are we using it?

Pre-requisites for success

Download MongoDB Relational Migrator

Create your sink database

Get your relational database ready

Create a database in Amazon RDS

Create a parameter group

Connect Amazon RDS to relational database

Configure Confluent Cloud account

Connect Confluent Cloud to MongoDB Relational Migrator

Running MongoDB Relational Migrator

Conclusion

Related

Use MongoDB as the Data Store for your Strapi Headless CMS

How to Leverage an Event-Driven Architecture with MongoDB and Databricks

Adding Real-Time Notifications to Ghost CMS Using MongoDB and Server-Sent Events

MongoDB's New Time Series Collections

Table of Contents