One challenge often arises when breaking down monolithic applications into microservices: reworking the data layer. One of the common tasks is to migrate the relational database into a more scalable option using NoSQL databases. Apache Cassandra is a popular choice, but there are a number of challenges when migrating the model from SQL database to it.
Is it possible? Yes! So let’s go through the process of migrating Relational Databases into NoSQL store. Maybe you will learn something in the way…
What about the other Challenges?
– Have enough understanding of the old application;
– Simplify the old model to mitigate possible addictions and bad design;
– Trying to think about all possibilities to make it right the first time(you probably won’t!).
Before start…
In order to start, two principles need to be clear:
Query first: Before start modelling tables and columns, one item that is more important is to understand how the data is retrieved. This will make the performance optimization much easier.
Denormalization: In the normal relational modelling you create tables and relate them to foreign keys. This new approach you create tables to support queries and those tables don’t have relationships (no Joins allowed).
Let’s do it piece by piece (like Jack the Ripper)
The entire process could be split into 2 phases. The first phase is a closer look at the monolithic application, in order to identify the application flow and strong entities. This is a crucial phase to understand the flow of the data and to identify what is a key point or what is the result of Bad Design. This will also give you a bigger picture of what needs to be migrated.
There are many ways of mapping old application queries. One suggestion is to run the existing application with DebugMode “ON” and try to identify action points, like Buttons and Screen Initialization(Menu or link), and track those actions into executions(Selects, Inserts, Updates, Deletes) on SQL database. For each action point that results in a Database execution, you should create a record in a separated document (ex. Spreadsheet). This document should contain as much as possible information about the function that you mapped, with description and queries that are related to that function(endpoint, action, description, service name, related entities, data queries).
The second phase uses the output of phase 1 to create the first draft of Application Queries. Those queries will drive the creation of keyspaces, tables and columns in the new model. The suggested approach uses the Methodology presented by Artem Chebotko from DataStack Inc. You can check the complete paper about it here.…in conclusion.
Moving to Cassandra is a complex job, but if you follow this procedure, the migration is a feasible task and easier than you think. One benefit of going through this path is a more scalable solution(at least we hope so!). Just before finishing if I could give some advice about it:
Must do:
- Tables need to be created considering mainly the queries and the columns that need to be filtered. Normally they are partition keys to your tables.
- Test tables performance against data, using automated tools(like JMeter).
Not to do:
- Try to reproduce any SQL features into Cassandra(Joins and index are not recommended!)
- Care too much about data replication on tables. This will happen frequently.
- Try to model right on the first try. The model should be evolved during the whole Development process.