Database Maintenance, Data Migrations, and Model Versioning

Most data-driven applications are backed by at least a database, if not many. Schema change and data migration are the driving forces of the data reconciliation nightmare. This article puts a name on commonly known culprits, proposes some remediations, or provides resources that can help find a solution.

There is a more specific “How to version mongodb models” article that gets personal from a model versioning perspective.

In this article we will talk about:

Even though this blogpost was designed to offer complementary materials to those who bought my Testing nodejs Applications book, the content can help any software developer to tuneup working environment. You use this link to buy the book. Testing nodejs Applications Book Cover

Introduction

First iteration of the application is always easy to pull off. When subsequent iterations have schema changes, data migration or some major upgrades then problems start to creep in. This article name what to anticipate after deploying a database to production, and how to mitigate issues related to running a database production deployment.

The database migration term is seen from the following perspectives, at least in this article:

Database vendors have a pre-determined release cycles. That makes it easy to anticipate when a new release will be available. This makes it easy to align application development with supporting software release cycles. Aligning risky feature changes with cycles we don't expect huge changes from the database vendor. Such a model should work well within a continuous deployment environment as well.

Strategy

The database provider release cycle suggests that decoupling would make things a little easier, especially in data driven applications where schema change is inevitable and frequent.

As database engine changes have their own upgrade time, model changes should also have their own model version update time. With this knowledge, we have two granular scenario, one regarding underlying software upgrades and another regarding code change.

For database engine version upgrade, we can expect the reverse operation which is to downgrade to previous versions. For model revision updates, we can expect rollback as a reverse operation

It is better to have upgrades and model updates at separate times.

There is state-driven and migration-driven database migration strategies. State vs migration-driven database delivery helps to understand more on those two strategies.

Challenges when upgrading database engines

To clear things up, we will have to answer the following question:

When is the right time to migrate the database? When is the right time to switch the application to use the new version of the database?

When Upgrading mongodb versions, a challenge is how to achieve that in a production environment. The gravity of the problem goes from no expected issue to a more insane scenario such as incompatibility in data format, or changes in data access APIs.

Database engine upgrade presents three opportunities to do the upgrade:

On each opportunity has its own challenges, offer different opportunities and procedures. We always have to strive for safe procedures for database engine upgrades.

What are implications of driver API change when migrating to a new database version?

The database engine upgrade(major version) qualifies to keep the title of “database upgrade”. The following list has additional resources about the database upgrade subject:

Challenges when versioning mongoose models

Model versioning goes hand in hand with data migration. From the first perspective because change in model schema requires data structures to reflect definitions on the model.

However, given the inconsistencies that can be introduced in minor and major versions, it makes sense to do models versioning(mongoose schema changes) when the risk to have database engine related issues is really low: meaning only when patches are being released, before a new minor or major release.

Another issue to figure out, is making sure transformations are a part of booting from backups. That way, applying schema changes becomes a ritual, reducing the risk to have corrupt data at any single time.

What kind of problems to expect, when forced to run one migration script twice or thrice? Is the state going to be changed or preserved ~ keyword: the scripts should be re-runnable.

Release policy that aligns with database engine releases

The database in our context is mongodb, but tactics described here can be applied to any other kind of database.

To adopt a release policy that aligns with database engine releases, instead of colliding with it, helps making operations a little smooth.

Automated backups/restore

mongodb provides two mechanisms that we can tap into to automate backups and restore. mongodump/mongorestore and mongoexport/mongoimport.

Our task is to figure out how to fit transform operations when executing mongoimport or mongorestore operations.

What are the known problems that mongodump + mongorestore utilities may face when migrating to a new version of mongodb.

What are the known problems that mongoexport + mongoimport utilities may face when migrating to a new version of mongodb.

Additional resources about backup and restore:

Key tasks to execute when upgrading database engine

These tasks do not include containerized environments, such as docker or orchestrated environments such as kubernetes.

Finding and updating Packages

Every operating system has its own way of finding and upgrading its package. The next example, showcase how the same thing is accomplished in Linux environment, more specifically, Ubuntu OS.

# Fetches latest packages from PPA
$ apt-get update

# Upgrades and install latest versions.
$ apt-get upgrade
$ apt-get dist-upgrade

# Installing `mongodb` alone 
$ apt-get install -y mongodb-org

# Reloading newly installed package
$ service `mongodb` reload

Example:

Running MongoDB

Most of the time, the following commands are going to help for most start, restart or stop mongodb run as a service. The commands are usable for both Ubuntu and MacOS, and may be executed right after updating mongodb packages.

# not required all the time  
$ killall mongod && sleep 3 && service mongod start
$ service `mongodb` start

# macOS   
$ mongod # using mode is up to you, but it is not advised

# Finding if a service, in this case, `mongodb` is available
$ service --status-all

Example:

redis Maintenance and Migration

Key takeaway

Or things that are good to know

Conclusion

In this article, we revisited data reconciliation issues associated with schema change, database engine versions, data migration issues in development and production settings. There are additional complimentary materials in the “Testing nodejs applications” book.

References