Database Maintenance, Data Migrations, and Model Versioning
Most data-driven applications are backed by at least a database, if not many. Schema change and data migration are the driving forces of the data reconciliation nightmare. This article puts a name on commonly known culprits, proposes some remediations, or provides resources that can help find a solution.
There is a more specific “How to version
mongodb
models” article that gets personal from a model versioning perspective.
In this article we will talk about:
- Better versioning of
mongoose
models. - Safe procedures for database engine upgrades
- Release cycles that do not collide schema change with database engine upgrades
- Synchronizing release(patch vs minors) with database engine releases(patch vs minors)
- Upgrade
mongoose
model version in a production environment - Upgrading to new
mongodb
database version in production environment - Align continuous deployment with upgrades of underlying infrastructure(database engines, database driver, model upgrades, data access frameworks, etc)
Even though this blogpost was designed to offer complementary materials to those who bought my Testing
nodejs
Applications book, the content can help any software developer to tuneup working environment. You use this link to buy the book.
Introduction
First iteration of the application is always easy to pull off. When subsequent iterations have schema changes, data migration or some major upgrades then problems start to creep in. This article name what to anticipate after deploying a database to production, and how to mitigate issues related to running a database production deployment.
The database migration term is seen from the following perspectives, at least in this article:
- Patching database engines (patch versions)
- Updating database engines (minor versions)
- Upgrading database engines (major versions)
- Patching database driver/ORM/ODM (patch versions)
- Updating database driver/ORM/ODM (minor versions)
- Upgrading database driver/ORM/ODM (major versions)
- Automated backups and restore
- Data migration from one database engine version to the next
- Model schema change on same database engine versions
- Model schema change on different database engine versions
- State vs Migration driven database delivery
Database vendors have a pre-determined release cycles. That makes it easy to anticipate when a new release will be available. This makes it easy to align application development with supporting software release cycles. Aligning risky feature changes with cycles we don't expect huge changes from the database vendor. Such a model should work well within a continuous deployment environment as well.
Strategy
The database provider release cycle suggests that decoupling would make things a little easier, especially in data driven applications where schema change is inevitable and frequent.
As database engine changes have their own upgrade time, model changes should also have their own model version update time. With this knowledge, we have two granular scenario, one regarding underlying software upgrades and another regarding code change.
For database engine version upgrade, we can expect the reverse operation which is to downgrade to previous versions. For model revision updates, we can expect rollback as a reverse operation
It is better to have upgrades and model updates at separate times.
There is
state-driven
andmigration-driven
database migration strategies. State vs migration-driven database delivery helps to understand more on those two strategies.
Challenges when upgrading database engines
To clear things up, we will have to answer the following question:
When is the right time to migrate the database? When is the right time to switch the application to use the new version of the database?
When Upgrading mongodb
versions, a challenge is how to achieve that in a production environment. The gravity of the problem goes from no expected issue to a more insane scenario such as incompatibility in data format, or changes in data access APIs.
Database engine upgrade presents three opportunities to do the upgrade:
- Patching database engines (patch versions)
- Updating database engines (minor versions)
- Upgrading database engines (major versions)
On each opportunity has its own challenges, offer different opportunities and procedures. We always have to strive for safe procedures for database engine upgrades.
What are implications of driver API change when migrating to a new database version?
The database engine upgrade(major version) qualifies to keep the title of “database upgrade”. The following list has additional resources about the database upgrade subject:
- Avoid downtime by Using Blue-Green Deployment to reduce Downtime Risk
- Deploying
mongodb
using docker - Running a
mongodb
Database in Kubernetes with StatefulSets - Automated blue/green database deployments
- Zero Downtime Deployment with a Database
- 100 Percent Online Deployments: Blue-Green Deployment
Challenges when versioning mongoose models
Model versioning goes hand in hand with data migration. From the first perspective because change in model schema requires data structures to reflect definitions on the model.
However, given the inconsistencies that can be introduced in minor and major versions, it makes sense to do models versioning(mongoose
schema changes) when the risk to have database engine related issues is really low: meaning only when patches are being released, before a new minor or major release.
Another issue to figure out, is making sure transformations are a part of booting from backups. That way, applying schema changes becomes a ritual, reducing the risk to have corrupt data at any single time.
What kind of problems to expect, when forced to run one migration script twice or thrice? Is the state going to be changed or preserved ~ keyword: the scripts should be re-runnable.
Release policy that aligns with database engine releases
The database in our context is mongodb
, but tactics described here can be applied to any other kind of database.
To adopt a release policy that aligns with database engine releases, instead of colliding with it, helps making operations a little smooth.
- Release cycles that do not collide schema change with database engine version bumps
- Synchronizing release(patch vs minors) with database engine releases(patch vs minors)
Automated backups/restore
mongodb
provides two mechanisms that we can tap into to automate backups and restore. mongodump
/mongorestore
and mongoexport
/mongoimport
.
Our task is to figure out how to fit transform operations when executing mongoimport
or mongorestore
operations.
What are the known problems that
mongodump
+mongorestore
utilities may face when migrating to a new version ofmongodb
.What are the known problems that
mongoexport
+mongoimport
utilities may face when migrating to a new version ofmongodb
.
Additional resources about backup and restore:
- How to back up restore and migrate a
mongodb
database on ubuntu 14.04 mongodb
cluster migration with zero downtime
Key tasks to execute when upgrading database engine
- [ ] task one ~ backup and lock writes
- [ ] task two ~ upgrade packages
- [ ] task three ~ stop/restart the database server
- [ ] task four ~ execute migration scripts
- [ ] task five ~ booting the database server from new version
These tasks do not include containerized environments, such as
docker
or orchestrated environments such askubernetes
.
Finding and updating Packages
Every operating system has its own way of finding and upgrading its package. The next example, showcase how the same thing is accomplished in Linux environment, more specifically, Ubuntu OS.
# Fetches latest packages from PPA
$ apt-get update
# Upgrades and install latest versions.
$ apt-get upgrade
$ apt-get dist-upgrade
# Installing `mongodb` alone
$ apt-get install -y mongodb-org
# Reloading newly installed package
$ service `mongodb` reload
Example:
Running MongoDB
Most of the time, the following commands are going to help for most start, restart or stop mongodb
run as a service. The commands are usable for both Ubuntu and MacOS, and may be executed right after updating mongodb
packages.
# not required all the time
$ killall mongod && sleep 3 && service mongod start
$ service `mongodb` start
# macOS
$ mongod # using mode is up to you, but it is not advised
# Finding if a service, in this case, `mongodb` is available
$ service --status-all
Example:
redis
Maintenance and Migration
Key takeaway
Or things that are good to know
- To avoid “loss of fidelity” in migration tasks, use BSON instead of JSON
mongodb
can detect Engine if a collection or database exists and take action accordingly.- Run migration during non-peak hours: migrations are CPU/Memory/Disk intensive tasks. They can bring the system down.
- Lock writes, to preserve data consistency across shards+datacenters(non-peak hours|maintenance window)
db.fsyncLock() + db.fsynchUnlock()
to lock and unlock database writes during migration task 1, chunk migration- Migration based on Export + Import functions do not guarantee data integrity
- Migration based on
mongodump
andmongorestore
guarantee data integrity
Conclusion
In this article, we revisited data reconciliation issues associated with schema change, database engine versions, data migration issues in development and production settings. There are additional complimentary materials in the “Testing nodejs
applications” book.
References
- Testing
nodejs
Applications book - “How We Do
mongodb
Migrations at Coinbase” Coinbase Blog - How To Back Up, Restore, and Migrate a
mongodb
Database on Ubuntu 14.04 ~ DigitalOcean community tutorials - Migrate ~ to
mongodb
Atlas Mongodb Documentation - Managing Schema changes with MongoDB ~ Derick Rethans Blog
- “How to run two versions of
mongodb
on a same linux machine” ~ StackOverflow Answer - “How to Upgrade
mongodb
Versions Without Downtime” ~ Percona Live Session - How to run
mongodb
on Kubernetes ~ Google Developers Labs