I use this example when I speak about and teach devops trainings.
I call it the migration sandwich. (Nothing to do with the cube rule).
A piece of bread isn't a sandwich and a single migration in a tool like alembic isn't a "sandwich" either. You have a couple layers of bread with one or several layers of toppings and it's not a sandwich until it's all done.
People get a laugh out of the "idiot sandwich meme" and we always have a good conversation about what gnarly migrations people have seen or done (72+ hours of runtime, splitting to dozens or more tables and then reconstructing things, splitting things out to safely be worked on in the expanded state for weeks, etc).
I had never heard it called "expand and contract" before reading this article a few years ago.
I have usually heard it called "A–AB–B migrations". As in, you support version A, then you support both version A and version B, then you support just version B.
The rest of the sequencing details follow from this idea.
I’ve always paired this with the strangler pattern.
“In programming, the strangler fig pattern or strangler pattern is an architectural pattern that involves wrapping old code, with the intent of redirecting it to newer code.”
That is a very reductionist view and not particularly useful either. It shares elements with versioning and you could likely implement this using explicit versioning, but it is completely independent of it.
The main difference I see is that of focus here the focus is to migrate a database without downtime or excessive global locks, keeping multiple versions of the schema is a detail.
Versioning is a concept where each version lives in non-intersecting time intervals.
This concept is completely focusing on the fact that your structures lifetimes must absolutely have non-empty intersections. It's close to the opposite.
> "Versioning is a concept where each version lives in non-intersecting time intervals."
Is it? Node.js publishes "Current", "LTS" and "Maintenance" versions, and there's always a reasonable time interval during which consumers typically upgrade from eg Maintenance to newer LTS or even Current. From the publishing side, that's very similar to "expand and contract", in temporarily expanding what's supported to include Current, and dropping support for oldest versions leaving Maintenance. It's continuous instead of ad hoc, and there are more than 2 versions involved, but the principle is basically the same (at least if you squint).
Though I guess if you're talking strictly about schema management strategies, then yeah, "versioning" might be very different from "expand contract", as you noted.
To me is just a blue-green type deployment for schemas. You have an old and a new thing, you split and merge as traffic replays to the new thing and shows that its viable and not breaking, you swap over as you can.
I’ve heard that one too. I think the key insight is that you need to “stop the bleeding” (stop creating more old data that needs to be migrated) before you do any backfilling. That’s why I always called it dual writing because that’s the stop the bleeding step.
In the step where writes are added, from the client to the new schema instance, how does this work when writes depend on existing data, when the existing data hasn’t even been copied? Constrains like foreign keys will prevent this from working, no?
You wouldn’t enable foreign key constraints until you finish backfilling the old data in that case. Or you do it in phases where you do all of these steps and migrate the dependency first.
2) migration involves the problem of mixing a migration write with an actual live in flight mutation. Cassandra would solve this with additional per cell write time tracking or a migrated vs new mutation flag
3) and then you have deletes. So you'll need a tombstone mechanism, because if a live delete of a cell value is overwritten by a migrated value, than data that is deleted comes back to life
This is the model we used at the SaaS I worked for a decade ago. It worked great to allow for smooth, zero-downtime upgrades across a fleet of thousands of DB servers serving tens of thousands of app servers and millions of active users.
I’m confused. I thought Expand and Contract was about mutating an existing schema, adding columns and tables, not creating a full replacement schema. But maybe I misunderstood?
What’s in the article, I know as the Strangler Fig Pattern.
> For column-level changes, this often means adding new columns to a table that have the characteristics you want while leaving the current columns as-is.
I think what makes it confusing is that their diagrams depict a completely separate schema, but what they describe is really just altering the existing schema.
> What’s in the article, I know as the Strangler Fig Pattern.
Strangler fig pattern is mostly concerned with migrating from an old software to a new software, from example from a monolith to microservices. But I guess you can also apply it to database schemas.
I use Prisma on almost all my node.js projects these days, and I wish that part of schema migrations was also automated by Prisma. But last I checked, it doesn't even rename columns properly.
I feel like maybe they should invest more R&D in their migrations technology? The ORM is pretty great.
Lack of reasonable support for migrations turned me off to Prisma when I first encountered it (in a KCDodds Remix app circa 2021-ish). I'm surprised that's still unaddressed.
Is there any easy way to implement this pattern in AWS RDS deployments where we need to deploy multiple times a day and need it to be done in few minutes?
In my experience, this process typically spans multiple deploys. I would say the key insight that I have taken away from decades of applying this approach, is that data migrations need to be done in an __eventually consistent__ approach, rather than as an all-or-nothing, stop-the-world, global transaction or transformation.
Indeed, this pattern, in particular, is extremely useful in environments where you are trying to making changes to one part of a system while multiple deploys are happening across the entire system, or where you are dealing with a change that requires a large number of clients to be updated where you don't have direct control of those clients or they operate in a loosely-connected fashion.
So, regardless of AWS RDS as your underlying database technology, plan to break these steps up into individual deployment steps. I have, in fact, done this with systems deployed over AWS RDS, but also with systems deployed to on-prem SQL Server and Oracle, to nosql systems (this is especially helpful in those environments), to IoT and mobile systems, to data warehouse and analysis pipelines, and on and on.
im not sure about the name, but this is a great little doc for introducing junior devs to migrations.
the only thing i would add is a minor and major version changes, so its clear how the different class ent stages are labeled/how you track when you're ready to backfill
I'm hearing you out, but how is this going to affect the part of this that is client behavior rather than database behavior? If there is some kind of sdk that actually captures the interface here (that is, that the client needs to be compatible with both versions of the schema at once for a while) and pushes that back to the client, that could be interesting, like a way to define that column "name" and columns "first name", "last name" are conceptually part of the same thing and that the client code paths must provide handling for both at once.
It should be this way. Clients should have some protocol to communicate the schema they expect to the database probably with some versioning scheme. The database should be able to serve multiple mutually compatible views over the schema (stay robust to column renames for example). The database should manage and prevent the destruction of in use views of that schema. After an old view has been made incompatible, old clients needing that view should be locked out.
> The database should manage and prevent the destruction of in use views of that schema. After an old view has been made incompatible, old clients needing that view should be locked out.
this is the interesting part where the article's prpcess matters. how do you make incompatible changes without breaking clients?
On the Rails side, Gitlab has an extensive set of helpers for this that a lot of Rails projects have adopted—I would love to see them pulled out into a Gem or adopted into Rails core proper: https://gitlab.com/gitlab-org/gitlab-foss/blob/master/lib/gi...
I use this example when I speak about and teach devops trainings.
I call it the migration sandwich. (Nothing to do with the cube rule).
A piece of bread isn't a sandwich and a single migration in a tool like alembic isn't a "sandwich" either. You have a couple layers of bread with one or several layers of toppings and it's not a sandwich until it's all done.
People get a laugh out of the "idiot sandwich meme" and we always have a good conversation about what gnarly migrations people have seen or done (72+ hours of runtime, splitting to dozens or more tables and then reconstructing things, splitting things out to safely be worked on in the expanded state for weeks, etc).
I had never heard it called "expand and contract" before reading this article a few years ago.
What does everyone else call these?
I have usually heard it called "A–AB–B migrations". As in, you support version A, then you support both version A and version B, then you support just version B.
The rest of the sequencing details follow from this idea.
I’ve always paired this with the strangler pattern.
“In programming, the strangler fig pattern or strangler pattern is an architectural pattern that involves wrapping old code, with the intent of redirecting it to newer code.”
https://en.wikipedia.org/wiki/Strangler_fig_pattern
This is the same pattern as versioning, but with an extremely short sunset for the old version.
That is a very reductionist view and not particularly useful either. It shares elements with versioning and you could likely implement this using explicit versioning, but it is completely independent of it.
The main difference I see is that of focus here the focus is to migrate a database without downtime or excessive global locks, keeping multiple versions of the schema is a detail.
Seems like you agree with my assessment.
:(
Things can have similarities without being the same. Also not being unique is not a moral failure.
I think I am failing to understand your stance.
Saying the pattern is the same isn't the same as saying two things fitting the pattern are the same in every respect.
Patterns are necessarily reductionist, like any sort of comparison of things that aren't 100% similar.
There is value in recognizing patterns. They're useful for comprehension and memory.
> This is the same pattern as versioning
I guess we disagree on the reasonable interpretations of this sentence
That's only half of the sentence
It's actually not.
Versioning is a concept where each version lives in non-intersecting time intervals.
This concept is completely focusing on the fact that your structures lifetimes must absolutely have non-empty intersections. It's close to the opposite.
> "Versioning is a concept where each version lives in non-intersecting time intervals."
Is it? Node.js publishes "Current", "LTS" and "Maintenance" versions, and there's always a reasonable time interval during which consumers typically upgrade from eg Maintenance to newer LTS or even Current. From the publishing side, that's very similar to "expand and contract", in temporarily expanding what's supported to include Current, and dropping support for oldest versions leaving Maintenance. It's continuous instead of ad hoc, and there are more than 2 versions involved, but the principle is basically the same (at least if you squint).
Though I guess if you're talking strictly about schema management strategies, then yeah, "versioning" might be very different from "expand contract", as you noted.
It is often necessary to use multiple versions at the same time
To me is just a blue-green type deployment for schemas. You have an old and a new thing, you split and merge as traffic replays to the new thing and shows that its viable and not breaking, you swap over as you can.
Evens between odds. Where even schema versions are migrations and 'divisible between two'.
But green/blue and A/AB/B I've used before to discuss the same.
This is just the natural solution that falls out if you want to change a schema with no downtime. I always just called it “dual writing”.
We always called these "four-phase migrations". An old Stripe article used similar naming[0].
[0]: https://stripe.com/blog/online-migrations
I’ve heard that one too. I think the key insight is that you need to “stop the bleeding” (stop creating more old data that needs to be migrated) before you do any backfilling. That’s why I always called it dual writing because that’s the stop the bleeding step.
In the step where writes are added, from the client to the new schema instance, how does this work when writes depend on existing data, when the existing data hasn’t even been copied? Constrains like foreign keys will prevent this from working, no?
You wouldn’t enable foreign key constraints until you finish backfilling the old data in that case. Or you do it in phases where you do all of these steps and migrate the dependency first.
1) double write essentially
2) migration involves the problem of mixing a migration write with an actual live in flight mutation. Cassandra would solve this with additional per cell write time tracking or a migrated vs new mutation flag
3) and then you have deletes. So you'll need a tombstone mechanism, because if a live delete of a cell value is overwritten by a migrated value, than data that is deleted comes back to life
This is the model we used at the SaaS I worked for a decade ago. It worked great to allow for smooth, zero-downtime upgrades across a fleet of thousands of DB servers serving tens of thousands of app servers and millions of active users.
The following academic work (132 page pdf) elucidates this pattern in the context of a real application:
https://ris.utwente.nl/ws/portalfiles/portal/275963001/PDEng...
I’m confused. I thought Expand and Contract was about mutating an existing schema, adding columns and tables, not creating a full replacement schema. But maybe I misunderstood?
What’s in the article, I know as the Strangler Fig Pattern.
What you describe is what they describe as well:
> For column-level changes, this often means adding new columns to a table that have the characteristics you want while leaving the current columns as-is.
I think what makes it confusing is that their diagrams depict a completely separate schema, but what they describe is really just altering the existing schema.
> What’s in the article, I know as the Strangler Fig Pattern.
Strangler fig pattern is mostly concerned with migrating from an old software to a new software, from example from a monolith to microservices. But I guess you can also apply it to database schemas.
Expand Contract from Fowler's bliki
https://martinfowler.com/bliki/ParallelChange.html
Expand the interface contract and then contract the interface contract? ;)
I use Prisma on almost all my node.js projects these days, and I wish that part of schema migrations was also automated by Prisma. But last I checked, it doesn't even rename columns properly.
I feel like maybe they should invest more R&D in their migrations technology? The ORM is pretty great.
Lack of reasonable support for migrations turned me off to Prisma when I first encountered it (in a KCDodds Remix app circa 2021-ish). I'm surprised that's still unaddressed.
Lack of support for running migrations, or generating migrations?
What do you find good about Prisma?
The ORM is great! I usually setup my projects so the Prisma generated types are shared with the front end and build my own types on top of those.
I like the prisma schema first way of specifying my models too. It’s pretty intuitive and readable, it centralizes all my models in one place.
The migration system could be more advanced but does the job. Multiple production projects I worked heavily on use it.
Overall I think it’s very well designed software
Is there any easy way to implement this pattern in AWS RDS deployments where we need to deploy multiple times a day and need it to be done in few minutes?
In my experience, this process typically spans multiple deploys. I would say the key insight that I have taken away from decades of applying this approach, is that data migrations need to be done in an __eventually consistent__ approach, rather than as an all-or-nothing, stop-the-world, global transaction or transformation.
Indeed, this pattern, in particular, is extremely useful in environments where you are trying to making changes to one part of a system while multiple deploys are happening across the entire system, or where you are dealing with a change that requires a large number of clients to be updated where you don't have direct control of those clients or they operate in a loosely-connected fashion.
So, regardless of AWS RDS as your underlying database technology, plan to break these steps up into individual deployment steps. I have, in fact, done this with systems deployed over AWS RDS, but also with systems deployed to on-prem SQL Server and Oracle, to nosql systems (this is especially helpful in those environments), to IoT and mobile systems, to data warehouse and analysis pipelines, and on and on.
im not sure about the name, but this is a great little doc for introducing junior devs to migrations.
the only thing i would add is a minor and major version changes, so its clear how the different class ent stages are labeled/how you track when you're ready to backfill
Ok hear me out. What if this whole process was statefully managed for you as an add on to your database?
Like you essentially defined the steps in a temporal like workflow and then it does all the work of expanding, verifying and contracting.
I'm hearing you out, but how is this going to affect the part of this that is client behavior rather than database behavior? If there is some kind of sdk that actually captures the interface here (that is, that the client needs to be compatible with both versions of the schema at once for a while) and pushes that back to the client, that could be interesting, like a way to define that column "name" and columns "first name", "last name" are conceptually part of the same thing and that the client code paths must provide handling for both at once.
If you solve that "verifying" step, you will already revolutionize software development.
It should be this way. Clients should have some protocol to communicate the schema they expect to the database probably with some versioning scheme. The database should be able to serve multiple mutually compatible views over the schema (stay robust to column renames for example). The database should manage and prevent the destruction of in use views of that schema. After an old view has been made incompatible, old clients needing that view should be locked out.
> The database should manage and prevent the destruction of in use views of that schema. After an old view has been made incompatible, old clients needing that view should be locked out.
this is the interesting part where the article's prpcess matters. how do you make incompatible changes without breaking clients?
On the Rails side, Gitlab has an extensive set of helpers for this that a lot of Rails projects have adopted—I would love to see them pulled out into a Gem or adopted into Rails core proper: https://gitlab.com/gitlab-org/gitlab-foss/blob/master/lib/gi...
[dead]