Sometimes, old design decisions comes back to bite you. This is one of those tales.
A project I’m working on had a Django model similar to this:
class Municipality(models.Model):
code = models.CharField(max_length=2, primary_key=True)
name = models.CharField(max_length=100)
and it was used by other models such as:
class ZipCode(models.Model):
code = models.CharField(max_length=4, primary_key=True)
municipality = models.ForeignKey(Municipality)
And all was good, until we needed to expand the municipality model to support different countries, and thus a single unique field with only the code
- which may collide across countries, was not enough.
For all the modern parts of the code base we use UUID
s as the primary key,
so we wanted to migrate the primary key of Municipality
to a UUID
, while
retaining all the relations.
As of September 2017, Django does not support migrating primary keys in any nice manner, so we’re on our own here.
We tried a lot of magic solutions, but we always got stuck with the migrations system not being able to detect and handle the changes nicely.
After some research and quite a bit of trial and error, we settled on the following approach. It has some drawbacks I’ll get back to, but works reasonable well.
A quick reminder on how the world looks from a database perspective. When you define a ForeignKey
field in Django, it creates a database column of the same type as the primary key of the referenced model on the referring model, and adds the foreign key constraints. So in the example above, we have two tables (in pseudo SQL):
CREATE TABLE municipality (
code varchar(2) PRIMARY KEY NOT NULL,
name varchar(100)
);
CREATE TABLE zipcode (
code varchar(4) PRIMARY KEY NOT NULL,
municipality_id VARCHAR(2) REFERENCES(municipality.id) NOT NULL
);
So, we need to:
- Break the foreign key constraints.
- Alter the root model.
- Map the the new primary key ids to the old ones.
- Re-apply the foreign keys to it.
We start by breaking the foreign keys in the referring models.
class ZipCode(models.Model):
code = ... # Same as before
municipality = models.CharField(max_length=2) # Foreign key removed here
python manage.py makemigrations -n break_zipcode_muni_foreignkey
Now that the Municipality
model is free from any external referring models,
we can go to work on it.
Start by adding the new id
field:
class Municipality(models.Model):
id = models.UUIDField(default=uuid.uuid4)
python manage.py makemigrations -n add_id_field_to_muni
For some reason, using uuid.uuid4()
as a default function in the migration didn’t work in my case, so I added a step in the created migration to create new unique ids for all rows:
def create_ids(apps, schema_editor):
Municipality = apps.get_model('myapp', 'Municipality')
for m in municipality:
m.id = uuid.uuid4()
m.save()
# ...
operations = [
migrations.AddField(...),
migrations.RunPython(
code=create_ids,
reverse_code=migrations.RunPython.noop,
),
]
Now we have a UUID
id
field on Municipality
, and we should be able to switch the primary key around:
class Municipality(models.Model):
id = models.UUIDField(default=uuid.uuid4, primary_key=True) # primary_key added
code = models.CharField(max_length=2, unique=True) # primary_key replaced with unique
Create the migration, and make sure that the AlterField
operation for code
is run before the AlterField
on id
so that we never have two primary keys at the same time. We’ve added primary_key
to the id
field and unique=True
to the code
field, since we still want to enforce that constraint for now, and we lost it when we removed the primary_key
attribute from it.
Congratulations, we now have a new UUID
primary key. But we still need to clean up everything we broke the foreign keys from.
Lets start by creating an empty migration:
python manage.py makemigrations --empty -n fix_zipcode_fk_to_muni_uuid myapp
Open the file, and let us begin:
def match(apps, schema_editor):
ZipCode = apps.get_model('myapp', 'ZipCode')
Municipality = apps.get_model('myapp', 'Municipality')
for zip_code in ZipCode.object.all():
zip_code.temp_muni = Municipality.objects.get(code=z.municipality)
zip_code.save()
# ...
operations = [
migrations.AddField(
model_name='zipcode',
name='temp_muni',
field=models.UUIDField(null=True),
),
migrations.RunPython(
code=match,
reverse_code=migrations.RunPython.noop,
),
migrations.RemoveField(model_name='zipcode', name='municipality'),
migrations.RenameField(
model_name='zipcode', old_name='temp_muni', new_name='municipality'),
migrations.AlterField(
model_name='zipcode',
name='municipality',
field=models.ForeignKey(
on_delete=django.db.models.deletion.PROTECT,
to='municipality')
]
Let us go through the steps here.
- Add a temporary field for storing the
UUID
ofMunicipality
that we want to connect to. We don’t make it aForeignKey
field just yet, as Django gets confused about the naming later on. - We run the
match
function to look up the new ids by the old lookup key, and store it in the temporary fieldtemp_muni
. - Remove the old
municipality
field. - Rename the temporary field to
municipality
. - Finally migrate the type of
municipality
to a foreign key to create all the database constraints we need.
And there you go. All done.
There are some down sides here. Since we split the migrations into several files/migrations, we leave ourself vulnerable if any of the later migrations fail. This will probably leave the application in a pretty unworkable state. So make sure to test the migrations quite a bit. You can reduce the risk by hand editing all the steps into one migration, but if you have references from multiple different apps, then you need to the breaking and restoring in separate steps anyway.
Logging / Debugging
You’ll most likely end up with some SQL errors during the process of creating these, so a nice trick I like to do is to create a simple logging migration operation.
def log(message):
def fake_op(apps, schema_editor):
print(message)
return fake_op
# ...
operations = [
migration.RunPython(log('Step 1')),
migration.AlterField(..),
migration.RunPython(log('Step 2')),
# ...
]
This allows you to see where in the process the migration fail.
To see what SQL Django creates for a given migration, run
python manage.py sqlmigrate <appname> <migration_number>
. This is super useful for checking whether operations are run in the order that you expect.