Creating seed data in a flask-migrate or alembic migration

Each Answer to this Q is separated by one/two green lines.

How can I insert some seed data in my first migration? If the migration is not the best place for this, then what is the best practice?

"""empty message

Revision ID: 384cfaaaa0be
Revises: None
Create Date: 2013-10-11 16:36:34.696069

"""

# revision identifiers, used by Alembic.
revision = '384cfaaaa0be'
down_revision = None

from alembic import op
import sqlalchemy as sa


def upgrade():
    ### commands auto generated by Alembic - please adjust! ###
    op.create_table('list_type',
    sa.Column('id', sa.Integer(), nullable=False),
    sa.Column('name', sa.String(length=80), nullable=False),
    sa.PrimaryKeyConstraint('id'),
    sa.UniqueConstraint('name')
    )
    op.create_table('job',
    sa.Column('id', sa.Integer(), nullable=False),
    sa.Column('list_type_id', sa.Integer(), nullable=False),
    sa.Column('record_count', sa.Integer(), nullable=False),
    sa.Column('status', sa.Integer(), nullable=False),
    sa.Column('sf_job_id', sa.Integer(), nullable=False),
    sa.Column('created_at', sa.DateTime(), nullable=False),
    sa.Column('compressed_csv', sa.LargeBinary(), nullable=True),
    sa.ForeignKeyConstraint(['list_type_id'], ['list_type.id'], ),
    sa.PrimaryKeyConstraint('id')
    )
    ### end Alembic commands ###

    # ==> INSERT SEED DATA HERE <==


def downgrade():
    ### commands auto generated by Alembic - please adjust! ###
    op.drop_table('job')
    op.drop_table('list_type')
    ### end Alembic commands ###

Alembic has, as one of its operation, bulk_insert(). The documentation gives the following example (with some fixes I’ve included):

from datetime import date
from sqlalchemy.sql import table, column
from sqlalchemy import String, Integer, Date
from alembic import op

# Create an ad-hoc table to use for the insert statement.
accounts_table = table('account',
    column('id', Integer),
    column('name', String),
    column('create_date', Date)
)

op.bulk_insert(accounts_table,
    [
        {'id':1, 'name':'John Smith',
                'create_date':date(2010, 10, 5)},
        {'id':2, 'name':'Ed Williams',
                'create_date':date(2007, 5, 27)},
        {'id':3, 'name':'Wendy Jones',
                'create_date':date(2008, 8, 15)},
    ]
)

Note too that the alembic has an execute() operation, which is just like the normal execute() function in SQLAlchemy: you can run any SQL you wish, as the documentation example shows:

from sqlalchemy.sql import table, column
from sqlalchemy import String
from alembic import op

account = table('account',
    column('name', String)
)
op.execute(
    account.update().\
        where(account.c.name==op.inline_literal('account 1')).\
        values({'name':op.inline_literal('account 2')})
        )

Notice that the table that is being used to create the metadata that is used in the update statement is defined directly in the schema. This might seem like it breaks DRY (isn’t the table already defined in your application), but is actually quite necessary. If you were to try to use the table or model definition that is part of your application, you would break this migration when you make changes to your table/model in your application. Your migration scripts should be set in stone: a change to a future version of your models should not change migrations scripts. Using the application models will mean that the definitions will change depending on what version of the models you have checked out (most likely the latest). Therefore, you need the table definition to be self-contained in the migration script.

Another thing to talk about is whether you should put your seed data into a script that runs as its own command (such as using a Flask-Script command, as shown in the other answer). This can be used, but you should be careful about it. If the data you’re loading is test data, then that’s one thing. But I’ve understood “seed data” to mean data that is required for the application to work correctly. For example, if you need to set up records for “admin” and “user” in the “roles” table. This data SHOULD be inserted as part of the migrations. Remember that a script will only work with the latest version of your database, whereas a migration will work with the specific version that you are migrating to or from. If you wanted a script to load the roles info, you could need a script for every version of the database with a different schema for the “roles” table.

Also, by relying on a script, you would make it more difficult for you to run the script between migrations (say migration 3->4 requires that the seed data in the initial migration to be in the database). You now need to modify Alembic’s default way of running to run these scripts. And that’s still not ignoring the problems with the fact that these scripts would have to change over time, and who knows what version of your application you have checked out from source control.

Migrations should be limited to schema changes only, and not only that, it is important that when a migration up or down is applied that data that existed in the database from before is preserved as much as possible. Inserting seed data as part of a migration may mess up pre-existing data.

As most things with Flask, you can implement this in many ways. Adding a new command to Flask-Script is a good way to do this, in my opinion. For example:

@manager.command
def seed():
    "Add seed data to the database."
    db.session.add(...)
    db.session.commit()

So then you run:

python manager.py seed

MarkHildreth has supplied an excellent explanation of how alembic can handle this. However, the OP was specifically about how to modify a flask-migration migration script. I’m going to post an answer to that below to save people the time of having to look into alembic at all.

Warning
Miguel’s answer is accurate with respect to normal database information. That is to say, one should follow his advice and absolutely not use this approach to populate a database with “normal” rows. This approach is specifically for database rows which are required for the application to function, a kind of data which I think of as “seed” data.

OP’s script modified to seed data:

"""empty message

Revision ID: 384cfaaaa0be
Revises: None
Create Date: 2013-10-11 16:36:34.696069

"""

# revision identifiers, used by Alembic.
revision = '384cfaaaa0be'
down_revision = None

from alembic import op
import sqlalchemy as sa


def upgrade():
    ### commands auto generated by Alembic - please adjust! ###
    list_type_table = op.create_table('list_type',
    sa.Column('id', sa.Integer(), nullable=False),
    sa.Column('name', sa.String(length=80), nullable=False),
    sa.PrimaryKeyConstraint('id'),
    sa.UniqueConstraint('name')
    )
    op.create_table('job',
    sa.Column('id', sa.Integer(), nullable=False),
    sa.Column('list_type_id', sa.Integer(), nullable=False),
    sa.Column('record_count', sa.Integer(), nullable=False),
    sa.Column('status', sa.Integer(), nullable=False),
    sa.Column('sf_job_id', sa.Integer(), nullable=False),
    sa.Column('created_at', sa.DateTime(), nullable=False),
    sa.Column('compressed_csv', sa.LargeBinary(), nullable=True),
    sa.ForeignKeyConstraint(['list_type_id'], ['list_type.id'], ),
    sa.PrimaryKeyConstraint('id')
    )
    ### end Alembic commands ###


    op.bulk_insert(
        list_type_table,
        [
            {'name':'best list'},
            {'name': 'bester list'}
        ]
    )


def downgrade():
    ### commands auto generated by Alembic - please adjust! ###
    op.drop_table('job')
    op.drop_table('list_type')
    ### end Alembic commands ###

Context for those new to flask_migrate

Flask migrate generates migration scripts at migrations/versions. These scripts are run in order on a database in order to bring it up to the latest version. The OP includes an example of one of these auto-generated migration scripts. In order to add seed data, one must manually modify the appropriate auto-generated migration file. The code I have posted above is an example of that.

What changed?

Very little. You will note that in the new file I am storing the table returned from create_table for list_type in a variable called list_type_table. We then operate on that table using op.bulk_insert to create a few example rows.

You can also use Python’s faker library which may be a bit quicker as you don’t need to come up with any data yourself. One way of configuring it would be to put a method in a class that you wanted to generate data for as shown below.

from extensions import bcrypt, db

class User(db.Model):
    # this config is used by sqlalchemy to store model data in the database
    __tablename__ = 'users'

    id = db.Column(db.Integer, primary_key=True)
    name = db.Column(db.String(150))
    email = db.Column(db.String(100), unique=True)
    password = db.Column(db.String(100))

    def __init__(self, name, email, password, fav_movie):
        self.name = name
        self.email = email
        self.password = password

    @classmethod
    def seed(cls, fake):
        user = User(
            name = fake.name(),
            email = fake.email(),
            password = cls.encrypt_password(fake.password()),
        )
        user.save()

    @staticmethod
    def encrypt_password(password):
        return bcrypt.generate_password_hash(password).decode('utf-8')

    def save(self):
        db.session.add(self)
        db.session.commit()

And then implement a method that calls the seed method which could look something like this:

from faker import Faker
from users.models import User

fake = Faker()
    for _ in range(100):
        User.seed(fake)

If you prefer to have a separate function to seed your data, you could do something like this:

from alembic import op
import sqlalchemy as sa

from models import User

def upgrade():
    op.create_table('users',
        sa.Column('id', sa.Integer(), nullable=False),
        sa.Column('name', sa.String(length=80), nullable=False),
        sa.PrimaryKeyConstraint('id'),
        sa.UniqueConstraint('name')
    )

    # data seed
    seed()


def seed():
    op.bulk_insert(User.__table__,
        [
            {'name': 'user1'},
            {'name': 'user2'},
            ...
        ]
    )

This way, you don’t need to save the return of create_table into a separate variable to then pass it on to bulk_insert.


The answers/resolutions are collected from stackoverflow, are licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0 .