8.8: Implementing your database schema (Advanced Internet Programming)

I will, in fact, claim that the difference between a bad programmer and a good one is whether he considers his code or his data structures more important. Bad programmers worry about the code. Good programmers worry about data structures and their relationships.

— Linus Torvalds (2006)
comp.version-control.git, 27 July 2006

Show me your flowcharts and conceal your tables, and I shall continue to be mystified. Show me your tables, and I won’t usually need your flowcharts; they’ll be obvious.

— Frederick Brooks (1975)
The Mythical Man Month

Anxiety is understandable if you do not have much experience with databases. Fortunately, there is no need to worry. If you make a mistake, you aren’t going to hurt anyone or destroy a business. You have the freedom to make changes and evolve your database design as your project evolves.

When I build software, I like to start by designing the extreme ends of the layered architecture, before writing any code:

I sketch rough screen designs on paper (i.e., the output of the presentation layer)
I create sample data using a spreadsheet or text file (i.e., the output of the persistence layer)

These two designs interconnect: the screen designs inform the structure of data that needs to be stored, and the sample data informs what needs to be shown on the screen. Using sketches and spreadsheets makes it easy to experiment and change quickly.

From this starting point, the screen designs guide the creation of components in React/Angular. The sample data inspires the creation of the database schema and persistence layer. Together, these layers provide a framework to add the intermediate domain logic.

When implementing the database, you can use any approach that works for you. I believe that a good way to start is by creating three files:

schema.sql or schema.js
This file contains any required declarations: create table statements in SQL, db.createCollection statements in MongoDB, and any constants.
sample_data.sql or sample_data.js
This file contains statements to insert sample data used during development. When you deploy your system to production, you will not use any of this data.
drop.sql or drop.js
This file drops any of the tables or collections created by schema.sql/schema.js. Note that these statements may need to be in the reverse order to the original declarations in schema.sql.

These files give developers the flexibility to recreate or delete the database when needed, and populate the database with sample data.

These database scripts simplify version tracking in git. The script files are small, easy to read and easy to edit. They avoid the need to commit the large binary database files created by Postgres or MongoDB.

These files accelerate early development: developers can immediately create a clean database by dropping everything (drop.sql/drop.js) and rebuilding the database using the schema (schema.sql/schema.js).

When a project enters production, there are more sophisticated approaches to migrating the database for schema changes. In particular, you may use a strategy such as the following:

Including a table or collection in the database to store a value such as schema_version. You increase this number each time you change the database design. ^[1]
For each version of the schema, create three files:
1. A schema.sql or schema.js file that can recreate the schema on a blank database.
2. An upgrade.sql or upgrade.js file that modifies the previous version of the schema, and its data, to upgrade to the next version.
3. A corresponding downgrade.sql or downgrade.js, which is used in emergencies to reverse the upgrade (e.g., where the upgrade is causing errors).

Schema migration tools (such as Flyway, Liquibase, and a range of other options for npm) can automate the schema migration process. However, I do not recommend schema migration when you are building your first project.

Shopping list database

Database design is an iterative process: the design will evolve as your understanding of a problem domain improves. I’d like to give you an insight into how this could happen when developing the shopping list schema.

When I designed the shopping list schema, I began by creating a sample table that looks like this:

Description	Quantity
Chocolate bars	5
Pasta	1
Eggs	12
Milk	2

I converted the table into a simple schema:

create table shopping_item(
    description text primary key,
    quantity numeric not null
);

Using the Postgres shell (i.e., psql -h localhost postgres), I then inserted data:

insert into shopping_item
    (description, quantity)
values
    ('Chocolate bar', 5),
    ('Pasta', 1),
    ('Eggs', 12),
    ('Milk', 2);

As I started down this path, I wondered what should happen if I enter the same data twice. For example, is the following data valid?

Description	Quantity
Chocolate bars	5
Chocolate bars	15
Chocolate bars	3
Pasta	1
Pasta	1
Eggs	12
Milk	2
Milk	1

This presents a design decision with potential differences in opinion. You might say “no — the database should not allow duplicates”. I decided, in this case, to say “yes — duplicate descriptions are allowed”. Both decisions are appropriate, depending on the design objectives.

However, by deciding to allow duplicates, I needed to evolve the database schema. In a SQL database, the primary key values must be unique. Duplicate descriptions are not valid in a table where description is the primary key. Inserting duplicate values results in a database error.

I changed the database schema to look like this:

create table shopping_item(
    id serial primary key,
    description text not null,
    quantity numeric not null
);

In Postgres, the serial data type is an automatically incrementing integer.

In the future, I expect that this schema will continue to change. For example, it might be necessary to add another column for units (e.g., does “1” milk mean one 600mL carton or 1L?) or to change the type of quantity so that it can store text (e.g., “One 600mL carton”). When I make the system multi-user, I will need to add a user relation and define the relationship between users and their shopping lists.

However, right now, it doesn’t need to be perfect: it just needs to be a rough first draft to begin coding.

1. In MongoDB, you might even store this as a property of individual records in your collection.