CS 312 - Practical DB II

Goals

Learn how to make multiple tables in the database
Learn about implementing join tables and querying them
Learn a little more about Promises and asynchronous code

Background

In the database lecture, I talked about various data relationships, but in our database practical, we used a very basic data model that only had a single table. As the real strength of using a relational database in the first place is being able to do data modeling, I wanted you to have an example that went a little farther.

Prerequisites

Visit the Practical DB II page on Replit (or click the 'Start Project' link for Practical DB II on our team page).
Click through to the GitHub classroom assignment to create your private repository.
Return to the assignment on Replit and go to the Version Control panel.
Click the button to create a new git repository (not the one for connecting to an existing repo) OR type git init in the shell.
Commit the current state of the practical to your repository.
Open the shell and follow the instructions on your GitHub repository for connecting an existing repository:
1. git remote add origin repo-name where repo-name is the name of your repository (e.g., https://github.com/csci312-s21/practical-dbii-ChristopherPAndrews.git)
2. git branch -m main
3. git push -u origin main

Overview

In this practical, we are going to add genres to FilmExplorer. The film objects that we have been passing around have a property called genre_ids, which is an array of numbers. These correspond to a list of genres available through the MovieDB's API. This list is pretty small, and we could just include a lookup table in the client and look up the genres when we display the films. However, in the interest of giving you a data modeling example, we are going to store them in the database.

Data modeling

There is a many-to-many relationship between films and genres: films can have several genres and a genre can contain several films. We will model this in the database with three tables:

Film: this table contains all of the film properties we are used to like id, title, etc... it will not contain any genre information
Genre: this table contains genre ids and their related names
FilmGenre: this table is the join table that ties the other two together. It has two foreign keys that refer to the id fields of the two other tables.

When we pull films out of the database, we will denormalize the data. In other words, we will combine these three tables together to produce film objects that include a list of the genre names associated with the film.

Set up the database

The first step will be the same first step we took before -- invoke npx knex init to create the knexfile.js.

Update the knexfile.js to look like this:


module.exports = {
  development: {
    client: "sqlite3",
    connection: {
      filename: "./filmexplorer.sqlite3",
    },
    useNullAsDefault: true,
  },

  test: {
    client: "sqlite3",
    connection: ":memory:",
    useNullAsDefault: true,
  },

  production: {
    client: "pg",
    connection: process.env.DATABASE_URL,
    migrations: {
      directory: "./migrations"
    },
    seeds: {
      directory: "./seeds"
    },
    ssl: {
      require: true,
      rejectUnauthorized: false
    }
  },
};

Note that this is virtually identical to what we had before, except that we do not have a special directive for handling the seeding for tests (we will just stick with the same seed as we use for development).

The migration

The next step is to create the migration with npx knex migrate:make films. Recall that the purpose of this step is to specify the schema for our tables. Here is the up for the migration:

exports.up = function(knex) {
    return knex.schema.createTable("Film", table => {
        table.integer("id").unique().notNullable();
        table.string("title").unique().notNullable();
        table.text("overview");
        table.string("poster_path").notNullable();
        table.float("vote_average");
        table.integer("rating");
        table.string("release_date");
      })
      .createTable("Genre", table=>{
        table.integer("id").unique().notNullable();
        table.string("name").unique().notNullable();
      })
      .createTable("FilmGenre", table =>{
        table.integer("film_id");
        table.integer("genre_id");
        table.foreign("film_id").references("Film.id").onDelete("CASCADE");
        table.foreign("genre_id").references("Genre.id").onDelete("CASCADE");
      });
};

There are a couple of things worth noting in here. First, we are specifying all three tables in the same migration. Migrations should be organized around functionality in the project. We can create multiple tables by chaining createTable commands together.

The second thing to look at is how we create foreign keys in knex. Note that it takes two directives: one creates the actual column, and the second sets it up as a foreign key. We use references to provide the actual connection. Notice how we use the form TableName.propertyName to make it clear which column in the database we are talking about. I have also included an onDelete directive, which tells the database how deletions from the referenced tables should be handled. I have specified that deletes should "CASCADE", which means that when, for example, a film is deleted, all records in FilmGenre that reference that film should also be deleted.

Of course, we can't forget the down which removes all of the modifications made by this migration. Follow the pattern in the last DB practical and use dropTableIfExists to remove the tables. Like createTable, dropTableIfExists calls can be chained together so you can delete all three tables.

You can now run the migration with npx knex migrate:latest

Seeding the database

Create a new seed file with npx knex seed:make load-films. Open up the new file and add const fs = require("fs"); at the top.

In the previous practical, we just read in the JSON file of data and dumped it right into the database. We could get away with the because the structure of the objects stored in the JSON file was an exact match for the schema of the Article table in the database. We can do the same thing for the Genre table.

Read in the file:

const genreContents = fs.readFileSync("./data/genres.json");

Parse the JSON to get an array of objects:

const genres = JSON.parse(genreContents);

Insert the collection into the genre table (clearing it out first):

await knex("Genre").del();
await knex.batchInsert("Genre", genres, 100);

The Film table is a little harder. If you look closely, you will also find that the schema we just specified for the Film table doesn't match the JSON file structure -- there are a number of properties that we don't need, so I left them off.

Follow the same pattern: read in the films.json file and parse the JSON.

Before we can put it in the database, however, we are going to change the structure. Use map to create a new array of film objects that only have id, title, overview, poster_path, vote_average, release_date, and rating properties. The original films don't have ratings, so you can just set the value of this property to 0.

Once you have the data marshalled into this new form, you can follow the delete and then batch insert pattern to load the films.

The final step is to load the join table. To be able to batch load, we need to have a JavaScript structure that looks like an array of objects, with each object having a film_id property and a genre_id property.

[{film_id:324857, genre_id: 28}, 
{film_id:324857, genre_id: 12}, 
{film_id:324857, genre_id: 16}, 
{film_id:324857, genre_id: 878}, 
...etc]

Create a new variable called genreMap and set it equal to an empty array.

Use forEach to iterate over the films (the unmodified ones that still have a genre_ids property). For each film, iterate over genre_ids and for each one, push a new object with the film id and genre id into genreMap.

Once you are done, use the delete and batch insert pattern one more time.

Now you can run the seed file to load the database with npx knex seed:run.

The more alert among you will have noticed that some snake_case naming has slithered into the code. The naming format comes from the JSON files I pulled down from the MovieDB. For pragmatic reasons I didn't "clean" the data files (I wanted to keep the barrier to updating the data later low). If this was a real application that solely relies on the database, I would have included a transformation to camelCase as part of the seeding and migration process. That would be an example of "bumping the lamp".

Library functions

Now that the database is configured, we can write our accessor functions. I've re-written the routes to make use of three functions: getFilm, getAllFilms, and updateFilmRating. We will deal with each of these in turn.

Get a single film

The query we need for this is pretty much the same a the one we used in Simplepedia to get a single article.

const [film] = await knex("Film").select().where({id:id});
if (film){
    return film;
  }else{
    return null;
  }

The problem here, of course, is that the film doesn't have its genre information yet. There is a way to write a query to do the double join necessary to assemble all of that information together, but to get the genres in a nice array structure requires a non-standard aggregation function (a function that combine multiple rows into one).

In SQLite, that query looks like this:

SELECT film.*, group_concat(name, ", ") 
FROM film 
JOIN filmgenre ON film.id=filmgenre.film_id 
JOIN genre ON genre.id=filmgenre.genre_id 
GROUP BY title ;

Unfortunately, group_concat doesn't exist in Postgres, so if we used this and tried to deploy our code on Heroku with a Postgres database, the query would break. So, we will use a two step process and assemble our final objects in JavaScript.

Fetching genres

You will note that there is an extra function in backend-utils.js called getGenres(id). As you might imagine, it will return a list of the genre names associated with a particular film.

This will be our first JOIN operation as we join together Genre and FilmGenre. A JOIN operation essentially makes a combination table that has all of the columns of both tables. We can add a condition in to specify how to combine the rows from the two tables together. In this case, we want rows to be married when Genre.id is equal to FilmGenre.genre_id.

On top of this, we only want the rows where film_id is equal to id, and we only want the list of genre names.

Putting this all together, we get a query that looks like this:

const genres = await knex.select( "name")
    .from("FilmGenre")
    .join("Genre", "Genre.id", "FilmGenre.genre_id")
    .where({"film_id":id});

This should give us the list of genres that we want. Unfortunately, it has the same issue we had previously when we extracted sections for Simplepedia. The objects in the array that this returns are of the form {name:"Science Fiction"}, but we just want the simple strings.

So, for the last step in this function, use map to distill each object down to a simple string.

When you think you have this, try running the tests to make sure this function is correct (you may want to use test.only to isolate the genres test).

When the test passes, use the function to add the genres to the new film object in getFilm. You should now pass a few more tests. If you run the dev server, you will also be able to access the /api/film/:id endpoint with your browser.

Fetching all films

Fetching all of the films is even more straightforward. It is the same query as fetching the single film, but without the condition:

const films = await knex("Film").select();

Things get more interesting when we try to add the genres in, however.

Here is one approach that doesn't work:

films.forEach(async (film)=>{
  film.genres = await getGenres(id);
});

return films;

It seems really promising. This looks like the logical extension of adding genres to a single film. But it really doesn't work.

The problem is the asynchronous code. The await in there means that we will properly wait for getGenres to complete, BUT only within the asynchronous function that contains the call. The forEach DOES NOT wait. It sees that the function returns a Promise and moves on to the next item in the list. All of those assignment statements are all queued up to happen in the future, but the forEach just cranks on through all of the items in the list, and then the films array is returned BEFORE any of those assignments are made.

There are two solutions to this problem. The simple solution is to use a conventional for loop. Without the extra functions, it will be the execution of getAllFilms that waits, and you will get all of the updates before returning the array. You are welcome to use this approach. If asynchronous code is making you want to cover your ears and sing "la la la" at the top of your voice, use this approach and skip down to "Regardless...".

There is a second way that leans into the asynchronous approach. The async and await syntax is just a slightly nicer way to deal with Promises. We can treat an async function exactly the same as a function that returns a Promise. If we went back to our first attempt using forEach, and swapped the forEach out for map, we would end up with an array of Promises.

An array of Promises may not sound very useful, but we have a function called Promise.all() that is designed for dealing with them. Promise.all() takes in an array of Promises and returns a Promise that resolve when all of the Promises have resolved (or one of them is rejected).

Since this just returns a Promise, we can use await to wait for it to resolve:

await Promise.all(
  films.map(async (film)=>{
    film.genres = await getGenres(film.id);
  }));

The cool thing about this approach is since we aren't waiting for each Promise to resolve in turn, they can complete out of turn if one of them gets hung up a bit of I/O or some other blocking process.

Regardless of which approach you use, you should be able to return the films collection now. Run the tests and you should pass a few more of them. You should also be able to see the entire film collection at /api/films, and you should be able to see the genre list. Even better, you can fire up the main web page and you should see all of the films (with genre list!).

Updating ratings

After all of that, updating the film ratings is a bit of an anti-climax. This is virtually identical to how we updated the articles in Simplepedia. The big difference is that we are only updating the rating property. This is primarily because our application doesn't alter any other parts of films, but also to avoid having to subtract the genres property from the film objects before sending them back to the database.

const count = await knex("Film").where({id:id}).update({rating:rating});

return (count === 1);

With this in place, all of the tests should now pass.

Finishing Up

Add and commit your changes to Github.
Submit your repository to Gradescope

I will expect that this:

Passes all tests
Passes all ESLint checks

Last updated 05/07/2021