coursera-data-engineering

Interacting With a Relational Database Using SQL

As a Data Engineer, you want to help the organization find valuable insights through data. To do so, you should learn the basics of querying and manipulating data with SQL.

In this lab, you will be working with a DVD rental sample relational database where you will find information about stores, customers, and rented films. Below you will find the diagram showing the relationships between the tables, and a brief description of each table.

The purpose of this lab is to use SQL language to query the database in order to answer business questions. As you complete each exercise, you should check the output you get from each exercise cell block with the expected output. If you get stuck, you can always check the solution notebook.

To open the solution notebook, follow these steps:

Table of Contents

1 - Database

You will be working with a modified version of the Sakila Sample Database, which is licensed under the New BSD license.

For learning purposes, let’s assume that the data belongs to Rentio, which is a fictitious company dedicated to renting movies to clients from all around the world.

In the database, you will be able to find the data of the stores and the staff who work in them as well as their addresses. Each store manages its inventory, so when a store receives a new DVD, information about the film, category, language and actors is inserted into the database. Also, every time a new customer rents a film, the customer’s basic information is inserted into the database along with their address. Additionally, a rental is added as soon as a transaction occurs with information about inventory, film, and paying customers.

Throughout the exercises in this lab, you will work with the various tables in this database, helping the business understand the most rented films by different attributes.

Rentio’s transactional database includes the following tables.

Here you can find the entity-relationship model (ERM) of the transactional database showing all the tables and how they are related:

rentio-database-erd

Note: As you’re scrolling down to do the SQL exercises, you can split this editor into two to keep the database diagram next to you.

2 - Running SQL Commands in a Notebook

In this lab, you will leverage the use of SQL “magic” (%sql or %%sql) offered by the ipython-sql extension, which allows you to connect to a database and issue SQL commands interactively within a JupyterLab notebook. JupyterLab defines “magic” as special commands prefixed with %. Here, you’ll employ the load_ext magic to load the ipython-sql extension. Load the SQL module:

%load_ext sql

The provided magic command loads the ipython-sql extension, enabling connection to databases supported by SQLAlchemy. Before you start issuing SQL queries to the database, you need first to establish a connection to the MYSQL database. For that, you need information such as the database username, password, hostname (or address), port number and the database name. This information is provided to you and stored in the environmental variables defined in the src/env file. Run the following cell to load the values for these variables, and establish a connection to the MYSQL database.

import os 
import socket

from dotenv import load_dotenv

load_dotenv('./src/env', override=True)

DBHOST = socket.gethostname()
DBPORT = os.getenv('DBPORT')
DBNAME = os.getenv('DBNAME')
DBUSER = os.getenv('DBUSER')
DBPASSWORD = os.getenv('DBPASSWORD')

connection_url = f'mysql+pymysql://{DBUSER}:{DBPASSWORD}@{DBHOST}:{DBPORT}/{DBNAME}'

%sql {connection_url}

3 - Create, Read, Update, and Delete (CRUD) Operations

CRUD stands for Create, Read, Update, and Delete, which are basic operations for manipulating data. When we talk about databases, we use INSERT INTO, SELECT, UPDATE, and DELETE statements respectively to refer to CRUD operations.

3.1 - CREATE TABLE

Before using the statements for CRUD operations, you will see the CREATE TABLE statement which is used to create a new table in a database. You must specify the name of the columns, and the data type for each column. You can check the full list of data types here.

CREATE TABLE table_name (
    column1 datatype,
    column2 datatype,
    column3 datatype,
   ...
);

Exercise 1

Write a SQL query to create a replica of the category table called category_copy. Use these columns:

column name data type
category_id INTEGER
name VARCHAR(25)
last_update TIMESTAMP

VARCHAR(25) means a string of variable character length that can contain up to 25 characters in it.

Note: You will be using the magic command %%sql to allow multiline query syntax.

Note: Replace all None in the provided code.

%%sql 
CREATE TABLE category_copy (
    category_id INTEGER,
    name VARCHAR(25),
    last_update TIMESTAMP
);

3.2 - SELECT

The SELECT statement is used to get data from a database. It also goes along with the FROM clause to indicate the table you want to query.

You could specify the columns of the table you want to retrieve from the query by listing each one as follows:

SELECT
    column1,
    column2,
    column3,
    ...
FROM table_name;

What’s more, you could use * to get all the columns from the table:

SELECT
    *
FROM table_name;

The LIMIT clause is used to limit the number of rows the query is returning.

SELECT
    *
FROM table1
LIMIT 1;

Exercise 2

Write a SQL query to retrieve the title, length, and release year of the film table. Limit output to only 4 records.

%%sql
SELECT title, length, release_year
FROM film
LIMIT 4;
Expected Output

Note: The order and actual values in the records may change.

title length release_year
ACADEMY DINOSAUR 86 2006
ACE GOLDFINGER 48 2006
ADAPTATION HOLES 50 2006
AFFAIR PREJUDICE 117 2006

Exercise 3

Write an SQL query to get all the columns of the store table.

%%sql
SELECT *
FROM store;
Expected Output

Note: The order and some details of the records may change.

store_id manager_staff_id address_id last_update
1 1 1 2006-02-15 09:57:12.000
2 2 2 2006-02-15 09:57:12.000

3.3 - WHERE

The WHERE clause is used to filter data based on a condition. In the end, the query will return the rows which satisfy the condition.

SELECT
    *
FROM table_name
WHERE column1 = 'value1';

Exercise 4

Write an SQL query to retrieve the first name, last name, and email of each active staff member from the staff table.

%%sql
SELECT first_name, last_name, email
FROM staff
WHERE active;
Expected Output

Note: The order of the records may change.

first_name last_name email
Mike Hillyer Mike.Hillyer@sakilastaff.com
Jon Stephens Jon.Stephens@sakilastaff.com

3.4 - INSERT INTO

The INSERT INTO statement is used to insert new rows in a table.

You could insert new rows without specifying some columns, but you will have to write the column names and values that you want to insert. That’s useful when some columns are filled automatically by the default value of the column or when the column is of SERIAL data type.

INSERT INTO table_name (
  column1,
  column2,
  column3,
  ...
)
VALUES (
  'value1',
  'value2',
  'value3',
  ...
);

If you are adding a row to a table and will be specifying all the column values, then you don’t need to specify the column names and can just specify the values to be inserted.

INSERT INTO table_name
VALUES (
  'value1',
  'value2',
  'value3',
  ...
);

Exercise 5

Write an SQL query to insert the following rows to the category_copy table:

category_id name last_update
1 Horror 2006-02-15 09:46:27.000
10 Animation 2006-02-15 09:46:27.000
20 Pop 2006-02-15 09:46:27.000
%%sql
INSERT INTO category_copy 
VALUES 
(1, 'Horror', '2006-02-15 09:46:27.000'),
(10, 'Animation', '2006-02-15 09:46:27.000'),
(20, 'Pop', '2006-02-15 09:46:27.000');

Execute the SELECT statement on the table to check that the values were successfully added:

%%sql
SELECT *
FROM category_copy;
Expected Output

Note: The order of the records may change.

category_id name last_update
1 Horror 2006-02-15 09:46:27.000
10 Animation 2006-02-15 09:46:27.000
20 Pop 2006-02-15 09:46:27.000

3.5 - UPDATE

The UPDATE statement is used to change the values of some columns on existing rows in a table. You could use the WHERE clause to filter the rows you want to change.

UPDATE table_name
SET
  column2 = 'value2',
  column3 = 'value3',
...
WHERE column1 = 'value1';

Exercise 6

Write an SQL query to perform the following changes in the category_copy table:

You can add more cells for each query if you want.

%%sql
UPDATE category_copy 
SET last_update = '2020-09-12 08:00:00.000';

UPDATE category_copy 
SET category_id = 2
WHERE name = 'Animation';

UPDATE category_copy 
SET name = 'Action'
WHERE category_id = 1;

Execute the SELECT statement on the table to check that the values were successfully updated:

%%sql
SELECT *
FROM category_copy;
Expected Output

Note: The order of the records may change.

category_id name last_update
1 Action 2020-09-12 08:00:00.000
2 Animation 2020-09-12 08:00:00.000
20 Pop 2020-09-12 08:00:00.000

3.6 - DELETE

The DELETE statement is used to delete existing rows in a table. It also goes along with the FROM clause to indicate the table where you want to delete the rows. You could use the WHERE clause to filter the rows you want to delete.

You have to be careful because you will delete all rows of a table if you don’t specify a condition:

DELETE FROM table_name;

That’s why you should add a condition unless you want to delete all rows:

DELETE FROM table_name
WHERE column1 = 'value1';

Exercise 7

Write an SQL query to delete the row where the category_id is 20 from the category_copy table.

%%sql
DELETE FROM category_copy 
WHERE category_id = 20;

Execute the SELECT statement on the table to check that the row were successfully deleted:

%%sql
SELECT *
FROM category_copy;
Expected Output

Note: The order of the records may change.

category_id name last_update
1 Action 2020-09-12 08:00:00.000
2 Animation 2020-09-12 08:00:00.000

4 - SQL Clauses

In the next sections, you will see how to create more complex SQL queries to read data from a database using the most common clauses.

4.1 - ALIASES

Aliases temporarily change the name of a column. They allow you to use and display a more readable name for the columns. To create an alias you could use the keywords AS next to a column to change its name. It is optional and mainly serves to improve clarity.

SELECT
    column1 AS Alias_Column_1,
    column2 AS Alias_Column_2,
    column3 AS Alias_Column_3,
    ...
FROM table_name;

Exercise 8

Write an SQL query to obtain the title, length, and release year of the film table. Change column names to have film_ as a prefix.

%%sql
SELECT 
    title AS film_title, 
    length AS film_length,
    release_year AS film_release_year
FROM film
LIMIT 10;
Expected Output

Note: Not all of the records are shown here. Their order may change.

film_title film_length film_release_year
ACADEMY DINOSAUR 86 2006
ACE GOLDFINGER 48 2006
ADAPTATION HOLES 50 2006
AFFAIR PREJUDICE 117 2006

4.2 - JOIN

The JOIN clause is used to combine the data from multiple tables based on a shared column between the tables.

SELECT
    *
FROM table1
JOIN table2 ON table1.column1 = table2.column2;

By default, the JOIN clause is equivalent to INNER JOIN, which combines rows from two tables based on a related column between them, returning only the rows with matching values in the specified column in both tables. There are also other types of joins:

Exercise 9

Write an SQL query to get the city, address and district for each address in the table address.

%%sql
SELECT city, address, district
FROM address
JOIN city ON address.city_id = city.city_id
LIMIT 5;
Expected Output

Note: Not all of the records are shown here. Their order and values may change.

city address district
Abha 733 Mandaluyong Place Asir
Abu Dhabi 535 Ahmadnagar Manor Abu Dhabi

Exercise 10

Write an SQL query to obtain the country, city, and address of the active staff members.

%%sql
SELECT country.country, city.city, address.address
FROM address
JOIN staff ON staff.address_id = address.address_id
JOIN city ON address.city_id = city.city_id
JOIN country ON city.country_id = country.country_id
WHERE staff.active;

Note: The order and actual values in the records may change.

country city address
Canada Lethbridge 23 Workhaven Lane
Australia Woodridge 1411 Lillydale Drive

4.3 - GROUP BY

The GROUP BY SQL statement groups rows that have the same values in specified columns into summary rows. It is often used with aggregate functions (COUNT, MAX, MIN, SUM, AVG) to perform calculations on each group of data. So you can use it to help answer questions like “What is the average film length by film release year?”.

SELECT
    column1,
    COUNT(column2),
    SUM(column3)
FROM table1
GROUP BY column1;

Exercise 11

Write an SQL query to retrieve the number of films by rating in the film table.

%%sql
SELECT rating, COUNT(film_id) AS films_count
FROM film
GROUP BY rating;

Note: The order and actual values in the records may change.

rating films_count
PG 194
G 178
NC-17 210
PG-13 223
R 195

4.4 - ORDER BY

The ORDER BY clause is used to sort the rows in ascending or descending order based on one or more columns. By default, this clause will sort the rows in ascending order, but you could use the DESC keyword to order the rows in descending order.

SELECT
    *
FROM table1
ORDER BY column1 DESC;

Exercise 12

Write an SQL query to get the number of films by category. Sort the results by the number of films in ascending order.

%%sql
SELECT category.name AS category, COUNT(film.film_id) AS films_count
FROM film
JOIN film_category ON film.film_id = film_category.film_id
JOIN category ON film_category.category_id = category.category_id
GROUP BY category.name
ORDER BY films_count;

Note: Not all of the records are shown here. Their order and values may change.

category films_count
Music 51
Horror 56
Classics 57

During this lab, you’ve written several SQL queries to manipulate data from the DVD rental sample database. First, you created queries to gather general information relevant to the business. Then, you built queries using more complex clauses to perform data transformation while combining multiple tables. Finally, you performed other basic CRUD operations. Congratulations! You now have the basic knowledge to process data using SQL.

References

[1] SQL Tutorial, W3schools, 2022. [Online]. Available: https://www.w3schools.com/sql/. [Accessed: 07- Mar- 2022]