Learning SQL the Badass Way (2)

Some vocabulary

This is the second blog post from “Learn SQL the badass way”. I outlined the scope of the blog in the first blog entry where I explained why I set the scope to R users and people with some programming languages. You may not find what you are looking for if you are not familiar with R or if you do not have any other experience to work with data, because I focus on the data management part in this blog.

Before we get in touch with new SQL commands, we have to learn some SQL vocabulary. As in other languages, we have to learn some basic vocabulary to advance our SQL skills. Let’s say we have a small collection of tables about books. In the author table we store information about the book author(e.g. first name, second name, etc); in the book table contains information about each book (e.g. genre etc); and the sales table summarizes the sales data for each book. How do we manage all of these tables and the dependencies in SQL?

Sometimes you encounter a diagram to display how each table is related to each other and we can think of all the independent tables as a collection of tables and we have to figure out how they are related. A entity relationship diagram (ERD) will help you to see the relation of each table, it displays the collection of entities and attributes.

In the SQL world, entities are independent objects. For instance, the book table is a independent object because it exits next to other entities of our collection. Entities have attributes or properties. For instance, the book table contains title, subtitle, book id and more attributes. Thus, entities refer to tables (or in my world data frame) of our collection and attributes refer to columns, or I would say variables, in the table.

Furthermore, we can differentiate between primary and foreign keys in the tables:

  • Primary key: Is a unique indicator that helps us to match tables (e.g. a unique author ID)
  • Foreign keys: Is a primary key that is defined in other table to create a link between tables (Book ID in Book table and the sales table)

We may also differentiate what information a attribute stores. Some common data formats are:

  • Char (for characters)
  • Varchar (for variable character length)
  • Numeric (Integer, time)

Types of SQL Statements

In the SQL world, we can ultimately distinguish between the data definition and manipulation language:

  • Data Definition Language (DDL):
    • Commands from the DDL are used to define, change, or drop tables (database)
    • SQL examples: Create, Alter, Truncate, Drop
  • Data Manipulation Language (DML):
    • DML is used to read and modify data in tables
    • Those operations are sometimes named as CRUD operations and we learned them in the last blog: Create, Read, Update, and Delete rows in a table
    • SQL examples: INSERT, SELECT, UPDATE, DELETE

Now, let’s put some of these concepts into practice:

CREATE

  • You can create new tables with CREATE TABLE command. It works in three steps. You have to provide a name for your table, each column needs a name, and you have to specify which kind of information will be stored (e.g. numerical values, characters) in the column
  • The following command creates a toy table with for petsales with five variables:
CREATE TABLE PETSALE (
    ID INTEGER NOT NULL,
    PET CHAR(20),
    SALEPRICE DECIMAL(6,2),
    PROFIT DECIMAL(6,2),
    SALEDATE DATE
    );

As the output illustrates, we can add options to create the table:

  • The ID variable is an integer that does not accept zero, or in other words: NOT NULL
  • The column PET is generated to store to character string
  • The column SALEDATE stores dates
  • And we could also set a primary key with the clause: PRIMARY KEY
  • A second example
CREATE TABLE PET (
    ID INTEGER NOT NULL,
    ANIMAL VARCHAR(20),
    QUANTITY INTEGER
    );
  • So far, both tables do not contain any values. With INSERT INTO, we fill the table with corresponding values:
INSERT INTO PETSALE VALUES
    (1,'Cat',450.09,100.47,'2018-05-29'),
    (2,'Dog',666.66,150.76,'2018-06-01'),
    (3,'Parrot',50.00,8.9,'2018-06-04'),
    (4,'Hamster',60.60,12,'2018-06-11'),
    (5,'Goldfish',48.48,3.5,'2018-06-14');
  • And for the second table:
INSERT INTO PET VALUES
    (1,'Cat',3),
    (2,'Dog',4),
    (3,'Hamster',2);

As we learned in the last session, we can use SELECT to check whether it worked:

SELECT * FROM PET;
Table 1: 3 records
ID ANIMAL QUANTITY
1 Cat 3
2 Dog 4
3 Hamster 2

ALTER TABLE

  • We use the ALTER TABLE statement to add, delete, or modify columns. For instance:
  • ADD COLUMN, DROP COLUMN; ALTER COLUMN; RENAME COLUMN

First: ADD COLUMN

ALTER TABLE PETSALE
ADD COLUMN QUANTITY INTEGER;
SELECT * FROM PETSALE;
Table 2: 5 records
ID PET SALEPRICE PROFIT SALEDATE QUANTITY
1 Cat 450.09 100.47 2018-05-29 NA
2 Dog 666.66 150.76 2018-06-01 NA
3 Parrot 50.00 8.90 2018-06-04 NA
4 Hamster 60.60 12.00 2018-06-11 NA
5 Goldfish 48.48 3.50 2018-06-14 NA
  • Again, fill in your values
UPDATE PETSALE SET QUANTITY = 9 WHERE ID = 1;
UPDATE PETSALE SET QUANTITY = 24 WHERE ID = 5;
  • Check whether it worked
SELECT * FROM PETSALE;
Table 3: 5 records
ID PET SALEPRICE PROFIT SALEDATE QUANTITY
1 Cat 450.09 100.47 2018-05-29 9
2 Dog 666.66 150.76 2018-06-01 NA
3 Parrot 50.00 8.90 2018-06-04 NA
4 Hamster 60.60 12.00 2018-06-11 NA
5 Goldfish 48.48 3.50 2018-06-14 24

Second: DROP COLUMN

ALTER TABLE PETSALE
DROP COLUMN PROFIT;
SELECT * FROM PETSALE;
Table 4: 5 records
ID PET SALEPRICE SALEDATE QUANTITY
1 Cat 450.09 2018-05-29 9
2 Dog 666.66 2018-06-01 NA
3 Parrot 50.00 2018-06-04 NA
4 Hamster 60.60 2018-06-11 NA
5 Goldfish 48.48 2018-06-14 24

Third: ALTER COLUMN

  • We can change the data type, for instance, to increase the length of a character variable to VARCHAR(20) with ALTER COLUMN
ALTER TABLE PETSALE
ALTER COLUMN PET SET DATA TYPE VARCHAR(20);
SELECT * FROM PETSALE;
Table 5: 5 records
ID PET SALEPRICE SALEDATE QUANTITY
1 Cat 450.09 2018-05-29 9
2 Dog 666.66 2018-06-01 NA
3 Parrot 50.00 2018-06-04 NA
4 Hamster 60.60 2018-06-11 NA
5 Goldfish 48.48 2018-06-14 24

Forth: RENAME COLUMN

  • Use RENAME COLUMN to change to a new name:
ALTER TABLE PETSALE
RENAME COLUMN PET TO ANIMAL;
SELECT * FROM PETSALE;
Table 6: 5 records
ID ANIMAL SALEPRICE SALEDATE QUANTITY
1 Cat 450.09 2018-05-29 9
2 Dog 666.66 2018-06-01 NA
3 Parrot 50.00 2018-06-04 NA
4 Hamster 60.60 2018-06-11 NA
5 Goldfish 48.48 2018-06-14 24

Truncate

  • The TRUNCATE statement will remove all(!) rows from an existing table, just like the one we created in the beginning, however, it does not delete the table itself.
TRUNCATE TABLE PET IMMEDIATE;

Caution: DROP TABLE tablename; drops the entire table!

DROP TABLE PETSALE;
Edgar Treischl
Edgar Treischl
Senior Research Fellow

My research interests include quantitative methods, evaluation, causality and so much more. matter.

Related