SQL for Data Analysis: Instructor Notes

Learning objectives

Data

As for all of the Data Carpentry ecology lessons, this lesson uses the Portal Project Teaching Database. The data is available at http://dx.doi.org/10.6084/m9.figshare.1314459 and the download includes a SQLite database file (portal_mammals.sqlite) as well as three .csv files (species.csv, plots.csv, surveys.csv) that can be imported into SQLite.

Note that the figshare download is an archive (.zip) file that rudely explodes all of the files into your current directory.

Instructor’s setup notes

By default SQLite Manager opens in a separate window and it is not possible to zoom in to enlarge the font so that it is more readable, especially for students in the back rows.

The way to fix this is to:

  1. Open the SQLite Manager
  2. Click on the OptionsOptions button .
  3. Chose Start SQLite Manager: in a new tab.

You can then use Ctrl - + to zoom just like any other web page.

Motivation and Framing

See this slide deck as a sample intro for the lesson: SQL Intro Deck

Key points:

If you’ve written up a diagram of the data analysis pipeline (raw data -> clean data -> import and analyze -> results -> visualization), it can be helpful to identify that you’re now somewhere between clean data and analysis.

Lesson outline

00-sql-introduction

Tips

00-supplement-database-design.md

(optional) The first lesson includes a brief introduction to data design and choosing database systems. This material expands on the database design in the first section.

01-sql-basic-queries

02-sql-aggregation

03-sql-joins-aliases

Alternative activities

Queries on the board

As you teach the lesson, it can be helpful to pause and write up the query keywords on the board. This could look like this:

FROM table

WHERE (conditional statement, applies to row values) (AND/OR)

ORDER BY column/FUNCTION(column) (ASC/DESC)


* After 02-sql-aggregation

SELECT column FUNCTION(column) AGGREGATE_FUNCTION(column) FROM table

WHERE (conditional statement, applies to row values) (AND/OR) (IS (NOT) NULL) GROUP BY column HAVING (conditional statement, applies to group) ORDER BY column/FUNCTION(column) (ASC/DESC)


* After 03-sql-joins-aliases

SELECT column FUNCTION(column) AGGREGATE_FUNCTION(column) FROM table JOIN table ON table.col = table.col WHERE (conditional statement, applies to row values) (AND/OR) (IS (NOT) NULL) GROUP BY column HAVING (conditional statement, applies to group) ORDER BY column/FUNCTION(column) (ASC/DESC)


As a bonus, if you can leave this on the board, it translates nicely into 
the `dplyr` portion of the `R` lesson, i.e.: 

SQL: dplyr:

SELECT column select(col) FUNCTION(column) mutate(col = fcn(col)) AGGREGATE_FUNCTION(column) summarize(col = fcn(col)) FROM table JOIN table ON table.col = table.col WHERE (conditional statement, applies to row values) filter(condition) (AND/OR) (IS (NOT) NULL) is.na() GROUP BY column group_by(col) HAVING (conditional statement, applies to group) ORDER BY column/FUNCTION(column) (ASC/DESC) arrange() ~~~

“Interactive” database

If you want to try something more active (esp. if you’re teaching SQL in the afternoon!), this is a an interactive activity to try.