Python for data analysis

Python is a general purpose programming language that is useful for writing scripts to work effectively and reproducibly with data.

This is an introduction to Python designed for participants with no programming experience. This lesson is based on Data Carpentry Python for Ecologists and was adjusted for using a different dataset. The lesson can be taught in 1 to 1.5 days (~ 8-9 hours). It starts with some basic information about Python syntax, Anaconda navigator and the Spyder IDE, and moves through how to import CSV files, using the pandas package to work with data frames, how to calculate summary information from a dataframe, how to manipulate dataframes, how to automate steps with loops and functions and a brief introduction to plotting. The last episode demonstrates how to work with databases directly from Python.

Getting Started

Data Carpentry’s teaching is hands-on, so participants are encouraged to use their own computers to ensure the proper setup of tools for an efficient workflow.
These lessons assume no prior knowledge of the skills or tools.

To get started, follow the directions in the “Setup” tab to download data to your computer and follow any installation instructions.

Prerequisites

This lesson requires a working copy of Python.
To most effectively use these materials, please make sure to install everything before working through this lesson.

Schedule

Setup Download files required for the lesson
00:00 1. Before we start What is Python?
Why should I learn Python?
00:20 2. Introduction to python How can I do basic calculations in python?
How can I use functions in python?
What data types are there?
01:30 3. Starting With Data How can I import data in Python?
What is Pandas?
Why should I use Pandas to work with data?
03:00 4. Indexing, Slicing and Subsetting DataFrames in Python How can I access specific data within my data set?
How can Python and Pandas help me to analyse my data?
04:00 5. Manipulating DataFrames with pandas How can I reshape DataFrames and make tables?
Can I work with data from multiple sources?
How can I combine data from different data sets?
05:00 6. Data workflows and automation How can I automate operations in Python?
What are functions and why should I use them?
06:30 7. Making Plots With Matplotlib How can I visualize data in Python?
08:00 8. Accessing SQLite Databases Using Python & Pandas How can we query databases from within Python?
08:45 Finish

The actual schedule may vary slightly depending on the topics and exercises chosen by the instructor.