Before we start
|
python is an open source and platform independent programming language
the SciPy ecosystem for python provides the tools necessary for scientific computing
Spyder is a great IDE to code in and interact with python
with its large community it is easy to find help in the internet
|
Introduction to python
|
the console works like a fancy calculator
naming variables in python should be consistent, there are style files that can be followed
assigning a value to one variable does not change the values of other variables
functions have one or more arguments, some of them can be optional
modules increase the functionality of python
python knows numerical, text and logical data types
lists and numpy arrays are versatile data structures in python
subsetting uses square brackets and indexing starts at 0
|
Starting With Data
|
pd.read_csv() is used to import tabular data into python
methods to inspect dataframes are .dtypes, .shape(), .head(), .tail()
individual columns from a dataframe can be chosen using [‘ColumnName’]
methods for basic statistics on dataframes are .describe(), .min(), .max(), .mean(), …
.groupby() can be used to group a dataframe by categories
the datetime package offers functionality for datetime objects
|
Indexing, Slicing and Subsetting DataFrames in Python
|
use column labels in [] to access individual columns
indexing starts at 0, when choosing a range, the stop bound is one step BEYOND the row you want to select
using the = operator, like in y = x , does not create a copy of x , instead y refers to the same object as x , the .copy() method creates a true copy
use label based loc and index based iloc for subsetting rows and columns in dataframes
we can also using criteria with == , > , < , != etc. in subsetting
missing values in form of NaN can be dropped with .dropna()
.to_csv saves a dataframe as a csv file
|
Manipulating DataFrames with pandas
|
the method .pivot_table() creates pivot tables or can just be used to reshape data from long to wide format
the method .melt() brings data back to long format
the function pd.concat() can be used to concatenate/stack two DataFrames
axis = 0 will stack vertically and axis = 1 horizontally
the function pd.merge() can be used to join two DataFrames and requires joining keys
pandas can perform inner joins, the default option in merge(), left joins, right joins and full joins
|
Data workflows and automation
|
|
Making Plots With Matplotlib
|
|
Accessing SQLite Databases Using Python & Pandas
|
|