Introduction to Python
|
Python is an interpreted language
The REPL (Read-Eval-Print loop) allows rapid development and testing of code segments
Jupyter notebooks builds on the REPL concepts and allow code results and documentation to be maintained together and shared
Jupyter notebooks is a complete IDE (Integrated Development Environment)
|
Python basics
|
The Jupyter environment can be used to write code segments and display results
Data types in Python are implicit based on variable values
Basic data types are Integer, Float, String and Boolean
Lists and Dictionaries are structured data types
Arithmetic uses standard arithmetic operators, precedence can be changed using brackets
Help is available for builtin functions using the help() function further help and code examples are available online
In Jupyter you can get help on function parameters using shift+tab
Many functions are in fact methods associated with specific object types
|
Python control structures
|
Most programs will require ‘Loops’ and ‘Branching’ constructs.
The if , elif , else statements allow for branching in code.
The for and while statements allow for looping through sections of code
The programmer must provide a condition to end a while loop.
|
Creating re-usable code
|
Functions are used to create re-usable sections of code
Using parameters with functions make them more flexible
You can use functions written by others by importing the libraries containing them into your code
|
Processing data from a file
|
Reading data from files is far more common than program ‘input’ requests or hard coding values
Python provides simple means of reading from a text file and writing to a text file
Tabular data is commonly recorded in a ‘csv’ file
Text files like csv files can be thought of as being a list of strings. Each string is a complete record
You can read and write a file one record at a time
Python has builtin functions to parse (split up) records into individual tokens
|
Dates and Time
|
Date and Time functions in Python come from the datetime library, which needs to be imported
You can use format strings to have dates/times displayed in any representation you like
Internally date and times are stored in special data structures which allow you to access the component parts of dates and times
|
Processing JSON data
|
JSON is a popular data format for transferring data used by a great many Web based APIs
The JSON data format is very similar to the Python Dictionary structure.
The complex structure of a JSON document means that it cannot easily be ‘flattened’ into tabular data
We can use Python code to extract values of interest and place them in a csv file
|
Reading data from a file using Pandas
|
pandas is a Python library containing functions and data structures to assist in data analysis
pandas data structures are the Series (like a vector) and the Dataframe (like a table)
the pandas read_csv function allows you to read an entire csv file into a Dataframe
|
Extracting row and columns
|
|
Data Aggregation using Pandas
|
Summarising numerical and categorical variables is a very common requirement
Missing data can interfere with how statistical summaries are calculated
Missing data can be replaced or created depending on requirement
Summarising or aggregation can be done over single or multiple variables at the same time
|
Joining Pandas Dataframes
|
You can join pandas Dataframes in much the same way as you join tables in SQL
The concat() function can be used to concatenate two Dataframes by adding the rows of one to the other.
concat() can also combine Dataframes by columns but the merge() function is the preferred way
The merge() function is equivalent to the SQL JOIN clause. ‘left’, ‘right’ and ‘inner’ joins are all possible.
|
Wide and long data formats
|
The melt() method can be used to change from wide to long format
The pivot() method can be used to change from the long to wide format
Aggregations are best done from data in the long format.
|
Data visualisation using Matplotlib
|
Graphs can be drawn directly from pandas, but it still uses matplotlib
Different graph types have different data requirements
Graphs are created from a variety of discrete components placed on a ‘canvas’, you don’t have to use them all
Plotting multiple graphs on a single ‘canvas’ is possible
|
Accessing SQLite Databases
|
The SQLite database system is directly available from within Python
A database table and a pandas Dataframe can be considered similar structures
Using pandas to return all of the results from a query is simpler than using sqlite3 alone
|