1.6. Digging into Py(thon)¶

Hopefully this goes better than Chrissy Teigen’s experience:

1.6.1. Tutorials¶

You can’t learn programmatic material during class sessions, try though I might to make it possible. You can only learn through practice. You should be checking out tutorials and lessons online in your free time.

Two great options:

Codeacademy is great. You can probably blast through the key lessons before a free trial expires (currently=7 days).
Go through #3 to #14 of A Whirlwind Tour of Python.

As you follow either of those, I would put the code you write inside the /codebook/ folder inside your Class Notes repo.

You can call the file(s) whatever you want. (If you want a suggestion for a filename, “Cheatsheet”, “Whirlwind Cheatsheet”, or “Codeacademy Cheatsheet” make sense.)
Our resources page has a python cheatsheet you can download.

Do you prefer to learn through games?

(That’s how I learned python! I built solvers for Sudoku and the Cracker Barrel golf tee game… Both taught me a LOT about programming in python, problem solving strategies, and data structures.)
Edabit has a bunch of games. If you log in, you can search for python challenges that take from 1 minute to … longer… For example, the Museum of Dull Things. If you find any games illuminating, please let me know via the class discussions repo!

1.6.2. Python essentials¶

Ok, now we are going to live code a bunch together. I want you to get comfortable typing commands yourself rather than copy-pasting. This is slightly more painful in the beginning, but much better payoff in the long-run.

Tip

Copy all of these code blocks to your own notebook file inside your notes folder and try to run them. Make sure Jupyter Lab calls the cells “code” cells instead of markdown cells, and then use CTRL + ENTER to run them.

HINT: You can copy code cells on this website by clicking the “copy” symbol in the upper-right corner of code blocks. If you want to copy chunks of files that includes sections of Markdown, watch this video.

1.6.2.1. Comments¶

In python code blocks, the “#” character tells python to ignore the rest of the line.

Leaving GOOD comments in the code is important! Good, smart code tries to reduce the use of comments by writing code so obvious that it is “self-documenting” (I’ll explain why later),

But for now… you should err on the side of adding MORE comments. Why? If you need to take a break and come back at a later point, comments will help to quickly bring you up to speed if you forgot why you put in a particular line of code. Getting in the habit of using comments is smart.

You’ll become more discerning about comments as you progress. Two articles about how and when to use comments: this link and this link

1.6.2.2. Arithmetic¶

# YOU: TYPE ALL OF THESE OUT ON YOUR OTHER PARTICIPATION SHEET... YOU CAN OMIT THE COMMENTS IF YOU WANT
# YOU: TRY VARIATIONS TOO...

print(2+3) # addition
print(2-3) # subtraction
print(2/3) # division - in Python 3, division of integers (a data type) inherently returns floats (a data type)
print(type(2), type(2/3)) # see?
print(2//3, type(2//3)) # floor division returns an integer. 
# FOR YOU to try: use this to tell me how many full hours are in 7643 minutes?

print(2%3) # mod operator
print(2*3) # multiplication
print(2**3) # 2 to the power of three
print(2^3) # ^ is NOT the power operator!!! it is a 'bit' operator - you don't need to know this for now

int(2+3*(4+15)/3) # 1. PEMDAS applies 
                  # 2. If the last command in a cell return an *object*, jupyter auto prints it w/o needing print()
                  # 3. this should be a float (21.0), but you can convert a float to an int with the int() function

1.6.2.3. Parentheses - Grouping and Calling¶

As the above example shows, parentheses are for grouping ((4+15)/3 forces addition before division) and calling a function (e.g. print() means the print function is called on the inputs inside the parentheses).

1.6.2.4. Logic and comparisons¶

The comparison operators are == (equals), != (Not equal), > (greater than), >= (equal or greater than), <, and <=. Each of these prompts Python to evaluate the truth of the comparison and return True or False.

True and False are booleans, meaning True is equal to 1, and False is equal to 0.

# YOU: TYPE ALL OF THESE OUT ON YOUR OTHER PARTICIPATION SHEET... YOU CAN OMIT THE COMMENTS IF YOU WANT
# YOU: TRY VARIATIONS TOO...

print(3>3)            # 3 is not greater than 3, so this evaluates to...
                      # YOU: try 2 of the 3 other comparison operators
print(True == 1)      
print(type(True), int(True), type(False), int(False)) # print() can print a sequence of objects 

The logic operators are and, or, and not. They evaluate a sequence of statements and return a true or false boolean.

What does or mean? In common parlance, or usually means “Do you want A or do you want B? (pick one)”. Mathematically, or works like a dad joke - You: “Dad, are we rich or poor?” Dad: “Yes”.

# YOU: TYPE ALL OF THESE OUT ON YOUR OTHER PARTICIPATION SHEET... YOU CAN OMIT THE COMMENTS IF YOU WANT

a = True              # you assign variables by writing: VariableName = Thing. 
b = False
print(a and b)        # if both sides of *and* are true, the whole thing is
print(a or b)         # if either side of *or* is true, the whole thing is
print(a and not b)    # *not* negates what is after it
print(not a or not b) # "not b" is true, so the whole thing is true

The membership operators in and not in check whether the left object is or is not in the object on the right side.

# try these... what do you get?
a=3
b=[1,2,3]
print(a in b)
print(a not in b)
print(b in a)
print(b not in a)

The identity operators is and is not check whether the left side and the right side are the same object.

WARNING: is and == are NOT the same!!!* Here is an example borrowed from G4G.

list1 = [] 
list2 = [] 
list3=list1 
print(list1 == list2)
print(list1 is list2)
print(list1 is list3)

Parentheses: You can (and certainly will at some point need to) check for the truth of statements involving many variables, and complex logic requests. You can dictate the order Python evaluates statements. So, for example,

if (Poor and TaxRateAtOrBelowNegative10) or (MiddleClass and TaxRateAtOrBelow5) or (Rich and TaxRateBelow15):
    start_audit()

will audit rich filers if they have less than a 15% tax rate, but will only audit poor tax filers if they had a negative tax rate.

# a few silly examples

print((3>3) == False) # 1 is not greater than 2, so this evaluates to... 
print(3>3 == False)   
print((3>3) != True)

1.6.2.5. Variables are pointers¶

Read this page!

I’ll simply provide the following warning: Unless you read and understand the link above, any time you write x=y, you might be creating a secret bug in your code that will cause potentially enormous errors!

To illustrate:

x = [1, 2, 3]
print(type(x))
y = x
print(y)
x.append(4) 
print(y) # y was changed as well... Why? Read the page above!

<class 'list'>
[1, 2, 3]
[1, 2, 3, 4]

1.6.2.6. Everything is an object¶

Referring again to Whirlwind of Python,

In object-oriented programming languages like Python, an object is an entity that contains data along with associated metadata and/or functionality. In Python everything is an object, which means every entity has some metadata (called attributes) and associated functionality (called methods). These attributes and methods are accessed via the dot syntax.

So, object.method(<arguments here>) will call the function method from/on object, and the function uses whatever arguments you pass it.

Examples:

Above, the object x has the type attribute of list, and lists have a “method” called append.
In the stock prices program we show during the lectures, we imported a package: import pandas_datareader as pdr. Now, the “package” pandas_datareader is actually an “object” (which we call pdr for convenience). That object - like any object - has “method” functions. In that code, for example, I called pdr.get_data_yahoo(stocks) to download stock prices.
Seriously, EVERYTHING is an object.
- Lists are objects (duh)
- Attributes and methods of objects are themselves objects. Put type(x.append) at the end of the code block above.
- Files

1.6.2.7. Common object types¶

Boolean and int were covered above.

None. See here.

float.

Warning

Beware of comparing floating point numbers! Below is an example, and see here for the explanation.

print(1.0+2.0 == 3.0)
print(0.1+0.2 == 0.3) # FALSE?!

True
False

str. There are built-in functions that work on strings directly

a='some string' # a = "some string" is the same. 

# some functions work on strings directly
print(len(a)) 

# string types also have many functions as methods
print(a.upper())
#YOU: type a.<tab> in your notebook, and jupyter will open a list of possible functions!

1.6.2.8. Built in data structures¶

Python has list, tuple, dict, and set. Beginners typically rely on lists extensively, but as you progress, you will find that all four are extremely useful, because their unique traits solve different needs.

After class, you should absolutely read this and as you do, try the examples, and throw them into your growing personal cheat sheet.

First, let me illustrate the use of .extend() vs .append() vs + for lists:

L=[8, 5, 6, 3, 7]
L.extend([5])    # extend concatenates
L.extend([3,4])  # concatenates work the same with more elemens
L = L + [13,14]  # + concatenates
L.append(7)      # append adds its entire argument to the list as a new element. 
L.append([6])    # 7 is an int, so it goes in as an int, but [6] is a *list*, so append puts a list as the element
L.append([8,9])  # see, the last element is [8,9]
L

[8, 5, 6, 3, 7, 5, 3, 4, 13, 14, 7, [6], [8, 9]]

Now, let’s all define this vector: L=[8, 5, 6, 3, 7].

Exercises: Write code that does the following:

Returns the length.
Returns the largest element.
Returns the smallest element.
Returns the total of the vector.
Returns the first element. See this awesome answer to learn about “slicing” lists in Python. If that link is dead: https://stackoverflow.com/questions/509211/understanding-slice-notation?rq=1
Returns the last element.
Returns the first 2 elements.
Returns the last 2 elements.
Returns the odd numbered elements (i.e. [8,6,7].

I’d suggest putting what you just learned about how python indexes an object and how to slice a list into your personal cheat sheet until you have it memorized thoroughly.

1.6.2.9. For loops¶

Python loops are very intuitive:

for state in states:
    capitol=stateCapitals[state]
    print(capitol)
    print(capitol.upper())
    <you can use as many lines as you need, just keep indenting>
    <the indents are 4 spaces, or more commonly, a <tab>>
    
print(states) # <-- the for loop ends when you write a line of 
              # code (not a comment!) that is unindented 

So, for each state, Python will start the indented block of code and run each line within the code block in sequence. So if the list of states is [Alabama, Alaska, Arizona,...], Python will…

Set state = ‘Alabama’
Set capitol = ‘Montgomery’
Print ‘Montgomery’
Print ‘MONTGOMERY’
Execute the next two lines of code that I’ve “skipped above”.
At the end of the block of code, python will check if there is another element in the states vector. There is!
Set state = ‘Alaska’
Set capitol = ‘Juneau’
Print ‘Juneau’
Print ‘JUNEAU’
…
Set state = ‘Wyoming’
Set capitol = ‘Cheyenne’
Print ‘Cheyenne’
Print ‘CHEYENNE’
Is there another state? No? Ok! The for-loop is complete! Python will exit the code block and proceed. The next line of code is print(states) and so that’s the next thing it will do.

A few comments:

PYTHON AND INDENTATION

In python, indentations at the beginning of lines are not “up to the user”. Indentations indicate a “block” of code that is run as a unit.

if 7 < 5:     
    # this is false, obvi, so 
    # nothing under/inside the
    # if statement runs
    print('I am NOT here.') 
    
# the next unindented line is
# not governed by the "if"
# so it does run
print('But I am here') 

But I am here

The syntax for a for loop is for <name> in <iterable object>:. You must include the colon! After that, all lines of code within the “block” of code of the for loop are indented. See the popout to the right on indentation.
- Note: When I write anything inside <>, you should drop the “<” and “>” symbols too.
The iterator object can be anything Python can iterate through, e.g. a list. (But not just lists!) So the list above is a list of states, and note that it is descriptively named “states”.
You decide what the <name> is, and it should be something that communicates the content.
- Generally speaking, don’t name variables in Python x, y, z, vector, myvector and other uninformative names! Use informative names to make you code readable!
- If you are looping over letters, each object might be called a letter, if you are looping over stocks, each element should probably be called a stock,.. (DUH, right?)

Conversely, how you use whitespace within a line is up to you. Both of these lines of code are the same:

print(      a)

print(a)

1.6.2.10. If, elif, else¶

Syntax:

if <condition 1>:                         # you must use the colon!
     <do some stuff if condition is true>
elif <condition 2>:                       # as in "Else-If"
    <do stuff if cond 1 is false and 2 is true>
else:
    <if neither 1 or 2 are true, do this>

Comments:

You can include zero or as many elif code blocks as you want
You can omit the else block entirely
Whatever is in <condition> must evaluate to True or False or 1 or 0
See the “Logic and comparisons” section above on how Python evaluates conditions

1.6.2.11. While¶

Syntax:

while <condition is True>:
    <do some stuff>

For example:

counter = 0
while counter < 7:
    print(counter)
    counter += 1 # "+=" is short for "add to myself". 
                 # Here, it's an abbreviation for: counter = counter + 1

I have one important comment about while loops: Every time through the loop, there must be a chance for the condition to become False. If not, your code will loop forever!

We won’t use while loops in this class. But if you ever write one, and it is stuck in an infinite loop, you can stop the kernel by typing i, i. Or click the “Terminals and Kernels” tab in the left sidebar and “shutdown” next to your code’s filename.

1.6.2.12. Writing your own functions¶

Writing your own functions is important for improving the clarity of your code because it

separates different strands of logic
allows you to reuse code
prevents copy/paste errors

To write a function, write def <nameOfYourChoice>(<you can specify arguments the function takes, or none>): and then write your indented code block that is the function.

On inputs:

Any object(s) you want can be given as inputs! You can give as inputs a variable, a list, a dictionary, even a function. Remember, in python, everything is an object.
Functions can get “positional” arguments or keyword arguments. Positional arguments are understood because Python figures them out based on the order in which you provide them.

On outputs:

Any object(s) you want can be returned as outputs! Once the code executes a line starting with return, the function will end and output whatever is on that line. It can be a list, set, function, dictionary, string. It can be a dictionary with lists inside it, or a list with dictionaries inside it. Go wild if you want! (While practicing python. In practice, don’t be complex for the sake of it!)

On documentation:

Code that is poorly documented won’t be used. By you, by you in the future, or by others. So you should document it! You do this by adding line(s) immediately after the first line, as the example below shows.
The docstring can be accessed by users via <functionName>? or help(<FunctionName>) the same as any other function. In fact, this is how help is written in all Python functions we’ve used!

Example: The function below shows off positional and keyword arguments, how to write a multiline “docstring”, how the program ends once a return is executed, outputting a list, and setting default values for inputs.

def f(x, a=1, b=1):
    '''
    The first argument you give goes to x, the second to a, the third to b.
    If you do not provide a or b, they default to the value 1.
    '''
    if x < 0:
        return "WHOA THIS IS NEGATIVE"
    return [a + b * x, 2] # you can return any object(s) you want! this is a list, for example

print(f(-100))    # notice it never gets to a+b*x
print(f(2,2,2))   
print(f(1))       # uses the default value of a and b
print(f(1,b=3))   # uses the default value of a
help(f)           # the docstring is useful!

WHOA THIS IS NEGATIVE
[6, 2]
[2, 2]
[4, 2]
Help on function f in module __main__:

f(x, a=1, b=1)
    The first argument you give goes to x, the second to a, the third to b.
    If you do not provide a or b, they default to the value 1.

# this won't work! python requires you to use the keyword arguments AFTER the positional arguments
print(f(b=3,1)) 

  File "<ipython-input-7-4bfe94fa827f>", line 2
    print(f(b=3,1))
                ^
SyntaxError: positional argument follows keyword argument

1.6.2.13. Scope¶

I want you to be generally aware of the concept of “global” and “local” scope. Generally, python objects are available only within the region they are defined and subregions therein. Put differently, objects are available downstream, but not upstream.

x=1
def silly_func():
    xyz = 14
    print(x) # variables defined OUTSIDE AND BEFORE a function are visible INSIDE the func
    
silly_func() 

print(xyz)   # variables defined INSIDE a function are NOT visible OUTSIDE the func

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-9-e4e4f48ed516> in <module>
----> 1 print(xyz)   # variables defined INSIDE a function are NOT visible OUTSIDE the func

NameError: name 'xyz' is not defined

x = 1
def silly_func():
    x=2
    return x
print(silly_func())
print(x)               # changing the downstream variable inside the function didn't change the upstream version

2
1

1.6.3. Popular, nay, essential packages¶

As the semester proceeds, you will surely need to learn (to some degree) the following packages. For each, you might note the most common and useful functions, and copy “cookbook” uses of the packages which you can paste into new programs. (E.g. how to open a csv file.)

Note: I do not personally, nor do many programmers, commit to memory many functions of many packages. We simply know what can be done and when needed, we search (tab completion/google/stack overflow) for the command/recipe for that function.

Built-in packages:os sys itertools re datetime csv
Datasci packages (Anaconda installs these for you!), note the aliases here aren’t strictly needed, but by convention, virtually everyone uses the shorter names
- pandas as pd
- seaborn as sns
- matplotlib as mpl
- statsmodels.api as sm
- matplotlib.pyplot as plt
- numpy as np
- sklearn
Web crawling
- requests, requests_html, urllib
- time and tdqm
- beautifulsoup4 as bs4
- html5lib
- selenium

1.6.4. Clear output and rerun from the start!¶

Code must run from beginning to end and produce the same thing every time

Restart the kernal and clear output
Run all cells

1.6.5. How do I…?¶

1.6.5.1. Stuck on syntax issues for a function?¶

See the tips in the Jupyter Lab page.

1.6.5.2. Copy chunks of lecture files into my own code?¶

I think this is the easiest way:

Clone the lecture repo to your computer.
Open the code you’re working on in Jupyter Lab.
Open the lecture code you’re working on in Jupyter Lab in a new tab.
Drag the lecture code to the right until it snaps into a new panel.
Click and hold to the left of the block you want to drag into your code and drag it into your code.

Pro Tip

You can select a bunch of code blocks and drag them all at once. To do that, click to the left of a cell you want, hold the shift button, and then click up or down until all the cells you want are highlighted. Then do step 5 above.

Want another amateur youtube video? Here’s one of me showing those steps!

LeDataSciFi-2021