Data Analysis using Jupyter Notebooks Part 2¶

Benjamin J. Morgan

Tutorial 2¶

Mathematical functions, importing modules, and variables¶

Simple mathematical operations can be typed directly into code cells and run to perform calculations, much how you would use a calculator; e.g.

1+2+3+4

or

5**3 # five cubed

Other mathematical functions use Python functions, such as log and sqrt. These two functions are both contained in the math module, which needs to be imported before you can use them. The math module also contains standard mathematical constants, such as $\mathrm{e}$ and $\pi$.

Import the `math` module and use the appropriate functions and constants to perform the following calculations: - $\ln(2)$ - $\sqrt{121}$ - $\sin( \pi/2 )$ - $\mathrm{e}^2$

Variables¶

Variables let you store things to use later. Once you have stored a value in a variable, you can use the variable in place of the original value. Variable assignment (storing a value in a variable) uses a single = symbol, e.g. variable_name = value

a = 6

b = 2

a * b

Variable names must start with a letter or an underscore, so spider, and _book are valid variable names, but 2_times_2 is not. In Python, variable names are case-sensitive, so apple, Apple, and APPLE are distinct variables.

apple = 1

Apple = 2

APPLE = 3

print( apple, Apple, APPLE )

1 2 3

Data Types¶

In Python, different kinds of data are represented by different data types. Unlike some other programming languages, you do not need to specify what data type a variable will be used to store before using it. Instead the computer will figure this out when you run your code. You can also use a single variable to store different data types throughout a piece of code, although this can make it harder for people to understand.

var = 1 # integer

print( var )

var = 2 # integer

print( var )

var = 3.5 # floating point number, called a "float"

print( var )

var = 'a variable'

print( var )

Another data type is the list. In Python, lists represent an ordered sequence of values (which could be other lists):

a = 1

b = 2

c = 3

my_list = [ a, b, c ]

print( my_list )

You can access particular elements in a list using list indexing, with square [ ] brackets

print( my_list[2] )

Remember that Python starts counting from zero, so my_list[2] returns the third element in my_list (you can think of this as two positions along from the front)

To access a range of values in a list, you can use a slice, with the : symbol:

longer_list = [2,4,6,8,10,12,14,16]

longer_list[2:4]

Slices work by giving you all elements, counting from the index on the left of the : up to but not including the index on the right of the :. This is why the previous example gives us the third element (2 from the beginning), the fourth element (3 from the beginning), and does not give us the fifth element (4 from the beginning).

List indexing, and slicing can also understand negative numbers, which indicate counting from the end of the list, instead of from the beginning:

longer_list[-2]

You can also select every $n$ elements in a sequence, by adding a third number to a slice

longer_list[2::2]

In this example we have left out the second number, which normally indicates where to stop. If you do not specify where to stop, the slice will continue to the end of the list. The previous example, then, means “every second element of longer_list, starting with the third element (2 from the beginning), and continuing until the end of the list.”

numpy arrays¶

Arrays are particularly useful for storing and working with numerical data. Arrays are part of the numpy module, so you need to import numpy before you can use them. Remember that this often looks like import numpy as np, which allows us to use the shorthand np instead of numpy when referring to it in our code:

import numpy as np

a = np.array([1,2,3,4,5,6,7,8,9,10])

What are some ways to generate sequences of numbers, without having to type them all out one after the other? Look at the `numpy.arange()` and `numpy.linspace()` functions in your semester 1 notebooks. What is the difference between these two lines?

numpy arrays can be operated on like vectors: you can write code that performs the same operation on every element in the array:

print(a**2)

numpy also contains a large number of mathematical functions, including functions such as np.sqrt() and np.sin(). These examples behave the same as math.sqrt() and math.sin() if you use them to operate on single numbers, but they can also operate on lists of numbers or entire arrays at once:

import math

import numpy as np

print( math.sqrt(4) ) # square root of a single number

print( np.sqrt(4) ) # square root of a single number

print( np.sqrt( [4,9,16] ) ) # square root of every element of a list of numbers

a = np.array( [4, 9, 16] ) # create a numpy array

print( np.sqrt(a) ) # square root of every element of a numpy array

numpy includes common statistical functions for finding means, standard deviations, maximum and minimum values, or the index of a maximum or minimum value.

a = np.random.random(10) # create a numpy array with 10 random numbers between 0 and 1

print( 'a:', a ) # print a, rounded to 2 decimal places.

print( 'mean:', np.mean(a) )

print( 'std dev:', np.std(a) )

print( 'minimum value:', np.min(a) )

print( 'maximum value:', np.max(a) )

print( 'index of the minimum value:', np.argmin(a) )

print( 'index of the maximum value:', np.argmax(a) )