# 2. Scalar types and control structures in Python#

The open-access textbookMinimalist Data Wrangling with Pythonby Marek Gagolewski is, and will remain, freely available for everyone’s enjoyment (also in PDF; a printed version can be ordered from Amazon: AU CA DE ES FR IT JP NL PL SE UK US). It is a non-profit project. Although available online, it is a whole course, and should be read from the beginning to the end. Refer to the Preface for general introductory remarks. Any bug/typo reports/fixes are appreciated. Make sure to check out the author’s other book,Deep R Programming[34].

In this part, we introduce the basics of the Python language itself.
As it is a general-purpose tool, various packages supporting
data wrangling operations will provided as third-party extensions.
In further chapters, based on the concepts discussed here,
we will be able to use
**numpy**, **scipy**,
**matplotlib**, **pandas**, **seaborn**,
and other packages with some healthy degree of confidence.

## 2.1. Scalar types#

The five ubiquitous scalar types (i.e., *single* or *atomic* values) are:

`bool`

– logical,`int`

,`float`

,`complex`

– numeric,`str`

– character.

### 2.1.1. Logical values#

There are only two possible logical (Boolean) values: `True`

and `False`

.
We can type:

```
True
## True
```

to instantiate one of them. This might seem boring; unless, when trying to play with the above code, we fell into the following pitfall.

Important

Python is case-sensitive.
Writing “`TRUE`

” or “`true`

” instead of “`True`

” is an error.

### 2.1.2. Numeric values#

The three numeric scalar types are:

`int`

– integers, e.g.,`1`

,`-42`

,`1_000_000`

;`float`

– floating-point (real) numbers, e.g.,`-1.0`

,`3.14159`

,`1.23e-4`

;`complex`

(*) – complex numbers, e.g.,`1+2j`

(these are infrequently used in our applications; however, see Section 4.1.4).

In practice, numbers of the type `int`

and `float`

often interoperate
seamlessly. We usually do not have to think about them as being of
distinctive types.

`1.23e-4`

and `9.8e5`

are examples of numbers entered
using the so-called scientific notation, where “`e`

” stands for
“times 10 to the power of”.
Additionally, `1_000_000`

is a decorated (more human-readable) version
of `1000000`

. Use the `print`

function to check their values.

#### 2.1.2.1. Arithmetic operators#

Here is the list of available arithmetic operators:

```
1 + 2 # addition
## 3
1 - 7 # subtraction
## -6
4 * 0.5 # multiplication
## 2.0
7 / 3 # float division (the result is always of the type float)
## 2.3333333333333335
7 // 3 # integer division
## 2
7 % 3 # division remainder
## 1
2 ** 4 # exponentiation
## 16
```

The precedence of these operators is quite predictable, e.g., exponentiation has higher priority than multiplication and division, which in turn bind more strongly than addition and subtraction. Consequently:

```
1 + 2 * 3 ** 4 # the same as 1+(2*(3**4))
## 163
```

is different from, e.g., `((1+2)*3)**4)`

.

Note

Keep in mind that computers’ floating-point arithmetic is precise
only up to a few significant digits.
As a consequence, the result of `7/3`

is only approximate
(`2.3333333333333335`

).
We will get back to this topic in Section 5.5.6.

#### 2.1.2.2. Creating named variables#

Named variables can be introduced using the *assignment operator*,
`**=**`. They can store arbitrary Python objects and be
referred to anytime. Names of variables can include any lower- and uppercase
letters, underscores, and digits (but not at the beginning).
It is best to make them self-explanatory, like:

```
x = 7 # read: let `x` from now on be equal to 7 (or: `x` becomes 7)
```

We can check that `x`

(great name, by the way: it means
*something of general interest* in mathematics) is now available
for further reference by printing out the value it is bound to:

```
print(x) # or just `x`
## 7
```

New variable can easily be created based on existing ones:

```
my_2nd_variable = x/3 - 2 # creates `my_2nd_variable`
print(my_2nd_variable)
## 0.3333333333333335
```

Also, existing variables can be rebound to any other value whenever we please:

```
x = x/3 # let the new `x` be equal to the old `x` (7) divided by 3
print(x)
## 2.3333333333333335
```

Create two named variables `height`

(in centimetres) and `weight`

(in kilograms). Based on them, determine your
BMI.

Note

(*) Augmented assignments are also available. For example:

```
x *= 3
print(x)
## 7.0
```

In this context, the above is equivalent to `x = x*3`

. In other
words, it created a new variable. Nevertheless, in other scenarios,
augmented assignments modify the objects they act upon *in place*;
compare Section 3.5.

### 2.1.3. Character strings#

Character strings (objects of the type `str`

) consist of arbitrary text.
They are created using either double quotes or apostrophes:

```
print("spam, spam, #, bacon, and spam")
## spam, spam, #, bacon, and spam
print("Cześć! ¿Qué tal?")
## Cześć! ¿Qué tal?
print('"G\'day, howya goin\'," he asked.\n"Fine, thanks," she responded.\\')
## "G'day, howya goin'," he asked.
## "Fine, thanks," she responded.\
```

Above, “`\'`

” (a way to include an apostrophe in an apostrophe-delimited
string), “`\\`

” (a backslash), and “`\n`

” (a newline character)
are examples of
*escape sequences*.

Multiline strings are also possible:

```
"""
spam\\spam
tasty\t"spam"
lovely\t'spam'
"""
## '\nspam\\spam\ntasty\t"spam"\nlovely\t\'spam\'\n'
```

Call the **print** function on the above object to reveal
the special meaning of the included escape sequences.

Important

Many string operations are available, e.g., for formatting, pattern searching, or extracting matching chunks. They are especially important in the art of data wrangling as information often arrives in textual form. Chapter 14 covers this topic in detail.

#### 2.1.3.1. F-strings (formatted string literals)#

Also, *f-strings* (formatted string literals)
help prepare nice output messages:

```
x = 2
f"x is equal to {x}"
## 'x is equal to 2'
```

Notice the “`f`

” prefix. The “`{x}`

” part was replaced with the value
stored in the `x`

variable.

There are many options available. As usual, it is best
to study the documentation
in search of interesting features. Here, let us just mention that
we will frequently be referring to placeholders like
“`{variable:width}`

” and “`{variable:width.precision}`

”,
which specify the field width and the number of fractional digits
of a number. This can arouse a series of values nicely aligned
one below another.

```
π = 3.14159265358979323846
e = 2.71828182845904523536
print(f"""
π = {π:10.8f}
e = {e:10.8f}
""")
##
## π = 3.14159265
## e = 2.71828183
```

“`10.8f`

” means that a value should be formatted as a `float`

,
be of width at least ten characters (text columns),
and use eight fractional digits.

## 2.2. Calling built-in functions#

We have a few functions at our disposal. For instance:

```
e = 2.718281828459045
round(e, 2)
## 2.72
```

We rounded the Euler constant `e`

to two decimal digits.

Call **help**`("round")`

to access the function’s manual. Note that the second argument,
called `ndigits`

, which we set to `2`

, has a default value of `None`

.
Check what happens when we omit it during the call.

### 2.2.1. Positional and keyword arguments#

As **round** has two parameters, `number`

and `ndigits`

, the
following (and no other) calls are equivalent:

```
print(
round(e, 2), # two arguments matched positionally
round(e, ndigits=2), # positional and keyword argument
round(number=e, ndigits=2), # two keyword arguments
round(ndigits=2, number=e) # the order does not matter for keyword args
)
## 2.72 2.72 2.72 2.72
```

That no other form is permitted is left as an exercise, i.e., positionally matched arguments must be listed before the keyword ones.

### 2.2.2. Modules and packages#

Other functions are available in numerous Python modules and packages (which are collections of modules).

For example, **math** features many mathematical functions:

```
import math # the math module must be imported prior its first use
print(math.log(2.718281828459045)) # the natural logarithm (base e)
## 1.0
print(math.floor(-7.33)) # the floor function
## -8
print(math.sin(math.pi)) # sin(pi) equals 0 (with some numeric error)
## 1.2246467991473532e-16
```

See the official documentation for the comprehensive list of objects defined therein. On a side note, all floating-point computations in any programming language are subject to round-off errors and other inaccuracies. This is why the result of \(\sin\pi\) is not exactly 0, but some value very close thereto. We will elaborate on this topic in Section 5.5.6.

Packages can be given aliases, for the sake of code readability
or due to our being lazy. For instance, we are used to importing
the **numpy** package under the `np`

alias:

```
import numpy as np
```

And now, instead of writing, for example,
**numpy.random.rand**`()`

, we can call instead:

```
np.random.rand() # a pseudorandom value in [0.0, 1.0)
## 0.6964691855978616
```

### 2.2.3. Slots and methods#

Python is an object-orientated programming language.
Each object is an instance of some *class* whose name we can reveal
by calling the **type** function:

```
x = 1+2j
type(x)
## <class 'complex'>
```

Important

Classes define the following kinds of *attributes*:

*slots*– associated data,*methods*– associated functions.

Call **help**`("complex")`

to reveal that
the `complex`

class defines, amongst others,
the **conjugate** method and the `real`

and `imag`

slots.

Here is how we can read the two slots:

```
print(x.real) # access slot `real` of object `x` of the class `complex`
## 1.0
print(x.imag)
## 2.0
```

And here is an example of a method call:

```
x.conjugate() # equivalently: complex.conjugate(x)
## (1-2j)
```

Notably, the documentation of this function can be
accessed by typing **help**`("complex.conjugate")`

*(class name – dot – method name)*.

## 2.3. Controlling program flow#

### 2.3.1. Relational and logical operators#

We have several operators which return a single logical value:

```
1 == 1.0 # is equal to?
## True
2 != 3 # is not equal to?
## True
"spam" < "egg" # is less than? (with respect to the lexicographic order)
## False
```

Some more examples:

```
math.sin(math.pi) == 0.0 # well, numeric error...
## False
abs(math.sin(math.pi)) <= 1e-9 # is close to 0?
## True
```

Logical results might be combined using **and**
(conjunction; for testing if both operands are true) and
**or** (alternative; for determining whether at least one operand
is true). Likewise, **not** (negation) is available too.

```
3 <= math.pi and math.pi <= 4
## True
not (1 > 2 and 2 < 3) and not 100 <= 3
## True
```

Notice that **not **`100 <= 3`

is equivalent to `100 > 3`

.
Also, based on the de Morgan’s laws,
**not **`(1 > 2 `

**and**` 2 < 3)`

is true
if and only if `1 <= 2 `

**or**` 2 >= 3`

holds.

Assuming that `p`

, `q`

, `r`

are logical
and `a`

, `b`

, `c`

, `d`

are variables of the type `float`

,
simplify the following expressions:

**not not**`p`

,**not**`p`

**and not**`q`

,**not**`(`

**not**`p`

**or not**`q`

**or not**`r)`

,**not**`a == b`

,**not**`(b > a`

**and**`b < c)`

,**not**`(a>=b`

**and**`b>=c`

**and**`a>=c)`

,`(a>b`

**and**`a<c)`

**or**`(a<c`

**and**`a>d)`

.

### 2.3.2. The **if** statement#

The **if** statement allows us to execute a chunk of code
conditionally, based on whether the provided expression is true or not.

For instance, given some variable:

```
x = np.random.rand() # a pseudorandom value in [0.0, 1.0)
```

we can react enthusiastically to its being less than 0.5 (note the colon after the tested condition):

```
if x < 0.5: print("spam!")
```

We did not get excited because `x`

is equal to:

```
print(x)
## 0.6964691855978616
```

Multiple **elif** (*else-if*) parts can also be added.
They can be followed by an optional **else** part, which is executed
if all the conditions tested are not true.

```
if x < 0.25: print("spam!")
elif x < 0.5: print("ham!") # i.e., x in [0.25, 0.5)
elif x < 0.75: print("bacon!") # i.e., x in [0.5, 0.75)
else: print("eggs!") # i.e., x >= 0.75
## bacon!
```

If more than one statement is to be executed conditionally, an indented code block can be introduced.

```
if x >= 0.25 and x <= 0.75:
print("spam!")
print("I love it!")
else:
print("I'd rather eat spam!")
print("more spam!") # executed regardless of the condition's state
## spam!
## I love it!
## more spam!
```

Important

The indentation must be neat and consistent. We recommend using four spaces. The reader is encouraged to try to execute the following code chunk and note what kind of error is generated:

```
if x < 0.5:
print("spam!")
print("ham!") # :(
```

For a given BMI, print out the corresponding category as defined by the WHO (underweight if below 18.5 kg/m², normal range up to 25.0 kg/m², etc.). Bear in mind that the BMI is a simplistic measure. Both the medical and statistical communities pointed out its inherent limitations. Read the Wikipedia article thereon for more details (and appreciate the amount of data wrangling required for its preparation: tables, charts, calculations; something that we will be able to do quite soon, given quality reference data, of course).

(*) Check if it is easy to find on the internet (in reliable sources) some raw datasets related to the body mass studies, e.g., measuring subjects’ height, weight, body fat and muscle percentage, etc.

### 2.3.3. The **while** loop#

The **while** loop executes a given statement or a series of
statements as long as a given condition is true.

For example, here is a simple simulator determining how long we have to wait until drawing the first number not greater than 0.01 whilst generating numbers in the unit interval:

```
count = 0
while np.random.rand() > 0.01:
count = count + 1
print(count)
## 117
```

Using the **while** loop,
determine the arithmetic mean of 10 randomly generated numbers
(i.e., the sum of the numbers divided by 10).

## 2.4. Defining functions#

We can also introduce our own functions as a means for code reuse.
For instance, below is one that computes the minimum
(with respect to the `**<**` relation) of three given
objects:

```
def min3(a, b, c):
"""
A function to determine the minimum of three given inputs.
By the way, this is a docstring (documentation string);
call help("min3") later.
"""
if a < b:
if a < c:
return a
else:
return c
else:
if b < c:
return b
else:
return c
```

Example calls:

```
print(min3(10, 20, 30),
min3(10, 30, 20),
min3(20, 10, 30),
min3(20, 30, 10),
min3(30, 10, 20),
min3(30, 20, 10))
## 10 10 10 10 10 10
```

Note that the function *returns* a value. The result can be
fetched and used in further computations:

```
x = min3(np.random.rand(), 0.5, np.random.rand()) # minimum of 3 numbers
x = round(x, 3) # do something with the result
print(x)
## 0.5
```

Write a function named **bmi** which computes and returns a
person’s BMI, given their weight (in kilograms) and height (in centimetres).
As documenting functions constitutes a good development practice,
do not forget about including a docstring.

We can also introduce new variables inside a function’s body. This can help the function perform what it has been designed to do.

```
def min3(a, b, c):
"""
A function to determine the minimum of three given inputs
(alternative version).
"""
m = a # a local (temporary/auxiliary) variable
if b < m:
m = b
if c < m: # be careful! no `else` or `elif` here — it's a separate `if`
m = c
return m
```

Example call:

```
m = 7
n = 10
o = 3
min3(m, n, o)
## 3
```

All *local variables* cease to exist after the function is called.
Notice that `m`

inside the function is a variable independent
of `m`

in the global (calling) scope.

```
print(m) # this is still the global `m` from before the call
## 7
```

Implement a function **max3** which determines
the maximum of three given values.

Write a function **med3** which defines the median of three
given values (the value that is in-between two other ones).

(*) Indite a function **min4** to compute the
minimum of four values.

Note

*Lambda expressions* give us an uncomplicated way to define
functions using a single line of code.
Their syntax is: **lambda**` argument_name: return_value`

.

```
square = lambda x: x**2 # i.e., def square(x): return x**2
square(4)
## 16
```

Objects generated through lambda expressions do not have to be assigned a name – they can be anonymous. This is useful when calling methods that take other functions as their arguments. With lambdas, the latter can be generated on the fly.

```
def print_x_and_fx(x, f):
"""
Arguments: x - some object; f - a function to be called on x
"""
print(f"x = {x} and f(x) = {f(x)}")
print_x_and_fx(4, lambda x: x**2)
## x = 4 and f(x) = 16
print_x_and_fx(math.pi/4, lambda x: round(math.cos(x), 5))
## x = 0.7853981633974483 and f(x) = 0.70711
```

## 2.5. Exercises#

What does **import**` xxxxxx `

**as**` x`

mean?

What is the difference between **if** and **while**?

Name the scalar types we introduced in this chapter.

What is a docstring and how can we create and access it?

What are keyword arguments?