# 2. Scalar types and control structures in Python#

This open-access textbook is, and will remain, freely available for everyone’s enjoyment (also in PDF; a paper copy can also be ordered). It is a non-profit project. Although available online, it is a whole course, and should be read from the beginning to the end. Refer to the Preface for general introductory remarks. Any bug/typo reports/fixes are appreciated. Make sure to check outDeep R Programming[35] too.

In this part, we introduce the basics of the Python language itself.
Being a general-purpose tool, various packages supporting
data wrangling operations are provided as third-party extensions.
In further chapters, extending upon the concepts discussed here,
we will be able to use **numpy**, **scipy**,
**matplotlib**, **pandas**, **seaborn**,
and other packages with a healthy degree of confidence.

## 2.1. Scalar types#

*Scalars* are *single* or *atomic* values.
Their five ubiquitous types are:

`bool`

– logical,`int`

,`float`

,`complex`

– numeric,`str`

– character.

Let’s discuss them in detail.

### 2.1.1. Logical values#

There are only two possible logical (Boolean) values: `True`

and `False`

.
By typing:

```
True
## True
```

we instantiated the former. This is a dull exercise unless we have fallen into the undermentioned pitfall.

Important

Python is a case-sensitive language.
Writing “`TRUE`

” or “`true`

” instead of “`True`

” is an error.

### 2.1.2. Numeric values#

The three numeric scalar types are:

`int`

– integers, e.g.,`1`

,`-42`

,`1_000_000`

;`float`

– floating-point (real) numbers, e.g.,`-1.0`

,`3.14159`

,`1.23e-4`

;(*)

`complex`

– complex numbers, e.g.,`1+2j`

.

In practice, numbers of the type `int`

and `float`

often interoperate
seamlessly. We usually do not have to think about them as being of
distinctive types.
On the other hand, complex numbers are rather infrequently used in
basic data science applications (but see Section 4.1.4).

`1.23e-4`

and `9.8e5`

are examples of numbers in *scientific notation*,
where “`e`

” stands for “… times 10 to the power of …”.
Additionally, `1_000_000`

is a decorated (more human-readable) version
of `1000000`

. Use the `print`

function to check out their values.

#### 2.1.2.1. Arithmetic operators#

Here is the list of available arithmetic operators:

```
1 + 2 # addition
## 3
1 - 7 # subtraction
## -6
4 * 0.5 # multiplication
## 2.0
7 / 3 # float division (results are always of the type float)
## 2.3333333333333335
7 // 3 # integer division
## 2
7 % 3 # division remainder
## 1
2 ** 4 # exponentiation
## 16
```

The precedence of these operators is quite predictable, e.g., exponentiation has higher priority than multiplication and division, which in turn bind more strongly than addition and subtraction. Thus,

```
1 + 2 * 3 ** 4
## 163
```

is the same as `1+(2*(3**4))`

and is different from,
e.g., `((1+2)*3)**4)`

.

Note

Keep in mind that computers’ floating-point arithmetic is precise
only up to a dozen or so significant digits.
As a consequence, the result of `7/3`

is only approximate;
hence the `2.3333333333333335`

above.
We will get discuss this topic in Section 5.5.6.

#### 2.1.2.2. Creating named variables#

A named variable can be introduced through the *assignment operator*,
`**=**`. It can store an arbitrary Python object which we
can recall at any later time. Names of variables can include
any lower- and uppercase letters, underscores, and
(except at the beginning) digits.

To make our code easier to understand for humans, it is best to use names that are self-explanatory, like:

```
x = 7 # read: let `x` from now on be equal to 7 (or: `x` becomes 7)
```

“`x`

” is great name: it means *something of general interest* in mathematics.
Let’s print out the value it is bound to:

```
print(x) # or just `x`
## 7
```

New variables can easily be created based on existing ones:

```
my_2nd_variable = x/3 - 2 # creates `my_2nd_variable`
print(my_2nd_variable)
## 0.3333333333333335
```

Existing variables may be rebound to any other value freely:

```
x = x/3 # let the new `x` be equal to the old `x` (7) divided by 3
print(x)
## 2.3333333333333335
```

Define two named variables `height`

(in centimetres) and `weight`

(in kilograms). Determine the corresponding
body mass index (BMI).

Note

(*) Augmented assignments are also available. For example:

```
x *= 3
print(x)
## 7.0
```

In this context, the foregoing is equivalent to `x = x*3`

. In other
words, it creates a new object. Nevertheless, in some scenarios,
augmented assignments may modify the objects they act upon *in place*;
compare Section 3.5.

### 2.1.3. Character strings#

Character strings (objects of the type `str`

) store text data.
They are created using apostrophes or double quotes:

```
print("spam, spam, #, bacon, and spam")
## spam, spam, #, bacon, and spam
print('Cześć! ¿Qué tal?')
## Cześć! ¿Qué tal?
print('"G\'day, how\'s it goin\'," he asked.\\\n"All good," she responded.')
## "G'day, how's it goin'," he asked.\
## "All good," she responded.
```

We see some examples of
*escape sequences* here:

“

`\'`

” is a way to include an apostrophe in an apostrophe-delimited string,“

`\\`

” enters a backslash,“

`\n`

” inputs a newline character.

Multiline strings are created using three apostrophes or double quotes:

```
"""
spam\\spam
tasty\t"spam"
lovely\t'spam'
"""
## '\nspam\\spam\ntasty\t"spam"\nlovely\t\'spam\'\n'
```

Call the **print** function on the above objects to reveal
the meaning of the included escape sequences.

Important

Many string operations are available, e.g., for formatting and pattern searching. They are especially important in the art of data wrangling as information often arrives in textual form. Chapter 14 covers this topic in detail.

#### 2.1.3.1. F-strings (formatted string literals)#

*F-strings* are formatted string literals:

```
x = 2
f"x is equal to {x}"
## 'x is equal to 2'
```

Notice the “`f`

” prefix. The “`{x}`

” part was replaced with the value
stored in the `x`

variable.

The formatting of items can be fine-tuned. As usual, it is best
to study the documentation
in search of noteworthy features. Here, let’s just mention that
we will frequently be referring to placeholders like
“`{value:width}`

” and “`{value:width.precision}`

”,
which specify the field width and the number of fractional digits
of a number. This way, we can output a series of values aesthetically
aligned one beneath another.

```
π = 3.14159265358979323846
e = 2.71828182845904523536
print(f"""
π = {π:10.8f}
e = {e:10.8f}
πe² = {(π*e**2):10.8f}
""")
##
## π = 3.14159265
## e = 2.71828183
## πe² = 23.21340436
```

“`10.8f`

” means that a value should be formatted as a `float`

,
be of width at least ten characters (text columns),
and use eight fractional digits.

## 2.2. Calling built-in functions#

We have a few base functions at our disposal. For instance,
to round the Euler constant `e`

to two decimal digits, we can call:

```
e = 2.718281828459045
round(e, 2)
## 2.72
```

Call **help**`("round")`

to access the function’s manual. Note that the second argument,
called `ndigits`

, which we set to `2`

, defaults to `None`

.
Check what happens when we omit it during the call.

### 2.2.1. Positional and keyword arguments#

The **round** function has two parameters, `number`

and `ndigits`

.
Thus, the following calls are equivalent:

```
print(
round(e, 2), # two arguments matched positionally
round(e, ndigits=2), # positional and keyword argument
round(number=e, ndigits=2), # two keyword arguments
round(ndigits=2, number=e) # the order does not matter for keyword args
)
## 2.72 2.72 2.72 2.72
```

Verifying that no other call scheme is permitted is left as an exercise, i.e., positionally matched arguments must be listed before the keyword ones.

### 2.2.2. Modules and packages#

Python modules and packages (which are collections of modules)
define thousands of additional functions.
For example, **math** features the most common mathematical routines:

```
import math # the math module must be imported before we can use it
print(math.log(2.718281828459045)) # the natural logarithm (base e)
## 1.0
print(math.floor(-7.33)) # the floor function
## -8
print(math.sin(math.pi)) # sin(pi) equals 0 (with small numeric error)
## 1.2246467991473532e-16
```

See the official documentation for the comprehensive list of objects available. On a side note, all floating-point computations in any programming language are subject to round-off errors and other inaccuracies. This is why the result of \(\sin\pi\) is not exactly 0, but some value very close thereto. We will elaborate on this topic in Section 5.5.6.

Packages can be given aliases, for the sake of code readability
or due to our being lazy. For instance, in Chapter 4
we will get used to importing the **numpy** package
under the `np`

alias:

```
import numpy as np
```

And now, instead of writing, for example,
**numpy.random.rand**`()`

, we can call:

```
np.random.rand() # a pseudorandom value in [0.0, 1.0)
## 0.6964691855978616
```

### 2.2.3. Slots and methods#

Python is an object-orientated programming language.
Each object is an instance of some *class* whose name we can reveal
by calling the **type** function:

```
x = 1+2j
type(x)
## <class 'complex'>
```

Important

Classes define two kinds of *attributes*:

*slots*– associated data,*methods*– associated functions.

Call **help**`("complex")`

to reveal that
the `complex`

class defines, amongst others,
the **conjugate** method and the `real`

and `imag`

slots.

Here is how we can read the two slots:

```
print(x.real) # access slot `real` of object `x` of the class `complex`
## 1.0
print(x.imag)
## 2.0
```

And here is an example of a method call:

```
x.conjugate() # equivalently: complex.conjugate(x)
## (1-2j)
```

Notably, the documentation of this function can be
accessed by typing **help**`("complex.conjugate")`

*(class name – dot – method name)*.

## 2.3. Controlling program flow#

### 2.3.1. Relational and logical operators#

We have several operators which return a single logical value:

```
1 == 1.0 # is equal to?
## True
2 != 3 # is not equal to?
## True
"spam" < "egg" # is less than? (with respect to the lexicographic order)
## False
```

Some more examples:

```
math.sin(math.pi) == 0.0 # well, numeric error...
## False
abs(math.sin(math.pi)) <= 1e-9 # is close to 0?
## True
```

Logical results can be combined using **and**
(*conjunction*; for testing if both operands are true) and
**or** (*alternative*; for determining whether at least one operand
is true). Likewise, **not** stands for *negation*.

```
3 <= math.pi and math.pi <= 4 # is it between 3 and 4?
## True
not (1 > 2 and 2 < 3) and not 100 <= 3
## True
```

Notice that **not **`100 <= 3`

is equivalent to `100 > 3`

.
Also, based on the de Morgan laws,
**not **`(1 > 2 `

**and**` 2 < 3)`

is true
if and only if `1 <= 2 `

**or**` 2 >= 3`

holds.

Assuming that `p`

, `q`

, `r`

are logical
and `a`

, `b`

, `c`

, `d`

are variables of the type `float`

,
simplify the following expressions:

**not not**`p`

,**not**`p`

**and not**`q`

,**not**`(`

**not**`p`

**or not**`q`

**or not**`r)`

,**not**`a == b`

,**not**`(b > a`

**and**`b < c)`

,**not**`(a>=b`

**and**`b>=c`

**and**`a>=c)`

,`(a>b`

**and**`a<c)`

**or**`(a<c`

**and**`a>d)`

.

### 2.3.2. The **if** statement#

The **if** statement executes a chunk of code
*conditionally*, based on whether the provided expression is true or not.
For instance, given some variable:

```
x = np.random.rand() # a pseudorandom value in [0.0, 1.0)
```

we can react enthusiastically to its being less than 0.5:

```
if x < 0.5: print("spam!") # note the colon after the tested condition
```

Actually, we remained cool as a cucumber (nothing was printed)
because `x`

is equal to:

```
print(x)
## 0.6964691855978616
```

Multiple **elif** (*else-if*) parts can also be added.
They are executed one by one, until one of the tests turns out to
be successful.
At the end, an optional **else** part can be included,
which is executed when all of the tested conditions turn out to be false.

```
if x < 0.25: print("spam!")
elif x < 0.5: print("ham!") # i.e., x in [0.25, 0.5)
elif x < 0.75: print("bacon!") # i.e., x in [0.5, 0.75)
else: print("eggs!") # i.e., x >= 0.75
## bacon!
```

Note that if we wrote the second condition
as `x >= 0.25 and x < 0.5`

, we would introduce some redundancy;
when it is being considered, we already know that
`x < 0.25`

(the first test) is *not* true.
Similarly, the `else`

part is only executed when all the tests
fail, which in our case happens if
neither `x < 0.25`

, `x < 0.5`

, nor `x < 0.75`

is true,
i.e., if `x >= 0.75`

.

Whenever more than one statement is to be executed conditionally,
an *indented code block* can be introduced.

```
if x >= 0.25 and x <= 0.75:
print("bacon!")
print("I love it!")
else:
print("I'd rather eat spam!")
print("more spam!") # executed regardless of the condition's state
## bacon!
## I love it!
## more spam!
```

Important

The indentation must be neat and consistent. We recommend using
*four spaces*. Note the kind of error generated when we try executing:

```
if x < 0.5:
print("spam!")
print("ham!") # :(
```

```
IndentationError: unindent does not match any outer indentation level
```

For a given BMI, print out the corresponding category as defined by the WHO (underweight if less than 18.5 kg/m², normal range up to 25.0 kg/m², etc.). Bear in mind that the BMI is a simplistic measure. Both the medical and statistical communities pointed out its inherent limitations. Read the Wikipedia article thereon for more details (and appreciate the amount of data wrangling required for its preparation: tables, charts, calculations; something that we will be able to perform quite soon, given quality reference data, of course).

(*) Check if it is easy to find on the internet (in reliable sources) some raw datasets related to the body mass studies, e.g., measuring subjects’ height, weight, body fat and muscle mass, etc.

### 2.3.3. The **while** loop#

The **while** loop executes a given statement or a series of
statements *as long as* a given condition is true.
For example, here is a simple simulator determining
how long we have to wait *until* drawing the first value
*not* greater than 0.01 whilst generating numbers in the unit interval:

```
count = 0
while np.random.rand() > 0.01:
count = count + 1
print(count)
## 117
```

Using the **while** loop,
determine the arithmetic mean of 100 randomly generated numbers
(i.e., the sum of the numbers divided by 100).

## 2.4. Defining functions#

As a means for *code reuse*, we can define *our own* functions.
For instance, below is a procedure that computes the minimum
(with respect to the `**<**` relation) of three given
objects:

```
def min3(a, b, c):
"""
A function to determine the minimum of three given inputs.
By the way, this is a docstring (documentation string);
call help("min3") later to view it.
"""
if a < b:
if a < c:
return a
else:
return c
else:
if b < c:
return b
else:
return c
```

Example calls:

```
print(min3(10, 20, 30),
min3(10, 30, 20),
min3(20, 10, 30),
min3(20, 30, 10),
min3(30, 10, 20),
min3(30, 20, 10))
## 10 10 10 10 10 10
```

Note that **min3** *returns* a value. The result it yields can be
consumed in further computations:

```
x = min3(np.random.rand(), 0.5, np.random.rand()) # minimum of 3 numbers
x = round(x, 3) # transform the result somehow
print(x)
## 0.5
```

Write a function named **bmi** which computes and returns a
person’s BMI, given their weight (in kilograms) and height (in centimetres).
As documenting functions constitutes a good development practice,
do not forget about including a docstring.

New variables can be introduced inside a function’s body. This can help the function perform its duties.

```
def min3(a, b, c):
"""
A function to determine the minimum of three given inputs
(alternative version).
"""
m = a # a local (temporary/auxiliary) variable
if b < m:
m = b
if c < m: # be careful! no `else` or `elif` here — it's a separate `if`
m = c
return m
```

Example call:

```
m = 7
n = 10
o = 3
min3(m, n, o)
## 3
```

All *local variables* cease to exist after the function is called.
Notice that `m`

inside the function is a variable independent
of `m`

in the global (calling) scope.

```
print(m) # this is still the global `m` from before the call
## 7
```

Implement a function **max3** which determines
the maximum of three given values.

Write a function **med3** which defines the median of three
given values (the value that is in-between two other ones).

(*) Indite a function **min4** to compute the
minimum of four values.

Note

*Lambda expressions* give us an uncomplicated way to define
functions using a single line of code.
The are defined using the syntax **lambda**` argument_name: return_expression`

.

```
square = lambda x: x**2 # i.e., def square(x): return x**2
square(4)
## 16
```

Objects generated through lambda expressions do not have to be assigned a name: they can remain anonymous. This is useful when calling a method which takes another function as its argument. With lambdas, the latter can be generated on the fly.

```
def print_x_and_fx(x, f):
"""
Arguments: x - some object; f - a function to be called on x
"""
print(f"x = {x} and f(x) = {f(x)}")
print_x_and_fx(4, lambda x: x**2)
## x = 4 and f(x) = 16
print_x_and_fx(math.pi/4, lambda x: round(math.cos(x), 5))
## x = 0.7853981633974483 and f(x) = 0.70711
```

## 2.5. Exercises#

What does **import**` xxxxxx `

**as**` x`

mean?

What is the difference between **if** and **while**?

Name the scalar types we introduced in this chapter.

What is a docstring and how can we create and access it?

What are keyword arguments?