# 2. Scalar Types and Control Structures in Python#

The open-access textbookMinimalist Data Wrangling with Pythonby Marek Gagolewski is, and will remain, freely available for everyone’s enjoyment (also in PDF; a printed version can be ordered from Amazon: AU CA DE ES FR IT JP NL PL SE UK US). It is a non-profit project. Although available online, it is a whole course; it should be read from the beginning to the end. Refer to the Preface for general introductory remarks. Any bug/typo reports/fixes are appreciated.Also, make sure to check out my other book,Deep R Programming[34].

In this part, we introduce the basics of the Python language itself.
As it is a general-purpose tool, various packages supporting
data wrangling operations will provided as third-party extensions.
In further chapters, based on the concepts discussed here,
we will be able to use
**numpy**, **scipy**,
**matplotlib**, **pandas**, **seaborn**,
and other packages with some healthy degree of confidence.

## 2.1. Scalar Types#

The five ubiquitous scalar types (i.e., *single* or *atomic* values) are:

`bool`

– logical,`int`

,`float`

,`complex`

– numeric,`str`

– character.

### 2.1.1. Logical Values#

There are only two possible logical (Boolean) values: `True`

and `False`

.
We can type:

```
True
## True
```

to instantiate one of them. This might seem boring; unless, when trying to play with the above code, we fell into the following pitfall.

Important

Python is case-sensitive.
Writing “`TRUE`

” or “`true`

” instead of “`True`

” is an error.

### 2.1.2. Numeric Values#

The three numeric scalar types are:

`int`

– integers, e.g.,`1`

,`-42`

,`1_000_000`

;`float`

– floating-point (real) numbers, e.g.,`-1.0`

,`3.14159`

,`1.23e-4`

;`complex`

(*) – complex numbers, e.g.,`1+2j`

(these are infrequently used in our applications; however, see Section 4.1.4).

In practice, numbers of type `int`

and `float`

often interoperate seamlessly.
We usually do not have to think about them as being of distinctive
types.

`1.23e-4`

and `9.8e5`

are examples of numbers entered
using the so-called scientific notation, where “`e`

” stands for
“times 10 to the power of”.
Additionally, `1_000_000`

is a decorated (more human-readable) version
of `1000000`

. Use the `print`

function to check their values.

#### 2.1.2.1. Arithmetic Operators#

Here is the list of available arithmetic operators:

```
1 + 2 # addition
## 3
1 - 7 # subtraction
## -6
4 * 0.5 # multiplication
## 2.0
7 / 3 # float division (the result is always of type float)
## 2.3333333333333335
7 // 3 # integer division
## 2
7 % 3 # division remainder
## 1
2 ** 4 # exponentiation
## 16
```

The precedence of these operators is quite predictable, e.g., exponentiation has higher priority than multiplication and division, which in turn bind more strongly than addition and subtraction. Consequently:

```
1 + 2 * 3 ** 4 # the same as 1+(2*(3**4))
## 163
```

is different from, e.g., `((1+2)*3)**4)`

.

Note

Keep in mind that computers’ floating-point arithmetic is precise
only up to a few significant digits.
As a consequence, the result of `7/3`

is only approximate
(`2.3333333333333335`

).
We will get back to this topic in Section 5.5.6.

#### 2.1.2.2. Creating Named Variables#

Named variables can be introduced using the *assignment operator*,
`**=**`. They can store arbitrary Python objects and be
referred to at any time.
Names of variables can include any lower- and uppercase letters,
underscores, and digits (but not at the beginning).
It is best to make them self-explanatory, like:

```
x = 7 # read: let `x` from now on be equal to 7 (or: `x` becomes 7)
```

We can check that `x`

(great name, by the way: it means
*something of general interest* in mathematics) is now available
for further reference by printing out the value that is bound therewith:

```
print(x) # or just `x`
## 7
```

New variable can easily be created based on existing ones:

```
my_2nd_variable = x/3 - 2 # creates `my_2nd_variable`
print(my_2nd_variable)
## 0.3333333333333335
```

Also, existing variables can be re-bound to any other value whenever we please:

```
x = x/3 # let the new `x` be equal to the old `x` (7) divided by 3
print(x)
## 2.3333333333333335
```

Create two named variables `height`

(in centimetres) and `weight`

(in kilograms).
Based on them, determine your BMI.

Note

(*) Augmented assignments are also available. For example:

```
x *= 3
print(x)
## 7.0
```

In this context, the above is equivalent to `x = x*3`

, i.e.,
a new variable has been created. Nevertheless, in other scenarios,
augmented assignments modify the objects they act upon in place;
compare Section 3.5.

### 2.1.3. Character Strings#

Character strings (objects of type `str`

),
which can consist of arbitrary text, are
created using either double quotes or apostrophes:

```
print("spam, spam, #, bacon, and spam")
## spam, spam, #, bacon, and spam
print("Cześć! ¿Qué tal?")
## Cześć! ¿Qué tal?
print('"G\'day, howya goin\'," he asked.\n"Fine, thanks," she responded.\\')
## "G'day, howya goin'," he asked.
## "Fine, thanks," she responded.\
```

Above, “`\'`

” (a way to include an apostrophe in an apostrophe-delimited
string), “`\\`

” (a backslash), and “`\n`

” (a newline character)
are examples of
*escape sequences*.

Multiline strings are also possible:

```
"""
spam\\spam
tasty\t"spam"
lovely\t'spam'
"""
## '\nspam\\spam\ntasty\t"spam"\nlovely\t\'spam\'\n'
```

Call the **print** function on the above object to reveal
the special meaning of the included escape sequences.

Important

Many string operations are available. They are related, for example to formatting, pattern searching, or extracting matching chunks. They are especially important in the art of data wrangling as oftentimes information comes to us in textual form. We shall be covering this topic in detail in Chapter 14.

#### 2.1.3.1. F-Strings (Formatted String Literals)#

Also, the so-called *f-strings* (formatted string literals)
can be used to prepare nice output messages:

```
x = 2
f"x is equal to {x}"
## 'x is equal to 2'
```

Notice the “`f`

” prefix. The “`{x}`

” part was replaced with the value
stored in the `x`

variable.

There are many options available. As usual, it is best
to study the documentation
in search of interesting features. Here, let us just mention that
we will frequently be referring to placeholders like
“`{variable:width}`

” and “`{variable:width.precision}`

”,
which specify the field width and the number of fractional digits
of a number. This can result in a series of values nicely aligned
one below another.

```
π = 3.14159265358979323846
e = 2.71828182845904523536
print(f"""
π = {π:10.8f}
e = {e:10.8f}
""")
##
## π = 3.14159265
## e = 2.71828183
```

“`10.8f`

” means that a value should be formatted as a `float`

,
be of width at least ten characters (text columns),
and use eight fractional digits.

## 2.2. Calling Built-in Functions#

There are quite a few built-in functions ready for use. For instance:

```
e = 2.718281828459045
round(e, 2)
## 2.72
```

Rounds `e`

to 2 decimal digits.

Call **help**`("round")`

to access the function’s manual. Note that the second argument,
called `ndigits`

, which we set to `2`

, has a default value of `None`

.
Check what happens when we omit it during the call.

### 2.2.1. Positional and Keyword Arguments#

As **round** has two parameters, `number`

and `ndigits`

, the
following (and no other) calls are equivalent:

```
print(
round(e, 2), # two arguments matched positionally
round(e, ndigits=2), # positional and keyword argument
round(number=e, ndigits=2), # two keyword arguments
round(ndigits=2, number=e) # the order does not matter for keyword args
)
## 2.72 2.72 2.72 2.72
```

That no other form is allowed is left as an exercise, i.e., positionally matched arguments must be listed before the keyword ones.

### 2.2.2. Modules and Packages#

Other functions are available in numerous Python modules and packages (which are collections of modules).

For example, **math** features many mathematical functions:

```
import math # the math module must be imported prior its first use
print(math.log(2.718281828459045)) # the natural logarithm (base e)
## 1.0
print(math.floor(-7.33)) # the floor function
## -8
print(math.sin(math.pi)) # sin(pi) equals 0 (with some numeric error)
## 1.2246467991473532e-16
```

See the official documentation for the comprehensive list of objects defined therein. On a side note, all floating-point computations in any programming language are subject to round-off errors and other inaccuracies. This is why the result of \(\sin\pi\) is not exactly 0, but some value very close thereto. We will elaborate on this topic in Section 5.5.6.

Packages can be given aliases, for the sake of code readability
or due to our being lazy. For instance, we are used to importing
the **numpy** package under the `np`

alias:

```
import numpy as np
```

And now, instead of writing, for example,
**numpy.random.rand**`()`

, we can call instead:

```
np.random.rand() # a pseudorandom value in [0.0, 1.0)
## 0.6964691855978616
```

### 2.2.3. Slots and Methods#

Python is an object-oriented programming language.
Each object is an instance of some *class* whose name we can reveal
by calling the **type** function:

```
x = 1+2j
type(x)
## <class 'complex'>
```

Important

Classes define the following kinds of *attributes*:

*slots*– associated data,*methods*– associated functions.

Call **help**`("complex")`

to reveal that
the `complex`

class features, amongst others,
the **conjugate** method and the `real`

and `imag`

slots.

Here is how we can read the two slots:

```
print(x.real) # access slot `real` of object `x` of class `complex`
## 1.0
print(x.imag)
## 2.0
```

And here is an example of a method call:

```
x.conjugate() # equivalently: complex.conjugate(x)
## (1-2j)
```

Notably, the documentation of this function can be
accessed by typing **help**`("complex.conjugate")`

*(class name – dot – method name)*.

## 2.3. Controlling Program Flow#

### 2.3.1. Relational and Logical Operators#

We have several operators which return a single logical value:

```
1 == 1.0 # is equal to?
## True
2 != 3 # is not equal to?
## True
"spam" < "egg" # is less than? (with respect to the lexicographic order)
## False
```

Some more examples:

```
math.sin(math.pi) == 0.0 # well, numeric error...
## False
abs(math.sin(math.pi)) <= 1e-9 # is close to 0?
## True
```

Logical results might be combined using **and**
(conjunction; for testing if both operands are true) and
**or** (alternative; for determining whether at least one operand
is true). Likewise, **not** (negation) is available too.

```
3 <= math.pi and math.pi <= 4
## True
not (1 > 2 and 2 < 3) and not 100 <= 3
## True
```

Notice that **not **`100 <= 3`

is equivalent to `100 > 3`

.
Also, based on the de Morgan’s laws,
**not **`(1 > 2 `

**and**` 2 < 3)`

is true
if and only if `1 <= 2 `

**or**` 2 >= 3`

holds.

Assuming that `p`

, `q`

, `r`

are logical
and `a`

, `b`

, `c`

, `d`

are float-type variables,
simplify the following expressions:

**not not**`p`

,**not**`p`

**and not**`q`

,**not**`(`

**not**`p`

**or not**`q`

**or not**`r)`

,**not**`a == b`

,**not**`(b > a`

**and**`b < c)`

,**not**`(a>=b`

**and**`b>=c`

**and**`a>=c)`

,`(a>b`

**and**`a<c)`

**or**`(a<c`

**and**`a>d)`

.

### 2.3.2. The **if** Statement#

The **if** statement allows us to execute a chunk of code
conditionally, based on whether the provided expression is true or not.

For instance, given some variable:

```
x = np.random.rand() # a pseudorandom value in [0.0, 1.0)
```

we can react enthusiastically to its being less than 0.5 (note the colon after the tested condition):

```
if x < 0.5: print("spam!")
```

which did not happen, because it is equal to:

```
print(x)
## 0.6964691855978616
```

Multiple **elif** (*else-if*) parts can also be added,
followed by an optional **else** part, which is executed
if all the conditions tested are not true.

```
if x < 0.25: print("spam!")
elif x < 0.5: print("ham!") # i.e., x in [0.25, 0.5)
elif x < 0.75: print("bacon!") # i.e., x in [0.5, 0.75)
else: print("eggs!") # i.e., x >= 0.75
## bacon!
```

If more than one statement is to be executed conditionally, an indented code block can be introduced.

```
if x >= 0.25 and x <= 0.75:
print("spam!")
print("I love it!")
else:
print("I'd rather eat spam!")
print("more spam!") # executed regardless of the condition's state
## spam!
## I love it!
## more spam!
```

Important

The indentation must be neat and consistent. We recommend using four spaces. The reader is encouraged to try to execute the following code chunk and note what kind of error is generated:

```
if x < 0.5:
print("spam!")
print("ham!") # :(
```

For a given BMI, print out the corresponding category as defined by the WHO (underweight if below 18.5, normal range up to 25.0, etc.). Let us bear in mind that the BMI is a simplistic measure. Both the medical and statistical communities point out its inherent limitations. Read the Wikipedia article thereon for more details (and appreciate the amount of data wrangling required for its preparation – tables, charts, calculations; something that we will be able to do quite soon, given good reference data, of course).

(*) Check if it is easy to find on the internet (in reliable sources) some raw data sets related to the body mass studies, e.g., measuring subjects’ height, weight, body fat and muscle percentage, etc.

### 2.3.3. The **while** Loop#

The **while** loop executes a given statement or a series of
statements as long as a given condition is true.

For example, here is a simple simulator determining how long we have to wait until drawing the first number not greater than 0.01 whilst generating numbers in the unit interval:

```
count = 0
while np.random.rand() > 0.01:
count = count + 1
print(count)
## 117
```

Using the **while** loop,
determine the arithmetic mean of 10 randomly generated numbers
(i.e., the sum of the numbers divided by 10).

## 2.4. Defining Own Functions#

We can also define our own functions as a means for code reuse.
For instance, below is one that computes the minimum
(with respect to the `**<**` relation) of three given
objects:

```
def min3(a, b, c):
"""
A function to determine the minimum of three given inputs.
By the way, this is a docstring (documentation string);
call help("min3") later.
"""
if a < b:
if a < c:
return a
else:
return c
else:
if b < c:
return b
else:
return c
```

Example calls:

```
print(min3(10, 20, 30),
min3(10, 30, 20),
min3(20, 10, 30),
min3(20, 30, 10),
min3(30, 10, 20),
min3(30, 20, 10))
## 10 10 10 10 10 10
```

Note that the function *returns* a value. The result can be
fetched and used in further computations:

```
x = min3(np.random.rand(), 0.5, np.random.rand()) # minimum of 3 numbers
x = round(x, 3) # do something with the result
print(x)
## 0.5
```

Write a function named **bmi** which computes and returns a
person’s BMI, given their weight (in kilograms) and height (in centimetres).
As documenting functions constitutes a good development practice,
do not forget about including a docstring.

We can also introduce new variables inside a function’s body. This can help the function perform what it has been designed to do.

```
def min3(a, b, c):
"""
A function to determine the minimum of three given inputs
(alternative version).
"""
m = a # a local (temporary/auxiliary) variable
if b < m:
m = b
if c < m: # be careful! no `else` or `elif` here — it's a separate `if`
m = c
return m
```

Example call:

```
m = 7
n = 10
o = 3
min3(m, n, o)
## 3
```

All *local variables* cease to exist after the function is called.
Notice that `m`

inside the function is a variable independent
of `m`

in the global (calling) scope.

```
print(m) # this is still the global `m` from before the call
## 7
```

Write a function **max3** which determines
the maximum of three given values.

Write a function **med3** which defines the median of three
given values (the one value that is in-between the other ones).

(*) Write a function **min4** to compute the
minimum of four values.

Note

*Lambda expressions* give us an uncomplicated way to define
functions using a single line of code.
Their syntax is: **lambda**` argument_name: return_value`

.

```
square = lambda x: x**2 # i.e., def square(x): return x**2
square(4)
## 16
```

Objects generated through lambda expressions do not have to be assigned a name – they can be anonymous. This is useful when calling methods that take other functions as their arguments. With lambdas, the latter can be generated on the fly.

```
def print_x_and_fx(x, f):
"""
Arguments: x - some object; f - a function to be called on x
"""
print(f"x = {x} and f(x) = {f(x)}")
print_x_and_fx(4, lambda x: x**2)
## x = 4 and f(x) = 16
print_x_and_fx(math.pi/4, lambda x: round(math.cos(x), 5))
## x = 0.7853981633974483 and f(x) = 0.70711
```

## 2.5. Exercises#

What does **import**` xxxxxx `

**as**` x`

mean?

What is the difference between **if** and **while**?

Name the scalar types we introduced in this chapter.

What is a docstring and how can we create and access it?

What are keyword arguments?