Any bug/typos reports/fixes are appreciated.
Note that the most up-to-date version of this book can be found at https://datawranglingpy.gagolewski.com/.
Below is the list of the most noteworthy changes.
under development (v188.8.131.52xxx):
New HTML theme (featuring light and dark mode).
Not using seaborn where it can easily be replaced by 1–3 calls to the lower-level matplotlib, especially in the numpy chapters.
Use numpy.genfromtxt more eagerly.
A few more examples of using f-strings for pretty-printing of results.
(…) to do (…) work in progress (…)
Numeric reference style; updated bibliography.
Reduce the file size of the screen-optimised PDF at the cost of a slight decrease of the quality of some figures.
The print-optimised PDF now uses selective rasterisation of parts of figures, not whole pages containing them. This should result in a much better quality of the printed version.
Minor extensions, including: pandas.Series.dt.strftime, more details how to avoid pitfalls in data frame indexing, etc.
First printed (paperback) version can be ordered from Amazon.
Fixed page margin and header sizes.
Minor typesetting and other fixes.
ISBN 978-0-6455719-1-2 assigned.
Handling tied observations.
Plots look better when printed in black and white.
Other minor extensions and material reordering: more aggregation functions, pandas.unique, pandas.factorize, probability vectors representing binary categorical variables, etc.
The Kolmogorov–Smirnov Test (one and two sample).
The Pearson Chi-Squared Test (one and two sample and for independence).
Dealing with round-off and measurement errors.
Adding white noise (jitter).
Matrices are iterable.
Matrix multiplication, dot products.
Euclidean distance, few-nearest-neighbour and fixed-radius search.
Aggregation of multidimensional data.
Regression with k-nearest neighbours.
Least squares fitting of linear regression models.
Geometric transforms; orthonormal matrices.
SVD and dimensionality reduction/PCA.
Classification with k-nearest neighbours.
Clustering with k-means.
Text Processing and Regular Expression chapters merged.
Unidimensional Data Aggregation and Transformation chapters merged.
pandas.GroupByobjects are iterable.
Argument unpacking and variadic arguments (
More lightweight mathematical notation.
Some equalities related to the mathematical functions we rely on (the natural logarithm, cosine, etc.).
A way to compute the most correlated pair of variables.
A note on modifying elements in an array and on adding new rows and columns.
An example seasonal plot in the time series chapter.
Solutions to the SQL exercises added; to ignore small round-off errors, use pandas.testing.assert_frame_equal instead of pandas.DataFrame.equals.
More details on file paths.
Many chapters merged or relocated.
Added captions to all figures.
Improved formatting of elements (information boxes such as note, important, exercise, example).
First public release – most chapters are drafted, more or less.
Using Sphinx for building.