Changelog¶
Important
Any bug/typo reports/fixes are appreciated. The most up-to-date version of this book can be found at https://datawranglingpy.gagolewski.com/.
Below is the list of the most noteworthy changes.
under development (v1.0.3.9xxx):
New HTML theme (includes the light and dark mode).
Not using seaborn where it can easily be replaced by 1–3 calls to the lower-level matplotlib, especially in the numpy chapters. This way, we can learn how to create some popular charts from scratch. In particular, we are now using own functions to display a heat map and a pairs plot.
Use numpy.genfromtxt more eagerly.
A few more examples of using f-strings for results’ pretty-printing.
Bug fixes and loads of other minor extensions.
(…) Updated to Python 3.x, numpy 2.x, pandas 2.x (…)
(…) to do (…) work in progress (…) more to come (…)
2023-02-06 (v1.0.3):
Numeric reference style; updated bibliography.
Reduce the file size of the screen-optimised PDF at the cost of a slight decrease of the quality of some figures.
The print-optimised PDF now uses selective rasterisation of parts of figures, not whole pages containing them. This should increase the quality of the printed version of this book.
Bug fixes.
Minor extensions, including: pandas.Series.dt.strftime, more details how to avoid pitfalls in data frame indexing, etc.
2022-08-24 (v1.0.2):
The first printed (paperback) version can be ordered from Amazon.
Fixed page margin and header sizes.
Minor typesetting and other fixes.
2022-08-12 (v1.0.1):
Cover.
ISBN 978-0-6455719-1-2 assigned.
2022-07-16 (v1.0.0):
Preface complete.
Handling tied observations.
Plots now look better when printed in black and white.
Exception handling.
File connections.
Other minor extensions and material reordering: more aggregation functions, pandas.unique, pandas.factorize, probability vectors representing binary categorical variables, etc.
Final proofreading and copyediting.
2022-06-13 (v0.5.1):
The Kolmogorov–Smirnov Test (one and two sample).
The Pearson Chi-Squared Test (one and two sample and for independence).
Dealing with round-off and measurement errors.
Adding white noise (jitter).
Lambda expressions.
Matrices are iterable.
2022-05-31 (v0.4.1):
The Rules.
Matrix multiplication, dot products.
Euclidean distance, few-nearest-neighbour and fixed-radius search.
Aggregation of multidimensional data.
Regression with k-nearest neighbours.
Least squares fitting of linear regression models.
Geometric transforms; orthonormal matrices.
SVD and dimensionality reduction/PCA.
Classification with k-nearest neighbours.
Clustering with k-means.
Text Processing and Regular Expression chapters merged.
Unidimensional Data Aggregation and Transformation chapters merged.
pandas.GroupBy
objects are iterable.Semitransparent histograms.
Contour plots.
Argument unpacking and variadic arguments (
*args
,**kwargs
).
2022-05-23 (v0.3.1):
More lightweight mathematical notation.
Some equalities related to the mathematical functions we rely on (the natural logarithm, cosine, etc.).
A way to compute the most correlated pair of variables.
A note on modifying elements in an array and on adding new rows and columns.
An example seasonal plot in the time series chapter.
Solutions to the SQL exercises added; to ignore small round-off errors, use pandas.testing.assert_frame_equal instead of pandas.DataFrame.equals.
More details on file paths.
2022-04-12 (v0.2.1):
Many chapters merged or relocated.
Added captions to all figures.
Improved formatting of elements (information boxes such as note, important, exercise, example).
2022-03-27 (v0.1.1):
The first public release: most chapters are drafted, more or less.
Using Sphinx for building.
2022-01-05 (v0.0.0):
Project started.