Changelog#
Important
Any bug/typos reports/fixes are appreciated.
Note that the most up-to-date version of this book can be found at https://datawranglingpy.gagolewski.com/.
Below is the list of the most noteworthy changes.
under development (v1.0.3.9xxx):
New HTML theme (featuring light and dark mode).
Not using seaborn where it can easily be replaced by 1–3 calls to the lower-level matplotlib, especially in the numpy chapters.
Use numpy.genfromtxt more eagerly.
A few more examples of using f-strings for pretty-printing of results.
(…) to do (…) work in progress (…)
2023-02-06 (v1.0.3):
Numeric reference style; updated bibliography.
Reduce the file size of the screen-optimised PDF at the cost of a slight decrease of the quality of some figures.
The print-optimised PDF now uses selective rasterisation of parts of figures, not whole pages containing them. This should result in a much better quality of the printed version.
Bug fixes.
Minor extensions, including: pandas.Series.dt.strftime, more details how to avoid pitfalls in data frame indexing, etc.
2022-08-24 (v1.0.2):
First printed (paperback) version can be ordered from Amazon.
Fixed page margin and header sizes.
Minor typesetting and other fixes.
2022-08-12 (v1.0.1):
Cover.
ISBN 978-0-6455719-1-2 assigned.
2022-07-16 (v1.0.0):
Preface complete.
Handling tied observations.
Plots look better when printed in black and white.
Exception handling.
File connections.
Other minor extensions and material reordering: more aggregation functions, pandas.unique, pandas.factorize, probability vectors representing binary categorical variables, etc.
Final proof-reading.
2022-06-13 (v0.5.1):
The Kolmogorov–Smirnov Test (one and two sample).
The Pearson Chi-Squared Test (one and two sample and for independence).
Dealing with round-off and measurement errors.
Adding white noise (jitter).
Lambda expressions.
Matrices are iterable.
2022-05-31 (v0.4.1):
The Rules.
Matrix multiplication, dot products.
Euclidean distance, few-nearest-neighbour and fixed-radius search.
Aggregation of multidimensional data.
Regression with k-nearest neighbours.
Least squares fitting of linear regression models.
Geometric transforms; orthonormal matrices.
SVD and dimensionality reduction/PCA.
Classification with k-nearest neighbours.
Clustering with k-means.
Text Processing and Regular Expression chapters merged.
Unidimensional Data Aggregation and Transformation chapters merged.
pandas.GroupBy
objects are iterable.Semitransparent histograms.
Contour plots.
Argument unpacking and variadic arguments (
*args
,**kwargs
).
2022-05-23 (v0.3.1):
More lightweight mathematical notation.
Some equalities related to the mathematical functions we rely on (the natural logarithm, cosine, etc.).
A way to compute the most correlated pair of variables.
A note on modifying elements in an array and on adding new rows and columns.
An example seasonal plot in the time series chapter.
Solutions to the SQL exercises added; to ignore small round-off errors, use pandas.testing.assert_frame_equal instead of pandas.DataFrame.equals.
More details on file paths.
2022-04-12 (v0.2.1):
Many chapters merged or relocated.
Added captions to all figures.
Improved formatting of elements (information boxes such as note, important, exercise, example).
2022-03-27 (v0.1.1):
First public release – most chapters are drafted, more or less.
Using Sphinx for building.
2022-01-05 (v0.0.0):
Project started.