Changelog#

Important

Any bug/typo reports/fixes are appreciated. The most up-to-date version of this book can be found at https://datawranglingpy.gagolewski.com/.

Below is the list of the most noteworthy changes.

  • under development (v1.0.3.9xxx):

    • New HTML theme (includes the light and dark mode).

    • Not using seaborn where it can easily be replaced by 1–3 calls to the lower-level matplotlib, especially in the numpy chapters. This way, we can learn how to create some popular charts from scratch. In particular, we are now using own functions to display a heat map and a pairs plot.

    • Use numpy.genfromtxt more eagerly.

    • A few more examples of using f-strings for results’ pretty-printing.

    • Bug fixes and loads of other minor extensions.

    • (…) to do (…) work in progress (…) more to come (…)

  • 2023-02-06 (v1.0.3):

    • Numeric reference style; updated bibliography.

    • Reduce the file size of the screen-optimised PDF at the cost of a slight decrease of the quality of some figures.

    • The print-optimised PDF now uses selective rasterisation of parts of figures, not whole pages containing them. This should increase the quality of the printed version of this book.

    • Bug fixes.

    • Minor extensions, including: pandas.Series.dt.strftime, more details how to avoid pitfalls in data frame indexing, etc.

  • 2022-08-24 (v1.0.2):

    • The first printed (paperback) version can be ordered from Amazon.

    • Fixed page margin and header sizes.

    • Minor typesetting and other fixes.

  • 2022-08-12 (v1.0.1):

    • Cover.

    • ISBN 978-0-6455719-1-2 assigned.

  • 2022-07-16 (v1.0.0):

    • Preface complete.

    • Handling tied observations.

    • Plots now look better when printed in black and white.

    • Exception handling.

    • File connections.

    • Other minor extensions and material reordering: more aggregation functions, pandas.unique, pandas.factorize, probability vectors representing binary categorical variables, etc.

    • Final proofreading and copyediting.

  • 2022-06-13 (v0.5.1):

    • The Kolmogorov–Smirnov Test (one and two sample).

    • The Pearson Chi-Squared Test (one and two sample and for independence).

    • Dealing with round-off and measurement errors.

    • Adding white noise (jitter).

    • Lambda expressions.

    • Matrices are iterable.

  • 2022-05-31 (v0.4.1):

    • The Rules.

    • Matrix multiplication, dot products.

    • Euclidean distance, few-nearest-neighbour and fixed-radius search.

    • Aggregation of multidimensional data.

    • Regression with k-nearest neighbours.

    • Least squares fitting of linear regression models.

    • Geometric transforms; orthonormal matrices.

    • SVD and dimensionality reduction/PCA.

    • Classification with k-nearest neighbours.

    • Clustering with k-means.

    • Text Processing and Regular Expression chapters merged.

    • Unidimensional Data Aggregation and Transformation chapters merged.

    • pandas.GroupBy objects are iterable.

    • Semitransparent histograms.

    • Contour plots.

    • Argument unpacking and variadic arguments (*args, **kwargs).

  • 2022-05-23 (v0.3.1):

    • More lightweight mathematical notation.

    • Some equalities related to the mathematical functions we rely on (the natural logarithm, cosine, etc.).

    • A way to compute the most correlated pair of variables.

    • A note on modifying elements in an array and on adding new rows and columns.

    • An example seasonal plot in the time series chapter.

    • Solutions to the SQL exercises added; to ignore small round-off errors, use pandas.testing.assert_frame_equal instead of pandas.DataFrame.equals.

    • More details on file paths.

  • 2022-04-12 (v0.2.1):

    • Many chapters merged or relocated.

    • Added captions to all figures.

    • Improved formatting of elements (information boxes such as note, important, exercise, example).

  • 2022-03-27 (v0.1.1):

    • The first public release: most chapters are drafted, more or less.

    • Using Sphinx for building.

  • 2022-01-05 (v0.0.0):

    • Project started.