Minimalist Data Wrangling with Python#
Minimalist Data Wrangling with Python by Marek Gagolewski is envisaged as a student’s first introduction to data science, providing a high-level overview as well as discussing key concepts in detail. We explore methods for cleaning data gathered from different sources, transforming, selecting, and extracting features, performing exploratory data analysis and dimensionality reduction, identifying naturally occurring data clusters, modelling patterns in data, comparing data between groups, and reporting the results.
Although available online, this is a whole course. It should be read from the beginning to the end. In particular, refer to the Preface for general introductory remarks.
For many students around the world, educational resources are hardly affordable. Therefore, I have decided that this book should remain an independent, non-profit, open-access project (available both in PDF and HTML forms). Whilst, for some people, the presence of a “designer tag” from a major publisher might still be a proxy for quality, it is my hope that this publication will prove useful to those who seek knowledge for knowledge’s sake.
Please spread the news about it by sharing the above URLs with your mates, peers, or students. Thank you.
Also, check out my other book, Deep R Programming .
Any bug/typo reports/fixes are appreciated: please submit them via this project’s GitHub repository.
Consider citing this book as: Gagolewski M. (2023), Minimalist Data Wrangling with Python, Zenodo, Melbourne, DOI: 10.5281/zenodo.6451068, ISBN: 978-0-6455719-1-2, URL: https://datawranglingpy.gagolewski.com/.
A printed version can be ordered from Amazon: AU CA DE ES FR IT JP NL PL SE UK US. Note that I get 0% revenue from sales (price = cost of printing + Amazon fee). Let me know if you find a vendor who can deliver the book to some geographic regions more cheaply.
Copyright (C) 2022–2023 by Marek Gagolewski. Some rights reserved.
This material is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License (CC BY-NC-ND 4.0).
- 1. Getting started with Python
- 2. Scalar types and control structures in Python
- 3. Sequential and other types in Python
- 4. Unidimensional numeric data and their empirical distribution
- 5. Processing unidimensional data
- 6. Continuous probability distributions
- 7. Multidimensional numeric data at a glance
- 8. Processing multidimensional data
- 9. Exploring relationships between variables
- 10. Introducing data frames
- 11. Handling categorical data
- 12. Processing data in groups
- 13. Accessing databases
- 14. Text data
- 15. Missing, censored, and questionable data
- 16. Time series