Minimalist Data Wrangling with Python
Minimalist Data Wrangling with Python is envisaged as a student’s first introduction to data science, providing a high-level overview as well as discussing key concepts in detail. We explore methods for cleaning data gathered from different sources, transforming, selecting, and extracting features, performing exploratory data analysis and dimensionality reduction, identifying naturally occurring data clusters, modelling patterns in data, comparing data between groups, and reporting the results.
For many students around the world, educational resources are hardly affordable. Therefore, I have decided that this book should remain an independent, non-profit, open-access project (available both in PDF and HTML forms). Whilst, for some people, the presence of a “designer tag” from a major publisher might still be a proxy for quality, it is my hope that this publication will prove useful to those who seek knowledge for knowledge’s sake.
Please spread the news about it by sharing the above URL with your mates/peers/students. Thanks.
Also, consider citing this book as: Gagolewski M. (2022), Minimalist Data Wrangling with Python, Zenodo, Melbourne, DOI: 10.5281/zenodo.6451068, ISBN: 978-0-6455719-1-2, URL: https://datawranglingpy.gagolewski.com/.
A printed version (the same as the aforementioned PDF one) can be ordered from Amazon: AU CA DE ES FR IT JP NL PL SE UK US. Note that I get 0% revenue from sales (price = cost of printing + Amazon fee). If you know a vendor who can deliver the book to some geographic regions more cheaply, let me know.
Any bug/typo reports/fixes are appreciated. Although available online, this is a whole course, and should be read from the beginning to the end. In particular, refer to the Preface for general introductory remarks.
Copyright (C) 2022 by Marek Gagolewski. Some rights reserved.
This material is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License (CC BY-NC-ND 4.0).
- 1. Getting Started with Python
- 2. Scalar Types and Control Structures in Python
- 3. Sequential and Other Types in Python
- 4. Unidimensional Numeric Data and Their Empirical Distribution
- 5. Processing Unidimensional Data
- 6. Continuous Probability Distributions
- 7. Multidimensional Numeric Data at a Glance
- 8. Processing Multidimensional Data
- 9. Exploring Relationships Between Variables
- 10. Introducing Data Frames
- 11. Handling Categorical Data
- 12. Processing Data in Groups
- 13. Accessing Databases
- 14. Text Data
- 15. Missing, Censored, and Questionable Data
- 16. Time Series