Introduction to Statistics

An introductory statistical data science book for STEM students.
Author

Dr. Cheng-Han Yu

Published

October 15, 2024

Welcome

This is the website for my introductory statistics book. Currently this book serves as a main reference book for my MATH 4720 Statistical Methods and MATH 4740 Biostatistical Methods classes at Marquette University.1 Some topics can also be discussed in an introductory data science, regression, or other applied statistics courses. You’ll learn basic probability and statistical concepts as well as data analysis techniques such as linear regression using R2 and Python.

The book balances the following aspects of statistics:

  • mathematical formulation and statistical computation
  • distribution-based and simulation-based inferences
  • methodology and applications
  • classical/frequentist and Bayesian approaches

The materials in this book are adapted and modified from several existing statistics books, course websites, notes, and tutorials for best teaching and learning experience at Marquette. Main reference books are:

License

This website is (and will always be) free to use, and is licensed under the Creative Commons Attribution-NonCommercial-NoDerivs 4.0 License. If you’d like to give back, please consider reporting a typo or leaving a pull request at github.com/chenghanyustats/introstatsbook.


  1. For some historical reason, the two courses are numbered as senior level courses. However, both are introductory courses mainly taken by undergraduate students in STEM fields.↩︎

  2. This book uses base R (the default R syntax) rather than tidyverse R, the meta package ecosystem I teach in my MATH/COSC 3570 Introduction to Data Science course. While tidyverse offers a unified approach to data science and is becoming increasingly popular, I was somewhat persuaded by Prof. Matloff’s TidyverseSkeptic post and believe there is value in learning and teaching base R in an introductory statistics course. To be clear, both base R and tidyverse are excellent tools (I use both in my research and teaching). Being proficient in both approaches can only enhance your data science skills.↩︎

  3. This book has been served as the official textbook for MATH 4720 for years. However, I feel that this book is more suitable for senior undergraduate students or graduate students.↩︎