Statistics and Data

What is Statistics? What is Data Science? What are data? I bet you’ve heard about data science quite often these days. In fact, data science is a quite new buzzword that was not that popular ten years ago. Even now I don’t think there is a formal and clear definition of data science. In my opinion, data science is such a broad area and subject that anything techniques related to data could be viewed as a part of data science, and this book is by no means a comprehensive data science book covering every aspect of dealing with data.

While data science is hot and fancy, statistics is a dull and old word that has been used for centuries. Believe or not, by the 18th century the term statistics is used to describe the systematic collection of demographic and economic data by state 1, and in mathematics, statistics, or be more formally statistical inference 2, is the process of using data analysis to infer properties of a population. Without doubt statistics plays an important role in lots of aspects of data no matter what data science is and how data science evolves.

In this Statistics and Data Part, we define statistics, data, and learn about the computing software we will be using for the rest of the book. When I was a college student, I learned statistics with some paper and pens, doing all the calculations by hand or a calculator. We won’t do that anymore and you should not because every company or institution is doing statistical analysis using computer software, whatever that is.


  1. https://en.wikipedia.org/wiki/History_of_statistics↩︎

  2. https://en.wikipedia.org/wiki/Statistical_inference↩︎