variable | N | Mean | SD | VAR | Q1 | Median | Q3 | Min | Max |
---|---|---|---|---|---|---|---|---|---|
age | 1046 | 29.898 | 14.415 | 207.791 | 21.000 | 28.000 | 39.000 | 0 | 80.000 |
fare | 1308 | 33.295 | 51.759 | 2678.960 | 7.896 | 14.454 | 31.275 | 0 | 512.329 |
8 Summarizing numerical data
8.1 learning objectives
At the end of this chapter, you will know how to:
Calculate measures of central tendency
Calculate measures of dispersion
- Use these measures to summarize a numerical data.
8.2 Descriptive statistics
While summarizing categorical data is done with frequency and relative frequency tables, numerical data can be summarized using descriptive statistics divided into three groups:
- Measures of central tendency.
- Measures of dispersion.
- Measures of skewness.
In this chapter, we will focus on only the first two: measures of central tendency and measures of dispersion.
8.2.1 Measures of central tendency
Measures of central tendency help you summarize data by looking at the central point within it. While the mode (the value(s) with the highest frequency) is also considered a measure of central tendency, in this chapter we will only consider two: the average (or mean), and the median.
Statistic | description | formula | Excel function |
---|---|---|---|
Average or Mean | The sum of values divided by the number of observations | \[ \overline{X} = \frac{\sum{X}}{n} \] | =AVERAGE(x:x) |
Median | the middle value of a set of number once sorted in ascending or descending order | If n is odd: \[ M_x = x_\frac{n + 1}{2} \] If n is even: \[ M_x = \frac{x_{(n/2)} + x_{(n/2)+1}}{2} \] |
|
Table 8.1. Measures of central tendency
8.2.1.1 Demo
In the following video, I explain the how to manually calculate the average and median in a set of numerical data.
8.2.2 Measures of dispersion
Statistic | Definition | Formula | Excel formula |
---|---|---|---|
Variance (Var) | Expected squared deviation from the mean. Measures how far numbers spread around the average | \[ Var = \frac{\sum{(x_i-\overline{x})^2}}{N} \] | =VAR(x:x) |
Standard deviation (SD) | Square root of the Variance. | \[ SD = \sqrt{\frac{\sum{(x_i-\overline{x})^2}}{N}} \] | =STDEV(x) |
Range | Difference between the minimum and maximum values in the data. | N/A | Minimum value: Maximum value: Range: |
Interquartile range (IQR) | The difference between the first quartile (Q1) and the third quartile (Q3). It measures the spread of the middle 50% of the data. | N/A | Q1: Q3: IQR: |
Table 8.2. Measures of dispersion
8.2.2.1 Demo
In the following video, I explain the how to manually calculate the differente measures of dispersion in a set of numerical data.
8.3 Creating a descriptive statistics summary
Using the measures of central tendency and dispersion listed above, we can construct a descriptive statistics summary for any number of numerical variables in a dataset. Note that these summaries don’t typically display the interquartile range (IQR) and the range, but instead provide the actual values require to calculate them (i.e., Min, Q1, Q3, Max).
8.3.1 Demo
In the following video, I show you how to create descriptive statistics summary in Excel.
8.4 Conclusion
That’s it. Now you know how to summarize a numerical variable using measures of central tendency and dispersion!