8  Summarizing numerical data

8.1 learning objectives

At the end of this chapter, you will know how to:

  • Calculate measures of central tendency

  • Calculate measures of dispersion

  • Use these measures to summarize a numerical data.

8.2 Descriptive statistics

While summarizing categorical data is done with frequency and relative frequency tables, numerical data can be summarized using descriptive statistics divided into three groups:

  1. Measures of central tendency.
  2. Measures of dispersion.
  3. Measures of skewness.

In this chapter, we will focus on only the first two: measures of central tendency and measures of dispersion.

8.2.1 Measures of central tendency

Measures of central tendency help you summarize data by looking at the central point within it. While the mode (the value(s) with the highest frequency) is also considered a measure of central tendency, in this chapter we will only consider two: the average (or mean), and the median.

Statistic description formula Excel function
Average or Mean The sum of values divided by the number of observations \[ \overline{X} = \frac{\sum{X}}{n} \] =AVERAGE(x:x)
Median the middle value of a set of number once sorted in ascending or descending order

If n is odd:

\[ M_x = x_\frac{n + 1}{2} \]

If n is even:

\[ M_x = \frac{x_{(n/2)} + x_{(n/2)+1}}{2} \]

=MEDIAN(x:x)

=PERCENTILE(x:x,0.50)

=QUARTILE(x:x,2)

Table 8.1. Measures of central tendency

8.2.1.1 Demo

In the following video, I explain the how to manually calculate the average and median in a set of numerical data.

8.2.2 Measures of dispersion

Statistic Definition Formula Excel formula
Variance (Var) Expected squared deviation from the mean. Measures how far numbers spread around the average \[ Var = \frac{\sum{(x_i-\overline{x})^2}}{N} \] =VAR(x:x)
Standard deviation (SD) Square root of the Variance. \[ SD = \sqrt{\frac{\sum{(x_i-\overline{x})^2}}{N}} \] =STDEV(x)
Range Difference between the minimum and maximum values in the data. N/A

Minimum value: =MIN(x:x)

Maximum value: =MAX(x:x)

Range: =MAX(x:x)-MIN(x:x)

Interquartile range (IQR) The difference between the first quartile (Q1) and the third quartile (Q3). It measures the spread of the middle 50% of the data. N/A

Q1: =QUARTILE(x:x,1)

Q3: =QUARTILE(x:x,3)

IQR: =Q3-Q1

Table 8.2. Measures of dispersion

8.2.2.1 Demo

In the following video, I explain the how to manually calculate the differente measures of dispersion in a set of numerical data.

8.3 Creating a descriptive statistics summary

Using the measures of central tendency and dispersion listed above, we can construct a descriptive statistics summary for any number of numerical variables in a dataset. Note that these summaries don’t typically display the interquartile range (IQR) and the range, but instead provide the actual values require to calculate them (i.e., Min, Q1, Q3, Max).

Table 8.3. Descriptive statistics summary of age and fare in the Titanic dataset.
variable N Mean SD VAR Q1 Median Q3 Min Max
age 1046 29.898 14.415 207.791 21.000 28.000 39.000 0 80.000
fare 1308 33.295 51.759 2678.960 7.896 14.454 31.275 0 512.329

8.3.1 Demo

In the following video, I show you how to create descriptive statistics summary in Excel.

8.4 Conclusion

That’s it. Now you know how to summarize a numerical variable using measures of central tendency and dispersion!