Parameters of the normal distribution

Normal distributions are a family. The specific members of the family are identified by two parameters: the mean and the standard deviation.

Thomas Kinzler email:thomas.kinzler@tccd.edu , Daniel Kaplan https://dtkaplan.github.io
2020-06-16

Alternative document formats: Word & PDF

Orientation

The bell-shaped distribution is often a good description of the distribution of a variable, so often that it’s called the “normal” distribution.

Actually, there is not just one normal distribution but a whole family of them, all of which have the characteristic bell shape. To identify an individual member of this bell family, two numbers suffice:

• the mean: the location of the peak of the bell
• the standard deviation: the width of the bell

What’s potentially confusing here is that you may know about the mean in terms of a calculation you do on a variable: add up the n values and divide by n. Similarly, the standard deviation is such a calculation. Amazingly enough, the numbers you get from these calculations identify the bell-shaped distribution that’s the best match to the distribution of the variable. And if the variable has a bell-like distibution itself, you’ll see that the distribution of the variable is often close to an exact match to the normal curve itself.

One consequence of this is that people often can make a pretty good estimate of the mean and standard deviation of a variable just by looking at its distribution, say with a histogram or a violin plot.

This lesson is about learning to eyeball the mean and standard deviation from a display of the distribution. Reasons to do this include having a better understanding of the mean and standard deviation and, importantly, being able to double-check calculations in order to avoid blunders. Even with a computer, it’s easy to make mistakes like reading the wrong number from a table of calculations.

Activity

Open up the Density Little App. (See footnote1). In the Data tab in the top toll bar, set the Source Package to Little Apps, and the Data set to NHANES2. Set the response variable to height_adults.

1. The graphic shows a traditional plot of the distribution of the response variable, called a density plot. If you’re familiar with a histogram, you might like to think about a density plot as a kind of smoothed histogram without the jagged, abrupt changes from bar to bar.

2. Click on Apps Control (the three parallel lines icon) Check the box next to Compare to a normal distribution with the same mean and variance to overlay a theoretical normal distribution. The distribution will be shown as a black, bell-shaped curve The location and width of a normal distribution are described by the mean and standard deviation. In picking the particular normal distribution to overlay, the mean and standard deviation have been set to those of the response variable. Click on the Graph tab in the upper tool bar to see a larger graph.

• Compare the theoretical distribution to the actual distribution. You’ll see that the actual values of height occur less frequently near the center than they would be for a theoretical normal distribution. The tails, both right and left, line up pretty well with the theoretical normal distribution.
• What ranges of height occur more frequently in the actual heights than in the theoretical distribution?
3. Several variables are listed below along with a description of the shape of the distribution.

• diastolic blood pressure is slightly left skew

• system blood pressure is slightly right skew

• age is flat and truncated to the left

• testosterone is bi-modal.

• bmi_adults is right skew, truncated to the left.

• Look at each of these variables and figure out what differences between the theoretical normal distribution and the actual data correspond to the various labels.

4. Look through the various data sets to find variables that are a good match to the normal distribution.

• Comment on whether most variables have a “normal” shape.

Version 0.3, 2020-08-13