Introduces the distinction between quantitative and categorical variables through their very different appearances in a point plot.
Data can be many things, but one of the most common formats is a data frame, a kind of spreadsheet of rows and columns. We’ll work with the data frame (or data set)
Natality_2014 in the Source pakage
Little Apps, which is based on data published by the US Centers for Disease Control.
Natality_2014 has 100,000 rows. Each row reports a live birth in the US in 2014. There are dozens of variables, a few of which are shown below.
It’s hard to draw much of a conclusion by looking directly at a large data frame. But a graphical display of data can help.
A point plot1 is a basic statistical graphic that displays two variables from a data frame. One variable is represented on the vertical axis, another variable on the horizontal axis. Like the following point plot of the baby’s weight (in grams) (
dbwt) and the length (in weeks) of the pregnancy (
Referring to the graph in the previous section …
Set the Source package to
Little Apps, data set to
dbwt as the response variable and
combgest as the explanatory variable. The resulting plot should look much like the graph seen in the introduction to this lesson. Change the sample size to \(n = 5\) by cliciking on the n=50 icon and choosing n=5. Click on the Graph tab in the top tool bar to see a larger graph.
In the “Data” tab in the top tool bar you will see the graph and the data that is in the plot, in data-frame format.
Change the explanatory variable to
For each of the \(n=5\) rows of the data frame displayed in the Data tab, find the corresponding point in the graphic.
Change \(n\) to 500. In the
sex graph, all the points are lined up in two columns.
Explain why. . . .
Version 0.3, 2020-08-13