Page:Sm all cc.pdf/56

From Wikisource
Jump to navigation Jump to search
This page has been proofread, but needs to be validated.
53

through the data, not connecting line segments.

  • If possible, transform one or both variables so that the relationship between them is linear (e.g., choose among linear, semilog, and log-log plots).

Individual scientific specialties routinely violate one or more of the hints above. Each specialty also uses arbitrary unstated conventions for some plotting options:

  • whether to frame the plot or just use one annotated line for each axis;
  • whether to use an internal grid or just marginal tics on the frame or lines;
  • whether to put tics on one or both sides, and whether to put them inside or outside the plot frame.

Extrapolation and Interpolation:

If a relationship has been established between variables X and Y, then one can predict the value of Yi at a possibly unmeasured value of Xi. The reliability of this prediction depends dramatically on where the new Xi is with respect to the locations of the Xi that established the relationship. Several rules of thumb apply to interpolation and extrapolation:

  • interpolation to an Xi location that is between closely spaced previous Xi is relatively safe,
  • interpolation between widely spaced previous Xi is somewhat hazardous,
  • extrapolation for a short distance (<20% of the range of the previous Xi) is somewhat hazardous,
  • extrapolation for a great distance is foolhardy, and
  • both interpolation and extrapolation are much more reliable when the relationship is based on independent data than when it is based on non-independent data such as a time series.

For example, when we saw the pattern of temporal changes in the U.S. deficit, the data appeared to fit a trend of increasing deficit rather well, so one should be able to extrapolate to 1991 fairly reliably. However, extrapolation ability is weaker for a time series than for independent events. As I am typing this, it is January 1991, the U.S. has just gone to war, Savings & Loans are dropping like flies, the U.S. is in a recession, and a deficit as small as the extrapolated value of 22% seems hopelessly optimistic. In contrast, when you read this, the U.S. budget hopefully is running a surplus.

As another example, we have already examined the changes with time of cigarette smoking among high school students, and we concluded that extrapolation from the two points of Figure 11a was foolhardy. With the data from Figure 11b, we might extrapolate beyond 1989 by perhaps 2-3 years and before 1975 by perhaps one year; the difference in confidence between these two extrapolations is due to the better-defined trend for 1983-1989 than for 1976-1980. Because these data are from a time series, any extrapolation is somewhat hazardous: if cigarette smoking were found in 1990 to be an aphrodisiac, the 1983-1989 pattern would immediately become an obsolete predictor of 1990 smoking rates. If there were such a thing as a class of 1986.5, then interpolation for the interval 1983-1989 would be very reliable (error <0.5%), because of extensive data coverage and small variance about the overall trend. In contrast, interpolation of a predicted value for some of the unsampled years in the interval 1975-1980 would have an error of at least 1%, partly because data spacing is larger but primarily because we are unsure how much of the apparent secular change