Probability Plots

Posted by Beetle B. on Tue 06 June 2017

Sample Percentiles

Calculating sample percentiles is challenging. What is the 23rd percentile of 10 points?

One rule: Order the \(n\) samples from smallest to largest. Then the \(i\)-th smallest value is \(100\left(\frac{i-0.5}{n}\right)\) th sample percentile. The intermediate percentiles are linearly interpolated.

Probability Plot

The purpose of a probability plot is to determine if the sample matches a given distribution. The points in the plot are \((x,y)\) where \(x\) is the percentile of the distribution (using the formula in the previous section) and \(y\) is the \(i\) th smallest sample observed.

If the sample matches the distribution, the points will form a line at a 45 degree angle.

But what if we want to check against a family of distributions? Usually, if the curve is not straight, no set of parameters will work. And if it is straight, you need to find the parameters that will make it 45 degrees.

For a standard normal distribution, the slope is \(\sigma\) and the intercept is \(\mu\).

Consider the following non-normal distributions:

  1. Symmetric with light tails
  2. Symmetric with heavy tails
  3. Skewed

When plotting against a normal distribution, the first will give an S shaped curve - mostly straight in the middle, but deviating at the extremes. For 2, it is S shaped, but curved in the opposite direction. If it is skewed, the curve will be on the same side of the line at both extremes.

How do we know if the line is straight? Sampling variability ensures it never will be really straight. For \(n<30\), deviations from a straight line is not evidence of much. For \(n>>30\), you’ll get close to a linear relationship.

Beyond Normality

In a distribution with the parameters \(\theta_{1}\) and \(\theta_{2}\) as location and scale parameters, plot against the standard distribution: \(\theta_{1}=0,\theta_{2}=1\).