Sample Percentiles
Calculating sample percentiles is challenging. What is the 23rd percentile of 10 points?
One rule: Order the \(n\) samples from smallest to largest. Then the \(i\)-th smallest value is \(100\left(\frac{i-0.5}{n}\right)\) th sample percentile. The intermediate percentiles are linearly interpolated.
Probability Plot
The purpose of a probability plot is to determine if the sample matches a given distribution. The points in the plot are \((x,y)\) where \(x\) is the percentile of the distribution (using the formula in the previous section) and \(y\) is the \(i\) th smallest sample observed.
If the sample matches the distribution, the points will form a line at a 45 degree angle.
But what if we want to check against a family of distributions? Usually, if the curve is not straight, no set of parameters will work. And if it is straight, you need to find the parameters that will make it 45 degrees.
For a standard normal distribution, the slope is \(\sigma\) and the intercept is \(\mu\).
Consider the following non-normal distributions:
- Symmetric with light tails
- Symmetric with heavy tails
- Skewed
When plotting against a normal distribution, the first will give an S shaped curve - mostly straight in the middle, but deviating at the extremes. For 2, it is S shaped, but curved in the opposite direction. If it is skewed, the curve will be on the same side of the line at both extremes.
How do we know if the line is straight? Sampling variability ensures it never will be really straight. For \(n<30\), deviations from a straight line is not evidence of much. For \(n>>30\), you’ll get close to a linear relationship.
Beyond Normality
In a distribution with the parameters \(\theta_{1}\) and \(\theta_{2}\) as location and scale parameters, plot against the standard distribution: \(\theta_{1}=0,\theta_{2}=1\).