\(\newcommand{\Cov}{\mathrm{Cov}}\) \(\newcommand{\Corr}{\mathrm{Corr}}\) \(\newcommand{\Sample}{X_{1},\dots,X_{n}}\)
The Method of Moments
Let \(\Sample\) be a random sample from a pmf or a pdf. For \(k=1,2,3,\dots\), the kth moment is \(\frac{1}{n}\sum_{i=1}^{n}X_{i}^{k}\). So the first moment is the mean.
If the pdf is \(f(x;\theta_{1},\dots,\theta_{m})\) then we sample \(\theta_{i}\) by equating the first \(m\) sample moments to the first \(m\) population moments and solving. These are called moment estimators.
Note that this method could lead to problems like a negative value for a parameter when the distribution requires a positive value.
Maximum Likelihood Estimation
This method is preferred when the sample size is large.
Let \(\Sample\) have a joint pdf: \(f(x_{1},x_{2},\dots,x_{n};\theta_{1},\dots,\theta_{m})\) where \(\theta_{i}\) are unknown. If \(x_{i}\) are observed values, the above is called the likelihood function. The maximum likelihood estimates are the values of \(\theta_{i}\) that maximize the likelihood function.
In words, find the set of parameters that maximizes the probability of observing this particular sample.
Note that these methods may yield biased estimators.
A common trick: Take \(\ln f(x)\) to assist in calculating the maximum. This works as \(f(x)\) is always positive.
To use the MLE, you need to know the underlying distribution.
Example
As an example, let \(\Sample\) form an exponential distribution. Then:
Differentiate with respect to \(\lambda\).
The MLE for:
Exponential MLE
The exponential MLE for \(\lambda\). is \(\frac{1}{\bar{x}}\)
Binomial MLE
The binomial MLE for \(p\) is \(x/n\) where \(x\) is the number of successes.
Normal MLE
For the normal MLE:
- \(\hat{\mu}=\bar{x}\)
- \(\hat{\sigma}^{2}=\sum\frac{\left(x_{i}-\mu\right)^{2}}{n}\):
- Note that this expression uses the unknown \(\mu\). We often just swap it for the sample mean.
- If using the sample mean, then this is clearly a biased estimator. It is, however, consistent (see below).
The Invariance Principle
Let \(\hat{\theta_{i}},\dots,\hat{\theta_{m}}\) be the mle’s of the parameters \(\theta_{i},\dots,\theta_{m}\). Then the mle of any function \(h(\theta_{1},\dots,\theta_{m})\) is \(h(\hat{\theta_{i}},\dots,\hat{\theta_{m}})\)
Note that in some distributions, the mle of the mean is not \(\bar{x}\).
A Desirable Property of the Maximum Likelihood Estimate
In general when \(n\) is large, the mle of \(\theta\) is close to the MVUE of \(\theta\) (i.e. even if biased, it is almost unbiased).
Some Complications
Occasionally, calculus will fail you when trying to calculate the MLE.
Also, you need to know the distribution.
Some problems will yield multiple solutions to the MLE problem. In other cases, there is no maximum.
Occasionally, you can get a nonsensical solution.
Consistency
An estimator \(\hat{\theta}\) is said to be consistent if \(\forall\epsilon>0,P(|\hat{\theta}-\theta|\ge\epsilon)\rightarrow 0\) as \(n\rightarrow\infty\).
\(\bar{X}\) is a consistent estimator for \(\mu\) if \(\sigma^{2}<\infty\). Sketch of proof: Use Chebyshev’s Inequality and note that \(\sigma_{\bar{x}}^{2}=\frac{\sigma^{2}}{n^{2}}\)
An MLE is consistent if:
- Identification. If the parameters for the distribution differ, then so does the distribution. In other words, no two sets of \(\{\theta_{i}\}\) result in the same distribution.
- Compactness of the parameter space. This ensures a unique maximum, as well as no limit existing that comes arbitrarily close to the maximum. This is not a necessary condition.
- Continuous for almost all values of \(x\).
- Dominance: There exists an integrable \(D(x)\) such that \(\ln f(x;\Theta)<D(x)\ \forall\theta\in\Theta\)