Probstat/notes/t-distributions

จาก Theory Wiki
ไปยังการนำทาง ไปยังการค้นหา
This is part of probstat.

In many applications including computing confidence intervals and also hypothesis testing, we need to know the distribution of the sample mean. Let be a sample mean of a normal population computed from a sample of size . If the variance of the population is known, we have that

However, in most cases, we do not know and we only have the sample variance

.

Therefore, we would like to use the sample standard deviation instead of the real standard deviation . We hope that

will be close to normal. This is indeed true. Although is not normal, it is t-distributed with degrees of freedom. (See definition below.) Therefore, we can use the table for the t-distribution to compute probabilities for various events related to this statistic. Note that it is slightly harder to use than the standard normal table, because there are two parameters, i.e., the degrees of freedom and the level of confidence that you want.

We will show example usages for the t-distributions and then discuss the mechanics behind this usage of the t-distribution.

EX1: We take a sample of size 10 and calculate the sample mean and the sample variance . Find the confidence interval with 80% confidence level.

Solution: Since the sample size is 10, the degrees of freedom to use is 9. We look at the t-table and find out that

,

where is a t-distributed random variable with 9 degrees of freedom.

Since , we have that

If we put in the actual values of and we have that the confidence interval with 80% confidence level is

Comparison with the case where is known. As a comparison, suppose that the real variance is actually 5. Using the standard normal table, our confidence interval will be smaller. That is, we have that

Plugging in the values of and we have that the confidence interval with 80% confidence level is

which is slightly smaller the previous interval.

Student's t-Distribution

Note that our goal

Let's recall two important facts about sample means and sample variances. First, and further more, and are independent. These imply that the above quantity is a quotient of a unit normal random variable and a square root of a chi-squared random variable. This is exactly what a t-distributed random variable is.

Definition: If is a unit normal random variable and is a chi-squared with degrees of freedom, such that and are independent, we call a random variable

a t-distributed random variable with degrees of freedom.

To see that is t-distributed with degrees of freedom, we calculate

and note that the form of the quantity matches the definition of the t-distributed random variable where and .

The t-distribution is a flatten-out version of the normal distribution (see pdf below). It gets closer and closer to the normal distribution as the degrees of freedom increase. See wikipedia's article for more graphs and information.

Student t pdf.svg (Image from [1])

Links