Probstat/notes/hypothesis testing
- This is part of probstat.
Since it is very hard to obtain complete information of the population, we usually end up with a collection of much smaller sample data. A question arises: how can we be confident if the conclusion we make from the collected sample is correct or it is only by chance?
This section tries to answer this question.
เนื้อหา
A motivating example
You friend gives you a coin and claims that this coin is special. (It is unclear what special is about this coin.)
You want to prove it so you toss the coin for 20 times.
If you get 10 heads, do you believe your friend that the coin is special?
If you get 12 heads, do you believe your friend that the coin is special?
How about 15 heads? How about 18 heads? How about 20 heads?
Let's consider each case.
10: What is the probability that a normal coin turns up at least 10 heads from 20 coin tosses? 58% So that this does not show anything special about this coin.
12: What is the probability that a normal coin turns up at least 12 heads from 20 coin tosses? 25% So this coin might be a bit special?
15: What is the probability that a normal coin turns up at least 15 heads from 20 coin tosses? 2% This coin is special or I am very lucky.
18: What is the probability that a normal coin turns up at least 18 heads from 20 coin tosses? 0.02% This coin is special or I am extremely lucky.
20: What is the probability that a normal coin turns up at least 20 heads from 20 coin tosses? about 1 in a million. I definitely should believe that this coin is special.
Let's go back to our reasoning in the previous coin example.
We want to reject some belief, i.e., that the coin is normal. In this case, that the normality is in the probability of turning up head. So the hypothesis that we want to test (or reject) is the following:
- H0: "the probability that the coin turns up head is 0.5".
If the experimental result contradicts this hypothesis, we can reject it. However, note that it is impossible to completely contradict this hypothesis, even with a result that shows 1000 heads in 1000 coin tosses does not contradict this hypothesis because there is non-zero probability to obtain that result. Therefore, we are happy with a result which is "unlikely" enough. The degree of "unlikely" matters in our confidence in rejecting the hypothesis.
Consider this criteria:
- We shall reject , if after tossing the coin for 20 times, we get at least 18 heads.
We know that if the hypothesis is true, the probability that we reject it is at most 0.02%. Therefore, if we reject it under this assumption, it is extremely unlikely because of chance. The probability that we reject when it is actually true is the significant level of the test; in this case, the level of significant of the test is . (Note that if the significant level is very small, it means that if we reject , it is very significant.)
The null hypothesis
- See also the wikipedia article.
When we perform hypothesis testing, we usually start with a hypothesis that describe a "normal" situation, usually referred to as the null hypothesis. This hypothesis is there so that we can accept or reject it with experimental data.
In the previous example, the null hypothesis specifies that the head probability of the coin is 1/2. Let's consider another example. Suppose that we have know that on average students will get 80 points from the final exam for the probability class. In this semester, we try something different. We add another review section for each week and we would like to test if this review section improves the test score. Let denote the average score of students taking review sections. Our null hypothesis is
- H0: ,
which says that the review sections do not improve the score.
After we set up the null hypothesis, we should create a criteria for accept or reject the null hypothesis. For example, we may say that we shall reject the null hypothesis that the review section helps if the average score of students is greater than 90.
Error types
For any criteria for testing a null hypothesis, there usually are chances that we make a mistake. There are two types of errors:
- Type I error --- occurs when the hypothesis is true, but we reject it.
- Type II error --- occurs when the hypothesis is false, but we accept it.
The table below lists all four cases. (Shamelessly taken from [1].)
Null hypothesis (H0) is true | Null hypothesis (H0) is false | |
---|---|---|
Reject null hypothesis | Type I error False positive |
Correct outcome Negative |
Fail to reject null hypothesis | Correct outcome Positive |
Type II error False negative |
When we perform statistically hypothesis testing, we would like to check if the hypothesis is consistent with the observed data. Therefore we shall only reject the hypothesis if the data is far inconsistent with the hypothesis, i.e., we shall reject if the data is very improbable if we assume that is true. More specifically, we want the test to reject , when is true, with the probability at most some small value . Common values for is 0.1 (10%), 0.05 (5%), 0.01 (1%), or even 0.005 (0.5%). The value is called the level of significance of the test. Note that this value is also the the type I error of the test.
EX1: Suppose that you know that average height of Kasetsart University students is 170cm with variance of 10. You look at students of the faculty of engineering and think that maybe these students do not share the average height of KU students. Let denote the mean of the heights of engineering students. We assume that the heights are normally distributed and the variance is the same as KU students' heights, i.e., . Now, our null hypothesis is:
- H0: .
We shall take a sample of size 10. Let's design a test criteria with level of significance .
Our test will consider the sample mean and will reject if is far from 170. We will have to figure out how far the mean from 170 that we need.
First recall that the sample mean of a normal population is normally distributed with mean and variance . Therefore, the statistic
is unit normal. Let's refer to this statistic as .
If we look at the standard normal table, we find out that
After some calculation, if we reject when
our test will have the level of significance of 0.01 as required.
Notes: In this case, the population that we want to test is engineering students, not KU students.
Concerning the type II errors
From the previous example, if actually the mean , the hypothesis is incorrect but it will be extremely hard to reject . Therefore, in this case, the type II error will be very high. From this example, we can see that the type II error depends on how far the actual parameter from the one in the null hypothesis.
EX2: Consider the average height example. If the actual mean and we use the same test criteria, what is the type II error rate?
We incorrectly accept if . To simplify our analysis, let's approximate the error by assuming that we accept the null hypothesis when . (This will cause a very small error in our probability calculation.)
Since the population mean and the population variance , the probability that this happens is
This is equal to 0.5 - 0.498 = 0.002, because is unit normal. (See the table here.)
Tests concerning the mean of a normal population
In this section, we discuss how to design a test for the mean of a normal population. That is, we want to test the null hypothesis
where is some specified constant. We usually test against an alternative hypothesis .
When is known
When the variance of the population is known, we can design a test with specified level of significance using the calculation as in our previous example.
- Details will be added later.
When is unknown
The same reasoning we use when deriving the confidence interval of the sample mean when the population variance is unknown also works here. Recall that if we take a sample of size : , we can compute the statistics
and
Then, we have that
Given a complete description of the distribution of , we can design a hypothesis testing procedure as required.
EX3: Suppose that you know that average height of Kasetsart University students is 170cm. You look at students of the faculty of engineering and think that maybe these students do not share the average height of KU students. Let denote the mean of the heights of engineering students. We shall take a sample of size 10. Let's design a test criteria with level of significance for the following hypothesis
- H0: .
Let
Since the sample size is 10, we have that . Therefore, we shall look at the t-distribution table for 9 degrees of freedom. (See [2]) We have that
.
This implies that we shall reject when
Testing for the equivalence of the means
In this part, we have two populations. The first population is normal with mean with variance and the second population is also normal with mean with variance . We want to test
- H0: .
When the variances of both populations are known
In this section, we assume that the variances and are known.
Suppose that we take samples from the first population and from the second population. We shall compute the sample means and .
Note that if , then . Therefore we can write the hypothesis as
- H0: .
This new representation of the hypothesis suggests that we should reject the hypothesis if
is large.
Recall that
.
Therefore, we have that if is true, the random variable
is unit normal. Hence, a criteria for the hypothesis testing can be calculated using the standard normal table.
When the variances are not known
- To be added later...