Probstat/notes/hypothesis testing

This is part of probstat.

Since it is very hard to obtain complete information of the population, we usually end up with a collection of much smaller sample data. A question arises: how can we be confident if the conclusion we make from the collected sample is correct or it is only by chance?

This section tries to answer this question.

เนื้อหา

1 A motivating example
2 The null hypothesis
- 2.1 Error types
- 2.2 Concerning the type II errors
3 Tests concerning the mean of a normal population
- 3.1 When $\sigma ^{2}$ is known
- 3.2 When $\sigma ^{2}$ is unknown
4 Testing for the equivalence of the means
- 4.1 When the variances of both populations are known
- 4.2 When the variances are not known

A motivating example

You friend gives you a coin and claims that this coin is special. (It is unclear what special is about this coin.)

You want to prove it so you toss the coin for 20 times.

If you get 10 heads, do you believe your friend that the coin is special?

If you get 12 heads, do you believe your friend that the coin is special?

How about 15 heads? How about 18 heads? How about 20 heads?

Let's consider each case.

10: What is the probability that a normal coin turns up at least 10 heads from 20 coin tosses? 58% So that this does not show anything special about this coin.

12: What is the probability that a normal coin turns up at least 12 heads from 20 coin tosses? 25% So this coin might be a bit special?

15: What is the probability that a normal coin turns up at least 15 heads from 20 coin tosses? 2% This coin is special or I am very lucky.

18: What is the probability that a normal coin turns up at least 18 heads from 20 coin tosses? 0.02% This coin is special or I am extremely lucky.

20: What is the probability that a normal coin turns up at least 20 heads from 20 coin tosses? about 1 in a million. I definitely should believe that this coin is special.

Let's go back to our reasoning in the previous coin example.

We want to reject some belief, i.e., that the coin is normal. In this case, that the normality is in the probability of turning up head. So the hypothesis that we want to test (or reject) is the following:

H₀: "the probability that the coin turns up head is 0.5".

If the experimental result contradicts this hypothesis, we can reject it. However, note that it is impossible to completely contradict this hypothesis, even with a result that shows 1000 heads in 1000 coin tosses does not contradict this hypothesis because there is non-zero probability to obtain that result. Therefore, we are happy with a result which is "unlikely" enough. The degree of "unlikely" matters in our confidence in rejecting the hypothesis.

Consider this criteria:

We shall reject

H_{0}

, if after tossing the coin for 20 times, we get at least 18 heads.

We know that if the hypothesis $H_{0}$ is true, the probability that we reject it is at most 0.02%. Therefore, if we reject it under this assumption, it is extremely unlikely because of chance. The probability that we reject $H_{0}$ when it is actually true is the significant level of the test; in this case, the level of significant of the test is $\alpha =0.0002$ . (Note that if the significant level is very small, it means that if we reject $H_{0}$ , it is very significant.)

The null hypothesis

See also the wikipedia article.

When we perform hypothesis testing, we usually start with a hypothesis that describe a "normal" situation, usually referred to as the null hypothesis. This hypothesis is there so that we can accept or reject it with experimental data.

In the previous example, the null hypothesis specifies that the head probability of the coin is 1/2. Let's consider another example. Suppose that we have know that on average students will get 80 points from the final exam for the probability class. In this semester, we try something different. We add another review section for each week and we would like to test if this review section improves the test score. Let $\mu$ denote the average score of students taking review sections. Our null hypothesis is

H₀:

\mu \leq 80

,

which says that the review sections do not improve the score.

After we set up the null hypothesis, we should create a criteria for accept or reject the null hypothesis. For example, we may say that we shall reject the null hypothesis that the review section helps if the average score ${\bar {X}}$ of $n$ students is greater than 90.

Error types

See also wikipedia article on Type I and II errors

For any criteria for testing a null hypothesis, there usually are chances that we make a mistake. There are two types of errors:

Type I error --- occurs when the hypothesis is true, but we reject it.
Type II error --- occurs when the hypothesis is false, but we accept it.

The table below lists all four cases. (Shamelessly taken from [1].)

	Null hypothesis (H₀) is true	Null hypothesis (H₀) is false
Reject null hypothesis	Type I error False positive	Correct outcome Negative
Fail to reject null hypothesis	Correct outcome Positive	Type II error False negative

When we perform statistically hypothesis testing, we would like to check if the hypothesis is consistent with the observed data. Therefore we shall only reject the hypothesis if the data is far inconsistent with the hypothesis, i.e., we shall reject $H_{0}$ if the data is very improbable if we assume that $H_{0}$ is true. More specifically, we want the test to reject $H_{0}$ , when $H_{0}$ is true, with the probability at most some small value $\alpha$ . Common values for $\alpha$ is 0.1 (10%), 0.05 (5%), 0.01 (1%), or even 0.005 (0.5%). The value $\alpha$ is called the level of significance of the test. Note that this value $\alpha$ is also the the type I error of the test.

EX1: Suppose that you know that average height of Kasetsart University students is 170cm with variance of 10. You look at students of the faculty of engineering and think that maybe these students do not share the average height of KU students. Let $\mu$ denote the mean of the heights of engineering students. We assume that the heights are normally distributed and the variance is the same as KU students' heights, i.e., $\sigma ^{2}=150$ . Now, our null hypothesis is:

H₀:

\mu =170

.

We shall take a sample of size 10. Let's design a test criteria with level of significance $\alpha =0.01$ .

Our test will consider the sample mean ${\bar {X}}=(X_{1}+\cdots +X_{10})/10$ and will reject $H_{0}$ if ${\bar {X}}$ is far from 170. We will have to figure out how far the mean from 170 that we need.

First recall that the sample mean of a normal population is normally distributed with mean $\mu$ and variance $\sigma ^{2}/10=15$ . Therefore, the statistic

${\frac {{\bar {X}}-\mu }{\sqrt {15}}}$

is unit normal. Let's refer to this statistic as $Z=({\bar {X}}-\mu )/{\sqrt {15}}$ .

If we look at the standard normal table, we find out that

$P\{|Z|>2.58\}<0.01$

After some calculation, if we reject $H_{0}$ when

$|{\bar {X}}-170|>2.58\cdot {\sqrt {15}}=9.992,$

our test will have the level of significance of 0.01 as required.

Notes: In this case, the population that we want to test $H_{0}$ is engineering students, not KU students.

Concerning the type II errors

From the previous example, if actually the mean $\mu =170.001$ , the hypothesis is incorrect but it will be extremely hard to reject $H_{0}$ . Therefore, in this case, the type II error will be very high. From this example, we can see that the type II error depends on how far the actual parameter from the one in the null hypothesis.

EX2: Consider the average height example. If the actual mean $\mu =190$ and we use the same test criteria, what is the type II error rate?

We incorrectly accept $H_{0}$ if $|{\bar {X}}-170|\leq 9.992$ . To simplify our analysis, let's approximate the error by assuming that we accept the null hypothesis when ${\bar {X}}\leq 179.992$ . (This will cause a very small error in our probability calculation.)

Since the population mean $\mu =190$ and the population variance $\sigma ^{2}=150$ , the probability that this happens is

${\begin{array}{rcl}P\{{\bar {X}}\leq 179.992\}&=&P\{{\bar {X}}-190\leq -10.008\}\\&=&P\{({\bar {X}}-190)/(\sigma /{\sqrt {n}})\leq -10.008/(\sigma /{\sqrt {n}})\}\\&=&P\{({\bar {X}}-190)/(\sigma /{\sqrt {n}})\leq -10.008/(\sigma /{\sqrt {n}})\}\\&=&P\{({\bar {X}}-190)/(\sigma /{\sqrt {n}})\leq -2.584\}\\\end{array}}$

This is equal to 0.5 - 0.498 = 0.002, because $({\bar {X}}-190)/(\sigma /{\sqrt {n}})$ is unit normal. (See the table here.)

Tests concerning the mean of a normal population

In this section, we discuss how to design a test for the mean of a normal population. That is, we want to test the null hypothesis

$H_{0}:\mu =\mu _{0},$

where $\mu _{0}$ is some specified constant. We usually test $H_{0}$ against an alternative hypothesis $H_{1}:\mu \neq \mu _{0}$ .

When $\sigma ^{2}$ is known

When the variance of the population $\sigma ^{2}$ is known, we can design a test with specified level of significance using the calculation as in our previous example.

Details will be added later.

When $\sigma ^{2}$ is unknown

The same reasoning we use when deriving the confidence interval of the sample mean when the population variance $\sigma ^{2}$ is unknown also works here. Recall that if we take a sample of size $n$ : $X_{1},X_{2},\ldots ,X_{n}$ , we can compute the statistics

${\bar {X}}={\frac {X_{1}+X_{2}+\cdots +X_{n}}{n}},$

and

$S^{2}={\frac {\sum _{i=1}^{n}(X_{i}-{\bar {X}})^{2}}{n-1}}.$

Then, we have that

${\frac {{\bar {X}}-\mu }{S/{\sqrt {n}}}}\sim t_{n-1}.$

Given a complete description of the distribution of ${\bar {X}}$ , we can design a hypothesis testing procedure as required.

EX3: Suppose that you know that average height of Kasetsart University students is 170cm. You look at students of the faculty of engineering and think that maybe these students do not share the average height of KU students. Let $\mu$ denote the mean of the heights of engineering students. We shall take a sample of size 10. Let's design a test criteria with level of significance $\alpha =0.01$ for the following hypothesis

H₀:

\mu =170

.

Let

$T={\frac {{\bar {X}}-\mu }{S/{\sqrt {n}}}}$

Since the sample size is 10, we have that $T\sim t_{9}$ . Therefore, we shall look at the t-distribution table for 9 degrees of freedom. (See [2]) We have that

$P\{|T|>3.250\}<0.01$ .

This implies that we shall reject $H_{0}$ when

$\left|{\frac {{\bar {X}}-170}{S/{\sqrt {n}}}}\right|>3.250.$

Testing for the equivalence of the means

In this part, we have two populations. The first population is normal with mean $\mu _{x}$ with variance $\sigma _{x}^{2}$ and the second population is also normal with mean $\mu _{y}$ with variance $\sigma _{y}^{2}$ . We want to test

H₀:

\mu _{x}=\mu _{y}

.

When the variances of both populations are known

In this section, we assume that the variances $\sigma _{x}^{2}$ and $\sigma _{y}^{2}$ are known.

Suppose that we take samples $X_{1},X_{2},\ldots ,X_{n}$ from the first population and $Y_{1},Y_{2},\ldots ,Y_{m}$ from the second population. We shall compute the sample means ${\bar {X}}$ and ${\bar {Y}}$ .

Note that if $\mu _{x}=\mu _{y}$ , then $\mu _{x}-\mu _{y}=0$ . Therefore we can write the hypothesis as

H₀:

\mu _{x}-\mu _{y}=0

.

This new representation of the hypothesis suggests that we should reject the hypothesis if

$|{\bar {X}}-{\bar {Y}}|$

is large.

Recall that

${\bar {X}}-{\bar {Y}}\sim Normal(\mu _{x}-\mu _{y},{\frac {\sigma _{x}^{2}}{n}}+{\frac {\sigma _{y}^{2}}{m}})$ .

Therefore, we have that if $H_{0}$ is true, the random variable

${\frac {{\bar {X}}-{\bar {Y}}}{\sqrt {{\frac {\sigma _{x}^{2}}{n}}+{\frac {\sigma _{y}^{2}}{m}}}}}$

is unit normal. Hence, a criteria for the hypothesis testing can be calculated using the standard normal table.

When the variances are not known

To be added later...

Probstat/notes/hypothesis testing

เนื้อหา

A motivating example