ผลต่างระหว่างรุ่นของ "Probstat/notes/confidence intervals"
Jittat (คุย | มีส่วนร่วม) |
Jittat (คุย | มีส่วนร่วม) |
||
(ไม่แสดง 23 รุ่นระหว่างกลางโดยผู้ใช้คนเดียวกัน) | |||
แถว 3: | แถว 3: | ||
Suppose that we take a sample of size <math>n</math>, <math>X_1,X_2,\ldots,X_n</math> from a population which is normally distributed. Also suppose that the population has mean <math>\mu</math> and variance <math>\sigma^2</math>. In this section, we assume that we do not know <math>\mu</math> but we know the variance <math>\sigma^2</math>. The case we the variance is unknown will be discussed [[Probstat/notes/t-distributions|here]]. | Suppose that we take a sample of size <math>n</math>, <math>X_1,X_2,\ldots,X_n</math> from a population which is normally distributed. Also suppose that the population has mean <math>\mu</math> and variance <math>\sigma^2</math>. In this section, we assume that we do not know <math>\mu</math> but we know the variance <math>\sigma^2</math>. The case we the variance is unknown will be discussed [[Probstat/notes/t-distributions|here]]. | ||
− | We would like to estimate the mean <math>\mu</math>. To do so, we compute the sample mean <math>\bar{X}</math>. It is very certain that <math>\bar{X}\neq\mu</math>, but we hope that it will be close to <math>\mu</math>. In this section, we try to quantify how close the sample mean to the real mean. More precisely, we would like to find an error range <math>\beta</math> such that we have some confidence that | + | We would like to estimate the mean <math>\mu</math>. To do so, we compute the sample mean <math>\bar{X}</math>. It is very certain that <math>\bar{X}\neq\mu</math>, but we hope that it will be close to <math>\mu</math>. In this section, we try to quantify how close the sample mean to the real mean. More precisely, we would like to find an error range <math>\beta</math> such that we have some ''confidence'' that |
<center> | <center> | ||
แถว 11: | แถว 11: | ||
i.e., that <math>\mu</math> lies within <math>\bar{X}\pm\beta</math> (or in the range <math>(\bar{X}-\beta,\bar{X}+\beta)</math>). | i.e., that <math>\mu</math> lies within <math>\bar{X}\pm\beta</math> (or in the range <math>(\bar{X}-\beta,\bar{X}+\beta)</math>). | ||
− | + | == Definitions == | |
− | As discussed in the [[Probstat/notes/sample means and sample variances|the last section]], | + | When computing <math>\beta</math>, we usually specify the level of confidence <math>1-\alpha</math> that we want to get. |
+ | |||
+ | '''Two-sided confidence interval.''' Suppose that we take the sample <math>X_1,X_2,\ldots,X_n</math> of size <math>n-1</math> and compute <math>\bar{X}</math>. We say that an interval <math>(\bar{X}-\beta,\bar{X}+\beta)</math> is called ''a <math>1-\alpha</math> confidence level confidence interval'' if the probability that the real mean <math>\mu</math> is in the range <math>(\bar{X}-\beta,\bar{X}+\beta)</math> is <math>1-\alpha</math>. That is, | ||
+ | |||
+ | <center> | ||
+ | <math>P\left\{\bar{X}-\beta < \mu < \bar{X}+\beta\right\} = 1-\alpha</math>. | ||
+ | </center> | ||
+ | |||
+ | If we know the distribution of <math>\bar{X}</math>, we can use that to find <math>\beta</math> for the required confidence level. | ||
+ | |||
+ | As discussed in the [[Probstat/notes/sample means and sample variances|the last section]], since the population is normal, the random variable <math>\bar{X}</math> is a normal random variable with mean <math>\mu</math> and s.d. <math>\sigma/\sqrt{n}</math>, i.e., | ||
<center> | <center> | ||
แถว 19: | แถว 29: | ||
</center> | </center> | ||
− | + | ''Remarks:'' When we say that <math>A\sim Normal(a,b)</math> we mean that a random variable <math>A</math> is normally distributed with mean <math>a</math> and variance <math>b</math>. | |
Therefore, we have that | Therefore, we have that | ||
<center> | <center> | ||
− | <math>\frac{\bar{X}-\mu}{\sigma/\sqrt{n}}</math> | + | <math>\frac{\bar{X}-\mu}{\sigma/\sqrt{n}}=\sqrt{n}(\bar{X}-\mu)/\sigma</math> |
+ | </center> | ||
+ | |||
+ | is a unit normal random variable. We can then use the [https://en.wikipedia.org/wiki/Standard_normal_table standard normal table] to find probabilities related to this random variable. | ||
+ | |||
+ | === Examples === | ||
+ | |||
+ | '''EX1:''' If we look at the standard normal table, we can observe that | ||
+ | |||
+ | <center> | ||
+ | <math>P\left\{-1.96 < \sqrt{n}(\bar{X}-\mu)/\sigma < 1.96 \right\} = 0.95</math>, | ||
+ | </center> | ||
+ | |||
+ | which means that | ||
+ | |||
+ | <center> | ||
+ | <math>P\left\{\bar{X}-1.96\sigma/\sqrt{n} < \mu < \bar{X} + 1.96\sigma/\sqrt{n} \right\} = 0.95</math>. | ||
+ | </center> | ||
+ | |||
+ | From our definition, we have that the interval | ||
+ | |||
+ | <center> | ||
+ | <math>(\bar{X}-1.96\sigma/\sqrt{n}, \bar{X} + 1.96\sigma/\sqrt{n})</math> | ||
+ | </center> | ||
+ | |||
+ | is a confidence interval with 95 percent confidence. | ||
+ | |||
+ | '''EX2:''' Suppose that we know that the population has variance <math>\sigma^2 = 5</math>. We compute a mean from a sample of size 10. Find the confidence interval with 90% confidence. | ||
+ | |||
+ | '''Solutions:''' Let <math>Z</math> be a unit normal random variable. If we look at the standard normal table, we observe that | ||
+ | |||
+ | <center> | ||
+ | <math>P\{ -1.64 < Z < 1.64 \} = 0.9</math> | ||
+ | </center> | ||
+ | |||
+ | Consider <math>\sqrt{n}(\bar{X}-\mu)/\sigma = Z</math>. We have that | ||
+ | |||
+ | <center> | ||
+ | <math> | ||
+ | P\left\{\bar{X}-1.64\sigma/\sqrt{n} < \mu < \bar{X} + 1.64\sigma/\sqrt{n}\right\} = | ||
+ | P\{ -1.64 < Z < 1.64 \} = 0.9</math>. | ||
</center> | </center> | ||
− | is a unit normal random variable. | + | Plugging in all the values, we have that <math>1.64\sigma/\sqrt{n} = 1.16</math>. Thus, the confidence interval with 90% confidence is |
+ | |||
+ | <center> | ||
+ | <math>(\bar{X} - 1.16, \bar{X} + 1.16)</math> | ||
+ | </center> | ||
+ | |||
+ | '''EX3:''' Consider the previous population. Suppose that we want the error range to be small. More precisely, we want to sample mean to be accurate within 0.1 with 80% confidence level, i.e., | ||
+ | |||
+ | <center> | ||
+ | <math> | ||
+ | P\left\{\bar{X}-0.05 < \mu < \bar{X} + 0.05 \right\} = 0.8 | ||
+ | </math> | ||
+ | </center> | ||
+ | |||
+ | What is the size of the sample that we have to take? | ||
+ | |||
+ | '''Solution:''' We first look at the standard normal table, and find out that, for unit normal variable <math>Z</math>, | ||
+ | |||
+ | <center> | ||
+ | <math> | ||
+ | P\{-1.28 < Z < 1.28\} = 0.8. | ||
+ | </math> | ||
+ | </center> | ||
+ | |||
+ | Set <math>Z = \sqrt{n}(\bar{X}-\mu)/\sigma</math>. | ||
+ | |||
+ | <center> | ||
+ | <math> | ||
+ | P\{-1.28 < \sqrt{n}(\bar{X}-\mu)/\sigma < 1.28\} = | ||
+ | P\{\bar{X}-1.28\sigma/\sqrt{n} < \mu < \bar{X} + 1.28\sigma/\sqrt{n} \} | ||
+ | </math> | ||
+ | </center> | ||
+ | |||
+ | Therefore we want <math>1.28\sigma/\sqrt{n} < 0.05</math>. This is true when <math>\sqrt{n} > 1.28\cdot\sqrt{5}/0.05=57.243</math>, i.e., <math>n > 3276.799</math>. | ||
+ | |||
+ | == One-sided confidence intervals == | ||
+ | In many cases, we only want the guarantee of the sample mean on the upper bound side or the lower bound side. For example, we want to say that the real mean is not far too large from the sample mean, i.e., | ||
+ | |||
+ | <center> | ||
+ | <math> | ||
+ | P\{\mu < \bar{X}+\beta\} = 1-\alpha. | ||
+ | </math> | ||
+ | </center> | ||
+ | |||
+ | In this case, we want to compute the ''one-sided confidence interval'' using essentially the same approach as in the two-sided case. | ||
+ | |||
+ | '''EX1:''' Suppose that we know that the population has variance <math>\sigma^2 = 5</math>. We compute a mean from a sample of size 10. Find the value <math>\beta</math> such that <math>(-\infty,\bar{X}+\beta)</math> is the confidence interval with 80% confidence level that the sample mean is within this interval. | ||
+ | |||
+ | '''Solutions:''' Let <math>Z</math> be a unit normal random variable. If we look at the standard normal table, we observe that | ||
+ | |||
+ | <center> | ||
+ | <math>P\{ Z > -0.84 \} = 0.8</math>. | ||
+ | </center> | ||
+ | |||
+ | From this, we can say that | ||
+ | |||
+ | <center> | ||
+ | <math>P\{ \sqrt{n}(\bar{X}-\mu)/\sigma > -0.84 \} = P\{ \bar{X} + 0.84\sigma/\sqrt{n} > \mu \} = P\{ \mu < \bar{X} + 0.84\sigma/\sqrt{n} \}=0.8</math>. | ||
+ | </center> | ||
+ | |||
+ | Thus, the interval that we want is <math>(-\infty,\bar{X} + 0.84\sigma/\sqrt{n}) = (-\infty,\bar{X} + 0.594)</math>. | ||
+ | |||
+ | == Remarks == | ||
+ | Be careful when using probability related to confidence interval. We can talk about probabilities that the sample mean is close to the actual mean '''only before''' we take a sample. After we get the sample and compute the value <math>\bar{X}</math>, it does not make any sense to talk about probability, because the interval either contains the mean or does not contain the mean. Therefore, at that point, we can only say that the interval has, for example, 90% confidence level. |
รุ่นแก้ไขปัจจุบันเมื่อ 03:00, 5 ธันวาคม 2557
- This is part of probstat
Suppose that we take a sample of size , from a population which is normally distributed. Also suppose that the population has mean and variance . In this section, we assume that we do not know but we know the variance . The case we the variance is unknown will be discussed here.
We would like to estimate the mean . To do so, we compute the sample mean . It is very certain that , but we hope that it will be close to . In this section, we try to quantify how close the sample mean to the real mean. More precisely, we would like to find an error range such that we have some confidence that
,
i.e., that lies within (or in the range ).
Definitions
When computing , we usually specify the level of confidence that we want to get.
Two-sided confidence interval. Suppose that we take the sample of size and compute . We say that an interval is called a confidence level confidence interval if the probability that the real mean is in the range is . That is,
.
If we know the distribution of , we can use that to find for the required confidence level.
As discussed in the the last section, since the population is normal, the random variable is a normal random variable with mean and s.d. , i.e.,
Remarks: When we say that we mean that a random variable is normally distributed with mean and variance .
Therefore, we have that
is a unit normal random variable. We can then use the standard normal table to find probabilities related to this random variable.
Examples
EX1: If we look at the standard normal table, we can observe that
,
which means that
.
From our definition, we have that the interval
is a confidence interval with 95 percent confidence.
EX2: Suppose that we know that the population has variance . We compute a mean from a sample of size 10. Find the confidence interval with 90% confidence.
Solutions: Let be a unit normal random variable. If we look at the standard normal table, we observe that
Consider . We have that
.
Plugging in all the values, we have that . Thus, the confidence interval with 90% confidence is
EX3: Consider the previous population. Suppose that we want the error range to be small. More precisely, we want to sample mean to be accurate within 0.1 with 80% confidence level, i.e.,
What is the size of the sample that we have to take?
Solution: We first look at the standard normal table, and find out that, for unit normal variable ,
Set .
Therefore we want . This is true when , i.e., .
One-sided confidence intervals
In many cases, we only want the guarantee of the sample mean on the upper bound side or the lower bound side. For example, we want to say that the real mean is not far too large from the sample mean, i.e.,
In this case, we want to compute the one-sided confidence interval using essentially the same approach as in the two-sided case.
EX1: Suppose that we know that the population has variance . We compute a mean from a sample of size 10. Find the value such that is the confidence interval with 80% confidence level that the sample mean is within this interval.
Solutions: Let be a unit normal random variable. If we look at the standard normal table, we observe that
.
From this, we can say that
.
Thus, the interval that we want is .
Remarks
Be careful when using probability related to confidence interval. We can talk about probabilities that the sample mean is close to the actual mean only before we take a sample. After we get the sample and compute the value , it does not make any sense to talk about probability, because the interval either contains the mean or does not contain the mean. Therefore, at that point, we can only say that the interval has, for example, 90% confidence level.