ผลต่างระหว่างรุ่นของ "Probstat/notes/sample means and sample variances"

รุ่นแก้ไขปัจจุบันเมื่อ 09:41, 5 ธันวาคม 2557

This is part of probstat

Consider a certain distribution. The mean $\mu$ of the distribution is the expected value of a random variable $X$ sample from the distribution. I.e.,

$\mu =E[X]$ .

Also recall that the variance of the distribution is

$\sigma ^{2}=Var(X)=E[(X-\mu )^{2}]=E[X^{2}]-E[X]^{2}.$

And finally, the standard deviation is $\sigma ={\sqrt {Var(X)}}$ .

เนื้อหา

1 Sample Statistics
2 Distribution of sample means
- 2.1 Examples
3 Why do we use normal distributions?

Sample Statistics

Suppose that you take $n$ samples $X_{1},X_{2},\ldots ,X_{n}$ independently from this distribution. (Note that $X_{1},X_{2},\ldots ,X_{n}$ are random variables.)

Sample means

The statistic

${\bar {X}}={\frac {X_{1}+X_{2}+\cdots +X_{n}}{n}}={\frac {1}{n}}\sum _{i=1}^{n}X_{i}$

is called a sample mean. Since $X_{1},X_{2},\dots ,X_{n}$ are random variables, the mean ${\bar {X}}$ is also a random variable.

We hope that ${\bar {X}}$ approximates $\mu$ well. We can compute:

$E[{\bar {X}}]=E\left[{\frac {1}{n}}\sum _{i=1}^{n}X_{i}\right]={\frac {1}{n}}E\left[\sum _{i=1}^{n}X_{i}\right]={\frac {1}{n}}\sum _{i=1}^{n}E[X_{i}]={\frac {1}{n}}n\mu =\mu$

and since $X_{1},X_{2},\ldots ,X_{n}$ are independent, we have that

$Var({\bar {X}})=Var\left({\frac {X_{1}+X_{2}+\cdots +X_{n}}{n}}\right)={\frac {1}{n^{2}}}Var(X_{1}+X_{2}+\cdots +X_{n})={\frac {1}{n^{2}}}\cdot n\cdot Var(X)={\frac {\sigma ^{2}}{n}}.$

Sample variances and sample standard deviations

We can also use the sample to estimate $\sigma ^{2}$ .

The statistic

$S^{2}={\frac {\sum _{i=1}^{n}(X_{i}-{\bar {X}})^{2}}{n-1}}$

is called a sample variance. The sample standard deviation is $S={\sqrt {S^{2}}}={\sqrt {\frac {\sum _{i=1}^{n}(X_{i}-{\bar {X}})^{2}}{n-1}}}$ .

Note that the denominator is $n-1$ instead of $n$ .

We can show that $E[S^{2}]=\sigma ^{2}$ .

${\begin{array}{rcl}\mathrm {E} [S^{2}]&=&\mathrm {E} \left[{\frac {\sum _{i=1}^{n}(X_{i}-{\bar {X}})^{2}}{n-1}}\right]\\&=&\mathrm {E} \left[{\frac {\sum _{i=1}^{n}(X_{i}^{2}-2X_{i}{\bar {X}}+{\bar {X}}^{2}}{n-1}}\right]\\&=&{\frac {1}{n-1}}\left(\sum _{i=1}^{n}E[X_{i}^{2}]-2\cdot \sum _{i=1}^{n}E[X_{i}{\bar {X}}]+\sum _{i=1}^{n}E[{\bar {X}}^{2}]\right)\\&=&{\frac {1}{n-1}}\left(\sum _{i=1}^{n}E[X_{i}^{2}]-2\cdot \sum _{i=1}^{n}E\left[X_{i}\left((1/n)\sum _{j=1}^{n}X_{j}\right)\right]+\sum _{i=1}^{n}E\left[\left((1/n)\sum _{j=1}^{n}X_{j}\right)^{2}\right]\right)\\&=&{\frac {1}{n-1}}\left(\sum _{i=1}^{n}E[X_{i}^{2}]-(2/n)\cdot \sum _{i=1}^{n}\sum _{j=1}^{n}E\left[X_{i}\cdot X_{j}\right]+n\cdot E\left[\left((1/n)\sum _{j=1}^{n}X_{j}\right)^{2}\right]\right)\\&=&{\frac {1}{n-1}}\left(\sum _{i=1}^{n}E[X_{i}^{2}]-(2/n)\cdot \sum _{i=1}^{n}\sum _{j=1}^{n}E\left[X_{i}\cdot X_{j}\right]+(1/n)\cdot E\left[\left(\sum _{j=1}^{n}X_{j}\right)^{2}\right]\right)\\\end{array}}$

We note that since $X_{i}$ and $X_{j}$ are independent, we have that

E[X_{i}X_{j}]=E[X_{i}]E[X_{j}]=\mu \cdot \mu =\mu ^{2}

.

Let's deal with the middle term here.

${\begin{array}{rcl}\sum _{i=1}^{n}\sum _{j=1}^{n}E\left[X_{i}\cdot X_{j}\right]&=&\sum _{i=1}^{n}E[X_{i}X_{i}]+\sum _{i=1}^{n}\sum _{j\neq i}E[X_{i}X_{j}]\\&=&\sum _{i=1}^{n}E[X_{i}^{2}]+\sum _{i=1}^{n}\sum _{j\neq i}E[X_{i}]E[X_{j}]\\&=&nE[X^{2}]+n(n-1)E[X]E[X]\\&=&nE[X^{2}]+n(n-1)\mu ^{2}\\\end{array}}$

Let's work on the third term which ends up being the same as the middle term.

$E\left[\left(\sum _{j=1}^{n}X_{j}\right)^{2}\right]=E\left[\sum _{j=1}^{n}\sum _{k=1}^{n}X_{j}X_{k}\right]=\sum _{j=1}^{n}\sum _{k=1}^{n}E[X_{j}X_{k}]=nE[X^{2}]+n(n-1)\mu ^{2}.$

Let's put everything together:

${\begin{array}{rcl}\mathrm {E} [S^{2}]&=&{\frac {1}{n-1}}\left(\sum _{i=1}^{n}E[X_{i}^{2}]-(2/n)\cdot \sum _{i=1}^{n}\sum _{j=1}^{n}E\left[X_{i}\cdot X_{j}\right]+(1/n)\cdot E\left[\left(\sum _{j=1}^{n}X_{j}\right)^{2}\right]\right)\\&=&{\frac {1}{n-1}}\left(nE[X^{2}]-(2/n)(nE[X^{2}]+n(n-1)\mu ^{2})+(1/n)(nE[X^{2}]+n(n-1)\mu ^{2})\right)\\&=&{\frac {1}{n-1}}\left(nE[X^{2}]-2E[X^{2}]-2(n-1)\mu ^{2}+E[X^{2}]+(n-1)\mu ^{2}\right)\\&=&{\frac {1}{n-1}}\left((n-1)E[X^{2}]-(n-1)\mu ^{2}\right)\\&=&E[X^{2}]-\mu ^{2}=\sigma ^{2}\\\end{array}}$

Summary

Sample means:

${\bar {X}}={\frac {X_{1}+X_{2}+\cdots +X_{n}}{n}}={\frac {1}{n}}\sum _{i=1}^{n}X_{i}$

Sample variance:

$S^{2}={\frac {\sum _{i=1}^{n}(X_{i}-{\bar {X}})^{2}}{n-1}}$

Properties of sample means and sample variances

$E[{\bar {X}}]=\mu$
$Var[{\bar {X}}]=\sigma ^{2}/n$
$E[S^{2}]=\sigma ^{2}$

Distribution of sample means

While we know basic properties of sample means ${\bar {X}}$ , if we want to perform other statistical calculation (i.e., computing confidence intervals or testing hypotheses), it is very useful to know the exact distribution of ${\bar {X}}$ .

For a general population, it will be hard to deal the the distribution of ${\bar {X}}$ exactly. However, if the population is normal, we are in a very good shape.

Recall the definition of ${\bar {X}}$ :

${\bar {X}}=(X_{1}+X_{2}+\cdots +X_{n})/n.$

Therefore, ${\bar {X}}$ is a sum of independent normally distributed random variables. A nice property of normal random variables is that the sum of normally distributed random variables remains a normal random variable. Since a normal random variable is uniquely determined by its mean and variance, we have the following observation.

Distribution of sample means of normal populations.

If the population is normally distributed with mean $\mu$ and variance $\sigma ^{2}$ , the distribution of ${\bar {X}}$ is normal with mean $\mu$ and variance $\sigma ^{2}/n$ .

Examples

Ex1. Suppose that the population has mean $\mu =15$ and variance $\sigma ^{2}=15$ . If you select a sample of size 20, what is the probability that the sample mean ${\bar {X}}$ is greater than 17?

Solution:

The sample mean ${\bar {X}}$ is normal with mean $\mu =15$ and variance $\sigma ^{2}/n=15/20=0.75$ . Therefore,

${\frac {{\bar {X}}-15}{\sqrt {0.75}}}$

is unit normal.

Note that

$P\{{\bar {X}}>17\}=P\{{\frac {{\bar {X}}-15}{\sqrt {0.75}}}>{\frac {17-15}{\sqrt {0.75}}}\}\approx P\{{\frac {{\bar {X}}-15}{\sqrt {0.75}}}>2.309\}$

We can look at the standard normal table and find out that $P\{0\leq Z\leq 2.31\}=0.48956$ , for a unit normal random variable Z. Thus, the probability

$P\{{\bar {X}}>17\}\approx 0.5-0.48956=0.0104,$

which is roughly 1%.

Ex2.

To be added...

Why do we use normal distributions?

Normal random variables appear very often in our treatment of statistics. This is not just a coincidence. See limit theorems.

@@ แถว 1: / แถว 1: @@
-== Sample ==
+: ''This is part of [[probstat]]''
 Consider a certain distribution.  The mean <math>\mu</math> of the distribution is the expected value of a random variable <math>X</math> sample from the distribution.  I.e.,
@@ แถว 9: / แถว 10: @@
 <center>
-<math>\sigma^2=Var(X)=E[(X-\mu)^2]=E[X^2] = E[X]^2.</math>.
+<math>\sigma^2=Var(X)=E[(X-\mu)^2]=E[X^2] - E[X]^2.</math>
 </center>
 And finally, the standard deviation is <math>\sigma = \sqrt{Var(X)}</math>.
-Suppose that you take <math>n</math> samples <math>X_1,X_2,\ldots,X_n</math> independently from this distribution.  (Note that <math>X_1,X_2,\ldots,X_n</math> are random variables.
+== Sample Statistics ==
+Suppose that you take <math>n</math> samples <math>X_1,X_2,\ldots,X_n</math> independently from this distribution.  (Note that <math>X_1,X_2,\ldots,X_n</math> are random variables.)
 === Sample means ===
@@ แถว 26: / แถว 29: @@
 is called a '''sample mean.'''  Since <math>X_1,X_2,\dots,X_n</math> are random variables, the mean <math>\bar{X}</math> is also a random variable.
-Thus, we can compute:
+We hope that <math>\bar{X}</math> approximates <math>\mu</math> well.  We can compute:
+<center>
 <math>E[\bar{X}]= E\left[\frac{1}{n}\sum_{i=1}^n X_i\right] = \frac{1}{n}E\left[\sum_{i=1}^n X_i\right] = \frac{1}{n}\sum_{i=1}^n E[X_i] = \frac{1}{n} n\mu = \mu</math>
+</center>
-and
+and since <math>X_1,X_2,\ldots,X_n</math> are independent, we have that
-<math>Var(\bar{X}) = \frac{\sigma^2}{n}.</math>
+<center>
+<math>Var(\bar{X}) = Var\left(\frac{X_1+X_2+\cdots+X_n}{n}\right) = \frac{1}{n^2}Var(X_1+X_2+\cdots+X_n) = \frac{1}{n^2}\cdot n\cdot Var(X) = \frac{\sigma^2}{n}.</math>
+</center>
 === Sample variances and sample standard deviations ===
+We can also use the sample to estimate <math>\sigma^2</math>.
+The statistic
+<center>
+<math>S^2 = \frac{\sum_{i=1}^n (X_i - \bar{X})^2}{n-1}</math>
+</center>
+is called a '''sample variance'''.  The sample standard deviation is <math>S = \sqrt{S^2} = \sqrt{\frac{\sum_{i=1}^n (X_i - \bar{X})^2}{n-1}}</math>.
+Note that the denominator is <math>n-1</math> instead of <math>n</math>.
+We can show that <math>E[S^2] = \sigma^2</math>.
+<center>
+<math>
+\begin{array}{rcl}
+\mathrm{E}[S^2] &=& \mathrm{E}\left[\frac{\sum_{i=1}^n (X_i - \bar{X})^2}{n-1}\right] \\
+&=& \mathrm{E}\left[\frac{\sum_{i=1}^n (X_i^2 -2X_i\bar{X} + \bar{X}^2}{n-1}\right] \\
+&=& \frac{1}{n-1}\left( \sum_{i=1}^n E[X_i^2] - 2\cdot\sum_{i=1}^n E[X_i\bar{X}] + \sum_{i=1}^n E[\bar{X}^2] \right) \\
+&=& \frac{1}{n-1}\left( \sum_{i=1}^n E[X_i^2]
+- 2\cdot\sum_{i=1}^n E\left[X_i\left((1/n)\sum_{j=1}^n X_j\right)\right]
++ \sum_{i=1}^n E\left[\left((1/n)\sum_{j=1}^n X_j\right)^2\right] \right) \\
+&=& \frac{1}{n-1}\left( \sum_{i=1}^n E[X_i^2]
+- (2/n)\cdot\sum_{i=1}^n \sum_{j=1}^n E\left[X_i\cdot X_j\right]
++ n\cdot E\left[\left((1/n)\sum_{j=1}^n X_j\right)^2\right] \right) \\
+&=& \frac{1}{n-1}\left( \sum_{i=1}^n E[X_i^2]
+- (2/n)\cdot\sum_{i=1}^n \sum_{j=1}^n E\left[X_i\cdot X_j\right]
++ (1/n)\cdot E\left[\left(\sum_{j=1}^n X_j\right)^2\right] \right) \\
+\end{array}
+</math>
+</center>
+We note that since <math>X_i</math> and <math>X_j</math> are independent, we have that
+<center><math>E[X_iX_j] = E[X_i]E[X_j] = \mu\cdot\mu = \mu^2</math>.</center>
+Let's deal with the middle term here.
+<center>
+<math>
+\begin{array}{rcl}
+\sum_{i=1}^n \sum_{j=1}^n E\left[X_i\cdot X_j\right] &=& \sum_{i=1}^n E[X_iX_i] + \sum_{i=1}^n\sum_{j\neq i} E[X_iX_j]\\
+&=& \sum_{i=1}^n E[X_i^2] + \sum_{i=1}^n\sum_{j\neq i} E[X_i]E[X_j]\\
+&=& n E[X^2] + n(n-1)E[X]E[X]\\
+&=& n E[X^2] + n(n-1)\mu^2\\
+\end{array}
+</math>
+</center>
+Let's work on the third term which ends up being the same as the middle term.
+<center>
+<math>
+E\left[\left(\sum_{j=1}^n X_j\right)^2\right] = E\left[\sum_{j=1}^n \sum_{k=1}^n X_jX_k\right]
+= \sum_{j=1}^n \sum_{k=1}^n E[X_jX_k] = n E[X^2] + n(n-1)\mu^2.
+</math>
+</center>
+Let's put everything together:
+<center>
+<math>
+\begin{array}{rcl}
+\mathrm{E}[S^2]
+&=& \frac{1}{n-1}\left( \sum_{i=1}^n E[X_i^2]
+- (2/n)\cdot\sum_{i=1}^n \sum_{j=1}^n E\left[X_i\cdot X_j\right]
++ (1/n)\cdot E\left[\left(\sum_{j=1}^n X_j\right)^2\right] \right) \\
+&=& \frac{1}{n-1}\left( n E[X^2]
+- (2/n)(n E[X^2] + n(n-1)\mu^2)
++ (1/n)(n E[X^2] + n(n-1)\mu^2) \right) \\
+&=& \frac{1}{n-1}\left( n E[X^2]
+- 2E[X^2] - 2(n-1)\mu^2
++ E[X^2] + (n-1)\mu^2 \right) \\
+&=& \frac{1}{n-1}\left((n-1) E[X^2] - (n-1)\mu^2  \right) \\
+&=& E[X^2] - \mu^2 = \sigma^2\\
+\end{array}
+</math>
+</center>
+=== Summary ===
+Sample means:
+<center>
+<math>\bar{X} = \frac{X_1+X_2+\cdots+X_n}{n} = \frac{1}{n}\sum_{i=1}^n X_i</math>
+</center>
+Sample variance:
+<center>
+<math>S^2 = \frac{\sum_{i=1}^n (X_i - \bar{X})^2}{n-1}</math>
+</center>
+==== Properties of sample means and sample variances ====
+{{กล่องเทา|
+* <math>E[\bar{X}] = \mu</math>
+* <math>Var[\bar{X}] = \sigma^2/n</math>
+* <math>E[S^2] = \sigma^2</math>
+}}
+== Distribution of sample means ==
+While we know basic properties of sample means <math>\bar{X}</math>, if we want to perform other statistical calculation (i.e., computing confidence intervals or testing hypotheses), it is very useful to know the ''exact'' distribution of <math>\bar{X}</math>.
+For a general population, it will be hard to deal the the distribution of <math>\bar{X}</math> exactly.  However, if the population is normal, we are in a very good shape.
+Recall the definition of <math>\bar{X}</math>:
+<center>
+<math>\bar{X} = (X_1+X_2+\cdots+X_n)/n.</math>
+</center>
+Therefore, <math>\bar{X}</math> is a sum of independent normally distributed random variables.  A nice property of normal random variables is that the sum of normally distributed random variables remains a normal random variable.  Since a normal random variable is uniquely determined by its mean and variance, we have the following observation.
+{{กล่องเทา|
+'''Distribution of sample means of normal populations.'''
+If the population is normally distributed with mean <math>\mu</math> and variance <math>\sigma^2</math>, the distribution of <math>\bar{X}</math> is normal with mean <math>\mu</math> and variance <math>\sigma^2/n</math>.
+}}
+=== Examples ===
+'''Ex1.''' Suppose that the population has mean <math>\mu = 15</math> and variance <math>\sigma^2 = 15</math>.  If you select a sample of size 20, what is the probability that the sample mean <math>\bar{X}</math> is greater than 17?
+'''Solution:'''
+The sample mean <math>\bar{X}</math> is normal with mean <math>\mu=15</math> and variance <math>\sigma^2/n = 15/20 = 0.75</math>.  Therefore,
+<center>
+<math>\frac{\bar{X} - 15}{\sqrt{0.75}}</math>
+</center>
+is unit normal.
+Note that
+<center>
+<math>
+P\{\bar{X} > 17\} = P\{\frac{\bar{X} - 15}{\sqrt{0.75}} > \frac{17 - 15}{\sqrt{0.75}}\} \approx P\{\frac{\bar{X} - 15}{\sqrt{0.75}} > 2.309 \}
+</math>
+</center>
-=== Confidence intervals ===
+We can look at the standard normal table and find out that <math>P\{ 0 \leq Z \leq 2.31\} = 0.48956</math>, for a unit normal random variable ''Z''.  Thus, the probability
-Recall that the random variable <math>\bar{X}</math> is a normal random variable with mean <math>\mu</math> and s.d. <math>\sigma/\sqrt{n}</math>  Therefore,
 <center>
-<math>\sqrt{n}(\bar{X}-\mu)/\sigma</math>
+<math>
+P\{\bar{X} > 17\} \approx 0.5 - 0.48956 = 0.0104,
+</math>
 </center>
-will be a unit normal random variable.
+which is roughly 1%.
+'''Ex2.'''
+: ''To be added...''
-We consider how <math>\bar{X}</math> deviates from the true mean <math>\mu</math>.
+== Why do we use normal distributions? ==
+Normal random variables appear very often in our treatment of statistics.  This is not just a coincidence.  See [[probstat/notes/limit theorems|limit theorems]].

ผลต่างระหว่างรุ่นของ "Probstat/notes/sample means and sample variances"

รุ่นแก้ไขปัจจุบันเมื่อ 09:41, 5 ธันวาคม 2557

เนื้อหา

Sample Statistics

Sample means

Sample variances and sample standard deviations

Summary

Properties of sample means and sample variances

Distribution of sample means

Examples

Why do we use normal distributions?

รายการเลือกการนำทาง

เครื่องมือส่วนตัว

เนมสเปซ

สิ่งที่แตกต่าง

ดู

เพิ่มเติม

ค้นหา

การนำทาง

เครื่องมือ