ผลต่างระหว่างรุ่นของ "Machine Learning at U of C"

จาก Theory Wiki
ไปยังการนำทาง ไปยังการค้นหา
แถว 21: แถว 21:
  
 
And that's the reason why we try to learn <math>\mathbb{E}_p[y|x]</math>
 
And that's the reason why we try to learn <math>\mathbb{E}_p[y|x]</math>
 +
 +
The proof is easy.
 +
 +
<math>
 +
\int (y-h(x))^2 dP  = \int ((y-f_p(x)) + (f_p(x)- h(x)))^2)dP
 +
</math>
 +
 +
We get
 +
 +
<math> 
 +
\int (y-h(x))^2 dP = \int (y-f_p(x))^2 dP + \int (f_p(x)- h(x))^2 dP + 2 \int (y-f_p(x)) (f_p(x)-h(x)) dP 
 +
</math>
 +
 +
 +
Then observe that,
 +
 +
* The first term only depends on distribution <math>\mathbb{P}</math>
 +
* The third term is zero
 +
 +
<math> \int \int (y-f_p(x)) (f_p(x)-h(x)) p(x,y) dy dx = \int p(x) (f_p(x)- h(x)) [ \int (y-f_p(x)) p(y|x) dy ] dx  </math>
 +
 +
Observe also that the term <math>\int (y-f_p(x)) p(y|x) dy = \mathbb{E}[y-\mathbb{E}[y|x] | x]</math> which is zero.
 +
 +
 +
* The second term is equal to
 +
 +
<math>
 +
\int_{X} \int_{Y} (f_p(x)- h(x))^2 p(x,y) dy dx = \int_{X} (f_p(x) - h(x))^2 \int_{Y} p(x,y)dy dx =  ||f_p - h||^2_{l_2(\mathbb{P})} 
 +
</math>
  
 
=== Ordinary Least Square ===  
 
=== Ordinary Least Square ===  

รุ่นแก้ไขเมื่อ 07:21, 30 มีนาคม 2550

This page contains a list of topics, definitions, and results from Machine Learning course at University of Chicago.

Week 1

Learning problem

Given a distribution on . We want to learn the objective function (with respect to the distribution ).


Learning Algorithms

Let Z be the set of possible samples. The learning algorithm is a function that maps a number of samples to a measurable function (denoted here by F a class of all measurable functions). Sometimes we consider a class of computable functions instead.

Loss function

Suppose the learning algorithm outputs h. The learning error can be measured by

One can prove that minimizing this quantity could be reduced to the problem of minimizing the following quantity.

And that's the reason why we try to learn

The proof is easy.

We get


Then observe that,

  • The first term only depends on distribution
  • The third term is zero

Observe also that the term which is zero.


  • The second term is equal to

Ordinary Least Square

Tikhonov Regularization

Week 2