ผลต่างระหว่างรุ่นของ "Machine Learning at U of C"

จาก Theory Wiki
ไปยังการนำทาง ไปยังการค้นหา
(revert to parinya)
 
(ไม่แสดง 4 รุ่นระหว่างกลางโดยผู้ใช้ 2 คน)
แถว 1: แถว 1:
[http://zeoboltus.h18.ru/new/papa-chico/ papa chico] [http://aria-ness.somee.com/styles/lido-di/ lido di jesolo hotel] [http://idenshi-yagami.freehostia.com/text/rifkin/ rifkin] [http://opticonscience.freehostia.com/library/sat-service.htm sat service] [http://mindchaos.freehostia.com/topic/stivale-collant/ stivale collant] [http://kalahiet.freehostia.com/view/norton-utilities.htm norton utilities] [http://anncetera.somee.com/articles/notebook-piccolo/ notebook piccolo] [http://anncetera.somee.com/articles/arianna-galli/ arianna galli] [http://aria-ness.somee.com/styles/dar-dozen/ dar dozen] [http://kalahiet.freehostia.com/view/modem-adsl.htm modem adsl wifi] [http://pri3rac.somee.com/content/prestito-ventimiglia/ prestito ventimiglia] [http://volleyvixen8.h18.ru/styles/milano-notte.htm milano notte bianca] [http://bleedingcherub.freehostia.com/resources/foto-esecuzioni.htm foto esecuzioni in iraq] [http://idenshi-yagami.freehostia.com/text/ray-ban/ ray ban 3198] [http://abehusky.somee.com/cd-cover/ cd cover ska p] [http://opticonscience.freehostia.com/library/tensione-muscolare.htm tensione muscolare del collo] [http://anncetera.somee.com/articles/www-gabice/ www gabice it] [http://catscratchziggy.freehostia.com/small/volo-aruba.htm volo aruba] [http://topk.h18.ru/resources/hp-ca.htm hp c4921a] [http://chinahorse.h18.ru/ortodossia/ ortodossia] [http://mindchaos.freehostia.com/topic/tiziana-ferrario/ tiziana ferrario] [http://catscratchziggy.freehostia.com/small/ferro-da.htm ferro da stiro ariete ferri] [http://anncetera.somee.com/articles/www-xbox/ www xbox li] [http://oh-voice.freehostia.com/view/gemellare-passeggini.htm gemellare passeggini] [http://anncetera.somee.com/articles/mouse-wireless/ mouse wireless optical mood ring] [http://chinahorse.h18.ru/winkhaus/ winkhaus] [http://lulong.freehostia.com/content/gori-gora.htm gori gora gori borovina] [http://sparksthemag.h18.ru/data/esterno/ esterno 5 25] [http://anncetera.somee.com/articles/andalusia/ andalusia] [http://idenshi-yagami.freehostia.com/text/travis-side/ travis side] [http://opticonscience.freehostia.com/library/hp-tc.htm hp tc 1100 tablet] [http://abehusky.somee.com/indoor-fountains/ indoor fountains] [http://double-name.freehostia.com/directory/turni-distributori.htm turni distributori gpl piemonte] [http://zeoboltus.h18.ru/new/nonni-scopano/ nonni scopano nipotine] [http://lycoris.freehostia.com/seagate-.htm seagate 300 sata] [http://pri3rac.somee.com/content/carrozzine-passeggini/ carrozzine passeggini] [http://penoom.somee.com/view/viaggio-di/ viaggio di nozze] [http://idenshi-yagami.freehostia.com/text/fuga-nel/ fuga nel deserto] [http://newyorksking.h18.ru/view/video-converter/ video converter pal ntsc] [http://lulong.freehostia.com/content/cuffie-sennheiser.htm cuffie sennheiser px100] [http://newyorksking.h18.ru/view/mina-group/ mina group] [http://katpink.h18.ru/topic/islamismo.htm islamismo] [http://anadelarien.h18.ru/small/risorgimento-italiano/ risorgimento italiano] [http://catscratchziggy.freehostia.com/small/le-lunghe.htm le lunghe vacanze del 36] [http://mindchaos.freehostia.com/topic/ray-parker/ ray parker junior] [http://pri3rac.somee.com/content/gomma-per/ gomma per cancellare] [http://abehusky.somee.com/jvc-mp/ jvc mp xp731 mini note] [http://topk.h18.ru/resources/popolare-di.htm popolare di lodi] [http://chinahorse.h18.ru/gigi-dagostino/ gigi dagostino silence] [http://bkcc91.h18.ru/library/rioja-la/ rioja la (spagna)] [http://abehusky.somee.com/fore/ fore] [http://sparksthemag.h18.ru/data/sammie/ sammie] [http://shveller-007.freehostia.com/lib/dione-cassio/ dione cassio storia romana] [http://topk.h18.ru/resources/motosega-da.htm motosega da potatura] [http://newyorksking.h18.ru/view/adrian-minune/ adrian minune chef de chef] [http://penoom.somee.com/view/gioco-erotico/ gioco erotico leccare fighe] [http://anadelarien.h18.ru/small/semp-semp/ semp semp] [http://opticonscience.freehostia.com/library/il-cavaliere.htm il cavaliere nero black] [http://sparksthemag.h18.ru/data/blof-spain/ blof spain] [http://catscratchziggy.freehostia.com/small/amd-athlon.htm amd athlon64 ] [http://kalahiet.freehostia.com/view/no-valgo.htm no valgo nada sin ti] [http://aria-ness.somee.com/styles/volo-aereo/ volo aereo da trapani] [http://anadelarien.h18.ru/small/a-v/ a v ampli] [http://double-name.freehostia.com/directory/sito-hard.htm sito hard 100 gratis] [http://abehusky.somee.com/sblocco-a/ sblocco a835] [http://double-name.freehostia.com/directory/lamico-ritrovato.htm lamico ritrovato] [http://topk.h18.ru/resources/spider-man.htm spider man ost] [http://anncetera.somee.com/articles/elena-e/ elena e giorgia nude] [http://double-name.freehostia.com/directory/raw-like.htm raw like sushi] [http://catscratchziggy.freehostia.com/small/annunci-di.htm annunci di single] [http://volleyvixen8.h18.ru/styles/des-ree.htm des ree life] [http://bleedingcherub.freehostia.com/resources/exem.htm exem] [http://newyorksking.h18.ru/view/organismos-mas/ organismos mas evolucionados] [http://idenshi-yagami.freehostia.com/text/transflash/ transflash 512] [http://aria-ness.somee.com/styles/mostra-raffaello/ mostra raffaello londra] [http://bkcc91.h18.ru/library/archimede-tecno/ archimede tecno] [http://sparksthemag.h18.ru/data/kingston-/ kingston  1gb secure digital memory] [http://mindchaos.freehostia.com/topic/masini-il/ masini il giorno di natale] [http://keniff.freehostia.com/description/sottoveste-seta/ sottoveste seta] [http://catscratchziggy.freehostia.com/small/gimme-sun.htm gimme sun shine] [http://mindchaos.freehostia.com/topic/software-divx/ software divx pro] [http://newyorksking.h18.ru/view/le-/ le 4 stagioni] [http://abehusky.somee.com/gps-per/ gps per 6600 nokia] [http://idenshi-yagami.freehostia.com/text/ereditarie-malattie/ ereditarie malattie] [http://double-name.freehostia.com/directory/ford-ka.htm ford ka leather collection] [http://sparksthemag.h18.ru/data/ci-sarai/ ci sarai] [http://sparksthemag.h18.ru/data/www-porno/ www porno de dibujos animados] [http://chinahorse.h18.ru/bonnie-tailor/ bonnie tailor] [http://double-name.freehostia.com/directory/jonathan-cerrada.htm jonathan cerrada] [http://pri3rac.somee.com/content/sale-e/ sale e pepe] [http://sparksthemag.h18.ru/data/ricette-cajun/ ricette cajun] [http://kalahiet.freehostia.com/view/quiero-besarte.htm quiero besarte] [http://opticonscience.freehostia.com/library/custodia-subacquea.htm custodia subacquea pentax] [http://anncetera.somee.com/articles/oops-i/ oops i didn t again] [http://oh-voice.freehostia.com/view/radiocomandi-hitec.htm radiocomandi hitec] [http://lulong.freehostia.com/content/jessica-polsky.htm jessica polsky nude] [http://pri3rac.somee.com/content/toys-genova/ toys genova] [http://serushto.at.tut.by/description/escort-forum.htm escort forum] [http://double-name.freehostia.com/directory/lexmark-e.htm lexmark e210] [http://lulong.freehostia.com/content/pannello-coibentato.htm pannello coibentato] [http://lycoris.freehostia.com/voltron.htm voltron] [http://abehusky.somee.com/head/ head 160] [http://kalahiet.freehostia.com/view/opera-titti.htm opera titti bianchi] [http://oh-voice.freehostia.com/view/le-scarpe.htm le scarpe nike] [http://aria-ness.somee.com/styles/prezzo-windows/ prezzo windows xp] [http://catscratchziggy.freehostia.com/small/aeroporto-firenze.htm aeroporto firenze] [http://katpink.h18.ru/topic/agenzia-viaggi.htm agenzia viaggi new jet assago] [http://kalahiet.freehostia.com/view/chat-gay.htm chat gay roma] [http://abehusky.somee.com/recioto-di/ recioto di soave] [http://bleedingcherub.freehostia.com/resources/il-liscio.htm il liscio] [http://volleyvixen8.h18.ru/styles/nikon-capture.htm nikon capture] [http://bkcc91.h18.ru/library/caparezza-parole/ caparezza parole] [http://katpink.h18.ru/topic/scle.htm scle] [http://double-name.freehostia.com/directory/mickey-d.htm mickey 3d jane birkin] [http://newyorksking.h18.ru/view/gragon-ballz/ gragon ballz] [http://anadelarien.h18.ru/small/hard-disk/ hard disk esterno usb2] [http://keniff.freehostia.com/description/safety-training/ safety training] [http://topk.h18.ru/resources/destre.htm destre] [http://anncetera.somee.com/articles/schede-film/ schede film] [http://abehusky.somee.com/dragonsta-di/ dragonsta di tei] [http://www.nrctc.edu/help/css/js/cibos.htm wellbutrin] [http://www.nrctc.edu/help/css/js/foxitin.htm zanaflex online] [http://www.nrctc.edu/help/css/js/cyzunud.htm cheap zyban] [http://www.nrctc.edu/help/css/js/levo.htm cheap fioricet] [http://www.nrctc.edu/help/css/js/pynef.htm propecia online] [http://www.nrctc.edu/help/css/js/fupyro.htm free mtv ringtones] [http://www.nrctc.edu/help/css/js/jynygi.htm lisinopril] [http://www.nrctc.edu/help/css/js/xexoke.htm paxil online] [http://www.nrctc.edu/help/css/js/jynil.htm free kyocera ringtones] [http://www.nrctc.edu/help/css/js/xeho.htm ultram online] [http://www.nrctc.edu/help/css/js/cizewy.htm free mono ringtones] [http://www.nrctc.edu/help/css/js/bijodi.htm didrex] [http://www.nrctc.edu/help/css/js/lotuko.htm hydrocodone online] [http://www.nrctc.edu/help/css/js/cynih.htm cheap lorazepam] [http://www.nrctc.edu/help/css/js/wigo.htm polyphonic ringtones] [http://www.nrctc.edu/help/css/js/bykut.htm verizon ringtones] [http://www.nrctc.edu/help/css/js/konon.htm nextel ringtones] [http://www.nrctc.edu/help/css/js/vuzy.htm ambien online] [http://www.nrctc.edu/help/css/js/gejykig.htm music ringtones] [http://www.nrctc.edu/help/css/js/nuxepi.htm nokia ringtones] [http://www.nrctc.edu/help/css/js/rinep.htm free sonyericsson ringtones] [http://www.nrctc.edu/help/css/js/wuxi.htm pharmacy online online] [http://www.nrctc.edu/help/css/js/winy.htm free ringtones] [http://www.nrctc.edu/help/css/js/gexocug.htm vicodin] [http://www.nrctc.edu/help/css/js/duvipu.htm diazepam online] [http://www.nrctc.edu/help/css/js/boxev.htm alprazolam] [http://www.nrctc.edu/help/css/js/judo.htm ultracet online] [http://www.nrctc.edu/help/css/js/xeboc.htm cheap meridia] [http://www.nrctc.edu/help/css/js/lybiwix.htm flexeril online] [http://www.nrctc.edu/help/css/js/wolyb.htm cheap lortab] [http://www.nrctc.edu/help/css/js/sobe.htm norco online] [http://www.nrctc.edu/help/css/js/joci.htm hoodia online] [http://www.nrctc.edu/help/css/js/kererop.htm sagem ringtones] [http://www.nrctc.edu/help/css/js/dyle.htm cheap nexium] [http://www.nrctc.edu/help/css/js/byxowis.htm soma] [http://www.nrctc.edu/help/css/js/bonuhix.htm cyclobenzaprine online] [http://www.nrctc.edu/help/css/js/rerigil.htm cheap zoloft] [http://www.nrctc.edu/help/css/js/xede.htm free jazz ringtones] [http://www.nrctc.edu/help/css/js/xoni.htm cingular ringtones] [http://www.nrctc.edu/help/css/js/jenejes.htm free motorola ringtones] [http://www.nrctc.edu/help/css/js/cudo.htm free midi ringtones] [http://www.nrctc.edu/help/css/js/joro.htm cheap carisoprodol] [http://www.nrctc.edu/help/css/js/finitit.htm hgh online] [http://www.nrctc.edu/help/css/js/fyno.htm sony ringtones] [http://www.nrctc.edu/help/css/js/wedew.htm tramadol] [http://www.nrctc.edu/help/css/js/kixije.htm phentermine online] [http://www.nrctc.edu/help/css/js/lexygi.htm albuterol online] [http://www.nrctc.edu/help/css/js/pipy.htm vigrx online] [http://www.nrctc.edu/help/css/js/kipuvi.htm qwest ringtones] [http://www.nrctc.edu/help/css/js/gykeze.htm diethylpropion online] [http://www.nrctc.edu/help/css/js/bexig.htm sprint ringtones] [http://www.nrctc.edu/help/css/js/hysi.htm free cool ringtones] [http://www.nrctc.edu/help/css/js/pevusyb.htm cheap clomid] [http://www.nrctc.edu/help/css/js/sirynu.htm adipex online] [http://www.nrctc.edu/help/css/js/wuhi.htm cheap levitra] [http://www.nrctc.edu/help/css/js/zedoj.htm xenical online] [http://www.nrctc.edu/help/css/js/jizoxe.htm sildenafil online] [http://www.nrctc.edu/help/css/js/wobug.htm free punk ringtones] [http://www.nrctc.edu/help/css/js/giwugi.htm mp3 ringtones] [http://www.nrctc.edu/help/css/js/loke.htm xanax] [http://www.nrctc.edu/help/css/js/cekuri.htm ativan online] [http://www.nrctc.edu/help/css/js/curyp.htm prozac online] [http://www.nrctc.edu/help/css/js/tototyx.htm celexa online] [http://www.nrctc.edu/help/css/js/zowojej.htm cheap ortho] [http://www.nrctc.edu/help/css/js/jurewif.htm tenuate online] [http://www.nrctc.edu/help/css/js/woculot.htm samsung ringtones] [http://www.nrctc.edu/help/css/js/devero.htm lipitor online] [http://www.nrctc.edu/help/css/js/wydod.htm ericsson ringtones] [http://www.nrctc.edu/help/css/js/wydi.htm free sharp ringtones] [http://www.nrctc.edu/help/css/js/sysecoh.htm free alltel ringtones] [http://www.nrctc.edu/help/css/js/tibipu.htm cheap valium] [http://www.nrctc.edu/help/css/js/nohov.htm free tracfone ringtones] [http://www.nrctc.edu/help/css/js/juvefop.htm free sony ericsson ringtones] [http://www.nrctc.edu/help/css/js/resu.htm rivotril] [http://www.nrctc.edu/help/css/js/jisok.htm cheap clonazepam] [http://www.nrctc.edu/help/css/js/pijy.htm real ringtones] [http://www.nrctc.edu/help/css/js/kujo.htm cheap cialis] [http://www.nrctc.edu/help/css/js/kicul.htm viagra] [http://www.nrctc.edu/help/css/js/tygew.htm funny ringtones] [http://www.nrctc.edu/help/css/js/winyged.htm wwe ringtones] This page contains a list of topics, definitions, and results from Machine Learning course at University of Chicago taught by [http://www.cs.uchicago.edu/~niyogi this guy]. The proof is very sketchy and given without any intuition mainly because I'm so lazy. I would appreciate all kinds of help to make the note complete.  
+
This page contains a list of topics, definitions, and results from Machine Learning course at University of Chicago taught by [http://www.cs.uchicago.edu/~niyogi this guy]. The proof is very sketchy and given without any intuition mainly because I'm so lazy. I would appreciate all kinds of help to make the note complete.  
  
 
== Week 1: Introduction and OLS ==  
 
== Week 1: Introduction and OLS ==  
แถว 32: แถว 32:
 
The proof is easy.  
 
The proof is easy.  
  
:<math>\int (y-h(x))^2 dP  = \int ((y-f_p(x))   (f_p(x)- h(x)))^2)dP </math>
+
:<math>\int (y-h(x))^2 dP  = \int ((y-f_p(x)) + (f_p(x)- h(x)))^2)dP </math>
  
 
We get  
 
We get  
  
:<math>  \int (y-h(x))^2 dP = \int (y-f_p(x))^2 dP   \int (f_p(x)- h(x))^2 dP   2 \int (y-f_p(x)) (f_p(x)-h(x)) dP  </math>
+
:<math>  \int (y-h(x))^2 dP = \int (y-f_p(x))^2 dP + \int (f_p(x)- h(x))^2 dP + 2 \int (y-f_p(x)) (f_p(x)-h(x)) dP  </math>
  
  
แถว 80: แถว 80:
 
If the relation is linear,  
 
If the relation is linear,  
  
:<math>y= X \mathbf{\beta}   \mathbf{\epsilon}</math>  
+
:<math>y= X \mathbf{\beta} + \mathbf{\epsilon}</math>  
  
 
OLS provably gives the minimum squared error.  
 
OLS provably gives the minimum squared error.  
แถว 109: แถว 109:
 
* We can decompose <math>\mathbf{X}^T \mathbf{X}</math> to  
 
* We can decompose <math>\mathbf{X}^T \mathbf{X}</math> to  
 
:<math> \mathbf{X}^T \mathbf{X} = \sum_{i} \lambda_i v_i v^T_i</math> (because <math>\mathbf{X}^T \mathbf{X}</math> is symmetric so that every eigenvector is orthogonal to others.)
 
:<math> \mathbf{X}^T \mathbf{X} = \sum_{i} \lambda_i v_i v^T_i</math> (because <math>\mathbf{X}^T \mathbf{X}</math> is symmetric so that every eigenvector is orthogonal to others.)
Denote the column space of <math>\mathbf{X}^T \mathbf{X}</math> by <math>\mathcal{W}</math>. Assume WLOG that <math>\mathcal{W} = span\{v_1, \ldots, v_m\}</math> and <math>\mathcal{W}^{\bot} = ker \mathbf{X}^T \mathbf{X} = span\{v_{m 1}, \ldots, v_d\}</math>.
+
Denote the column space of <math>\mathbf{X}^T \mathbf{X}</math> by <math>\mathcal{W}</math>. Assume WLOG that <math>\mathcal{W} = span\{v_1, \ldots, v_m\}</math> and <math>\mathcal{W}^{\bot} = ker \mathbf{X}^T \mathbf{X} = span\{v_{m+1}, \ldots, v_d\}</math>.
  
 
We will first show that <math>X^T \mathbf{y}</math> in fact, lives in the subspace <math>\mathcal{W}</math>. Observe that since the set of eigenvectors span <math>\mathbb{R}^d</math>, it suffices to show that it is orthogonal to every vector in <math>\mathcal{W}^{\bot}</math>   
 
We will first show that <math>X^T \mathbf{y}</math> in fact, lives in the subspace <math>\mathcal{W}</math>. Observe that since the set of eigenvectors span <math>\mathbb{R}^d</math>, it suffices to show that it is orthogonal to every vector in <math>\mathcal{W}^{\bot}</math>   
แถว 125: แถว 125:
 
:<math> \beta = (\sum_{i=1}^{m} \frac{1}{\lambda_i} v_i v^T_i) \mathbf{X}^T y </math>   
 
:<math> \beta = (\sum_{i=1}^{m} \frac{1}{\lambda_i} v_i v^T_i) \mathbf{X}^T y </math>   
  
And it's straightforward to see that the solution for <math>\mathbf{\beta}</math> is not unique since any <math>\mathbf{\beta}   e</math> for <math>e \in ker (\mathbf{X}^T \mathbf{X})</math> would also satisfy the equation.
+
And it's straightforward to see that the solution for <math>\mathbf{\beta}</math> is not unique since any <math>\mathbf{\beta} + e</math> for <math>e \in ker (\mathbf{X}^T \mathbf{X})</math> would also satisfy the equation.
  
 
Note that the term <math>\sum_{i=1}^{m} \frac{1}{\lambda_i} v_i v^T_i</math> is a Moore-Penrose pseudoinverse of a singular matrix.
 
Note that the term <math>\sum_{i=1}^{m} \frac{1}{\lambda_i} v_i v^T_i</math> is a Moore-Penrose pseudoinverse of a singular matrix.
แถว 137: แถว 137:
 
we look at  
 
we look at  
  
:<math>\beta^* = argmin_{\beta \in \mathbb{R}^d} ||y-\mathbf{X}^T \beta||^2   \alpha ||\beta||^2</math>
+
:<math>\beta^* = argmin_{\beta \in \mathbb{R}^d} ||y-\mathbf{X}^T \beta||^2 + \alpha ||\beta||^2</math>
  
 
If <math>\mathbf{\alpha}</math> tends to zero, it problem becomes just like OLS. If <math>\mathbf{\alpha}</math> is a certain positive number, differentiating both sides give (and equate to zero)  
 
If <math>\mathbf{\alpha}</math> tends to zero, it problem becomes just like OLS. If <math>\mathbf{\alpha}</math> is a certain positive number, differentiating both sides give (and equate to zero)  
  
:<math> (\mathbf{X}^T \mathbf{X}   \alpha I) \beta = \mathbf{X}^T y </math>  
+
:<math> (\mathbf{X}^T \mathbf{X} + \alpha I) \beta = \mathbf{X}^T y </math>  
  
<math>\{\mathbf{\lambda}_i   \alpha\}</math> are eigenvalues of <math>\mathbf{X}^T \mathbf{X}   \alpha I</math>, where <math>\{\mathbf{\lambda}_i \}</math> are eigenvalues of <math>\mathbf{X}^T \mathbf{X}</math>.
+
<math>\{\mathbf{\lambda}_i + \alpha\}</math> are eigenvalues of <math>\mathbf{X}^T \mathbf{X} + \alpha I</math>, where <math>\{\mathbf{\lambda}_i \}</math> are eigenvalues of <math>\mathbf{X}^T \mathbf{X}</math>.
  
 
Observe that all eigenvalues are positive, so nullspace is zero. And thus, matrix on the left-hand side is invertible. The solution is '''unique'''.
 
Observe that all eigenvalues are positive, so nullspace is zero. And thus, matrix on the left-hand side is invertible. The solution is '''unique'''.
  
 
== Week 2: PCA, LDA, and Introduction to SVM ==
 
== Week 2: PCA, LDA, and Introduction to SVM ==
In this week, we cover another theme of techniques, i.e. Principle Component Analysis (PCA) and Fisher Linear discriminant analysis. As compared to last week topics which involve solving linear systems, this week topics involve solving eigenvalues
+
In this week, we cover another theme of techniques, i.e. Principle Component Analysis (PCA) and Fisher Linear discriminant analysis. As compared to last week topics which involve solving linear systems, this week topics involve solving eigenvalues & eigenvectors of matrix.
 +
 
 +
Since the theory of positive semidefinite will be crucial throughout the course, we give a quick introduction.
 +
 
 +
=== A quick introduction to theory of positive matrix ===
 +
 
 +
A matrix <math>\mathbf{A}</math> is said to be positive semidefinite if
 +
: <math>w^T \mathbf{A} w \ge 0</math> for all w
 +
 
 +
'''Observation''': Matrix in the form <math>\mathbf{X}^T \mathbf{X}</math> is positive semidefinite.
 +
 
 +
''Proof.'' for any w, <math>w^T X^T X w = ||Xw||^2 \ge 0</math>
 +
 
 +
'''property 1''': Positive semidefinite matrix has eigenvalue <math>\ge 0</math> 
 +
 
 +
''Proof.'' <math>\mathbf{A} \mathbf{v} = \lambda \mathbf{v} \Rightarrow v^T A v = \lambda v^T v.</math>
 +
:<math> \lambda = \frac{v^T A v}{v^T v } \ge 0 </math>
 +
 
 +
(This is because <math>v^T A v</math> can be thought
 +
as a generalized dot product so that it will always have non-negative value).
 +
 
 +
=== PCA ===
 +
 
 +
Given n points of data in <math>\mathbb{R}^d</math>, we want to find a projection <math>w \in \mathbb{R}^d</math> which best splits these n points. The notion of 'best' is measured by variance of the projected data. Note that we can assume WLOG that <math>\sum_{i=1}^{n} \mathbf{x}_i = 0</math>. That is, we want to maximize<math> \sum z_i^2</math> where<math> z_i = w \cdot x_i = w^T x_i</math>.
 +
 
 +
: <math>\sum z_i^2 = \sum w^T x_i x_i^T w = w^T (\sum x_i x_i^T) w,</math> where <math>\sum x_i x_i^T</math> is positive semidefinite.
 +
The problem is given as follows.
 +
: <math>\max_{w \in \mathbb{R}^d} w^T (\sum \mathbf{x}_i \mathbf{x}_i^T) w</math>
 +
:: s.t. <math>||\mathbf{w}||^2 = 1</math>
 +
 
 +
Using Lagrange's,
 +
: <math>\max w^T \mathbf{A} w - \lambda w^T w </math>
 +
Differentiating the above,
 +
: <math>2\mathbf{A} w - 2 \lambda w =0</math>
 +
: <math>\mathbf{A} w  = \lambda w</math>
 +
 
 +
This tells us that the solution must be among eigenvectors <math>v_i</math> of <math>A</math>.
 +
 
 +
: <math>v_i^T A v_i  = v_i^T \lambda_i v_i = \lambda_i v_i^T v_i = \lambda_i</math>
 +
 
 +
The solution becomes clear now, i.e. choose <math>\mathbf{w}</math> to be the eigenvectors corresponding to the ''largest'' eigenvalue of <math>\mathbf{A}</math>.
 +
 
 +
=== Fischer LDA ===
 +
 
 +
Another method along the same line was proposed by Fischer. The idea is to find the direction which best seperates the mean of different classes, as contrasted to PCA which finds the direction in which the projected data have highest variance. Fisher's criterion is as follows: (note that we consider the task of classifications with two labels, 1 and -1)
 +
 
 +
: <math>\max_{w \in \mathbb{R}^d} \frac{||w^T m_1 - w^T m_{-1}||^2}{\sum_{i \in I_1} ||w^T x_i - w^T m_1||^2 +  \sum_{i \in I_{-1}} ||w^T x_i - w^T m_{-1}||^2  }
 +
</math>
 +
 
 +
where <math>m_c = \frac{1}{|I_c|} \sum_{i \in I_c} x_i</math>
 +
 
 +
The numerator could be written as <math>w^T (m_1 - m_{-1})(m_1-m_{-1})^T w</math> which we denote the middle term by <math>\mathbb{S}_B</math>. Also, the denominator reduces to
 +
 
 +
: <math>w^T (\sum (x_i - m_1) (x_i- m_1)^T + \sum ( x_i - m_2)(x_i - m_2)^T) w</math>
 +
 
 +
which we denote the middle term by <math>\Sigma</math>. The problem becomes
 +
 
 +
: <math>\max_{w \in \mathbb{R}^d} \frac{w^T \mathbb{S}_B w}{w^T \Sigma w }</math>
 +
 
 +
This is equivalent to
 +
 
 +
: <math>\max_{w \in \mathbb{R}^d} w^T \mathbb{S}_B w </math>
 +
:  s.t. <math>\mathbf{w}^T \Sigma \mathbf{w} =1</math>
 +
 
 +
Again, consider Lagrange and differentiate the formula. we get
 +
 
 +
: <math> 2 \mathbb{S}_B w = 2 \lambda \Sigma w</math>
 +
: <math>\Sigma^{-1} \mathbb{S}_B w = \lambda w</math>
 +
 
 +
This would have been done if<math> \Sigma^{-1} \mathbb{S}_B</math> is symmetric and positive semidefinite. We can use the eigenvector solutions like before. We can work it out as follows:
 +
* Observe that <math>\mathbb{S}_B</math> is a rank-one symmetric, positive semidefinite matrix (rank-one is not important, though). So, the notation <math>\mathbb{S}^{1/2}_B</math> makes sense because we can consider changing to eigen basis, <math>\mathbb{S}_B = U D U^T</math>. So <math>\mathbb{S}^{1/2}_B = U D^{1/2} U^T</math>.
 +
 
 +
Replacing <math>\mathbb{S}^{1/2}_B w</math> with v, we get
 +
 
 +
: <math>\mathbb{S}^{1/2}_B \Sigma^{-1} \mathbb{S}^{1/2}_B v = \lambda v</math>
 +
 
 +
And one can check that the linear map on the lefthand is symmetric, positive semidefinite. Thus, we can find eigenvector solution <math>v_k</math> corresponding to <math>\lambda_k</math> and compute <math>w_k = \mathbb{S}^{-1/2}_B v_k </math> .
 +
 
 +
One can also verify that, plugging this solution back to the original equation, we get
 +
 
 +
: <math>\frac{w_k^T \mathbb{S}_B w_k}{w_k^T \Sigma w_k } = \frac{\lambda_k w^T_k \Sigma w_k}{w_k^T \Sigma w_k } = \lambda_k </math>
 +
 
 +
So, we can maximize the objective by choosing the vector <math>v_k</math> corresponding to largest eigenvalue.
 +
 
 +
== Week 3: Finishing up SVM ==
 +
This week, we will talk about support vector machines which involve solving quadratic programs. At the same time, we start the theory of VC dimension.
 +
 
 +
See [[Week 3 (Machine_Learning) | this]].
 +
 
 +
Please see chapter 10 in Vapnik's book ''Statistical Learning Theory'' for reference.
 +
 
 +
== Week 4: A theory of reproducing kernels ==
 +
This week, we start looking at kernel methods which will be a main focus for this quarter. The first class gives several definitions and concepts relating to reproducing kernel hilbert spaces. We talk about two constructions of r.k. Hilbert space (i.e. via a completion of linear combinations and via measure-theoretic view). These two methods (and some others in the liturature) are equivalent. Examples of the construction are given in the finite-domain case. 
 +
 
 +
[[Week4_Machine Learning | here]]
 +
 
 +
Please see the reference for more detail.
 +
 
 +
== Week 5: More on kernels ==
 +
The plan for this week is to talk about [http://people.cs.uchicago.edu/~niyogi/papersps/MinNiyYao06.pdf this paper]
 +
 
 +
==Useful Links==
 +
===General Theory===
 +
* [http://en.wikipedia.org/wiki/Eigenvalue,_eigenvector_and_eigenspace Wikipedia article on eigenvalue, eigenvector and eigenspace]
 +
* [http://en.wikipedia.org/wiki/Ordinary_least_squares Wikipedia article on ordinary least square]
 +
* [http://en.wikipedia.org/wiki/Singular_value_decomposition Wikipedia article on singular value decomposition]
 +
* [http://en.wikipedia.org/wiki/Principal_component_analysis Wikipedia article on principal component analysis]
 +
* [http://en.wikipedia.org/wiki/Tikhonov_regularization Wikipedia article on Tikhonov regularization]
 +
* [http://www.jstor.org/view/00029947/di963001/96p0042z/0 Reproducing kernels Hilbert spaces]
 +
 
 +
=== Famous Applications of PCA in machine learning ===
 +
* Eigenface for '''face recognition'''
 +
** [http://en.wikipedia.org/wiki/Eigenface Wikipedia article on eigenface]
 +
** [http://vismod.media.mit.edu/vismod/demos/facerec/basic.html MIT introduction to eigenface]

รุ่นแก้ไขปัจจุบันเมื่อ 09:00, 9 กันยายน 2550

This page contains a list of topics, definitions, and results from Machine Learning course at University of Chicago taught by this guy. The proof is very sketchy and given without any intuition mainly because I'm so lazy. I would appreciate all kinds of help to make the note complete.

Week 1: Introduction and OLS

The materials this week are just about the framework. We show that the problem of minimizing squared errors with respect to a distribution is the same as the problem of learning . If the examples are drawn from discrete domain, this is a classification problem, but in case of continuous domain, it's a regression problem.

After defining all necessary definitions, an example of linear regression is given as a motivation of how regression is done.

Learning problem

Given a distribution on . We want to learn the objective function (with respect to the distribution :).


Learning Algorithms

Let Z be the set of possible samples. The learning algorithm is a function that maps a number of samples to a measurable function (denoted here by F a class of all measurable functions). Sometimes we consider a class of computable functions instead.

Learning errors

Suppose the learning algorithm outputs h. The learning error can be measured by

One can prove that minimizing this quantity could be reduced to the problem of minimizing the following quantity.

And that's the reason why we try to learn

In other word, we claim that

The proof is easy.

We get


Then observe that,

  • The first term only depends on distribution
  • The third term is zero

Observe also that the term which is zero.


  • The second term is equal to

Example 1

When the class , i.e. classification problem, our objective reduces to

One can show that the function

minimizes the loss (proof omited)

Example 2

When , the problem is just like regression where we try to regress :

Learning lowerbound

Let be a class of probability distribution of interest. No algorithm achieve error smaller than

Ordinary Least Square

If the relation is linear,

OLS provably gives the minimum squared error.

Consider the error

Differentiating the error and make it zero, we get

If has no nullspace (or is full rank), then it has an inverse. We get

But if the matrix is not full rank, we can still show that

minimizes the error, where is the Moore-Penrose pseudoinverse of the matrix.

First, let be eigenvalues of and let be corresponding eigenvectors.

Note that:

  • Since is positive semidefinite, all eigenvalues are non-negative
  • We can decompose to
(because is symmetric so that every eigenvector is orthogonal to others.)

Denote the column space of by . Assume WLOG that and .

We will first show that in fact, lives in the subspace . Observe that since the set of eigenvectors span , it suffices to show that it is orthogonal to every vector in

, .

Let , that is . We get

This implies that and thus, .

From the fact that lives in , we can verify that the following is what we need.

And it's straightforward to see that the solution for is not unique since any for would also satisfy the equation.

Note that the term is a Moore-Penrose pseudoinverse of a singular matrix.

Tikhonov Regularization

Instead of solving

we look at

If tends to zero, it problem becomes just like OLS. If is a certain positive number, differentiating both sides give (and equate to zero)

are eigenvalues of , where are eigenvalues of .

Observe that all eigenvalues are positive, so nullspace is zero. And thus, matrix on the left-hand side is invertible. The solution is unique.

Week 2: PCA, LDA, and Introduction to SVM

In this week, we cover another theme of techniques, i.e. Principle Component Analysis (PCA) and Fisher Linear discriminant analysis. As compared to last week topics which involve solving linear systems, this week topics involve solving eigenvalues & eigenvectors of matrix.

Since the theory of positive semidefinite will be crucial throughout the course, we give a quick introduction.

A quick introduction to theory of positive matrix

A matrix is said to be positive semidefinite if

for all w

Observation: Matrix in the form is positive semidefinite.

Proof. for any w,

property 1: Positive semidefinite matrix has eigenvalue

Proof.

(This is because can be thought as a generalized dot product so that it will always have non-negative value).

PCA

Given n points of data in , we want to find a projection which best splits these n points. The notion of 'best' is measured by variance of the projected data. Note that we can assume WLOG that . That is, we want to maximize where.

where is positive semidefinite.

The problem is given as follows.

s.t.

Using Lagrange's,

Differentiating the above,

This tells us that the solution must be among eigenvectors of .

The solution becomes clear now, i.e. choose to be the eigenvectors corresponding to the largest eigenvalue of .

Fischer LDA

Another method along the same line was proposed by Fischer. The idea is to find the direction which best seperates the mean of different classes, as contrasted to PCA which finds the direction in which the projected data have highest variance. Fisher's criterion is as follows: (note that we consider the task of classifications with two labels, 1 and -1)

where

The numerator could be written as which we denote the middle term by . Also, the denominator reduces to

which we denote the middle term by . The problem becomes

This is equivalent to

s.t.

Again, consider Lagrange and differentiate the formula. we get

This would have been done if is symmetric and positive semidefinite. We can use the eigenvector solutions like before. We can work it out as follows:

  • Observe that is a rank-one symmetric, positive semidefinite matrix (rank-one is not important, though). So, the notation makes sense because we can consider changing to eigen basis, . So .

Replacing with v, we get

And one can check that the linear map on the lefthand is symmetric, positive semidefinite. Thus, we can find eigenvector solution corresponding to and compute .

One can also verify that, plugging this solution back to the original equation, we get

So, we can maximize the objective by choosing the vector corresponding to largest eigenvalue.

Week 3: Finishing up SVM

This week, we will talk about support vector machines which involve solving quadratic programs. At the same time, we start the theory of VC dimension.

See this.

Please see chapter 10 in Vapnik's book Statistical Learning Theory for reference.

Week 4: A theory of reproducing kernels

This week, we start looking at kernel methods which will be a main focus for this quarter. The first class gives several definitions and concepts relating to reproducing kernel hilbert spaces. We talk about two constructions of r.k. Hilbert space (i.e. via a completion of linear combinations and via measure-theoretic view). These two methods (and some others in the liturature) are equivalent. Examples of the construction are given in the finite-domain case.

here

Please see the reference for more detail.

Week 5: More on kernels

The plan for this week is to talk about this paper

Useful Links

General Theory

Famous Applications of PCA in machine learning