ผลต่างระหว่างรุ่นของ "MINDLAB READING"

จาก Theory Wiki
ไปยังการนำทาง ไปยังการค้นหา
(talk --> meeting)
 
(ไม่แสดง 65 รุ่นระหว่างกลางโดยผู้ใช้ 6 คน)
แถว 10: แถว 10:
  
 
==Date and Time==
 
==Date and Time==
At this moment, I plan to arrange the reading group in every friday afternoon (1pm - 2pm), but let us talk for more convenient dates and times.
+
At this moment, I plan to arrange the reading group in every friday afternoon, no more than two hours (1pm - 3pm), but let us talk for more convenient dates and times.
  
 
==May 11, 2007: Mixture Models and EM algorithm==
 
==May 11, 2007: Mixture Models and EM algorithm==
แถว 23: แถว 23:
 
* How can we train the learner in Bayesian paradigm?
 
* How can we train the learner in Bayesian paradigm?
 
:* This talk illustrates one Bayesian toolkit: the EM algorithm.
 
:* This talk illustrates one Bayesian toolkit: the EM algorithm.
 +
* What is Ockham's razor? Does it make sense?
 +
:* To Bayesian philosophy, we show a reasonable solution of related problem: '''what is the best parameter space given a data?'''.
  
 
===Main paper===
 
===Main paper===
แถว 29: แถว 31:
 
===Supplementary===
 
===Supplementary===
 
* David Mackay's book chapters '''2 (prerequisite)''', 20 and 22 [http://www.inference.phy.cam.ac.uk/itprnn/book.pdf get it here]
 
* David Mackay's book chapters '''2 (prerequisite)''', 20 and 22 [http://www.inference.phy.cam.ac.uk/itprnn/book.pdf get it here]
* Zoubin Ghahramani's slide [http://www.gatsby.ucl.ac.uk/~zoubin/course05/lect2m.pdf lecture2] [http://www.gatsby.ucl.ac.uk/~zoubin/course05/lect3.pdf lecture3] (DO NOT read PCA and FA)
+
* Zoubin Ghahramani's slide [http://www.gatsby.ucl.ac.uk/~zoubin/course05/lect1.pdf lecture1] [http://www.gatsby.ucl.ac.uk/~zoubin/course05/lect2m.pdf lecture2] [http://www.gatsby.ucl.ac.uk/~zoubin/course05/lect3.pdf lecture3] (DO NOT read PCA and FA)
 
* Chris Bishop's slide [http://research.microsoft.com/~cmbishop/downloads/Bishop-Exeter-BCS.ppt click] (DO NOT read variational inference)
 
* Chris Bishop's slide [http://research.microsoft.com/~cmbishop/downloads/Bishop-Exeter-BCS.ppt click] (DO NOT read variational inference)
  
แถว 35: แถว 37:
 
In order to understand how EM work, we have to understand a bit on Information entropy, the Jensen's inequality and Kullback-Leibler divergence. '''So please take a look at its properties before reading group: David Mackay's book chapter 2 is a very good introduction'''.
 
In order to understand how EM work, we have to understand a bit on Information entropy, the Jensen's inequality and Kullback-Leibler divergence. '''So please take a look at its properties before reading group: David Mackay's book chapter 2 is a very good introduction'''.
  
==May 18, 2007: The Monte Carlo Methods#1==
+
==May 18, 2007: Finite Markov Chains ==
[[Image:Mackay cover.gif|thumb|David Mackay's ground-breaking book]]  
+
[[Image:Mackay cover.gif|thumb|David Mackay's ground-breaking book [http://www.inference.phy.cam.ac.uk/itprnn/book.pdf click]]]  
Last meeting illustrated one way to do Bayesian inference: the EM algorithm. However, it can give us only the maximum likelihood of the parameter which is not we exactly want in Bayesian inference. This meeting will introduce more general Bayesian toolkit, namely, the monte carlo methods.
+
Last meeting illustrated one way to do Bayesian inference: the EM algorithm. However, it can give us only the maximum likelihood of the parameter which is not we exactly want in Bayesian inference. What we want in Bayesian inference is called '''marginalization''' (an inference based on averaging over all parameters). Normally, it is a real pain to calculate marginalization. Hence, our goal is to find a general Bayesian toolkit that can practically compute marginalization. Two most famous methods are (1) Monte Carlo sampling and (2) variational inference.
  
The topics we will cover are rejection sampling, uniform sampling, importance sampling, and introduction to Markov chain.
+
We want to cover Monte Carlo sampling but before that the notion of Markov chains is important.
 +
 
 +
In this meeting, we will see the class of ergodic Markov chains, the stationary distribution of an ergodic Markov chain, the detailed balance condition and a brilliant proof technique, namely, '''coupling'''.
 +
 
 +
===Main Material===
 +
* Eric Vigoda's lecture [http://www-static.cc.gatech.edu/~vigoda/MCMC_Course/MC-basics.pdf click].
  
 
===Prerequisite===
 
===Prerequisite===
You should know what is the '''eigen-decomposition''' of a matrix.
+
<strike>Mainly we require a fairly good knowledge on linear algebra [ftp://joshua.smcvt.edu/pub/hefferon/book/book.pdf E.g. pp. 347-358 of this book].
 +
* You should know what is the '''eigen-decomposition''' of a matrix, or in other words a '''diagonalization'''</strike>.
 +
 
 +
===Supplementary: Introduction to Markov Chain===
 +
* A lecture note by Olle Häggström [http://citeseer.ist.psu.edu/493576.html click (hurry up before last!)]
 +
* A draft book by Levin, Perez and Wilmer [http://www.oberlin.edu/markov/book/ click]
 +
* A book chapter by C. M. Grinstead and J. L. Snell [http://www.dartmouth.edu/~chance/teaching_aids/books_articles/probability_book/Chapter11.pdf very good for a real newbie]
 +
* A more advanced introduction by Jerrum and Sinclair [http://citeseer.ist.psu.edu/jerrum96markov.html click]
 +
 
 +
==NOTE FOR UNUSUAL DATE: SUNDAY May 27, 2007==
 +
Some Applications of finite Markov Chains. Continue from last meeting. Revisit a definition of aperiodic and irreducible Markov chains. Three examples of reversible and non-reversible chains. Applications to statistical physics (hard-core model, q-coloring) and approximate counting (counting knapsack solutions).
 +
 
 +
The materials are from Olle Haggstorm's book chapters 6 and 7 and the knapsack example of Jerrum and Sinclair.
 +
 
 +
==June 1, 2007: (Engineering) Monte Carlo Methods==
 +
 
 +
The topics we will cover are rejection sampling, uniform sampling, importance sampling, and the Markov chain Monte Carlo (MCMC) method, especially the Metropolis and the Gibbs algorithms. If there is enough time, Jung will briefly explain slice sampling, simulated annealing, MCMCMC and/or asymmetric Metropolis-Hasting methods.
 +
 
 
===Introduction to Monte Carlo===
 
===Introduction to Monte Carlo===
* David Mackay's Chapter 29
+
* David Mackay's book chapter 29
===Introduction to Markov Chain===
+
* Chris Bishop's book chapter 11
* Eric Vigoda's lecture.
+
* Radford Neal's famous technical report [http://www.cs.toronto.edu/~radford/ftp/review.pdf click]
 +
* Jordan et al.'s introduction of MCMC for machine learning [http://www.cs.berkeley.edu/~jordan/courses/281B-spring04/readings/mlintro.ps click]
 +
 
 +
===prerequisite===
 +
* Basically, all we need to know are topics from the last meeting
 +
:* You should know what is an '''ergodic Markov chain'''
 +
:* You should know what is a '''detailed-balance property''' of a Markov chain
 +
 
 +
==June 8, 2007: Learning via Fourier transform ==
 +
This meeting will be lead by Parinya Chalermsook from the University of Chicago. He will talk about the connection between Fourier analysis and machine learning. The abstract is given below.
 +
 
 +
=== Abstract ===
 +
Fourier analysis is one of the major areas in mathematics research. The fact that any function can be written as a linear combination of Fourier basis is well-known to the community. We shall consider an important special case in which we want to learn a boolean function defined on hypercube. I will show that learning boolean functions reduces to learning large Fourier coefficients of that function. Then I will present two generic learning algorithms that solves this problem under some assumptions on a class of functions to be learned. If time permits, we will see the correctness proof of both algorithms.
 +
 
 +
==June 15, 2007: Back to Basics (The Family of Tail Inequalities) ==
 +
Markov, Chebychev, Chernoff bounds with applications to coin tossing, statistical parameter estimation and set balancing.
 +
 
 +
This talk is planned to talk by Aun+.
 +
 
 +
==(THURSDAY; START 12.15 ) June 21, 2007: Nested sampling and MCMCMC ==
 +
This might be the last talk of this series. It gonna be Thursday at 12.15 instead of Friday 13.00.
 +
 
 +
This is a part of my works at Cambridge with DJCM. This talk will introduce the recent Monte Carlo technique named '''nested sampling''' due to John Skilling. Nested sampling can be used to calculate any (unnormalized) integrations, especially for those of evidence and free-energy calculations.
  
==May 25, 2007: The Monte Carlo Methods#2==
+
However, there is one difficult sub-task of nested sampling, namely, to sample a point uniformly from a truncated distribution. There are two methods proposed to solve the task: the first is just an ordinary MCMC algorithm and the second is the so-called ellipsoid method (based on rejection sampling of a mixture-of-uniform distribution). These two methods have pros and cons.
This meeting will cover the Markov chain Monte Carlo (MCMC) method, especially the Metropolis and the Gibbs algorithms. If there is enough time, we may go further to slice sampling, simulated annealing, MCMCMC and asymmetric Metropolis-Hasting methods.
+
My contribution is to propose a way to solve the problem by using Metropolis-coupled MCMC to validly link those two methods.
  
===prerequisite===
+
===references===
* You should know what is an '''ergodic Markov chain'''
+
* Nested sampling [http://www.inference.phy.cam.ac.uk/bayesys/ click]
* You should know what is a '''detailed-balance property''' of a Markov chain
+
* Simulated tempering [http://www.jstor.org/view/01621459/di986005/98p0228m/0 click]
 +
* Slice sampling [http://projecteuclid.org/DPubS?service=UI&version=1.0&verb=Display&handle=euclid.aos/1056562461 click]
 +
* Metropolis-coupled MCMC [http://www.stat.umn.edu/geyer/8931/c.pdf click]
 +
* Ellipsoid method [http://arxiv.org/abs/0704.3704 click]
 +
* See also Fill & Aldous' book (chapter MCMC) [http://www.stat.berkeley.edu/~aldous/RWG/book.html click]
 +
 
 +
==Useful Materials==
 +
* [http://www.dartmouth.edu/~chance/teaching_aids/books_articles/probability_book/amsbook.mac.pdf Basic probabilisy book ]
 +
* [http://research.microsoft.com/users/lovasz/dmbook.ps Lovasz's discrete mathematics ]
 +
* [http://research.microsoft.com/~minka/papers/matrix/ Tom Minka's in-depth notes on matrix algebra]
 +
* [http://www.uwlax.edu/faculty/will/svd/index.html Good introduction to SVD]
 +
* [http://www.math.auckland.ac.nz/~phy707/ Auckland's very good course about inverse problem: SVD, Regularization, Bayesian Methods, Dynamic Filtering and MCMC]
 +
* Some notes about Bayesian view on Ockham's razor
 +
:* [http://www.gatsby.ucl.ac.uk/~zoubin/papers/05occam/ A note by Ian Murray and Zoubin Ghahramani]
 +
:* Also read David Mackay's chapter 28.
 +
* Other ML courses
 +
:* Princeton [http://www.cs.princeton.edu/courses/archive/spring06/cos511/schedule.html click]
 +
:* UC Berkeley [http://www.cs.berkeley.edu/~jordan/courses/281B-spring04/lectures.html click]
 +
:* UBC (British-Columbia) [http://www.cs.ubc.ca/~murphyk/Teaching/CS540_Fall05/LectureNotes/index.html click]
 +
:* CMU [http://www.cs.cmu.edu/~avrim/ML07/ click (see also his 04 notes)]
 +
:* Toronto [http://www.cs.toronto.edu/~roweis/csc2515/texts.html click]
 +
:* [[Machine Learning at U of C|Chicago (incomplete)]]
 +
:* MIT [http://www.mit.edu/~9.520/ course1] [http://ocw.mit.edu/OcwWeb/Mathematics/18-465Spring-2004/LectureNotes/index.htm more advances]
 +
* Richard Weber's lectures [http://www.statslab.cam.ac.uk/%7Errw1/teaching/index.html discrete optimization, game theory and control theory ''without tears'']
 +
* ''Stochastic processes without tear'' by Yuri Suhov [http://www.statslab.cam.ac.uk/~yms/ Best introduction lecture notes on the topic]
  
 
==Interesting papers (please VOTE)==
 
==Interesting papers (please VOTE)==
 
===Classics Works===
 
===Classics Works===
 
* The Strength of Weak Learnability by Rob Schapire [http://citeseer.ist.psu.edu/schapire90strength.html click].
 
* The Strength of Weak Learnability by Rob Schapire [http://citeseer.ist.psu.edu/schapire90strength.html click].
:: This is the origin of all boosting methods.
+
* Adaboost paper [http://www.face-rec.org/algorithms/Boosting-Ensemble/decision-theoretic_generalization.pdf click]
 +
:: These two papers are the origin of all boosting methods.
 +
 
 +
* A Theory of the Learnable by L. G. Valiant [http://www.cs.toronto.edu/~roweis/csc2515/readings/p1134-valiant.pdf click]
  
 
* Shannon's source coding and channel capacity theorems
 
* Shannon's source coding and channel capacity theorems
 
:: See Mackay's textbook.
 
:: See Mackay's textbook.
  
* PCA, SVD and Regularization
+
===Today's hot topics===
 +
* Graphical Models for Machine Learning (Bishop's book chapter 8).
 +
 
 +
* Kernel Methods and Gaussian Processes (Bishop's book chapter 6).
  
* See also Toronto's ML reading group [http://www.cs.toronto.edu/~roweis/csc2515/texts.html click]
+
* Clustering (continue from first meeting)
 +
:* Various proposal to select K: X-means, G-means, PG-means, Full-Bayes
 +
:* Reduction of dimensionality: random projection method
 +
:* Some theory: Arora's paper
 +
:* Spectral Clustering (the currently hottest clustering algorithm)

รุ่นแก้ไขปัจจุบันเมื่อ 15:09, 28 มิถุนายน 2550

This page is created to be the (temporary?) main information page of the MIND lab reading group. Contents in our reading group can be any old or new (but interesting or important to many members) research works.

Plan

First NOTE that, although there will be a leader in a reading group, a reading group is NOT a talk. A leader might be the one who is most familiar with the selected topic, but he/she does not necessarily know everything. In order to keep a reading group survive, everyone should try quite hard to understand the selected material so that the discussions in a reading group will be fruitful.

For the first 2-3 week, Jung will take the lead. He will cover three topics which are frequently mentioned in literatures and he is most familiar with.

After that, if it successes, we hope that we can continue the process and change the leader week by week. The leader should have time at least 2 weeks to read the topic.

Date and Time

At this moment, I plan to arrange the reading group in every friday afternoon, no more than two hours (1pm - 3pm), but let us talk for more convenient dates and times.

May 11, 2007: Mixture Models and EM algorithm

Bishop's PRML book

This topic is a nice introduction for Bayesian paradigm in machine learning. After this meeting, we should be able to answer the following questions:

  • What is the Bayesian machine learning paradigm?
  • The Bayes equation
  • Why being Bayesian is a good idea? What are advantages of the Bayesian paradigm over the classical paradigm?
  • It is intuitive, easy to understand (but might not easy to do)
  • It can solve the model selection problem
  • How can we train the learner in Bayesian paradigm?
  • This talk illustrates one Bayesian toolkit: the EM algorithm.
  • What is Ockham's razor? Does it make sense?
  • To Bayesian philosophy, we show a reasonable solution of related problem: what is the best parameter space given a data?.

Main paper

  • Chris Bishop's PRML book chapter 9. ( I have one copy left; anyone who do not have this book please feel free to borrow me)

Supplementary

  • David Mackay's book chapters 2 (prerequisite), 20 and 22 get it here
  • Zoubin Ghahramani's slide lecture1 lecture2 lecture3 (DO NOT read PCA and FA)
  • Chris Bishop's slide click (DO NOT read variational inference)

Prerequisite

In order to understand how EM work, we have to understand a bit on Information entropy, the Jensen's inequality and Kullback-Leibler divergence. So please take a look at its properties before reading group: David Mackay's book chapter 2 is a very good introduction.

May 18, 2007: Finite Markov Chains

David Mackay's ground-breaking book click

Last meeting illustrated one way to do Bayesian inference: the EM algorithm. However, it can give us only the maximum likelihood of the parameter which is not we exactly want in Bayesian inference. What we want in Bayesian inference is called marginalization (an inference based on averaging over all parameters). Normally, it is a real pain to calculate marginalization. Hence, our goal is to find a general Bayesian toolkit that can practically compute marginalization. Two most famous methods are (1) Monte Carlo sampling and (2) variational inference.

We want to cover Monte Carlo sampling but before that the notion of Markov chains is important.

In this meeting, we will see the class of ergodic Markov chains, the stationary distribution of an ergodic Markov chain, the detailed balance condition and a brilliant proof technique, namely, coupling.

Main Material

  • Eric Vigoda's lecture click.

Prerequisite

Mainly we require a fairly good knowledge on linear algebra E.g. pp. 347-358 of this book.

  • You should know what is the eigen-decomposition of a matrix, or in other words a diagonalization.

Supplementary: Introduction to Markov Chain

NOTE FOR UNUSUAL DATE: SUNDAY May 27, 2007

Some Applications of finite Markov Chains. Continue from last meeting. Revisit a definition of aperiodic and irreducible Markov chains. Three examples of reversible and non-reversible chains. Applications to statistical physics (hard-core model, q-coloring) and approximate counting (counting knapsack solutions).

The materials are from Olle Haggstorm's book chapters 6 and 7 and the knapsack example of Jerrum and Sinclair.

June 1, 2007: (Engineering) Monte Carlo Methods

The topics we will cover are rejection sampling, uniform sampling, importance sampling, and the Markov chain Monte Carlo (MCMC) method, especially the Metropolis and the Gibbs algorithms. If there is enough time, Jung will briefly explain slice sampling, simulated annealing, MCMCMC and/or asymmetric Metropolis-Hasting methods.

Introduction to Monte Carlo

  • David Mackay's book chapter 29
  • Chris Bishop's book chapter 11
  • Radford Neal's famous technical report click
  • Jordan et al.'s introduction of MCMC for machine learning click

prerequisite

  • Basically, all we need to know are topics from the last meeting
  • You should know what is an ergodic Markov chain
  • You should know what is a detailed-balance property of a Markov chain

June 8, 2007: Learning via Fourier transform

This meeting will be lead by Parinya Chalermsook from the University of Chicago. He will talk about the connection between Fourier analysis and machine learning. The abstract is given below.

Abstract

Fourier analysis is one of the major areas in mathematics research. The fact that any function can be written as a linear combination of Fourier basis is well-known to the community. We shall consider an important special case in which we want to learn a boolean function defined on hypercube. I will show that learning boolean functions reduces to learning large Fourier coefficients of that function. Then I will present two generic learning algorithms that solves this problem under some assumptions on a class of functions to be learned. If time permits, we will see the correctness proof of both algorithms.

June 15, 2007: Back to Basics (The Family of Tail Inequalities)

Markov, Chebychev, Chernoff bounds with applications to coin tossing, statistical parameter estimation and set balancing.

This talk is planned to talk by Aun+.

(THURSDAY; START 12.15 ) June 21, 2007: Nested sampling and MCMCMC

This might be the last talk of this series. It gonna be Thursday at 12.15 instead of Friday 13.00.

This is a part of my works at Cambridge with DJCM. This talk will introduce the recent Monte Carlo technique named nested sampling due to John Skilling. Nested sampling can be used to calculate any (unnormalized) integrations, especially for those of evidence and free-energy calculations.

However, there is one difficult sub-task of nested sampling, namely, to sample a point uniformly from a truncated distribution. There are two methods proposed to solve the task: the first is just an ordinary MCMC algorithm and the second is the so-called ellipsoid method (based on rejection sampling of a mixture-of-uniform distribution). These two methods have pros and cons. My contribution is to propose a way to solve the problem by using Metropolis-coupled MCMC to validly link those two methods.

references

  • Nested sampling click
  • Simulated tempering click
  • Slice sampling click
  • Metropolis-coupled MCMC click
  • Ellipsoid method click
  • See also Fill & Aldous' book (chapter MCMC) click

Useful Materials

  • Other ML courses

Interesting papers (please VOTE)

Classics Works

  • The Strength of Weak Learnability by Rob Schapire click.
  • Adaboost paper click
These two papers are the origin of all boosting methods.
  • A Theory of the Learnable by L. G. Valiant click
  • Shannon's source coding and channel capacity theorems
See Mackay's textbook.

Today's hot topics

  • Graphical Models for Machine Learning (Bishop's book chapter 8).
  • Kernel Methods and Gaussian Processes (Bishop's book chapter 6).
  • Clustering (continue from first meeting)
  • Various proposal to select K: X-means, G-means, PG-means, Full-Bayes
  • Reduction of dimensionality: random projection method
  • Some theory: Arora's paper
  • Spectral Clustering (the currently hottest clustering algorithm)