MINDLAB READING

This page is created to be the (temporary?) main information page of the MIND lab reading group. Contents in our reading group can be any old or new (but interesting or important to many members) research works.

เนื้อหา

1 Plan
2 Date and Time
3 May 11, 2007: Mixture Models and EM algorithm
4 May 18, 2007: Finite Markov Chains
5 May 25, 2007: Monte Carlo Methods
- 5.1 Introduction to Monte Carlo
- 5.2 prerequisite
6 June 1, 2007: Continue from previous meetings
7 June 8, 2007: Least Square Method and Beyonds
8 Useful Materials
9 Interesting papers (please VOTE)
- 9.1 Classics Works
- 9.2 Today's hot topics

Plan

First NOTE that, although there will be a leader in a reading group, a reading group is NOT a talk. A leader might be the one who is most familiar with the selected topic, but he/she does not necessarily know everything. In order to keep a reading group survive, everyone should try quite hard to understand the selected material so that the discussions in a reading group will be fruitful.

For the first 2-3 week, Jung will take the lead. He will cover three topics which are frequently mentioned in literatures and he is most familiar with.

After that, if it successes, we hope that we can continue the process and change the leader week by week. The leader should have time at least 2 weeks to read the topic.

Date and Time

At this moment, I plan to arrange the reading group in every friday afternoon, no more than two hours (1pm - 3pm), but let us talk for more convenient dates and times.

May 11, 2007: Mixture Models and EM algorithm

Bishop's PRML book

This topic is a nice introduction for Bayesian paradigm in machine learning. After this meeting, we should be able to answer the following questions:

What is the Bayesian machine learning paradigm?

The Bayes equation

Why being Bayesian is a good idea? What are advantages of the Bayesian paradigm over the classical paradigm?

It is intuitive, easy to understand (but might not easy to do)
It can solve the model selection problem

How can we train the learner in Bayesian paradigm?

This talk illustrates one Bayesian toolkit: the EM algorithm.

What is Ockham's razor? Does it make sense?

To Bayesian philosophy, we show a reasonable solution of related problem: what is the best parameter space given a data?.

Main paper

Chris Bishop's PRML book chapter 9. ( I have one copy left; anyone who do not have this book please feel free to borrow me)

Supplementary

David Mackay's book chapters 2 (prerequisite), 20 and 22 get it here
Zoubin Ghahramani's slide lecture1 lecture2 lecture3 (DO NOT read PCA and FA)
Chris Bishop's slide click (DO NOT read variational inference)

Prerequisite

In order to understand how EM work, we have to understand a bit on Information entropy, the Jensen's inequality and Kullback-Leibler divergence. So please take a look at its properties before reading group: David Mackay's book chapter 2 is a very good introduction.

May 18, 2007: Finite Markov Chains

David Mackay's ground-breaking book click

Last meeting illustrated one way to do Bayesian inference: the EM algorithm. However, it can give us only the maximum likelihood of the parameter which is not we exactly want in Bayesian inference. What we want in Bayesian inference is called marginalization (an inference based on averaging over all parameters). Normally, it is a real pain to calculate marginalization. Hence, our goal is to find a general Bayesian toolkit that can practically compute marginalization. Two most famous methods are (1) Monte Carlo sampling and (2) variational inference.

We want to cover Monte Carlo sampling but before that the notion of Markov chains is important.

In this meeting, we will see the class of ergodic Markov chains, the stationary distribution of an ergodic Markov chain, the detailed balance condition and a brilliant proof technique, namely, coupling.

Main Material

Eric Vigoda's lecture click.

Prerequisite

~~Mainly we require a fairly good knowledge on linear algebra E.g. pp. 347-358 of this book.~~

You should know what is the eigen-decomposition of a matrix, or in other words a diagonalization.

Supplementary: Introduction to Markov Chain

A lecture note by Olle Häggström click (hurry up before last!)
A draft book by Levin, Perez and Wilmer click
A book chapter by C. M. Grinstead and J. L. Snell very good for a real newbie
A more advanced introduction by Jerrum and Sinclair click

May 25, 2007: Monte Carlo Methods

The topics we will cover are rejection sampling, uniform sampling, importance sampling, and the Markov chain Monte Carlo (MCMC) method, especially the Metropolis and the Gibbs algorithms. If there is enough time, Jung will briefly explain slice sampling, simulated annealing, MCMCMC and/or asymmetric Metropolis-Hasting methods.

Introduction to Monte Carlo

David Mackay's book chapter 29
Chris Bishop's book chapter 11
Radford Neal's famous technical report click
Jordan et al.'s introduction of MCMC for machine learning click

prerequisite

Basically, all we need to know are topics from the last meeting

You should know what is an ergodic Markov chain
You should know what is a detailed-balance property of a Markov chain

June 1, 2007: Continue from previous meetings

I'm sure that I cannot cover the topics of Markov chains, Monte Carlo and MCMC in just 2 meetings (starting from zero knowledge). Hence, this meeting intends to finish the program.

June 8, 2007: Least Square Method and Beyonds

This meeting will lead by Parinya Chalermsook from the University of Chicago. He will talk about ordinary least square (OLS), principal component analysis (PCA) and regularization. See his notes about the topics here.

Useful Materials

Basic probabilisy book
Lovasz's discrete mathematics
Tom Minka's in-depth notes on matrix algebra
Some notes about Bayesian view on Ockham's razor

A note by Ian Murray and Zoubin Ghahramani
Also read David Mackay's chapter 28.

Other ML courses

Princeton click
UC Berkeley click
CMU click (see also his 04 notes)
Toronto click

Richard Weber's lectures discrete optimization, game theory and control theory without tears
Stochastic processes without tear by Yuri Suhov Best introduction lecture notes on the topic

Interesting papers (please VOTE)

Classics Works

The Strength of Weak Learnability by Rob Schapire click.
Adaboost paper [1]

These two papers are the origin of all boosting methods.

A Theory of the Learnable by L. G. Valiant click

Shannon's source coding and channel capacity theorems

See Mackay's textbook.

Inverse Problems: PCA, SVD and Regularization (Ockham's razor revisited) click.

Today's hot topics

Graphical Models for Machine Learning (Bishop's book chapter 8).

Kernel Methods and Gaussian Processes (Bishop's book chapter 6).

Clustering (continue from first meeting)

Various proposal to select K: X-means, G-means, PG-means, Full-Bayes
Reduction of dimensionality: random projection method
Some theory: Arora's paper
Spectral Clustering (the currently hottest clustering algorithm)

MINDLAB READING

เนื้อหา

Plan

Date and Time

May 11, 2007: Mixture Models and EM algorithm

Main paper

Supplementary

Prerequisite

May 18, 2007: Finite Markov Chains

Main Material

Prerequisite

Supplementary: Introduction to Markov Chain

May 25, 2007: Monte Carlo Methods

Introduction to Monte Carlo

prerequisite

June 1, 2007: Continue from previous meetings

June 8, 2007: Least Square Method and Beyonds

Useful Materials

Interesting papers (please VOTE)

Classics Works

Today's hot topics

รายการเลือกการนำทาง

เครื่องมือส่วนตัว

เนมสเปซ

สิ่งที่แตกต่าง

ดู

เพิ่มเติม

ค้นหา

การนำทาง

เครื่องมือ