home *** CD-ROM | disk | FTP | other *** search
- Newsgroups: sci.math.stat
- Path: sparky!uunet!paladin.american.edu!gatech!rpi!zaphod.mps.ohio-state.edu!saimiri.primate.wisc.edu!sdd.hp.com!news.cs.indiana.edu!umn.edu!charlie
- From: charlie@umnstat.stat.umn.edu (Charles Geyer)
- Subject: Re: Metropolis sampler
- Message-ID: <1993Jan8.232157.1897@news2.cis.umn.edu>
- Keywords: Metropolis sampler
- Sender: news@news2.cis.umn.edu (Usenet News Administration)
- Nntp-Posting-Host: isles.stat.umn.edu
- Organization: School of Statistics, University of Minnesota
- References: <1993Jan7.120831.11046@cs.kuleuven.ac.be>
- Date: Fri, 8 Jan 1993 23:21:57 GMT
- Lines: 171
-
- In article <1993Jan7.120831.11046@cs.kuleuven.ac.be> maurits@cs.kuleuven.ac.be
- (Maurits Malfait) writes:
-
- >We are working on an algorithm that incorporates a Metropolis
- >sampler (*). Therefore we are interested in references and
- >information concerning:
- >
- >1. The convergence properties of a Metropolis sampler:
- > - when is the starting state of a Metropolis sampler
- > 'forgotten'?
- > - how fast and how good does the sampler approximate the
- > Boltzmann distribution?
-
- IMHO that is the wrong question to ask. The right question is how fast
- does the sampler mix. This is discussed in the current issue of Statistical
- Science in papers by Gelman and Rubin and by myself with comments by about
- 10 discussants. It should be out real soon now.
-
- The output of the sampler is a Markov chain X_1, X_2, ... whose equilibrium
- distribution is the distribution of interest, the one you want to sample
- from. Averages with respect to that distribution are approximated by
- averages over the chain.
-
- 1 n
- E g(X) is approximated by gbar_n = - sum g(X_i)
- n i=1
-
- the asymptotic distribution of gbar_n is usually asymptotically normal
-
- sqrt(n) [gbar_n - E g(X)] converges in distribution to N(0,sigma^2)
-
- where
-
- infinity
- sigma^2 = gamma_0 + 2 sum gamma_i (*)
- i=1
-
- where
-
- gamma_i = Cov[ g(X_0), g(X_i) ]
-
- at stationarity, i. e. if X_0 has the equilibrium distribution and X_i
- is the i-th step of a chain started at X_0.
-
- More precisely, if the sampler is reversible and irreducible, the CLT holds
- if the sum (*) is finite (Kipnis and Varadhan, Commun. Math. Phys., 1986)
- and in all nice cases (the precise technical condition is Harris recurrence),
- the asymptotics do not depend on the starting distribution.
-
- Hence asymptotically the starting distribution is irrelevant.
-
- In practice one does usually want to throw away a short initial segment of
- a run, but there is no reason to run until the initial state is "forgotten"
- or the chain "reaches equilibrium". Strictly speaking, it never does. All
- that is necessary is that the chain reach a state that is reasonably likely,
- a state that has reasonable probability of occuring in a sample of the
- size collected. If one knows the mode of the distribution, for example,
- one could start there with no burn-in.
-
- The trick is to prove that (*) is finite and then to estimate it accurately.
- This is still an area of active research, but I do give some advice in the
- Statistical Science paper mentioned above.
-
- >2. Variants of the Metropolis algorithm and recent developments.
-
- The main variant is the so-called Metropolis-Hastings algorithm (Hastings,
- Biometrika, 1970), which I described here last year
-
- > Newsgroups: sci.math.stat
- > From: charlie@umnstat.stat.umn.edu (Charles Geyer)
- > Subject: Re: Sampling from mixtures of distributions
- > Organization: School of Statistics, University of Minnesota
- > Date: Thu, 23 Apr 1992 00:07:22 GMT
- >
- > In article <20514@castle.ed.ac.uk> efsv02@castle.ed.ac.uk (M J Brewer) writes:
- >
- > >I am conducting Gibbs sampling on graphical models.
- > >
- > >I need to be able to sample from, for example,
- > >
- > > f( x ) = k * exp(-.5*((x-u)/v)^2) * exp(-.5*((g(x)-w)/y)^2)
- > >
- > >where k is an unknown constant ( => rejection sampling)
- > > u,w are known "means",
- > > v,y are known "variances", (cf. Normal)
- > > and g(x) is a NON-LINEAR function of x.
- > >
- > >I'm not sure about the envelope for rejection sampling; I've plotted the
- > >above function, and for a quadratic g(), a Normal envelope would seem ok,
- > >but I'm not sure about the choice of variance; also I need to be sure
- > >that the envelope "blankets" f() in the tails.
- > >
- > >Does anyone have any ideas?
- >
- > Yes. If the one-dimensional conditionals needed for the Gibbs sampler
- > are hard to sample from, DON'T USE the Gibbs sampler.
- >
- > There is sure to be a much simpler Metropolis-Hastings algorithm that
- > not only will be easier to code but also will run faster.
- >
- > For those who may still think that the Gibbs sampler is the name of the
- > game, the Metropolis-Hastings algorithm (Hastings, Biometrika, 1970) is
- > the following.
- >
- > One update step (say just updating one variable) moving from the state x
- > to a new state x'
- >
- > 1. simulate a realization y from the density q(x, )
- >
- > [the density q is chosen to be easy to simulate from but is
- > otherwise arbitrary]
- >
- > 2. calculate the Hastings ratio
- >
- > p(y) q(y,x)
- > r = -----------
- > p(x) q(x,y)
- >
- > [where p is the density of the distribution of interest times
- > an unknown proportionality constant which cancels]
- >
- > 3. (Metropolis rejection)
- >
- > if (r >= 1) then
- > x' := y
- > else
- > generate u uniform(0,1)
- > if u < r then
- > x' := y
- > else
- > x' := x
- > fi
- > fi
- >
- > The Gibbs sampler is the special case where q is the one-dimensional
- > conditional. It avoids the rejection step (r is always exactly 1),
- > but it does so at the cost of complicating step 1 and so is rarely
- > worth the trouble unless the sampling from one-dimensional conditionals
- > is very fast.
- >
- > Almost any q will do (one does have to check that the resulting Markov
- > chain is irreducible). Choose one that makes the update fast. The best
- > way to speed up your sampler is to speed up the update even if that increases
- > the sampling variance a bit. Though more samples are required for the
- > same accuracy, they are generated much faster.
-
- The Metropolis algorithm requires q(x,y) = q(y,x) for no particular reason.
- The more general Hastings update permits a wider range of "proposal
- distributions".
-
- BTW, note that there is no need for the distribution being sampled to
- be a Boltzman distribution (what statisticians call an exponential family
- model).
-
- >3. Possibilities to accelerate a Metropolis sampler.
-
- There are many. One is the Swendsen-Wang algorithm and related schemes
- (Swendsen and Wang, Phys. Rev. Lett., 1987; Wang and Swendsen, Physica A,
- 1990). Another is the Metropolis-coupled chain scheme mentioned by Geyer
- (Computing Science and Statistics: Proc. 23rd Symp. Interface, 1991).
- These are, however, special cases of the general Metropolis-Hastings
- algorithm.
-
- The main point is that the Metropolis-Hastings algorithm gives tremendous
- scope for experimentation. There are lots of things to try.
-
- --
- Charles Geyer
- School of Statistics
- University of Minnesota
- charlie@umnstat.stat.umn.edu
-