NetNews Usenet Archive 1993 #1

home *** CD-ROM | disk | FTP | other *** search

/ NetNews Usenet Archive 1993 #1 / NN_1993_1.iso / spool / sci / math / stat / 2735 < prev next >

Wrap

Text File | 1993-01-08 | 7.1 KB | 185 lines

Newsgroups: sci.math.stat Path: sparky!uunet!paladin.american.edu!gatech!rpi!zaphod.mps.ohio-state.edu!saimiri.primate.wisc.edu!sdd.hp.com!news.cs.indiana.edu!umn.edu!charlie From: charlie@umnstat.stat.umn.edu (Charles Geyer) Subject: Re: Metropolis sampler Message-ID: <1993Jan8.232157.1897@news2.cis.umn.edu> Keywords: Metropolis sampler Sender: news@news2.cis.umn.edu (Usenet News Administration) Nntp-Posting-Host: isles.stat.umn.edu Organization: School of Statistics, University of Minnesota References: <1993Jan7.120831.11046@cs.kuleuven.ac.be> Date: Fri, 8 Jan 1993 23:21:57 GMT Lines: 171 In article <1993Jan7.120831.11046@cs.kuleuven.ac.be> maurits@cs.kuleuven.ac.be (Maurits Malfait) writes: >We are working on an algorithm that incorporates a Metropolis >sampler (*). Therefore we are interested in references and >information concerning: > >1. The convergence properties of a Metropolis sampler: > - when is the starting state of a Metropolis sampler > 'forgotten'? > - how fast and how good does the sampler approximate the > Boltzmann distribution? IMHO that is the wrong question to ask. The right question is how fast does the sampler mix. This is discussed in the current issue of Statistical Science in papers by Gelman and Rubin and by myself with comments by about 10 discussants. It should be out real soon now. The output of the sampler is a Markov chain X_1, X_2, ... whose equilibrium distribution is the distribution of interest, the one you want to sample from. Averages with respect to that distribution are approximated by averages over the chain. 1 n E g(X) is approximated by gbar_n = - sum g(X_i) n i=1 the asymptotic distribution of gbar_n is usually asymptotically normal sqrt(n) [gbar_n - E g(X)] converges in distribution to N(0,sigma^2) where infinity sigma^2 = gamma_0 + 2 sum gamma_i (*) i=1 where gamma_i = Cov[ g(X_0), g(X_i) ] at stationarity, i. e. if X_0 has the equilibrium distribution and X_i is the i-th step of a chain started at X_0. More precisely, if the sampler is reversible and irreducible, the CLT holds if the sum (*) is finite (Kipnis and Varadhan, Commun. Math. Phys., 1986) and in all nice cases (the precise technical condition is Harris recurrence), the asymptotics do not depend on the starting distribution. Hence asymptotically the starting distribution is irrelevant. In practice one does usually want to throw away a short initial segment of a run, but there is no reason to run until the initial state is "forgotten" or the chain "reaches equilibrium". Strictly speaking, it never does. All that is necessary is that the chain reach a state that is reasonably likely, a state that has reasonable probability of occuring in a sample of the size collected. If one knows the mode of the distribution, for example, one could start there with no burn-in. The trick is to prove that (*) is finite and then to estimate it accurately. This is still an area of active research, but I do give some advice in the Statistical Science paper mentioned above. >2. Variants of the Metropolis algorithm and recent developments. The main variant is the so-called Metropolis-Hastings algorithm (Hastings, Biometrika, 1970), which I described here last year > Newsgroups: sci.math.stat > From: charlie@umnstat.stat.umn.edu (Charles Geyer) > Subject: Re: Sampling from mixtures of distributions > Organization: School of Statistics, University of Minnesota > Date: Thu, 23 Apr 1992 00:07:22 GMT > > In article <20514@castle.ed.ac.uk> efsv02@castle.ed.ac.uk (M J Brewer) writes: > > >I am conducting Gibbs sampling on graphical models. > > > >I need to be able to sample from, for example, > > > > f( x ) = k * exp(-.5*((x-u)/v)^2) * exp(-.5*((g(x)-w)/y)^2) > > > >where k is an unknown constant ( => rejection sampling) > > u,w are known "means", > > v,y are known "variances", (cf. Normal) > > and g(x) is a NON-LINEAR function of x. > > > >I'm not sure about the envelope for rejection sampling; I've plotted the > >above function, and for a quadratic g(), a Normal envelope would seem ok, > >but I'm not sure about the choice of variance; also I need to be sure > >that the envelope "blankets" f() in the tails. > > > >Does anyone have any ideas? > > Yes. If the one-dimensional conditionals needed for the Gibbs sampler > are hard to sample from, DON'T USE the Gibbs sampler. > > There is sure to be a much simpler Metropolis-Hastings algorithm that > not only will be easier to code but also will run faster. > > For those who may still think that the Gibbs sampler is the name of the > game, the Metropolis-Hastings algorithm (Hastings, Biometrika, 1970) is > the following. > > One update step (say just updating one variable) moving from the state x > to a new state x' > > 1. simulate a realization y from the density q(x, ) > > [the density q is chosen to be easy to simulate from but is > otherwise arbitrary] > > 2. calculate the Hastings ratio > > p(y) q(y,x) > r = ----------- > p(x) q(x,y) > > [where p is the density of the distribution of interest times > an unknown proportionality constant which cancels] > > 3. (Metropolis rejection) > > if (r >= 1) then > x' := y > else > generate u uniform(0,1) > if u < r then > x' := y > else > x' := x > fi > fi > > The Gibbs sampler is the special case where q is the one-dimensional > conditional. It avoids the rejection step (r is always exactly 1), > but it does so at the cost of complicating step 1 and so is rarely > worth the trouble unless the sampling from one-dimensional conditionals > is very fast. > > Almost any q will do (one does have to check that the resulting Markov > chain is irreducible). Choose one that makes the update fast. The best > way to speed up your sampler is to speed up the update even if that increases > the sampling variance a bit. Though more samples are required for the > same accuracy, they are generated much faster. The Metropolis algorithm requires q(x,y) = q(y,x) for no particular reason. The more general Hastings update permits a wider range of "proposal distributions". BTW, note that there is no need for the distribution being sampled to be a Boltzman distribution (what statisticians call an exponential family model). >3. Possibilities to accelerate a Metropolis sampler. There are many. One is the Swendsen-Wang algorithm and related schemes (Swendsen and Wang, Phys. Rev. Lett., 1987; Wang and Swendsen, Physica A, 1990). Another is the Metropolis-coupled chain scheme mentioned by Geyer (Computing Science and Statistics: Proc. 23rd Symp. Interface, 1991). These are, however, special cases of the general Metropolis-Hastings algorithm. The main point is that the Metropolis-Hastings algorithm gives tremendous scope for experimentation. There are lots of things to try. -- Charles Geyer School of Statistics University of Minnesota charlie@umnstat.stat.umn.edu