home *** CD-ROM | disk | FTP | other *** search
- Xref: sparky comp.arch:10675 comp.lang.forth:3485
- Path: sparky!uunet!know!cass.ma02.bull.com!mips2!news.bbn.com!usc!zaphod.mps.ohio-state.edu!darwin.sura.net!Sirius.dfn.de!Urmel.Informatik.RWTH-Aachen.DE!messua!dak
- From: dak@messua.informatik.rwth-aachen.de (David Kastrup)
- Newsgroups: comp.arch,comp.lang.forth
- Subject: Re: What's RIGHT with stack machines
- Message-ID: <dak.721621337@messua>
- Date: 13 Nov 92 02:22:17 GMT
- References: <Bx5AIr.EAy.2@cs.cmu.edu> <1992Nov4.103008.2641@Informatik.TU-Muenchen.DE> <MIKE.92Nov9004026@guam.vlsivie.tuwien.ac.at> <id.D6UU.5Z@ferranti.com> <lg0eheINNs7l@exodus.Eng.Sun.COM>
- Sender: news@Urmel.Informatik.RWTH-Aachen.DE (Newsfiles Owner)
- Organization: Rechnerbetrieb Informatik / RWTH Aachen
- Lines: 35
- Nntp-Posting-Host: messua
-
-
- >>I predict that before too long all high performance commodity micros will do
- >>scheduling at runtime.
-
- >I don't think the situation is as clear-cut as you describe it.
-
- >There are certain scheduling techniques that tend to work well no
- >matter where you use them -- as long as you have enough registers, it
- >doesn't hurt to stick a few instructions between a load into a
- >register and the subsequent use of that register. On superscalar
- >machines, it is generally a bad idea to do too many of exactly the
- >same thing in a big lump (i.e., ld, ld, ld or fadd, fadd, fadd), and
- >if you have the option of mixing things up a bit, you should.
- >Increasing the size of basic blocks (through code replication,
- >typically) is another trick for helping most machines, since branches
- >often stall pipelines.
-
- >These are general rules, and they won't yield optimum performance, but
- >you must trade them off against the costs of generating
- >implementation-specific code. Those costs include
-
- > (1) less sharing of text and libraries
- > (2) lots of cache flushing and
- > (3) scheduling and register allocation are not necessarily cheap.
-
- See the MIPS processors (micro without interlocking pipeline stages) for
- a clever design idea: they have simply left out all instruction scheduling.
- If you start a command using a register which a previous command still has
- to fill up, there is no delay, but the OLD value is used.
- So the compiler/assembler will have to include nops by hand in order to
- prevent register clashes. On the other hand, command interlocking can be
- done by compilers. And you can use the die space thus gained for other
- purposes (bigger cashes, etc.). And because of the Harvard bus architecture,
- there is no difference between waiting for interlock or fetching a nop.
- Only problem: code tends to get longer.
-