home *** CD-ROM | disk | FTP | other *** search
- Path: sparky!uunet!ogicse!plains!news.u.washington.edu!milton!chuckb
- From: chuckb@milton.u.washington.edu (Chuck Bass)
- Newsgroups: comp.sys.sgi
- Subject: Fastest way to transform points???
- Message-ID: <chuckb.714532056@milton>
- Date: 23 Aug 92 01:07:36 GMT
- Article-I.D.: milton.chuckb.714532056
- Sender: news@u.washington.edu (USENET News System)
- Organization: University of Washington
- Lines: 80
-
- I recently attempted to speed up some things that I was doing.
- Namely I wanted to decrease a bottleneck's effect. I found I
- was transforming points and multiplying matrices a lot.
-
- I us a standard matrix multiplier for this purpose. IE unrolled
- 4x4 matrix multiply.
-
- I attempted to improve performace using something like
-
- pushmatrix();
- loadmatrix(M1);
- multmatrix(M2);
- getmatrix(M3);
- popmatrix();
-
-
- This fragment turns out to be about 1/3 the speed of my 4x4
- matrix multiply
-
- 100000 mat multiplies took 2.51 seconds
- 100000 point transforms took 0.76 seconds
- 100000 sgmat multiplies took 11.27 seconds
-
- The 1.05 second case is uses stack manipulations. I rewrote the
- code to only do the following:
-
- multmatrix(M2)
-
- This code did the following:
-
- 100000 mat multiplies took 2.51 seconds
- 100000 point transforms took 0.76 seconds
- 100000 sgmat multiplies took 2.45 seconds
-
-
- This leads me to believe that there is no hardware involved in
- the stack multmatrix routine. I suspect that the difference in
- performance is because of the matrix inversion that takes place
- when a loadmatrix call is made.
-
- These results seem to be somewhat consistant. IE a point takes
- around 1/4th the time of a matrix multiply. (I did do an sginap
- after I opend the window to give the window manager a chance to
- open the window etc)
-
- These results are for on a PI 4D25 (the 4D35 gives similar
- results only faster) hinv says:
- 1 20 MHZ IP6 Processor
- FPU: MIPS R2010A/R3010 VLSI Floating Point Chip
- Revision: 2.0
- CPU: MIPS R2000A/R3000 Processor Chip Revision: 2.0
- On-board serial ports: 2
- Data cache size: 32 Kbytes
- Instruction cache size: 64 Kbytes
- Main memory size: 16 Mbytes
- Integral Ethernet: ec0, version 0
- Genlock option installed
- Tape drive: unit 2 on SCSI controller 0: QIC 150
- Disk drive: unit 1 on SCSI controller 0
- Integral SCSI controller 0: Version WD33C93A
- Graphics board: GR1.2 Bit-plane, Z-buffer, Turbo options
- installed
-
- My question is. Is there a faster way using some of the "built
- in matrix 'stuff'"? If there is not are there faster ways of
- making the transform. I know I can reduce it to a 4x3 matrix
- multiply for a gain of 25%. Are there other such optimizations?
-
-
- Thanks,
-
-
- Chuck Bass
- College of Forest Systems Engineering
- University of Washington
- chuckb@u.washington.edu
-
- PS I need to transform the points to do collision detection.
- Currently the function clipbox is not implemented on our
- machine ;-(.
-