Interview: id's John Carmack
by Frank Bernier


I recently had the chance to interview id software's John Carmack, the father of 3D gaming. As the co-founder, owner, and lead programmer at id Software, Carmack has been working his magic since 1991 on legendary games like Wolfenstein 3D, DOOM, and, of course, Quake.

I asked John specific questions on how the recently announced Power Macintosh G4 could improve gaming for Mac users. I was particularly interested to hear his views on AltiVec and how it could improve 3D rendering.

Now onto the interview....


Bernier: First of all, thanks John for taking your time, it is much appreciated.

Carmack: You caught me at a pretty good time -- I am typing in PPC opcodes for the Mac dynamic bytecode compiler. It is mind-numbing work, and a dodge was welcome. :-)


Bernier: What do you think about the new Apple G4-velocity engine (aka, AltiVec)?

Carmack: It has the same basic capabilities as the Intel and AMD extensions, but it has several conveniences beyond them. The main one is just architectural -- trinary ops are a lot nicer than destructive binary ops. Like the PIII, but unlike AMD, the registers are separate, so they can be used at any time. There are improvements in permutation and other housekeeping tasks.

The AltiVec compiler support is better than the x86 solutions, but that is a mixed blessing. I don't consider encouraging people to write compiler specific code a particularly good thing. The incoming memory streaming is nice, but Intel's write combining is better for outgoing streaming.


Bernier: Will this improve how fast 3D rendering is done on the Mac? (Especially Quake 3 Arena)?

Carmack: Apple has AltiVec code in their upcoming OpenGL, but I don't have benchmark results yet. It will definitely be faster, but I can't say how much. In many cases, the B&W Macs are pixel fill limited, so you may have to go to a lower resolution to see any speedup. The first generation of G4 systems don't have the new memory controller, so they won't be taking much advantage of the increased G4 bandwidth.
3dfx's recent release of Voodoo 3 Glide drivers for the Mac signals the company's strong support for the Mac and is good news for gamers hungry for faster 3D for their Macs. I recently had a telephone interview with Bryan Speece, Director of Macintosh Marketing and Business Development about 3dfx's renewed involvement in the Mac market and what we can expect from 3dfx in the future.


Bernier: Why do fast Macs (B&W G3), even though they often beat Pentium machines in Photoshop head-to-head, trail behind PC machines in 3D games?

Carmack: Photoshop filters are excellent optimization testbeds, but they are rarely apples to apples comparisons because of their hand tuned nature. I'm pretty sure that if Intel put some of their top guys on the task, they could be competitive with PPC on most Photoshop filters. There are certain tasks that AltiVec can be much better at, but in general, the raw computational rates are pretty close.

The positive side is that it is just easier to write maximum performance PPC assembly than Intel assembly, especially for floating point. That can often be a very important distinction. For straight compiled code, the G4 doesn't have as strong of benefits, although the better memory bandwidth will still be a win.

Don't believe any hype about the G4 being "three times as fast" as a pentium III. It isn't. It is somewhat faster per clock on average, but definitely not twice as fast, and it trails the fastest x86 machines by quite a few mhz right now, and that isn't likely to change due to its short pipeline length.

The G4 is great. I like it. Everyone should keep their claims rational.


Bernier: In other words, where is the bottleneck? (Quake 3 arena for example will top at 42 fps on a B&W G3 450 with a voodoo 3 2000 while a PII 450 or a PIII with similar settings will do much better). Is it because PPC G3 can`t get much memory bandwidth as you can with the Intel Pentium Pro (and beyond) 'write-combining' cache control mode?

Carmack: The voodoo case is almost completely due to the lack of write combining. Intel write combining is an almost perfect way to program cards like the voodoo.

The ATI case is a bit more complicated. The rage128 card in the B&W macs is running twenty or thirty mhz slower
than the rage128 you get for the pc. Apple started getting rage128 cards before ATI had worked out all the production issues. The addin cards ATI sells operate at the same rate as the windows cards. The rage128's preferred means of command transport leans on AGP a bit, so they have to use a less optimal method with a pci card.

The 66mhz pci interface just isn't as good as an AGP interface. The ATI windows driver supports direct page flipping when in fullscreen mode, which gives it a small performance edge.

The ATI windows driver supports add mode multitexture combining, which makes a couple things (skies, notably) somewhat faster. A declocked PCI Rage128 in a 400 mhz windows system performs about the same as a B&W mac, but with a normal setup it goes noticeably faster.

The RagePro Mac driver is operating synchronously, instead of with DMA buffers, which is why the RagePro windows driver is half again as fast. It needs to be rewritten.


Bernier: Does the new G4 has more restrictions on data-alignment than write-combining?

Carmack: Its difficult to answer that without getting very technical. Its not exactly an apples to apples comparison.


Bernier: Will the G4 higher memory throughput (2x?) make things better?

Carmack: The higher memory throughput should help all programs, which is the best type of improvement to make, but it doesn't completely make up for write combining on those tasks that it is uniquely suited for.


Bernier: I was surprised to see an article at Tom`s Hardware that concluded that AGP 2x was not any faster than PCI on a PC machine. You think the AGP situation on the G4 will be different?

Carmack: It depends on the motherboard chipset, the 3D card, and the operations you are performing. For many existing games, the bandwidth requirements aren't always enough to stress even normal pci, but that is definitely changing.

A good PCI implementation can be better than a bad AGP implementation, but two to four times the theoretical bandwidth (and other benefits) should never be dismissed.

Just this last weekend, I was testing PCI vs AGP in the matrox G200 linux driver. AGP was twice as fast (although still not close to what it should be able to do in theory), and didn't slow the main cpu with pci snoop cycles.

The last time I talked to Apple engineers about the new motherboard, they were very confident about the memory subsystem performance, but we didn't go in depth on the AGP system.


Bernier: Are you planing to release a Quake III Arena version that will take advantage of Altivec?

Carmack: One of the major technical directions for Q3 was to focus as much time as possible into a specific OpenGL path. We spend 50% of our execution time there. This path is amenable to SIMD optimizations for PIII/K6/AltiVec in drivers, and for hardware geometry acceleration. Apple is pursuing the driver optimizations. There are no other hot spots in Q3 where hand tuned optimization would pay off noticeably.

I am considering some code style changes for future work that may make it more reasonable to (optionally) allow compiler vector types for some speedup throughout the code, but an early first pass at it was a big mess, and the performance benefits are unproven.

Maybe I'll write AltiVec code for the new bone based models just for the experience, but the possible speedup would only be a few percent.


Bernier: I think Apple has made remarkable changes in the last 2 years but people keep saying Mac users need an affordable "gaming Mac". If you were Steve Jobs, what would you do about that?

Carmack: The base iMac should have:

  • 64 meg of ram
  • Rage128 graphics
  • Better speakers
  • Better mouse

That isn't particularly biased towards gaming, it would just be a solid system without any holes. The graphics definitely need to be improved before I can recommend an iMac to someone that wants to play games.

John Carmack

Rank
Co-founder, owner and lead programmer

Mission
Develop core technology for all id titles

Active Duty
1991-Present

Special Training
Self taught. Consumes all available knowledge relating to 3D programming

Background
Somehow ended up at Softdisk.

Off-Duty
Ferraris, Ferraris, Ferraris

Quote
"If you don't care enough to have something of your own to say, they shouldn't be quoting you."