# ------------------------------------------------------------------- # OBJECTPROCESSOR (c) Copyright 1995-1996 Nat! & KKP # ------------------------------------------------------------------- # These are some of the results/guesses that Klaus and Nat! found # out about the Jaguar. Since we are not under NDA or anything from # Atari we feel free to give this to you for educational purposes # only. # # Please note, that this is not official documentation from Atari # or derived work thereof (both of us have never seen the Atari docs) # and Atari isn't connected with this in any way. # # Please use this informationphile as a starting point for your own # exploration and not as a reference. If you find anything innacurate, # missing, needing more explanation etc. by all means please write # to us: # nat@zumdick.rhein-main.de # or # kkp@gamma.dou.dk # # If you could do us a small favor, don't use this information for # those lame flamewars on r.g.v.a or the mailing list. # # HTML soon ? # ------------------------------------------------------------------- # $Id: op.txt,v 1.10 1996/01/28 20:23:20 nat Exp $ # # If there are two theories I put the more likely one first. # ------------------------------------------------------------------- Things to know about the Objectprocessor (OP): -1 Imagine a phrase being an entity of 64 bits (or 8 bytes for that matter). 0. The object list is a linked list. 1. The object list is traversed by the object processor for each! scanline. 2. The Objectprocessor probably works like this: Whenever a new scanline needs to be displayed, the objectprocessor provides a linebuffer to the videosystem. While the videosystem is busy displaying this, the OP readies the next scanline. (It uses a doublebuffering strategy) It does this by traversing the objectlist and interpreting each object in sequence. Each object has per scanline the chance ONCE to fill the linebuffer. It fills the linebuffer at a specified horizontal position for a specified width. The data in the linebuffer is always overwritten (except when the Read-Modify-Write bit is set). If the active object has the transparent bit set, it will not overwrite values in the linebuffer when its source pixel has the value zero. The 'transparency' check is done before looking up the pixel's color in the CLUT (1 - 256 color modes). 2.1 The sooner a object appears in the list the more in the background it appears. The linebuffer is initalized with the linebufferbackgroundcolor (BG) before the objectprocessor starts filling the linebuffer. One may also assume that the OP normally traverses the linebuffer from left to right, except when the horizontal flip bit is set. (Very useful information indeed! (har) ) Each bitmap object is made up of pixels. These pixels can be either contain the color itself (direct) as in CrY and TrueColor modes or be an index into a Colorlookuptable (indirect). 2.2 We assume that the OP writes into the linebuffer locally, so that the objectdata is read over the bus, but not written into the linebuffer over the bus (which would be way evil) 2.3. The videosystem can deal with 16bit RGB/Crycolor and 24bit RGB pixels, the size of the pixels the OP writes into the linebuffer and pulls out of the CLUT, depends on the pixeltype chosen for the videosystem. 2.4 The object in the objectlist are *modified* by the OP. This means that an object list is only good for one frame. You need to continually refresh your object list each VBLANK. 3. ... 4. ... 5. The last object must be a STOP object. 6. The Objectlist must be doublephrase aligned. This means that the lower nybble of the address must be zero. 7. The address of the image of an object must be (as expected) phrase aligned (zero in the lower 3 bits) 8. There are five different objects that the Objectprocessor knows about. These are: 1. Bitmapped Object 2. Scaled bitmapped object 3. GPU-Object (Calls the GPU to do the displaying ?? ) 4. Branchobject 5. Stopobject (marks the end of the object list) The objects have different sizes. The minimum size of an object is a "phrase". Object type Number Size in phrases ----------------------------------------- BIMAP 0 2 SCALE 1 3 (4?) GPU 2 1 BRANCH 3 1 STOP 4 1 It looks like you need to pad your scale objects to four phrases... 9 To keep the Objectprocessor from fetching data (and wasting bandwidth) during the VBLANK you usually put two branch objects at the beginning of the display list, that branch to the stop object if the first displayable scanline has not been reached or the last displayable scanline has already been displayed. 10. Just reading concurrently from the linebuffers while the OP is displaying data produces glitches. Advice: Stay out of them! ::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::: 10 This is what a branch object looks like: ::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::: Phrase #0: 63 56 48 40 32 24 16 8 3 0 +--------^---------^-----+---^--------^--------+--------+--+-----^----+---+ | unused | Link-address | unknown|CC| VCnt |011| +------------------------+---------------------+--------+--+----------+---+ 63 .............43 42..........24 23...16 15.14 13 .... 3 The branch objects are used to compare the current scanline with the value stored in the branch object. Depending on the branch instructions comparison mode, the branch is taken either on < == != or >. The taken branch taken uses the information from the Linkinfo and branches to the phraseindexed object. If the comparison fails it simply examines and handles the next object in the list. VCnt: This is the value you compare the vertical scanline counter with (VC). For CC code 10 the operation goes: if( object->YCnt < VC) goto object->link; Conditioncodes: Values Comparison/Branch ------------------------------------------------ 000 Branch on equal (VCnt==VC) 001 Branch on less than (VCnt>VC) 010 Branch on greater than (VCnt | +--------+---------+------+-----+---------+---------+---+---+-------------+ 63...55 54....49 48..45 44.38 37...28 27..18 17.15 14.12 11.....0 6bit 4bit 7bit 10bit 10bit 3bit 3bit 12bit Curiously there seem to be some unused bits in the top half of this second phrase. Anyway starting from the left: firstpix: Pixels to skip flags: How to handle the source data index: Index into the CLUT iwidth: Width of the image dwidth: Offset to the next line of the image pitch: Increment for the Datapointer depth: Pixeldepth of the bitmap x-pos: Horizontal position of the object 1stpix: this is a field of 6 bits that contains the number of 'bits' to skip before fetching the first pixel. This must be used whenever your bitmap data isn't phrase aligned. Maybe most often used for CLUT modes. You get the value you want to write here by calculating: pixelindex * bits_per_pixel (f.e. 8 for 256 color mode) flags: You can tell the Objectprocessor the way it should handle the display data. These are the values you set here: Bit0 Bit1 Bit2 Bit3 -------------------------------------------------------------- Horizontal Flip ReadWriteModify Transparent Release A few guesses as to what each flag does: Horizonal flip: Lets the Objectprocessor run its path from the other end of the spritedata, which should effectively flip you sprite data. ReadWriteModify: The object processor reads the the pixel from the line buffer does something with the bitmap pixel value and the linebuffer pixel value and stores the result back into the linebuffer. Theory 1. For Crycolor the lower byte of the bitmap pixel value is sign extended and added to the lower byte of the linebuffer pixel value, thereby increasing or decreasing (depending on the sign) the intensity of the linebuffer pixel. This is a 'saturating add' meaning that you don't wrap around, but subtractions stick at 0 and additions stick at 255. The cryhues (upper byte) are mangled even more strangely, the effect could (with the right values) be like looking through a colored glass (your bitmap object with the RMW-flag set) onto the background (the other bitmap objects below it) This might be similiar to what happens when gouraudshading. Refer to the blitter docs. Theory 2. Both values are simply added together Transparent: When the source pixel is zero, this pixel will not be written. This is the way to achieve transparent sprites with the GPU. (Both CLUT and non-CLUT pixels) Release: If cleared then the OP 'hogs' the bus for the time it takes to fetch the scanline data of the object. If this bit is set, then the bustime is shared with other processors. If you have lotsa interrupts going, this might be worthwhile. index (idx): Index into the ColorLookUpTable (CLUT) This information is only used for 1 - 2 or 4 bitplane objects, to determine the offset in the CLUT to use. 1 bitplane 2 bitplane 4 bitplane ------------------------------------------------------- iiiiiiii iiiiii0 iiiii00 The value is shifted left once and then used as an index into the CLUT. Note that in 2 + 4 bitplane modes not all bits are in used, because the lower bits are replaced with the pixel value. For example in 4-bits-per-pixel mode pixel #7 and an idx value of 64 gives you an index of (64*2)+7 -> 135 So you preload the CLUT with the colors you want to use, for example green at index #241. When you want to display a small green arrow on the screen (as a pointer) for example you set your object to transparent, and the index to 120. When the object pointer fetches a set pixel, it will write the green value into the linebuffer. iwidth: Tell the OP how many *phrases* to draw in each line. This is the actual number of phrases to draw, not the horizontal index to index the next line (dwidth). This is probably not just #pixels_to_draw / bits_per_pixel, but rather the number of phrases the object spans. If a 32bit object spans two phrases you should enter a two here. dwidth: The horizontal phrase offset the OP should use to index to the next line. If you data is laid out in consecutive strips of horizontal data like this: screen : 00000000000 11111111111 22222222222 33333333333 memory : 00000000000111111111112222222222233333333333 then this will be just the same as . But if your data is laid out like this: 00000000000xxxxx11111111111xxxxx22222222222xxxxx33333333333xxxxx you should set to the proper offset so that adding to the phrase-address will bring you to the next line. (This might be useful for 'horizontally scrolling' objects). pitch (p): If you so desire you can organize your bitmap data in even stranger ways than one would think possible. With this value you control the datapointer that the OP uses to traverse your bitmap data. This value is added to the datapointer after the last fetch. If you use a 0 you will be always fetching the same phrase over and over again. Normally you set to 1, to advance through memory contigously. depth (d): The number of bits of each pixel. This specifies the rez of the object. You have the choice between direct pixel modes (16 or 24/32 bits) and indirect (CLUT) pixel modes. Note that using transparency effectively reduces the number of available colors by one (color #0). Values: 0 1 bits per pixel 2 colors CLUT 1 2 bits per pixel 4 colors CLUT 2 4 bits per pixel 16 colors CLUT 3 8 bits per pixel 256 colors CLUT 4 16 bits per pixel 65536 colors CRY 5 24 bits per pixel 16 Mio Colors TrueColor 6 unused 7 unused xpos: The horizontal position of the object on the screen (or in the linebuffer if you will) ::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::: 13. This is what a scaled bitmap object looks like. ::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::: Phrase #0 (1 of 3): 63 56 48 40 32 24 16 8 3 0 +--------^---------^-----+---^--------^--------+--------^--+-----^----+---+ | | | < Height >| |001| +------------------------+---------------------+-----------+----------+---+ 63 .............43 42..........24 23 ..... 14 13 ..... 3 2.0 21 bits 19 bits 10 bits 11 bits 3 bits Except for the type, which is different, this is just the same as the first phrase of the bitmap (non-scaled) object. Phrase #1 (2 of 3): This is the same as the the 'bitmapped' object Phrase #2 (3 of 3): 63 56 48 40 32 24 16 8 0 +--------^---------^---------^--------^--------+--------+--------+--------+ | unused | remain | VScale | HScale | +----------------------------------------------+--------+--------+--------+ 23...16 15...8 7...0 remainder: Keeps the VScale remainder ***DESTROYED BY THE OP*** v-scale: Vertical scaling factor h-scale: Horizontal scaling factor The scale is a fractional representation, using 3 bits for the integer part and 5 bits for the fractional part. Or in ASCII-Graphics: 76543210 00100000 or 0x20 is 1.0 iiifffff 00010000 or 0x10 is 0.5 The remainder is used by the objectprocessor for the vertical scaling, as a memory place. You should initialize it to 0.5 for best results, although in a lot of democode its initialized to 1.0. ::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::: 14. The elusive GPU-object ::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::: Phrase #0 (1 of 1): 63 56 48 40 32 24 16 8 3 0 +--------^---------^---------^--------^--------^--------^--------^----+---+ | datafield |010| +---------------------------------------------------------------------+---+ The GPU object gets an interrupt, it is believed that the OP is not halted because of this action. You might want to stuff some information into the datafield, which the GPU could then read from the OLP registers. But what for ? ::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::: 15 You can also look at the object in terms of C-structs, that's how they'd look like. ::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::: /* DON'T USE THESE BITFIELDS WITH ANYTHING ELSE THAN A ***GOOD*** C-COMPILER AND A MOTOROLA PROCESSOR */ #define byte unsigned char #define word unsigned short #define lword unsigned long #define phrase unsigned long long typedef struct { lword data:21; lword link:19; word height:10; word ypos:11; word type:3; } bitmap_obj_phrase_0; typedef struct { word unused:9; word firstpix:6; word flags:4; word index:7; word iwidth:10; word dwith:10; word pitch:3; word depth:3; word x_pos:12; } bitmap_obj_phrase_1; typedef struct { lword unused:24; word remainder:8; word v_scale:8; word h_scale:8; } scale_obj_phrase_2; typedef struct { lword unused:21; lword link:19; word conditioncode:2; word unused:8; ;; maybe index to register ? word ypos:11; word type:3; } branch_obj_phrase_0; typedef struct { phrase unused:61; word type:3; } stop_obj_phrase_0; typedef struct { phrase unknown:61; word type:3; } gpu_obj_phrase_0; typedef struct { stop_obj_phrase_0 p0; } stop_obj; typedef struct { branch_obj_phrase_0 p0; } branch_obj; typedef struct { gpu_obj_phrase_0 p0; } gpu_obj; typedef struct { bitmap_obj_phrase_0 p0; bitmap_obj_phrase_1 p1; } bitmap_obj; typedef struct { bitmap_obj_phrase_0 p0; bitmap_obj_phrase_1 p1; scale_obj_phrase_2 p2; /* need one padding phrase ? */ } scale_obj; SMALL DISCUSSION: Since the object processor walks the object list for each scanline, you should consider the following: If you have 64 bitmaps objects in your object list and a vertical rez of 240 lines going and a refreshrate of 60Hz the Objectprozessor is pulling 60 hz * 240 lines * 64 objects * 2 phrases = 1.8 Mio phrases/s ~ 14.7 Mio bytes/s for the object processor list alone! (ca. 14% of the systems bandwidth) If you figure you're using 128x128x16bit sprites fully visible, you're doing: 128x128*16bits/64bits = 4096 phrases a sprite 64 sprites in 60hz = 3840 sprites yields 15728640 phrases/s or 120 Mbytes/s So it is fairly easy to unknowingly saturate the bus with a nice object list... It should be obvious that non-"truecolor" sprites still make lotsa sense, when you're using the OP heavily. It would have been better in our opinion, if Atari had used a small 2-Kbit hitbuffer (or single bit Z-Buffer) and reversed the object order, so that the nearest object comes first and the background last in the object list. With such a slightly more complicated scheme,the OP could run at a rather constant: hrez * vrez * refresh * average_bits_per_pixel ---------------------------------------------- phrases/s 64 NEEDED STUFF: Need to document the logic setting up objects, that cross boundaries (especially the scaled bitmaps)