Graphics modes 9++ and 10++¢===========================¢¢This text is a translation of the¢original article published¢in Polish magazine "Atarynka",¢no. 2/2002.¢¢ The most often used mode is "Konop's¢ mode", used for the first time¢ by Konop/Shadows in "Asskicker".¢ It has been used for years,¢ and probably there's no other mode,¢ as fast and with similar screen¢ parameters (square pixels, small¢ screen memory, 16 shades [or other¢ GTIA mode]).¢ lewiS/AiDS¢ "32 shades in mode 9"¢ (article published¢ in "Atarynka" no. 1/2002)¢¢In this article I'll present the way¢of obtaining a mode, which differs¢from "Konop's mode" (also known as 9+)¢by lack of blank lines, and which¢is called, depending on used GTIA¢mode, 9++ or 10++.¢¢Advantages of the new mode¢--------------------------¢1. It's faster than the one with¢ blank lines (9+).¢2. Display List is significantly¢ shorter (every mode line is created¢ by single DL instruction).¢3. Since DL for this mode can load¢ ANTIC's screen memory pointer only¢ once, you don't need two DLs for¢ double-buffering, you may just¢ alter the screen address in DL.¢4. It's easy to set another (up to 16)¢ height of a line of mode¢ (e.g. 3, 5 or 6 instead of 4).¢5. It looks better - it's a matter¢ of taste, but in my opinion colors¢ in GTIA 10 mode look much better¢ - the blank lines lower¢ the brightness and saturation.¢¢Disadvantages¢-------------¢1. This mode makes ANTIC eat less¢ cycles, for the cost of some¢ additional work of 6502. There's¢ not much of this work for CPU,¢ unfortunately it must be properly¢ synchronized with the display.¢ It means that in practice it's¢ not easy to get the first advantage,¢ i.e. it requires additional¢ programming effort.¢2. It's known that there're problems¢ with displaying graphics in¢ the last, 240th line. This isn't¢ a problem for 9+ mode, because¢ you may put JVB at the end of DL¢ to get 60 lines. In modes 9++/10++¢ we may stay with 59 lines or...¢ use advantage 4 and lower¢ height of the top or the bottom¢ line to 3.¢¢Idea¢----¢The idea is simple - let's make ANTIC¢display the same line a few times,¢using its internal memory. This is¢what happens in mode 8 (system¢GRAPHICS 3), where the data comes from¢main memory only in the first line,¢and then from ANTIC's own memory.¢¢Theory¢------¢ANTIC contains internal 4-bit register¢DCTR, which counts scanlines. Normally¢for every line of mode it counts from¢zero to a value specific to the mode¢(in our case of ANTIC 15 mode it's¢zero). It's different after enabling¢vertical scrolling. In first scrolled¢line DCTR counts from VSCROL to 0,¢in the subsequent normally, and in¢the last from 0 to VSCROL. "Counts¢from a to b" means: first DCTR is¢loaded with a, and near the end of¢every line DCTR is compared with b,¢if there's equality, next DL¢instruction will be fetched; otherwise¢DCTR is incremented by 1.¢Consider following DL:¢$2f¢$0f¢and VSCROL=13 (decimal). We will see:¢- first line 4 times (DCTR values¢ 13,14,15,0 - DCTR wraps from 15¢ to 0, because it's 4-bit)¢- second line 14 times (DCTR values¢ from 0 to 13)¢Similarly for VSCROL=3:¢- first line 14 times¢- second line 4 times¢What is important, the screen data¢repeated in several scanlines is¢fetched from main memory only once.¢¢Obtaining the mode¢------------------¢As should be now easy to figure out,¢to have each line shown 4 times,¢we create following Display List:¢b($6f),a(screen)¢b($0f)¢b($2f)¢b($0f)¢...¢and only have to update VSCROL¢register in proper moments.¢¢Following picture should help in¢understanding what happens¢(in this English translation¢I'll just use ASCII art, I hope¢it's readable :-) ).¢¢DCTR¢ 13 @-------------------------¢ 14 --------------------------¢ 15 --------------------------¢ 0 --------------------------¢ 0 ==========================¢ 1 ==========================¢ 2 ==========================¢ 3 ====!==================*##¢ \--visible screen--/¢¢'-' - first line of mode¢ (bit 5 in DL is set)¢'=' - second line of mode¢ (bit 5 in DL is clear)¢'@' - here VSCROL must contain 13¢'!' - here starts DLI¢'*' - here VSCROL must contain 3¢'#' - here we put 13 into VSCROL¢¢This picture shows two lines¢of mode, i.e. 8 scanlines.¢It also shows the timing (scale¢is not preserved) - as time goes¢by, each scanline is displayed¢from the left to the right.¢¢First VSCROL must contain 13 decimal¢(it's a 4-bit register, so high¢nibble of value written to $d405¢may be anything). As soon as ANTIC¢loads DCTR with this value, first¢line of mode is guaranteed¢to be displayed correctly. Then¢the second line will be shown¢and only near the end of last¢scanline VSCROL is required to¢contain 3. So we have much time¢to write 3 to VSCROL. We may do it¢e.g. on DLI in the first line.¢It's much worse with preparing¢to display third line of mode.¢As you see on the picture ('#')¢there's little time to put 13 into¢VSCROL (just a few cycles).¢¢Possible implementations¢------------------------¢We may update VSCROL:¢a) using DLI interrupt,¢b) inside unrolled code of our effect,¢c) using one of POKEY timer IRQs.¢¢Implementation a) is undoubtedly¢easiest, unfortunately it's also¢the slowest one. Optimal is¢implementation b), unfortunately¢it can be applied only to simple¢effects, like bump mapping.¢Implementation c) could be superior¢to a), because POKEY timers may¢be set with single cycle accuracy,¢so we don't need to waste time¢in the interrupt routine.¢Unfortunately this implementation¢is impractical (in a game or demo),¢because POKEY timers are also used¢for generating sound and we don't¢want to sacrifice music quality¢(less channels) only to enable¢new graphics mode.¢¢DLI interrupt routine¢---------------------¢We want it fast. Apparently using¢DLI every second line of mode,¢i.e. every 8 scanlines, is enough.¢We enable it in DL opcodes with¢5th bit clear. The interrupt¢starts near the beginning of last¢scanline (see '!' on above picture).¢Normally the NMI service routine¢checks the interrupt type (DLI¢or VBLKI) and saves the registers¢(at least the accumulator).¢Nevertheless we still have much time¢before writing 13 to VSCROL.¢The easiest way to waste this time¢(and ensure proper synchronization)¢is to use WSYNC. Then we just write¢13 to VSCROL and apparently it's¢late enough to write 3 there.¢Here comes an example:¢¢ lda #<dli¢ sta $200¢ lda #>dli¢ sta $201¢ lda #$22¢ sta $22f¢ lda #<dl¢ sta $230¢ lda #>dl¢ sta $231¢ lda #$40¢ sta $26f¢ lda #$c0¢ sta $d40e¢ jmp *¢¢dli pha¢ sta $d40a¢ lda #13¢ sta $d405¢ lda #3¢ sta $d405¢ pla¢ rti¢¢; 2 blank lines, 1 line of mode¢dl db $90,$6f¢ dw $f000¢; 29 times $8f,$2f => 58 lines¢ db $8f,$2f,$8f,$2f,$8f,$2f,$8f,$2f¢ db $8f,$2f,$8f,$2f,$8f,$2f,$8f,$2f¢ db $8f,$2f,$8f,$2f,$8f,$2f,$8f,$2f¢ db $8f,$2f,$8f,$2f,$8f,$2f,$8f,$2f¢ db $8f,$2f,$8f,$2f,$8f,$2f,$8f,$2f¢ db $8f,$2f,$8f,$2f,$8f,$2f,$8f,$2f¢ db $8f,$2f,$8f,$2f,$8f,$2f,$8f,$2f¢ db $8f,$2f¢ db $41¢ dw dl¢¢What's not obvious at first sight,¢is why we write 3 to VSCROL at the end¢of DLI, not before writing to $d40a ?¢As shown on the picture, VSCROL¢in second line of mode is checked¢near the end of scanline. Exactly,¢there is done a check if current¢DL opcode should end its execution.¢But when we use DLI, which is called¢in last scanline of DL opcode, there's¢second (first in the order¢of execution) comparison of DCTR¢with VSCROL. Thus VSCROL must already¢contain 3 when the interrupt starts.¢¢How many cycles we gain?¢------------------------¢It's an important question, because¢it was mentioned as the first¢advantage of the new mode.¢Single line of 9+ mode can look in DL¢as follows:¢ db $0f,$00,$4f¢ dw screen¢ db $00¢Total: 6 cycles for DL¢+ (64 or 80) cycles for screen¢(depending on width).¢For 9++ mode it's only one DL opcode¢(i.e. 1 cycle) + (32 or 40) cycles¢for screen. Additionally 6502 has¢to update VSCROL, which takes¢6 cycles. Counting whole screen¢(2 blank lines, 59 lines of mode,¢screen address in first line, JVB)¢we get:¢Narrow screen:¢9+: 1+59*(6+64)+2+3=4136¢9++: 1+59*(1+32+6)+2+3=2307¢Normal screen:¢9+: 1+59*(6+80)+2+3=5080¢9++: 1+59*(1+40+6)+2+3=2779¢Roughly speaking the gain is about¢6-7% of CPU time. Unfortunately it's¢true only for the optimal¢implementation, i.e. updating VSCROL¢in unrolled code. If we use DLI,¢as in the example, the new 9++ mode¢turns out a bit slower than old 9+¢(but still much faster than a mode¢with no blank lines, done only¢with DL). The reason is the time¢we waste in STA $d40a. A good¢(but hard) solution is to make use¢of this time for some calculations.¢¢Don't forget about other advantages¢of 9++ mode, which can make it better¢for your needs than a traditional¢DL-only mode. When using DLI, there's¢one more advantage: you can change¢the background color in different¢lines at almost no cost.¢¢ Piotr Fusik (Fox/Taquart)¢ <fox@scene.pl>¢¢P.S. You may use the described VSCROL¢method to get other interesting modes,¢e.g. hardware-supported 40x40 text¢mode.¢