=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

			     AM/FM TECHCORNER

			The magic of "Octa"-sound

			Written by Teijo Kinnunen.

=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

To all TechCorner readers:

Up to now, I have received not a single letter from you. Therefore, I would
really like to hear your comments about TechCorner. I would also be glad to
hear your suggestions about what I should cover in the future TechCorners
(I'll soon run out of ideas!). You're also welcome to send any questions
concerning audio/music programming, which I'll attempt to answer in the
following TechCorner.

My address is:

	Teijo Kinnunen
	Oksantie 19
	SF-86300  OULAINEN
	FINLAND

(I'm sorry, but I don't probably have time to reply individually.)

or FidoNet: 2:228/402

=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
And back to the point...

As you most likely know, there are music programs that can "split" the audio
channels, resulting max. eight independent channels. Such programs are e.g.
Oktalyzer, StarTrekker and OctaMED, OctaMED being the best (advertisement!!).

All these programs use a similar method to produce the sound. They also have
similar restrictions, for example:

	* heavy CPU load
	* volume control not possible on channel-by-channel basis
	* rough looping resolution (a workaround could be possible, but quite
	  complex)
	* decreased sound quality

Other means to produce eight-channel sound could be possible, but I will
describe the method all of the above programs use.

The "magic" method is simply to mix two samples into one which is then played
out. It has to happen in real time, though. The critical point is that the
samples don't usually have the same playback rate. The mixing routine has to
remove or double bytes in order to achieve the correct playback frequency,
this leads to degradation in sound quality.

OctaMED uses two buffers per channel. The samples are mixed into the buffer
which is then played at constant frequency (period). When the other buffer
is being played, the other one is being filled. When the first buffer has
been played, the other buffer will start playing. This technique is called
double-buffering.

As already mentioned, the output rate is fixed. Naturally it has to be slow
enough, so that the other buffer can be filled before the first has been
played. On 7 MHz 68000 machines a good output period is approx. H-2/C-3
(OctaMED uses period 227 (non-HQ)). If not all four channels are splitted,
or a fast processor is being used, there's more time for filling the buffers,
and a higher output period can be used (in HQ-mode, OctaMED uses the highest
possible frequency, 124). The higher the output period, the better the sound
quality.

The sample buffers are played out using normal DMA output. However, the sample
pointers (AUDxDAT) have to be constantly swapped. The only correct method to
do this is via audio interrupts. In 5 - 8 -channel modes, OctaMED also uses
this interrupt for timing the music, this is very handy. E.g. StarTrekker,
as far as I know, uses VBlank timing for music and "assumes" that a certain
number of samples are played during one frame, which is very bad.

It's wise to keep all channels in exactly the same phase with each other.
When the DMA is started (done only once), you have to set all audio DMA bits
with the same MOVE-instruction. As a result, we can assume that the audio
interrupts will occur _exactly_ at the same time. So, you only need to have
one audio interrupt that handles all channels at the same time.

To clarify everything, let's have a look at a real 8-channel routine. It's
a very stripped-down version of the OctaMED routine. To simplify things, it
only handles one splitted channel, and doesn't handle repeat.

Below is the macro that fetches the samples, does the actual mixing, and
pushes the result into the playback buffer:

; This code does the magic 8 channel thing (mixing).
MAGIC_8TRK	MACRO
		swap	d6
		swap	d7
		move.b	0(a3,d6.w),d0
		add.b	0(a4,d7.w),d0
		move.b	d0,(a1)+
		swap	d6
		swap	d7
		add.l	d1,d6
		add.l	d2,d7
		ENDM

This is the shortest way to do it (if someone can find a shorter/faster way,
*PLEASE* let me know ;-). This macro is repeated many times, once for each
byte of the playback buffer (on OctaMED max. 1600 times/interrupt), so it
had better be fast.

Let's examine this macro more closely. First we'll have a look at the register
usage.

A3 and A4 are pointers to the _beginning_ of the samples to mix. They remain
constant throughout the mixing. Index registers D6.w and D7.w will be used to
get the actual offset. The samples will be mixed in D0, and the resulting
sample will be pushed into the buffer (pointed by A1). Note that the sample
data must be halved beforehand (converted into 7-bit dynamic range by shifting
sample bytes right one bit position), this saves us from using an extra
ASR-instruction.

As mentioned above, D6 and D7 are used to index the sample data, they are
offsets from the beginning of the sample. However, a resolution of a byte is
far too rough. Therefore we need to have a 16-bit fraction:

	SSSSSSSS SSSSSSSS FFFFFFFF FFFFFFFF

The upper word 'S' is the actual byte offset from the beginning of the sample,
and 'F' is the fraction part. When the value is updated (to point to the next
sample to mix), it's handled as a 32-bit value. (D1 and D2 contain the 32-bit
numbers to add each time, they are constant values based on the current playback
periods of the channels).

However, when the sample value must be fetched, the fractions must be forgotten.
A single SWAP instruction will do fine. As a result

	FFFFFFFF FFFFFFFF SSSSSSSS SSSSSSSS

the lower word can be easily used for indexing. Another SWAP, and everything is
back again for a new cycle.

This was the most critical part of 8-channel output, but let's also look at the
interrupt code.

_IntHandler8:	movem.l	d2/d5-d7/a2-a5,-(sp)

DB is a pointer to the data area, we can use A6-relative data addressing.

		lea	DB,a6
; ================ 8 channel handling (buffer swap) ======
		not.b	whichbuff-DB(a6)	;swap buffer
		bne.s	usebuff1

'whichbuff' tells us which buffer is currently in use. not.b toggles it and
we change the buffer pointers accordingly. A1 (int_Data) points to the buffers
(each 200 bytes long). A0 points to $DFF000 (custom chips)

		move.l	a1,$a0(a0)		;ac_data = buffer 1 (offs = 0)
		move.w	#100,$a4(a0)		;ac_len = 200 bytes
		bra.s	buffset
usebuff1	lea	200(a1),a1		;ac_data = buffer 2 (offs = 200)
		move.l	a1,$a0(a0)
		move.w	#100,$a4(a0)

Audio interrupt request bit MUST be cleared (very important).

buffset		move.w	#1<<7,$9c(a0)

Set the volume to maximum.

		move.w	#64,$a8(a0)
; ============== fill buffers ============

To make things easier, I've set up some pseudo-audio-hardware registers
(track0hw, track4hw). Instead of using ac_len, however, ac_end points to the
end of the sample.

startfillb	lea	track0hw-DB(a6),a2
;calculate channel A period

Some wizard-stuff again... It will calculate the fraction value to add each
cycle. The actual formula is:

                  227 * 65536    14876672
	fracval = ----------- = ----------
                    period        period

But as the result could be > 65535 and DIVU doesn't handle that big quotients,
it will be calculated as

                   3719168
	fracval = --------- * 4
                    period

ac_per of 0 is considered silence...
D1 will contain fracval and D2 will contain fracval / 4.

		move.l	#3719168,d7	;227 * 16384
		move.w	ac_per(a2),d6
		beq.s	setpzero0
		move.l	d7,d2
		divu 	d6,d2
		moveq	#0,d1
		move.w	d2,d1
		add.l	d1,d1
		add.l	d1,d1

Then we fetch the required addresses. A5 is the sample end pointer, and A3
(after checking) is the sample start pointer. Note: A3 is the _current_
start pointer, it will change after each fill.

;get channel A addresses
		move.l	ac_end(a2),a5
		move.l	(a2),d0
		beq.s	setpzero0
chA_dfnd	move.l	d0,a3	;a3 = start address, a5 = end address

The following operation will check, if the sample would run past the end
address during this fill. If so, turn it off.

;calc bytes before end
		mulu	#200<<3,d2
		clr.w	d2
		swap	d2
; d2 = # of bytes/fill
		add.l	a3,d2	;d2 = end position after this fill
		sub.l	a5,d2	;subtract sample end
		bmi.s	norestart0
		clr.l	(a2)
setpzero0	lea	zerodata-DB(a6),a3
		moveq	#0,d1
norestart0

Now repeat everything for the other channel....

;channel B period
		move.w	SIZE4TRKHW+ac_per(a2),d6
		beq.s	setpzero0b
		divu	d6,d7
		moveq	#0,d2
		move.w	d7,d2
		add.l	d2,d2
		add.l	d2,d2
;channel B addresses
		move.l	SIZE4TRKHW+ac_end(a2),a5
		move.l	SIZE4TRKHW(a2),d0
		beq.s	setpzero0b
		move.l	d0,a4
		mulu	#200<<3,d7
		clr.w	d7
		swap	d7
		add.l	a4,d7
		sub.l	a5,d7
		bmi.s	norestart0b
		clr.l	SIZE4TRKHW(a2)
setpzero0b	lea	zerodata-DB(a6),a4
		moveq	#0,d2
norestart0b

Finally, it's time to mix. It'll be done 200 times. To save time, DBF will occur
only after every 20th mix.

		moveq	#0,d6	;clear index regs
		moveq	#0,d7
		moveq	#9,d5	;DBF counter
do8trkmagic
		MAGIC_8TRK	;20 times..
		MAGIC_8TRK
		MAGIC_8TRK
		MAGIC_8TRK
		MAGIC_8TRK
		MAGIC_8TRK
		MAGIC_8TRK
		MAGIC_8TRK
		MAGIC_8TRK
		MAGIC_8TRK
		MAGIC_8TRK
		MAGIC_8TRK
		MAGIC_8TRK
		MAGIC_8TRK
		MAGIC_8TRK
		MAGIC_8TRK
		MAGIC_8TRK
		MAGIC_8TRK
		MAGIC_8TRK
		MAGIC_8TRK

		dbf	d5,do8trkmagic	;do until cnt zero

Then add the advanced index sample offsets to the sample pointers (the fraction
part cleared first).

end8trkmagic	clr.w	d6
		clr.w	d7
		swap	d6
		swap	d7
		add.l	d6,(a2)
		add.l	d7,SIZE4TRKHW(a2)

And exit the interrupt...

		movem.l	(sp)+,d2/d5-d7/a2-a5
		rts


I have provided you with an example program that plays two samples through a
single channel (for two seconds). Its arguments are:

	example <sample1> <period1> <sample2> <period2>

(where periods are usually between 200 - 900)

Have a look at the sources as well. The program consist of an interface & loader
part (written in C), and the player & audio part (in assembler).

As usual, feel free to use the code in your own programs!