Programming for non-programmers

David Holden introduces yet more complex variables in the third instalment in the series on programming.

Part 3 - Arrays

I have previously described Real, Integer and String variables, but it often happens that you have lots of variables of the same type, and it would be extremely tedious, as well as wasteful of space and slow, to create a separate named variable for each item. Basic, in common with most other programming languages, has a way of joining a series of similar variables into a structure called an array.

Arrays, like ordinary variables, can hold integers, strings or real numbers. In fact, there's a fourth type of array that you can use with BBC Basic, the byte array, but I'll leave that until later. To create an array you use the keyword DIM, which is short for DIMension, and the array type is defined by the form of the variable name you assign to it as with other variables. Unlike single variables, arrays must be defined before they are used, and when you define them you have to specify the number of elements you want the array to have, which is how many variables you want the array to hold. For example, to create an array to hold ten integers you would use -

    DIM numbers%(10)

Each element in the array is then addressed using the array name followed by the element number in brackets, for example -

    numbers%(4) = 236
    numbers%(6) = 23 + 7
    PRINT numbers%(4)

and so on.

Creating arrays of real or string variables is done in exactly the same way, for example -

    DIM string_array$(8)
    DIM numbers(147)

Experienced programmers will have realised the deliberate mistake earlier. When you create an array, it will actually have one more element that the number used because it also has a zero'th element. So -

    DIM number%(10)

really creates an array of 11 integers, addressed as number%(0) to number%(10). Quite often it is more convenient to simply ignore the first element so that the index is always a positive number, but you should be aware that it exists.

As you might expect, you can use an element of an array anywhere in your program where you could use a 'normal' variable. Similarly the index does not have to be a simple number but can be a variable or an expression.

    result% = numbers%(4*5)
    index% = 8
    result% = numbers%(index%)

So far all these arrays have been of the type known as one dimensional, but arrays in BBC Basic can be multi dimensional. These are sometimes referred to as matrix arrays, and can be thought of as a sort of grid, where each element is addressed by it's co-ordinates. This analogy can become a bit strained when you realise that a multi dimensional array can have many dimensions and is not constrained by the geometrical limit of three.

As an example, suppose you wanted to create an array to hold a list of people's names, surnames and forenames. You could use -

    DIM forename$(100)
    DIM surname$(100)
    forename$(6)="Fred"
    surname$(6)="Smith"
    PRINT forename$(6)+" "+surname$(6)

Alternatively a more elegant approach might be -

    DIM name$(100,2)
    name$(6,1)="Fred"
    name$(6,2)="Smith"
    PRINT name$(6,1)+" "+name$(6,2)

In either case the last line would print 'Fred Smith"

To make the example clearer I used 'DIM name$(100,2), but in reality this would be extremely wasteful of space. This would allocate spaces for 303 strings, not 200. Although it is common to ignore the '0' index when using a single dimensional array, it is extremely wasteful with a multi-dimensional array. The more efficient way to do this would therefore be -

    DIM name$(99,1)
    name$(5,0)="Fred"
    name$(5,1)="Smith"
    PRINT name$(6,0)+" "+name$(6,1)

It will be worth your while thinking about this, bearing in mind that arrays begin with Element 0, and making sure you understand how it really gives an array of 2 x 100 elements.

Some programming languages, including some variants of Basic, but not BBC Basic, support a special type of array in which each element is called a Record. Briefly, this is a system where each element of the array is made up of a group of variables, which may be of different types. As you can imagine, this is ideally suited for creating database applications.

The byte array

This is not an array like those previously described. It is a way of reserving a block of memory within the program's workspace. Data of various types can be placed in this block or read from it. Other variants of Basic use the keywords PEEK and POKE to do this, but BBC Basic has a much more powerful system called indirection operators.

It is quite possible to write complex programs without ever using or needing byte arrays, and I could ignore the byte array for the present, but there is a very good reason why I am introducing it at this stage. Programs need to communicate with the computer's operating system (OS), and to do this it is often necessary to pass data back and forth. As the OS will not be able to access the program's variables directly the data must be passed in a block of RAM, and the best way to do this is to use a byte array. It is therefore a good idea to familiarise yourself with their use as soon as possible.

A byte array is created in a similar way to other arrays, using the DIM keyword, but the syntax is slightly different. To reserve an array of 1000 bytes of memory you would use -

    DIM array% 1000

You should immediately notice one difference from previous arrays; there are no brackets. This is because we have not created an array in the normal sense, just reserved 1000 bytes of RAM. The variable array% is not the array itself, it is just a normal integer variable. When memory is allocated for the array, Basic stores the address of the start of the RAM it has reserved into this variable, so it becomes a pointer to this RAM. The first byte is at the location array%. You can therefore see that although array% is just like any other integer variable it is important for you not to change it or to allow it to become changed or it will no longer point to the start of the reserved RAM. Note that the last byte of this array will not be at array% + 1000 but at array% + 999 because the first byte is at array% + 0.

Before I describe exactly how data is stored and retrieved from a byte array I will have to explain little bit about how the computer's RAM is arranged and addressed. As you will probably be aware, RAM is organised into bytes, and each byte is made up of 8 bits. A byte is quite small, and can only hold a number with a value of from 0 to 255. Now the Risc processor in a RISC OS computer is a 32 bit processor, and so it naturally works with data in 32 bit or 4 byte 'chunks'. These 4 byte chunks are called words.

The term word is slightly ambiguous. In the days of 8 and 16 bit computers it was normally used to mean two bytes, or 16 bits, and the expression long word or double word was used to refer to a group of four bytes or 32 bits. However, all modern computers are (at least) 32 bit, and so the term is now commonly employed to refer to 4 bytes, which is how I shall use it throughout this series.

A Basic integer variable is stored in a 4 byte word, and as this can be held in a single register in the Risc processor it can be manipulated very quickly, which is the main reason why integer variables should be used wherever possible.

In Part 2 I said that an integer can be a positive or negative number, with a range of from -2,147,483,648 to +2,147,483,647. The reason for these rather strange numbers is that the integer is stored in a 4 byte word, but there has to be a way of telling whether it's a positive or negative number. Basic does this by using bit 31 (the 'top' bit) as a flag. If this bit is set, the number is negative, if not, it's positive. So, although the number is stored in 32 bits, only 31 of them are used to hold the actual number. The reason why the biggest negative number can be one more than the biggest positive number is because the positive 'count' starts at 0 whereas the negative 'count' starts at -1. None of this is of any great importance to the programmer unless you are trying to store very large numbers in an integer variable. If you try to store a number bigger than 2,147,483,647 some very surprising things will happen because bit 31 will become set and your big positive number will suddenly become a totally different negative number!

There are four indirection operators that can be used to store data in memory. These are -

? to store a byte or character
! to store a word or integer
$ to store a string
| to store a floating point number

The syntax for storing data in memory using indirection operators is -

    <indirection operator><address> = <data>

and to retrieve it -

    <data> = <indirection operator><address>

Naturally the data type must match the indirection operator used. The same rules apply to indirection operators as to other numeric variable types, so if you try to store a real number using the '!' word operator it will be truncated to an integer.

Now for some examples. Let's assume that you have created a byte array or 1000 bytes exactly as described above using -

    DIM array% 1000

To store an integer number 4567 at the start of this array you would use -

    !array% = 4567

or if it was a Basic integer variable 'number%' -

    !array% = number%

To retrieve the data and 'read' it back into a variable -

    number% = !array%

The same system can be used with strings, for example -

    $array% = "This is a string"
    name$ = "Fred"
    $array% = name$
    another$ = $array%

It may help you to better understand how indirection operators work if you think of them like this;

!array% means the word located at the address in RAM pointed to by array%
$array% means the collection of characters that make up a string located at the address in RAM pointed to by array%

The pointer to the place you want to put or retrieve the data from can be an expression. In fact, it would normally be an expression, as you would not just want to put data at the start of the array. If it is an expression then the expression must be enclosed in brackets.

    !(array%+4)=number%
    $(array%+200)="Fred Smith"
    index%=20 : !(array%+index%)=number%
    number%=!(array%+index%)
    number%=!(array%+(index%*4))

When using the '!' or '?' operators it is possible to simplify the base + offset system by putting the operator between the two numbers.

    array%!index% = number%
    array%!(index%*4) = number%

is equivalent to

    !(array%+index%) = number%
    !(array%+(index%*4)) = number%

Remember that you can only do this with byte and word operators, not with string or floating point.

In fact, you can do almost anything with this type of data that you could do with a 'normal' Basic variable. For example -

    array%!4 = 654
    array%!8 = 1560
    array%!12 = array%!4 + array%!8
    number% = array%!4 + 20
    array%!4 = array%!4 + 200
    PRINT array%!8

Array boundary checking

If you try to address an element of an array that doesn't exists, for example, if you create an array with 20 elements and try to store something in element number 22, then Basic will generate a Subscript out of range error. This will not happen with a byte array. If you create a byte array of 100 bytes and then try to store an integer at an address 120 bytes after the start of the array Basic will do exactly that. Unfortunately this will probably be disastrous. Your number will be stored 20 bytes after the end of the 'safe' allocated RAM, probably overwriting another variable or some other important part of Basic's workspace. The result might not be immediate catastrophe, but sooner or later your program will almost certainly crash with an apparently unconnected error. It is therefore absolutely vital that when you use a byte array you make sure that you always keep within its boundaries. If in doubt, make the array a bit bigger than strictly necessary to allow for slight 'overruns'.

Initialising arrays

When an array, but not a byte array, is created every element is set to zero or, in the case of a string array, to an empty string. With a byte array there is no actual creation process, Basic merely sets aside the required number of bytes of RAM, and so this RAM will contain whatever random data it happened to have.

It is possible to set every element of an array to any chosen value. This is done by using -

    <array name>() = <value>

Note that this is just the same as the method used to assign a value to a single element of an array except that the brackets are empty. For example -

    numbers%() = 100

would set every element of the integer array numbers% to the value 100.

    names$() = ""

would set every element of the string array names$ to a null string. It does not matter whether the array is single or multi dimensional, every element will be set. If you want to set part of an array or a byte array you will have to use another method.

The FOR NEXT loop

I have previously described one way of making Basic operate in a loop, the REPEAT - UNTIL structure. There are, in fact, two other types of loop, and one of the most useful is FOR - NEXT. It is also the fastest, so it is the best to use for things like setting multiple elements of an array to a value.

The structure of the loop is -

    FOR <variable> = <number> TO <number> [STEP ]
      (code to be executed)
    NEXT

The main feature of the FOR NEXT loop is that it is executed a fixed number of times instead until a certain condition is met. For example, to set elements 20 to 30 of the integer array 'number%' to zero -

    FOR count% = 20 TO 30
    numbers%(count%) = 0
    NEXT

If, as in this case, the STEP keword is omitted then the loop counter, which in this case is the variable count%, will increment in single steps. The code above is therefore exactly equivalent to -

    FOR count% = 20 TO 30 STEP 1
    numbers%(count%) = 0
    NEXT

The loop counter does not have to move in positive steps, they can be negative, so the same result would be obtained by -

    FOR count% = 30 TO 20 STEP -1
    numbers%(count%) = 0
    NEXT

Although the loop counter must obviously be a number or a numeric variable, the upper and lower limits can be numbers, variables, or expressions. All that really matters is that they can be evaluated to numbers. For example, our previous loop could have been written as -

    start% = 20
    end% = 30
    FOR count% = start% TO end%
    numbers%(count%) = 0
    NEXT

The strict syntax of Basic specifies that the loop variable that the NEXT keyword applies to should appear after the word, so in all the previous examples the last line should be -

    NEXT count%

You can do this if you wish, it sometimes helps to make the code easier to read with very complex structures having lots of FOR NEXT loops as you can see which NEXT is related to a FOR. However, as FOR NEXT loops should never overlap there is no structural reason to do this.

Althoughwe haven't done much actual programming in this session we've covered some very important topics. You should therefore have plenty of new things to try out for yourself, which is what this series is intended to encourage you to do.

David Holden