Index


RISC World

RISCLua programming exercise

Gavin Wraith on extending Lua with a construct to iterate over the contents of a directory.

It involves only a few lines of code in ARM assembler and C to provide an iterator function dir for Lua so that syntax of the form:

equation

can be used. The report is intended as a small illustration to show how the RISC OS PRMs (Programmer's Reference Manuals), the APCS (ARM Procedure Call Standard) and the Lua/C API (Application Program Interface) can be used together to extend Lua on a RISC OS computer.

Lua was designed to be easy to extend in this sort of way. In fact Programming in Lua, by Roberto Ierusalimschy, ISBN 85-903798-1-7, has an example in the last chapter (29, Managing Resources) of implementing a directory iterator for Unix. But Unix directories are not quite the same as RISC OS ones. Although I had no doubt that compiling the code given in the book, using Unixlib,would provide something that worked, I still felt that that was a long way round. I wanted to work it out for myself from first principles, using no extra baggage. I need hardly say that iterating over a directory is a facility that cannot be already provided in Lua, because Lua is written in ANSI C. The notion of directory is far too platform-dependent to be part of this. Lua is intended to be usable on devices like washing machines or snowmobiles which may not even have filing systems, let alone directories.

Let us start at the bottom, and consult the PRMs to see which bits of RISC OS let us discover the contents of a directory. In volume 2, page 70, we find details of the SWI (SoftWare Interrupt) OS_GBPB,which returns information about a named directory if entered with R0 having the values 9,10,11 or 12. The simplest variant is OS_GBPB 9, so that is the one we will use. This returns information only on the names of the objects in the directory, whereas the others also give further information (size, access permissions, datestamp, filetype, etc). Generalizing what is done here to a fancier construct, say:

equation

Which provides the names and filetypes of all the objects whose names match the wild-carded string leafpattern is only a little more complicated, and can be left as an exercise. I will try to steer the simplest path.

So we enter the SWI with:

R0 = 9
R1 = pointer to directory name (it will be null-terminated)
R2 = pointer to a buffer for the object's name
R3 = 1 (number of objects to be read)
R4 = offset of object to read (0 initially)
R5 = length of buffer
R6 = 0 (match all object names)

On exit from the SWI, R0,R1,R2,R5 and R6 are preserved. R3 is the number of objects read. R4 is the offset of the next object to read, or -1 if there are no more. If R3 = 1 then the object name read will be a null-terminated string in the buffer pointed at by R2. This tells us what goes on at a low level. We note that R4 is a sort of iteration variable. The PRMs do not tell us what the values of R4 are, and we do not need to know, apart from the fact that we start the loop with R4 = 0 and, if it terminates without error, it must terminate with R4 = -1.

We must now use the Objasm assembler to convert this information into a C function. The APCS says (I oversimplify) that C functions are realized by ARM subroutines, with the first four arguments to the function held in the registers R0,R1,R2 and R3 and the remaining arguments on a full descending stack pointed to by R13. The returned value is held in R0. The return address is held in R14. We will name the C function rdir.

The idea is that rdir will return the new offset. The assembler code for this does most of its work shuffling register values. We write this in a file s.sys containing:

; s.sys

AREA RDIR,CODE

EXPORT rdir

rdir

STMFD sp!,{R1-R6,R14}

MOV R6,#0 ; no pattern to match

MOV R5,R2 ; size of buffer

MOV R4,R3 ; offset

MOV R3,#1 ; number of objects to read

MOV R2,R1 ; buffer

MOV R1,R0 ; directory name

MOV R0,#9

SWI &2000C ; XOS_GBPB - error returning SWI

MOVVS R3,#0 ; in case of error state none read

CMP R3,#1 ; anything read?

MOVEQ R0,R4 ; if yes return offset

MVNNE R0,#0 ; if no return -1

LDMFD sp!,{R1-R6,pc}

END

The Objasm assembler will convert s.sys into an AOF (Acorn Object Format) file o.sys. The linker does not care how AOF files have been created, whether compiled from C or assembled with Objasm. However, to be referenced by C code our rdir subroutine has to be dressed in respectable C clothes - that is, given a prototype declaration.

We create a header file h.rdir containing rdir's prototype declaration.

/* rdir.h */

extern int rdir(const char *dir, const char *buf, int buflen, int offset);

Note that this function has just four arguments, which the APCS tells us will be the contents of registers R0-R3 when the subroutine rdir is entered. To see how this function is to be used to implement a directory iterator in Lua, we have to alter our perspective for a moment and look at Lua's very general notion of for-loop. This is detailed in section 2 of chapter 7 of the book, under the title The Semantics of the Generic for , which I paraphrase here in a simplified form that is adequate to our special case.

In general, a construction like:

equation

in Lua is equivalent to

equation

equation

equation

equation

equation

equation

equation

equation

equation

We call f the iterator function, s the invariant state and i the control variable. The loop is terminated when the control variable becomes nil.

In our case there is only one variable () given by the object name, and exprlist consists of the single item . The invariant state can be taken to be nil and the initial value of the control variable can also be taken to be nil ; subsequently it is the object name. So the function dir must return just the iterator function. In our case case the iterator function will be realized as a C function that returns the object name, or nil if there are no more objects. It will produce its result using internal state provided by dirname and the mysterious offset value used and returned by rdir.

In Lua, functions are first-class citizens. That is, they can be assigned to variables just like other sorts of values. In C and Basic, functions cannot be assigned to variables. There, they are indissolubly wedded to one name, which they get when they are defined, and which are part of their definition. The Lua bulletin boards are full of messages from veterans of C and Basic who ask questions like when I am debugging how do I find out the name of the function I am in? This makes no sense, because functions in Lua do not necessarily have a name. It is like asking, if I have a value of 77 how do I find out the name of the variable that is storing it?

Lua has lexical scoping. This means that a local variable is visible only within the block in which it is defined, and its subblocks, and after its declaration as a local variable. Local variables can hold function values, of course. So what happens if such a local function value is returned as a result from the body of an enclosing function? What happens to the values of local variables defined in that enclosing function that happen to be used in the returned function? Consider, for example:

equation

equation

equation

equation

equation

equation

The value returned by fred is a function that depends on the local variable y which is not an explicit parameter, like z. The Lua terminology is that y is an upvalue of the function returned by fred. It depends, of course, on the actual invocation of fred that is used. What is returned by fred is not simply a function but a closure, the word used in computer science to mean the combination of function together with an environment (i.e. list of bindings) that interprets the function's upvalues - the variables it uses that are defined in an enclosing scope. If you are familiar with a functional programming language this notion will not be strange.

To return to our directory iteration, what dir must do is return a closure consisting of the iterator function together with an environment for the directory name and for the offset variable, which the iterator function must update each time it is used. In C, because you cannot have local functions, you cannot really talk about upvalues. However, you can mimic them using static variables. These are variables whose scope is limited to the file in which they are defined. We need to define two C functions: one to provide the dir function in Lua, which we will call rdir_read and another to provide the iterator which we will call rdir_iter. We also need a static variable rdir_buf to provide the buffer to hold the results after the call to XOS_GBPB 9 in the function rdir. It will be defined by:

static const char rdir_buf[128];

I am presuming that 128 characters is sufficient for leaf names. All C functions that define functions in Lua take as a single argument a pointer to the Lua state and return an integer denoting the number of arguments the corresponding Lua function returns. So we will have a prototype declaration:

static int rdir_iter (lua_State *L);

Here is the C source in c.rdir.

/* rdir.c */

#include "lauxlib.h"

#include "lualib.h"

#include "rdir.h"

static int rdir_iter (lua_State *L);

static const char rdir_buf[128];

int rdir_read (lua_State *L) {

lua_pushnumber(L,0);

lua_pushstring(L,lua_tostring(L,1));

lua_pushcclosure(L,&rdir_iter,2);

return 1;

}

static int rdir_iter (lua_State *L) {

int offset = (int) lua_tonumber(L,lua_upvalueindex(1));

const char * dirname = lua_tostring(L,lua_upvalueindex(2));

int n = rdir(dirname,rdir_buf,128,offset);

if (n == -1)

lua_pushnil(L);

else {

lua_pushnumber(L, n);

lua_replace(L,lua_upvalueindex(1));

lua_pushstring(L,rdir_buf);

}

return 1;

}

Note that rdir_read was not declared as static. That is so that it is accessible elsewhere to be declared as code for the function dir. This function pushes an initial value of 0 for the offset onto the Lua stack, reads the dirname argument (lua_tostring(L,1)) and pushes that and finally pushes a pointer to rdir_iter and declares that these three items form a closure (lua_pushcclosure).

The rdir_iter function gets the upvalues offset and dirname and calls our low-level function rdir using them, and rdir_buf, as arguments. If it gets a value of -1 then it pushes a nil onto the stack otherwise it pushes the new offset and then pulls it off again with lua_replace to update the upvalue and finally pushes the object name in rdir_buf.

Bar a line or two of book-keeping for registering the new function and for amendments to the makefile to take account of o.sys and o.rdir, that is it. The reason that I have not included these details here is that in a real-life situation one hardly ever extends by a single function. One usually extends by a whole library of functions. In the case of dir , for example, it would be sensible to include also functions to return the filetype of a file, and other properties. It is the convention in Lua to implement such libraries as tables, and I wanted to avoid getting into this topic.

An important aspect of the code presented here is that it lets us iterate over huge directories without incurring any storage penalties, so long as we use sensible code in the body of the loop. This is in contrast to the technique of doing all the calls to XOS_GBPB 9 at once to build up a table of the directory's contents. Big directories will mean a big table, and that might not be appropriate. This way at least gives you the option of parsimony.

I doubt whether I would have been able to implement this little project if I had not got the book to explain what was behind the definitions in the Lua reference manual. You might say that it was an exercise to help my understanding of the book. I was astonished at how little coding was required, in fact. I have written this up to try and give heart to those who think that they too might attempt something similar, whether they be RISC OS programmers who know about C and ARM assembler but not about Lua, or those who know Lua but nothing about RISC OS.

Gavin Wraith

 Index