A complete knight tour

The following is a blackboard based complete knight-tour, adapted from Evan Tick's well known benchmark program.

% recommended use: ?-go(5).
go(N):-
	time(_),
	  init(N,M),             % prepare the chess board
	  knight(M,1,1),!,       % finds the first complete tour
	time(T),
  write(time=T),nl,statistics,show(N). % shows the answer

% fills the blackboard with free logical variables
% representing empty cell on the chess board
init(N,_):-
  for(I,1,N),     % generates I from 1 to N nondeterministically
    for(J,1,N),   % for/3 is the same as range/3 in the Art of Prolog
      bb_def(I,J,_NewVar),  % puts a free slot in the hashing table
  fail.
init(N,M):-
  M is N*N.                 % returns the number of cells

% tries to make a complete knight tour
knight(0,_,_) :- !.
knight(K,A,B) :-
  K1 is K-1,
  val(A,B,K),  % here we mark (A,B) as the K-th cell of the tour
  move(Dx,Dy), % we try a next move nondeterministically
  step(K1,A,B,Dx,Dy).

% makes a step and then tries more
step(K1,A,B,Dx,Dy):-
    C is A + Dx,
    D is B + Dy,
    knight(K1,C,D).

% shows the final tour
show(N):-
  for(I,1,N),
    nl,
    for(J,1,N),
      val(I,J,V),
      write(' '),X is 1-V // 10, tab(X),write(V),
  fail.
show(_):-nl.

% possible moves of the knight
move( 2, 1). move( 2,-1). move(-2, 1). move(-2,-1).
move( 1, 2). move(-1, 2). move( 1,-2). move(-1,-2).

Constant time access in this kind of problems to cell(I,J) is essential for efficiency as it is the most frequent operation. While the blackboard based version takes 39s in BinProlog for a 5x5 squares chess board, an equivalent program representing the board with a list of lists takes 147s in BinProlog, 167s in emulated Sicstus 2.1 and 68 seconds in native Sicstus 2.1. Results are expected to improve somewhat with binary trees or functor-arg representation of the board but they will still remain worse than with the blackboard based sparse array, due to their relatively high log(N) or constant factor. Moreover, representing large size (possibly updatable!) arrays with other techniques is prohibitively expensive and can get very complicated due to arity limits or tree balancing as it can see for example in the Quintus library.