Types of Parallel Models

Types of Parallel Models

The IRIX system supports a variety of parallel programming models. You can compare these models on two features:

Granularity The relative size of the units of computation that are affected: single statements, functions, or entire processes.
Communication channel The basic mechanism by which the independent, concurrent units of the program exchange data and synchronize their activity.

A summary comparison of the available models is shown in Table 3-1.

Comparing Parallel Models
Model Granularity Communication
Power Fortran(TM), IRIS POWER C(TM) Looping statement (DO or for statement) Shared variables in a single user address space.
Ada95 tasks Ada Procedure Shared variables in a single user address space.
POSIX threads C function Shared variables in a single user address space.
Lightweight UNIX processes (sproc()) C function Arena memory segment in a single user address space.
General UNIX processes (fork(), exec()) Process Arena segment mapped to multiple address spaces.
Remote Procedure Call (RPC) Process Memory copy within node or UDP or TCP network between nodes.
Portable Virtual Memory (PVM) Process Memory copy within node or TCP
socket between nodes.
Message-Passing (MPI) Process Memory copy within node or TCP socket between nodes.

Comparing Parallel Models
Model	Granularity	Communication
Power Fortran(TM), IRIS POWER C(TM)	Looping statement (DO or for statement)	Shared variables in a single user address space.
Ada95 tasks	Ada Procedure	Shared variables in a single user address space.
POSIX threads	C function	Shared variables in a single user address space.
Lightweight UNIX processes (sproc())	C function	Arena memory segment in a single user address space.
General UNIX processes (fork(), exec())	Process	Arena segment mapped to multiple address spaces.
Remote Procedure Call (RPC)	Process	Memory copy within node or UDP or TCP network between nodes.
Portable Virtual Memory (PVM)	Process	Memory copy within node or TCP socket between nodes.
Message-Passing (MPI)	Process	Memory copy within node or TCP socket between nodes.

Statement-Level Parallelism

Parallelism at the finest level of granularity is provided for three languages:

MIPSpro Fortran 77 supports compiler directives that command parallel execution of the bodies of DO-loops. The MIPSpro POWER Fortran 77 product is a preprocessor that automates the insertion of these directives in a serial program.
MIPSpro Fortran 90 supports parallelizing directives similar to MIPSpro Fortran 77, and the MIPSpro POWER Fortran 90 product automates their placement.
MIPSpro POWER C supports compiler pragmas that command parallel execution of segments of code. The IRIS POWER C analyzer automates the insertion of these pragmas in a serial program.

In all three languages, the run-time library--which provides the execution environment for the compiled program--contains support for parallel execution. The compiler generates library calls that create subprocesses and distribute loop iterations to them.

The run-time support can adapt itself dynamically to the number of available CPUs. Alternatively, you can control it--using program source statements or using environment variables at execution time--to use a certain number of CPUs.

Statement-level parallel support is based on using common variables in memory, and so it can be used only within the bounds of a single-memory system, a CHALLENGE or a single node in a POWERCHALLENGEarray.

Thread-Level Parallelism

A thread is an independent execution state within the context of a larger program. A UNIX process normally consists of an address space and one thread, together with a large collection of state information: a table of open files, a set of signal handlers, a process ID, an effective user ID, and so on.

There are three key differences between a thread and a process:

A UNIX process has its own set of UNIX state information, for example, its own effective user ID, signal handlers, and set of open file descriptors.
Threads exist within a process and do not have distinct copies of these UNIX state values. Threads share the single state belonging to their process.
Normally, each UNIX process has a unique address space of memory segments that are accessible only to that process (lightweight processes created with sproc() share an address space; see "Process-Level Parallelism").
Threads within a process share the single address space belonging to their process.
Processes are scheduled by the IRIX kernel. A change of process requires two context changes, into the kernel domain and back to the user domain of the next process. Since a process carries a large amount of state information, the change from the context of one process to the context of another can entail many instructions.
In contrast, threads are scheduled by code that operates almost entirely in the user domain without kernel interference. Since threads have less state information, thread scheduling is faster than process scheduling.

At this time, IRIX supports only one thread per process. However, Silicon Graphics, Inc. has announced the intention of supporting the POSIX standard for multithreaded applications in a future release.

In the meantime, the Silicon Graphics, Inc. implementation of the Ada 95 language includes support for multitasking Ada programs--using what are essentially threads in the meaning used here. For a complete discussion of the Ada 95 task facility, refer to the Ada 95 Reference Manual, which installs with the Ada 95 compiler (GNAT) product.

Process-Level Parallelism

A UNIX process consists of an address space, a varied set of state values, and one thread of execution. The main task of the IRIX kernel is to create processes and to dispatch them to different CPUs so as to maximize the utilization of the system.

IRIX contains a variety of interprocess communication (IPC) mechanisms, which are discussed in Chapter 2, "Interprocess Communication." These mechanisms can be used to exchange data and to coordinate the activities of multiple, asynchronous processes within a single-memory system. (Processes running in different nodes of an array must use one of the abstract models described in the next topic.)

In traditional UNIX practice, one process creates another with the system call fork(), which makes a duplicate of the calling process, after which the two copies execute concurrently. Typically the new process immediately uses the exec() function to load a new program.

The fork(2) reference page contains a complete list of the state values that are duplicated when a process is created. The exec(2) reference page details the process of creating a new program image for execution.

IRIX also supports the system function sproc(), which creates a lightweight process. A process created with sproc() shares some of its state values with its parent process (the sproc(2) reference page details how this sharing is specified).

In particular, a lightweight process does not have its own address space; it continues to execute in the address space of the original process. In this respect, a lightweight process is like a thread (see "Thread-Level Parallelism"). However, a lightweight process differs from a true thread in two significant ways:

A lightweight process still has a full set of UNIX state values, including its own signal handlers. Some of these values, for example the table of open file descriptors, can be shared with the parent process, but in general a lightweight process carries more state information than a thread.
Dispatch of lightweight processes is done in the kernel, and a context switch between lightweight processes, even when they share the same address space, is time-consuming.

The library support for statement-level parallelism is based on the use of lightweight processes, coordinating their activities through semaphores (see "Statement-Level Parallelism" and "Using IRIX Semaphores").

Portable, Abstract Models

There are three portable, abstract models of parallel execution that are supported by Silicon Graphics, Inc. systems. Each provides a method of distributing a computation within a single-memory system or across the nodes of a multiple-memory system, without having to reflect the system configuration in the source code. The three programming models are:

Message-Passing Interface (MPI)
Portable Virtual Memory (PVM)
Remote Procedure Call (RPC) interface

Each of the three has its particular strengths and weaknesses.

Message-Passing Interface (MPI) Model

MPI is a portable standard programming interface for the construction of a portable, parallel application in Fortran 77 or in C, especially when the application can be decomposed into a fixed number of processes operating in a fixed topology (for example, a pipeline, grid, or tree).

A highly tuned, efficient implementation of MPI is included with the Array software CD for Array systems such as the POWER CHALLENGEarray. MPI is the recommended parallel model for use with Array products.

MPI is discussed in more detail under "Using MPI and PVM".

Portable Virtual Machine (PVM) Model

PVM is an integrated set of software tools and libraries that emulates a general-purpose, flexible, heterogeneous, concurrent computing framework on interconnected computers of varied architecture. Using PVM, you can create a parallel application that executes as a set of concurrent processes on a set of computers. The set can include Silicon Graphics, Inc. uniprocessors, multiprocessors, and nodes of Array systems.

An implementation of PVM is included with the Array software CD for Silicon Graphics, Inc. Array systems. PVM has a better ability to deal with a heterogenous computer network than MPI does. In every other way, MPI is preferable. When the application runs in the context of a single Array system, an MPI design has better performance.

PVM is discussed in more detail under "Using MPI and PVM".

Remote Procedure Call (RPC) Model

RPC is a standard programming interface originally developed at Sun Microsystems, Inc. and used as the basis of Sun's Network File System (NFS) standard. RPC is used extensively within the IRIX system (and in most current UNIX implementations) to provide NFS and network management services.

The purpose of the RPC interface is to distribute services across a network, so that one program can easily supply a service to all others. An RPC server program registers the services it can provide with RPC. A client program anywhere in the network can issue a remote procedure call for a registered service, and the RPC interface takes care of locating the server program, invoking its service, and returning the result values to the caller.

RPC by itself does not support concurrent execution. A remote procedure call, like a local procedure call, is synchronous; that is, the caller is blocked until the called procedure completes its work. RPC is a method of distributing a computation over a network, not a method of parallel execution. However, RPC can be combined with other parallel execution models. For example, a thread or lightweight process can issue remote procedure calls.

RPC libraries are included in IRIX. For an overview of RPC programming, see the IRIX Network Programming Guide. For further details, refer to the rpc(3R) reference page.

Granularity	The relative size of the units of computation that are affected: single statements, functions, or entire processes.
Communication channel	The basic mechanism by which the independent, concurrent units of the program exchange data and synchronize their activity.