Parallel PIKAIA Homepage





Welcome to the Parallel PIKAIA Homepage. If you are interested in using the genetic algorithm based FORTRAN-77 optimization subroutine PIKAIA, and the modeling fitness-function that you want to maximize is computationally intensive, you have come to the right place. Here you will find all that you need to run PIKAIA in parallel (using the "Full Generational Replacement" evolutionary strategy) on your Linux cluster or supercomputer.

PIKAIA was developed by Paul Charbonneau and Barry Knapp at the High Altitude Observatory of the National Center for Atmospheric Research, and version 1.2 was released in April 2002. The parallel implementations found here were developed by Travis Metcalfe: the PVM version was part of his doctoral thesis in astronomy at the University of Texas at Austin, and the MPI version was developed during a postdoctoral fellowship at the Harvard-Smithsonian Center for Astrophysics



Quick start

PVM

MPI


Background

In 1998 we began a project to adapt some computationally intensive codes to interface with PIKAIA. On the fastest processors available at the time, a single model would run in about 45 seconds. We knew that the optimization would require ~106 model evaluations, so it was clear that a serial version of PIKAIA would require many months to finish on a single processor. To tackle the problem on a reasonable timescale, we decided to incorporate the message passing routines of the
Parallel Virtual Machine (PVM) software [and later the Message Passing Interface (MPI)] into the "Full Generational Replacement" evolutionary strategy of PIKAIA. These software packages allow a collection of networked computers to cooperate on a problem as if they were a single multi-processor parallel machine. All of the software and documentation is free. We had no trouble installing them on our Linux cluster and the sample programs that come with the distributions make them easy to learn and use. The trickiest part of the procedure was deciding how to split up the workload among the various computers.


Splitting up the workload

A genetic algorithm based fitting procedure quite naturally divides into two basic functions: evaluating the modeling fitness-function, and applying the genetic operators to each generation once the fitnesses had been calculated. When we profiled the distribution of execution time for each part of our code, this division became even more obvious. As with the vast majority of real-life applications, fitness evaluation is by far the most computationally demanding step. For our application, 93% of CPU time is spent carrying out fitness evaluation, 4% carrying out breeding and GA internal operations (such as mutation rate adjustment), and 3% for system and I/O. It thus seemed reasonable to allocate slave tasks to perform the model calculations, while a master task took care of the GA-related operations. In addition to decomposing the function of the code, a further division based on the data was also possible. Fitness evaluation across the population is inherently a parallel process, since each model can be evaluated independently of the others. Moreover, it requires minimal transfer of information, since all that the user-supplied function requires is the n-dimensional floating-point array of parameters defining one single instance of the model, and all it needs to return is the floating-point value corresponding to the model's fitness. It is then natural to send one model to each available processor, so the number of machines available would control the number of models that could be calculated in parallel. Maximal use of each processor is then assured by choosing a population size that is an integer multiple of the number of available processors.

In practice, this recipe for dividing the workload between the available processors proved to be very scalable. Since very little data is exchanged between the master and slave tasks, our 64-node Linux cluster provided a speedup factor of about 53 over the performance on a single processor.


PVM Version

Master Program
Starting with the source code for PIKAIA 1.2, we used the message passing routines from PVM to create a parallel fitness evaluation subroutine. The original code evaluates the fitnesses of the population one at a time in a DO loop. We replaced this procedure with a single call to a new subroutine that evaluates the fitnesses in parallel on all available processors. This parallel version of PIKAIA (
pikaia_master.f) along with the parallel fitness evaluation subroutine (pvm_fitness.f) constitutes the master program, which runs on the central computer of our Linux cluster.

After starting the slave program on every available processor, pvm_fitness.f sends an array containing scaled values of the parameters to each slave job over the network. In the first generation, these values are completely random; in subsequent generations, they are the result of the selection, crossover, and mutation of the previous generation, performed by the non-parallel portions of PIKAIA. Next, the subroutine listens for responses from the network and sends a new set of parameters to each slave job as it finishes the previous calculation. When all sets of parameters have been sent out, the subroutine begins looking for jobs that may have crashed and re-submits them to slaves that have finished and would otherwise sit idle. If a few jobs do not return a fitness after an elapsed time much longer than the average runtime required to compute a model, the subroutine assigns them a fitness of zero. When every set of parameters in the generation have been assigned a fitness value, the subroutine returns to the main program to perform the genetic operations -- resulting in a new generation of models to calculate. The process continues for a fixed number of generations, chosen to maximize the success rate of the search.

Slave Program
To allow many instances of our code to run simultaneously, we added a front end that communicates with the master program through PVM routines. This code (ff_slave.f) combined with the fitness function (userff.f) constitutes the slave program, and is run on each node of our Linux cluster. The operation of the slave program is relatively simple. Once it is started by the master program, it receives a set of parameters from the network. It then calls the fitness function (a parallel version of the sample fitness function twod in this distribution) with these parameters as arguments, and returns a fitness value to the master program.

Compiling and Running
This distribution of PVM-PIKAIA was developed to work under PVM 3.4.4. Once you have obtained and installed this software, you can compile the sample program using aimk (architecture independent make), which comes with the distribution. Place all of the source code for Parallel PIKAIA and our Makefile.aimk in the $PVM_ROOT/examples directory, and type aimk pikaia_ff. The code can then be run in parallel once the pvm daemon has been started on every available node of your machine.


MPI Version

Dual-Function Front End
MPI requires the user to specify the number of available processors manually at the beginning of the run. It then spawns that many instances of the specified program: the first one acts as the master program, and all the rest run as slaves. This dual-functionality requires a front end program (
mpi_pikaia.f) to determine whether a given instance is the master or the slave, to call the appropriate code as a subroutine, and to terminate all of the jobs when PIKAIA is done.

Master Subroutine
The master subroutine for MPI (pikaia_master.f) is nearly identical to the PVM version -- instead of evaluating the fitnesses of the population one at a time in a DO loop, we make a single call to a new subroutine that evaluates the fitnesses in parallel. For the MPI version, there is simply a different parallel fitness evaluation subroutine (mpi_fitness.f).

Just as with the PVM version mpi_fitness.f sends an array containing scaled values of the parameters to each slave job over the network. In the first generation, these values are completely random; in subsequent generations, they are the result of the selection, crossover, and mutation of the previous generation, performed by the non-parallel portions of PIKAIA. Next, the subroutine listens for responses from the network and sends a new set of parameters to each slave job as it finishes the previous calculation. When all sets of parameters have been sent out, the subroutine begins looking for jobs that may have crashed and re-submits them to slaves that have finished and would otherwise sit idle. If a few jobs do not return a fitness after an elapsed time much longer than the average runtime required to compute a model, the subroutine assigns them a fitness of zero. When every set of parameters in the generation have been assigned a fitness value, the subroutine returns to the main program to perform the genetic operations -- resulting in a new generation of models to calculate. The process continues for a fixed number of generations, chosen to maximize the success rate of the search.

Slave Subroutine
The slave subroutine communicates with the master task through MPI calls. This code (ff_slave.f) relies on a user-defined fitness function (userff.f), and runs simulataneously on every available processor. The operation of each slave task is relatively simple. Once it is started, it waits to receive a set of parameters from the master task over the network. It then calls the fitness function (a parallel version of the sample fitness function twod in this distribution) with these parameters as arguments, and returns a fitness value to the master task.

Compiling and Running
This distribution of MPI-PIKAIA was developed to work under MPICH 1.2.5. Once you have obtained and installed this software, you can compile the sample program using make with the provided Makefile. Simple place all of the source code for MPI-PIKAIA in the directory where you want it to run, and type make pikaia. The code can then be run in parallel by issuing a command like: mpirun -np <#proc> pikaia


Contact Travis Metcalfe with questions and comments about this page