This is a read only copy of the old FEniCS QA forum. Please visit the new QA forum to ask questions

parallel execution does not work (anymore?)

0 votes

Hi,

I just noticed that running jobs in parallel doesn't work correctly. The MWE below doesn't show correct MPI rank and world size at all, it rather looks like MPI executes N copies of the code instead.

from __future__ import absolute_import, print_function

from dolfin import *

mpi_rank = MPI.rank(mpi_comm_world())
mpi_size = MPI.size(mpi_comm_world())

print("rank / size: {0} / {1}".format(mpi_rank, mpi_size))

Output:

fenics@5d9c83c65909:~/shared/scratch/mpi_test$ python mpi_test.py 
rank / size: 0 / 1
fenics@5d9c83c65909:~/shared/scratch/mpi_test$ mpirun -n 4 python mpi_test.py 
rank / size: 0 / 1
rank / size: 0 / 1
rank / size: 0 / 1
rank / size: 0 / 1 

I believe that I saw some other code running in parallel correctly, but that's a few weeks back before I pulled the latest stable Docker image.

Can anyone reproduce this? Or what am I doing wrong here?

asked Sep 20, 2016 by smiter FEniCS Novice (190 points)

Works fine for me in the quay.io/fenicsproject/dev Docker image:

docker run --rm -ti quay.io/fenicsproject/dev
[...]
fenics@ca41b9567784:~$ python mpi_test.py 
rank / size: 0 / 1
fenics@ca41b9567784:~$ mpirun -np 4 python mpi_test.py 
rank / size: 1 / 4
rank / size: 3 / 4
rank / size: 2 / 4
rank / size: 0 / 4

I'm completely stunned. Pulling the latest stable Docker image triggered again the download of a few chunks, but the result remains the same.

Running even a simple MPI C program fails to give the correct output:

#include <mpi.h>
#include <stdio.h>

int
main(int argc, char** argv)
{
    MPI_Init(NULL, NULL);

    int mpi_size;
    MPI_Comm_size(MPI_COMM_WORLD, &mpi_size);

    int mpi_rank;
    MPI_Comm_rank(MPI_COMM_WORLD, &mpi_rank);

    printf("rank / size: %d / %d\n", mpi_rank, mpi_size);

    MPI_Finalize();

    return 0;
}

Output:

fenics@5d9c83c65909:~/shared/scratch/mpi_test$ mpirun -n 4 ./mpi_test 
rank / size: 0 / 1
rank / size: 0 / 1
rank / size: 0 / 1
rank / size: 0 / 1

Any ideas except re-creating the Docker container from scratch?

Are you running Docker on Mac or Windows? Maybe you need to allocate more CPU's to the virtual machine?

Ah, Debian's alternatives got somehow messed up. Compilers were somehow linked to OpenMPI, while mpirun and mpiexec pointed to MPICH. After forcing a reinstall of the mpich package and updating all symlinks to MPICH, the output is as expected:

fenics@5d9c83c65909:~/shared/scratch/mpi_test$ mpirun -n 4 ./mpi_test
rank / size: 0 / 4
rank / size: 1 / 4
rank / size: 2 / 4
rank / size: 3 / 4
fenics@5d9c83c65909:~/shared/scratch/mpi_test$ python mpi_test.py 
rank / size: 0 / 1
fenics@5d9c83c65909:~/shared/scratch/mpi_test$ mpirun -n 4 python mpi_test.py 
rank / size: 2 / 4
rank / size: 1 / 4
rank / size: 0 / 4
rank / size: 3 / 4

How and why that happened I do not know, but it's worth remembering ;-)

Thanks for your help!

...