This is a read only copy of the old FEniCS QA forum. Please visit the new QA forum to ask questions

OpenMPI 1.6.5 and threading causing poor performance

0 votes

I know hybrid MPI/threading is not supported in dolfin, but this is not really a question about that.

The problem I am having is that running jobs with mpirun is giving poor performance with openmpi. For some reason, openmpi seems to create a lot of threads, presumably for its own use, to use for communication. Turning them off makes it run a lot faster.
This is with openmpi 1.6.5 on ubuntu 14.04.

e.g. if I run the simple demo_poisson.py with a 320x320 mesh:


bash> time mpirun -n 6 python demo_poisson.py

real 0m40.151s

bash> # turn off threading
bash> export OMP_NUM_THREADS=1
bash> time mpirun -n 6 python demo_poisson.py

real 0m5.722s

Has anybody seen this before?

asked Sep 3, 2014 by chris_richardson FEniCS Expert (31,740 points)

Weird. I do not see this. Neither on my Mac nor on our cluster. Timings are around 5-6 s.

Yes I also thought it was weird. It is most pronounced on a dual socket 2x8-core server, but then I tried setting OMP_NUM_THREADS=1 on some other machines, and found a slight speedup too...

1 Answer

0 votes
 
Best answer

The linear solver is probably calling BLAS functions, and BLAS might be using more threads than you have cores.

answered Sep 3, 2014 by Garth N. Wells FEniCS Expert (35,930 points)
selected Sep 4, 2014 by chris_richardson

Unfortunately, I don't think the answer is that simple. In fact I also don't think it has anything to do with FEniCS/dolfin, which is a) a relief but b) means I shouldn't really be raising it here...
Look at this:

#!/usr/bin/python
from mpi4py import MPI 
import numpy
comm = MPI.COMM_WORLD
rank  = comm.Get_rank()


data = numpy.arange(1000, dtype='i')
# pass explicit MPI datatypes
while True:

   if rank == 0:
      data = numpy.arange(1000, dtype='i')
      comm.Send([data, MPI.INT], dest=1, tag=77)
   elif rank == 1:
      data = numpy.empty(1000, dtype='i')
      comm.Recv([data, MPI.INT], source=0, tag=77)

If I do

mpirun -n 2 example.py

and then

ps -eLf

I can see 32 threads - i.e. twice the number of cores on the machine.
It seems to be something about this installation of OpenMPI.

...