This is a read only copy of the old FEniCS QA forum. Please visit the new QA forum to ask questions

Parallel How and Why.

+2 votes

Hello everybody,

I try to run my fenics python code in parallel on both my notebook (with Ubuntu 14.04 LTS and the default 1.6 ppa dolfin) and some cluster with the hashdist 1.6 version.

On my notebook I get a parallel execution with 4 threads without setting anything and without running with mpirun just by starting the script. On the cluster I get only one thread and it seems to be slower though each core is much faster.

When I run with mpirun -np 4 I get every print output and plot four times but the code doesn't seem to accelerate.

So the questions are:

  1. Why and how does it run in parallel on my notebook but not on the cluster?
  2. How do I correctly run with MPI? (as I couldn't find much else by searching)

Thanks for the effort,
Jo

asked Jun 6, 2016 by jenom FEniCS Novice (690 points)

2 Answers

+1 vote
 
Best answer

The parallel features in FEniCS target HPC computing on parallel
clusters. On laptops, the computations are often bounded by
the transfer of data between memory and CPU. Therefore
adding threads may not result in speed up.

answered Jun 7, 2016 by Kent-Andre Mardal FEniCS Expert (14,380 points)
selected Jun 7, 2016 by jenom

Thank you very much for your reply.

So if I get this right, the hashdist version does not focus on openmp and thus petsc is compiled without. If I choose to use MPI (for problems compex enough to profit) I will have to guard my log output somehow and the rest is done in the background?

Our compute server has 1TB shared memory with 80 cores so I think it might be more efficient to just use openmp, though of course the memory is not equally well connected to all processors. Is there a way to modify the fenics yaml in the hashdist, to compile petsc with openmp?

Runnning my scripts with OMP_NUM_THREADS=1 or OMP_NUM_THREADS=4 doesn't seem to change a thing.

Yes, this is probably the way to go.

+2 votes

There could be many reasons for slow performance. Have you set the environment variable OMP_NUM_THREADS=1 when running with parallel processes?

answered Jun 7, 2016 by nate FEniCS Expert (17,050 points)
...