This is a read only copy of the old FEniCS QA forum. Please visit the new QA forum to ask questions

Speed Up options in FeniCS?

0 votes

So I am playing around a Hyperelasticity sample. I have 16 cores on one compute unit. And if needed I can have many of them. they also provide GPU support. So I wonder what optins can be used to paralelize given sample?

I would love to know if there is simple\lazy multythreading on\off switch?
I tried parameters["num_threads"] = 6 as first line in parameter defenitions got this
enter image description here

Sadly have not found list of possible PETScOptions or other parameters related threading/GPU options. Is there any list? Could you point in some direction/provide one?

Is MPI is only option standing? How one shall change original sample code to run it on MPI? (on one Compute node for example) Are there special compiler/execution flags\parameters required to run it?

probably offtopic: found FEM implementation on GPU via PETSC (SNES ex52) that shows OpenCL/CUDA FEM solvers, do you happen to have any GPGPU related development brunch?

asked Oct 24, 2014 by Panda FEniCS Novice (440 points)

1 Answer

+2 votes
 
Best answer

Usually, you can just run your code with e.g.:
mpirun -n 8 python example.py

If you run on a single node with multiple cores, the communication will use a shared memory transfer layer, so should be quite efficient.

answered Oct 24, 2014 by chris_richardson FEniCS Expert (31,740 points)
selected Oct 26, 2014 by Panda

And would parameters["num_threads"] specification be enough to parelalize given example (Hyperelasticity sample)? Which parts would/would not be paralellised?

Threading may speed up assembly, but is not compatible yet with mpi. Use one or the other.

So If I write parameters["num_threads"] it would not require MPI and just be able to use threads where it can? Grate! This leads me to an error in screanshot above.Are there any special requirements on compilation\execution to use parameters["num_threads"]?

Well it works out from the box and speed up is real (nearly x16 on 16 threads).!=)
Yet result is quite strange:

enter image description here

It seems to automatically separate mesh in 16 parts and generated 16 vtu files.. and opened 16 UI windows with them inside. How to merge resuls into single vtu (and demonstrate as one mesh in UI)?

install paraview to view vtu files...

...