This is a read only copy of the old FEniCS QA forum. Please visit the new QA forum to ask questions

ffc failure with petsc on more than one compute node

0 votes

Hi,

We've a code that is failing when it runs this function:

V = VectorFunctionSpace(mesh, "CG", 2)

On more than one compute node. It fails at the ffc stage:

[0]PETSC ERROR: ------------------------------------------------------------------------
[0]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, probably memory access out of range
[0]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger
[0]PETSC ERROR: or see http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind[0]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS X to find memory corruption errors
[0]PETSC ERROR: configure using --with-debugging=yes, recompile, link, and run
[0]PETSC ERROR: to get more information on the crash.
Rank 0 [Fri Oct 9 16:42:35 2015] [c3-0c0s2n1] application called MPI_Abort(MPI_COMM_WORLD, 59) - process 0
In instant.recompile: The module did not compile with command 'cmake -DDEBUG=TRUE .', see '/work/z01/z01/adrianj/.instant/ffc_form_e1514c0a3fffa4b6c0eda574739266f451a578aa/compile.log'
Traceback (most recent call last):
File "multiple_solitons.py", line 184, in
V = VectorFunctionSpace(mesh, "CG", 2)
File "/work/y07/y07/cse/fenics/1.5.0/lib/python2.7/site-packages/dolfin/functions/functionspace.py", line 628, in init
constrained_domain=constrained_domain)
File "/work/y07/y07/cse/fenics/1.5.0/lib/python2.7/site-packages/dolfin/functions/functionspace.py", line 153, in init
ufc_element, ufc_dofmap = jit(self._ufl_element, mpi_comm=mesh.mpi_comm())
File "/work/y07/y07/cse/fenics/1.5.0/lib/python2.7/site-packages/dolfin/compilemodules/jit.py", line 68, in mpi_jit
output = local_jit(*args, **kwargs)
File "/work/y07/y07/cse/fenics/1.5.0/lib/python2.7/site-packages/dolfin/compilemodules/jit.py", line 128, in jit
return form_compiler.jit(form, parameters=p)
File "/work/y07/y07/cse/fenics/1.5.0/lib/python2.7/site-packages/ffc/jitcompiler.py", line 72, in jit
return jit_element(ufl_object, parameters)
File "/work/y07/y07/cse/fenics/1.5.0/lib/python2.7/site-packages/ffc/jitcompiler.py", line 177, in jit_element
compiled_form, module, prefix = jit_form(form, parameters)
File "/work/y07/y07/cse/fenics/1.5.0/lib/python2.7/site-packages/ffc/jitcompiler.py", line 148, in jit_form
cache_dir = cache_dir)
File "/work/y07/y07/cse/fenics/1.5.0/lib/python2.7/site-packages/ffc/backends/ufc/build.py", line 73, in build_ufc_module
**kwargs)
File "/work/y07/y07/cse/fenics/1.5.0/lib/python2.7/site-packages/instant/build.py", line 563, in build_module
recompile(modulename, module_path, new_compilation_checksum, build_system)
File "/work/y07/y07/cse/fenics/1.5.0/lib/python2.7/site-packages/instant/build.py", line 152, in recompile
instant_error(msg % (cmd, compile_log_filename_dest))
File "/work/y07/y07/cse/fenics/1.5.0/lib/python2.7/site-packages/instant/output.py", line 85, in instant_error
raise RuntimeError(text)
RuntimeError: In instant.recompile: The module did not compile with command 'cmake -DDEBUG=TRUE .', see '/work/z01/z01/adrianj/.instant/ffc_form

etc....

However, it works if I run this within a single compute node. It only fails if I try to use more than one compute node. Our compute nodes have 24 cores, so the above will work up to 24 cores inside one node, but will fail if, for instance, I use 12 cores on 2 nodes (giving 24 altogether).

Any ideas what would be causing this. Is it a petsc/petsc4py thing?

I've cleaned the instant cache out before the runs.

thanks

adrianj

closed with the note: Please send a message to fenics-support@fenicsproject.org.
asked Oct 9, 2015 by adrianj FEniCS Novice (120 points)
closed Oct 11, 2015 by Garth N. Wells
...