This is a read only copy of the old FEniCS QA forum. Please visit the new QA forum to ask questions

Error code 76 while using MUMPS on hpc

0 votes

Hi,

I am trying to solve a Navier-Stokes based problem (total degrees of freedom of around 3M). Due to the bad conditioning of the system (inherent to the method that I am using) the iterative solvers don't converge. That is why I use MUMPS. Using an hpc cluster and mpirun with 24 cores, the Stokes variant of the problem is solved in a reasonable amount of time. Yet when I try to solve the non-linear Navier-Stokes problem it exits with an error on the first Newton iteration. A trimmed down section of the output I get with a log-level of DBG (only 3 cores for this attempt):

Solving linear system of size 3109410 x 3109410 (PETSc LU solver, (null)).
Elapsed time: 19431.1 (PETSc LU solver)
Elapsed time: 19431.1 (LU solver)

*** Error: Unable to successfully call PETSc function 'KSPSolve'.
*** Reason: PETSc error code is: 76.
*** Where: This error was encountered inside /soft/fenics/1.5.0/src/dolfin-1.5.0/dolfin/la/PETScLUSolver.cpp.
*** Process: unknown


*** DOLFIN version: 1.5.0
*** Git changeset:
*** -------------------------------------------------------------------------

Does it make sense that it is using the PETSc LU solver? Is this somehow a part of MUMPS? I construct the problem and its solver parameters with the following lines:

problem = NonlinearVariationalProblem(F, upp, bcs, J)
solverNavier = NonlinearVariationalSolver(problem)

prm = solverNavier.parameters
prm['newton_solver']['relative_tolerance'] = 1E-4
prm['newton_solver']['maximum_iterations'] = 8
prm['newton_solver']['relaxation_parameter'] = 1.0
prm['newton_solver']['linear_solver'] = 'mumps'

solverNavier.solve()

On this forum I've seen a number of similar questions. It seems that this is a maximum memory issue of 4GB for the LU solver. All of those questions seem to be solved by using the MUMPS solver. Yet this is what I believe I have specified. Am I doing something completely wrong?

Thanks,
Stein

asked Apr 12, 2016 by Stein FEniCS Novice (190 points)

1 Answer

0 votes

MUMPS can be a bit buggy in my experience (also see this bug report). Try switching to superlu_dist as your linear solver. It's another direct solver that also works in parallel.

Alternatively, have you tried to solve a smaller problem size using MUMPS? It could just be a memory issue, especially considering you're trying to solve a large nonlinear system.

answered Apr 12, 2016 by FF FEniCS User (4,630 points)
edited Apr 13, 2016 by FF

Thanks for the response. I'm afraid that fenics is not configured with superlu_dist on the cluster. It does not show up in the list from list_linear_solver_methods().
Indeed I did solve some smaller problems with MUMPS (around 1M). Could a memory issue explain this though? Since the Stokes solution solves without any problem, and this is essentially the first Newton iteration. Also, if I log on to the compute node I see it only uses about 20% of the available memory.

Sorry, I can't comment too rigorously on the memory requirements (hopefully someone else can jump in on that). If you don't think it's a memory issue, then you may want to look into recompiling a separate copy of FEniCS on your cluster using HashDist which includes superlu_dist.

...