Going parallel/distributed

Question

Dear all,

I am moving my simple linear elastic program to parallel, and next, distributed environments. Right now, I just like to make threads and MPI work, recreating Figure 10.8 of the book (page 216), where I want to create a field based on the thread number or MPI rank.

First problem: I don't know how to retrieve the thread number. All I did is use parameters["num_threads"] = 6, but I cannot be sure it's working and spawning threads.

Second problem: MPI won't work on my system (MacOS X 10.9.4, Fenics application, not from sources). If I use parameters["mesh_partitioner"] = "SCOTCH", and solve(A, u.vector(), b, 'petsc'), I expected some error, but not a crash, as you can see below.

If I could make MPI and threads work I'd be very happy! The book isn't really helpful to me (I found only chapter 10.4.1, "Parallel computing"), so any hint would be really appreciated!

Thanks!

$ mpirun -np 4 python linelastic.py
Number of global vertices: 1028
Number of global cells: 1929
*** -------------------------------------------------------------------------
*** Warning: Parameters *_domains of assemble has been deprecated in DOLFIN version 1.4.0.
*** It will be removed from version 1.6.0.
*** Parameter *_domains of assemble will be removed. Include this in the ufl form instead.
*** -------------------------------------------------------------------------

I am rank *** -------------------------------------------------------------------------
*** Warning: MPI::process_number has been deprecated in DOLFIN version 1.4.
*** It will be removed from version 1.5.
*** MPI::process_number() has been replaced by MPI::rank(MPI_Comm).
I am rank *** -------------------------------------------------------------------------
*** Warning: MPI::process_number has been deprecated in DOLFIN version 1.4.
*** It will be removed from version 1.5.
*** MPI::process_number() has been replaced by MPI::rank(MPI_Comm).
I am rank *** -------------------------------------------------------------------------
I am rank *** -------------------------------------------------------------------------
*** Warning: MPI::process_number has been deprecated in DOLFIN version 1.4.
*** It will be removed from version 1.5.
*** MPI::process_number() has been replaced by MPI::rank(MPI_Comm).
*** -------------------------------------------------------------------------
*** -------------------------------------------------------------------------

 0

 2
*** -------------------------------------------------------------------------

 3
*** Warning: MPI::process_number has been deprecated in DOLFIN version 1.4.
*** It will be removed from version 1.5.
*** MPI::process_number() has been replaced by MPI::rank(MPI_Comm).
*** -------------------------------------------------------------------------

 1
[0]PETSC ERROR: --------------------- Error Message ------------------------------------
[0]PETSC ERROR: No support for this operation for this object type!
[0]PETSC ERROR: Matrix format mpiaij does not have a built-in PETSc LU!
[0]PETSC ERROR: ------------------------------------------------------------------------
[0]PETSC ERROR: Petsc Release Version 3.3.0, Patch 3, Wed Aug 29 11:26:24 CDT 2012 
[0]PETSC ERROR: See docs/changes/index.html for recent updates.
[0]PETSC ERROR: See docs/faq.html for hints about trouble shooting.
[0]PETSC ERROR: See docs/index.html for manual pages.
[0]PETSC ERROR: ------------------------------------------------------------------------
[0]PETSC ERROR: Unknown Name on a darwin13. named Senseis-MacBook-Pro.local by sensei Fri Jul 25 17:01:28 2014
[0]PETSC ERROR: Libraries linked from /Users/johannr/fenics-1.4.0/local/lib
[0]PETSC ERROR: Configure run at Tue Jun  3 13:32:33 2014
[0]PETSC ERROR: Configure options --prefix=/Users/johannr/fenics-1.4.0/local COPTFLAGS=-O2 --with-debugging=0 --with-clanguage=cxx --with-c-support=1 --with-blas-lapack-dir=/usr --with-umfpack=1 --with-umfpack-include=/Users/johannr/fenics-1.4.0/local/include/suitesparse --with-umfpack-lib="[/Users/johannr/fenics-1.4.0/local/lib/libumfpack.a,/Users/johannr/fenics-1.4.0/local/lib/libamd.a]" --with-spooles=1 --with-spooles-include=/Users/johannr/fenics-1.4.0/local/include --with-spooles-lib=/Users/johannr/fenics-1.4.0/local/lib/libspooles.a --with-ptscotch=1 --with-ptscotch-dir=/Users/johannr/fenics-1.4.0/local --with-ml=1 --with-ml-include=/Users/johannr/fenics-1.4.0/local/include/trilinos --with-ml-lib=/Users/johannr/fenics-1.4.0/local/lib/libml.dylib --with-hdf5=1 --with-hdf5-dir=/Users/johannr/fenics-1.4.0/local --with-x=0 -with-x11=0 --with-fortran=0 --with-shared-libraries=1 PETSC_DIR=/Users/johannr/fenics-1.4.0/fenics-superbuild/build-fenics/CMakeExternals/src/PETSc PETSC_ARCH=darwin13.2.0-cxx-opt
[0]PETSC ERROR: ------------------------------------------------------------------------
[0]PETSC ERROR: MatGetFactor() line 3876 in /Users/johannr/fenics-1.4.0/fenics-superbuild/build-fenics/CMakeExternals/src/PETSc/src/mat/interface/matrix.c
[0]PETSC ERROR: PCSetUp_LU() line 133 in /Users/johannr/fenics-1.4.0/fenics-superbuild/build-fenics/CMakeExternals/src/PETSc/src/ksp/pc/impls/factor/lu/lu.c
[0]PETSC ERROR: PCSetUp() line 832 in /Users/johannr/fenics-1.4.0/fenics-superbuild/build-fenics/CMakeExternals/src/PETSc/src/ksp/pc/interface/precon.c
[0]PETSC ERROR: KSPSetUp() line 278 in /Users/johannr/fenics-1.4.0/fenics-superbuild/build-fenics/CMakeExternals/src/PETSc/src/ksp/ksp/interface/itfunc.c
[0]PETSC ERROR: KSPSolve() line 402 in /Users/johannr/fenics-1.4.0/fenics-superbuild/build-fenics/CMakeExternals/src/PETSc/src/ksp/ksp/interface/itfunc.c
Traceback (most recent call last):
  File "linelastic.py", line 134, in <module>
    solve(A, u.vector(), b, 'petsc')
  File "/Applications/FEniCS.app/Contents/Resources/lib/python2.7/site-packages/dolfin/fem/solving.py", line 278, in solve
    return cpp.la_solve(*args)
  File "/Applications/FEniCS.app/Contents/Resources/lib/python2.7/site-packages/dolfin/cpp/la.py", line 4482, in la_solve
    return _la.la_solve(*args)
RuntimeError: 

*** -------------------------------------------------------------------------
*** DOLFIN encountered an error. If you are not able to resolve this issue
*** using the information listed below, you can ask for help at
***
***     fenics@fenicsproject.org
***
*** Remember to include the error message listed below and, if possible,
*** include a *minimal* running example to reproduce the error.
***
*** -------------------------------------------------------------------------
*** Error:   Unable to successfully call PETSc function 'KSPSolve'.
*** Reason:  PETSc error code is: 56.
*** Where:   This error was encountered inside /Users/johannr/fenics-1.4.0/fenics-superbuild/build-fenics/CMakeExternals/src/DOLFIN/dolfin/la/PETScLUSolver.cpp.
*** Process: unknown
*** 
*** DOLFIN version: 1.4.0
*** Git changeset:  3b6582dfb45139c906c13b9ad57395632a2090f4
*** -------------------------------------------------------------------------

>>>> A LONG DUMP...

--------------------------------------------------------------------------
mpirun noticed that process rank 2 with PID 59263 on node Senseis-MacBook-Pro.local exited on signal 6 (Abort trap: 6).
--------------------------------------------------------------------------

mwelland · Answer 1 · Jul 25, 2014

Hi,

Firstly, as far as I know, threading and mpi don't mix. Someone correct me if I'm wrong, but the

parameters["num_threads"] = 6

line is how many threads to use when assembling the matricies, not when solving the problem (via petsc). Since the latter is generally the more expensive task, I would suggest you comment out that line and focus on getting mpi working.

Next up, the error message is saying that it doesn't know how to use the 'petsc' solver to solve a matrix. That is because 'petsc' is not a solver, it is a way to access solvers. Switch 'petsc' for 'gmres' and that might do the trick.

Lastly, to get rid of those pesky warnings, use something like:

comm = mpi_comm_world()
mpiRank = MPI.rank(comm)

to get the local machine's rank.

Enjoy!

answered Jul 25, 2014 by mwelland FEniCS User (8,410 points)

To that point, I think you have to choose an mpi capable solver or it wont run in parallel anyway (defaults don't iirc)

commented Jul 25, 2014 by Charles FEniCS User (4,220 points)

Thanks for the pointers. I have actually two projects, one with MPI, one with threads (the same code). Still, I cannot get MPI to work, since it does not converge solving the problem, however, when using the same code with threads (supposedly I'm using threads), it converges:

Traceback (most recent call last):
  File "linelastic.py", line 124, in <module>
    solve(A, u.vector(), b, 'gmres')
  File "/Applications/FEniCS.app/Contents/Resources/lib/python2.7/site-packages/dolfin/fem/solving.py", line 278, in solve
    return cpp.la_solve(*args)
  File "/Applications/FEniCS.app/Contents/Resources/lib/python2.7/site-packages/dolfin/cpp/la.py", line 4482, in la_solve
    return _la.la_solve(*args)
RuntimeError: 

*** -------------------------------------------------------------------------
*** DOLFIN encountered an error. If you are not able to resolve this issue
*** using the information listed below, you can ask for help at
***
***     fenics@fenicsproject.org
***
*** Remember to include the error message listed below and, if possible,
*** include a *minimal* running example to reproduce the error.
***
*** -------------------------------------------------------------------------
*** Error:   Unable to solve linear system using PETSc Krylov solver.
*** Reason:  Solution failed to converge in 10000 iterations (PETSc reason DIVERGED_ITS, residual norm ||r|| = 7.359133e+02).
*** Where:   This error was encountered inside PETScKrylovSolver.cpp.
*** Process: unknown
*** 
*** DOLFIN version: 1.4.0
*** Git changeset:  3b6582dfb45139c906c13b9ad57395632a2090f4
*** -------------------------------------------------------------------------

commented Jul 28, 2014 by senseiwa FEniCS User (2,620 points)

So... how can I test MPI? :(

commented Jul 29, 2014 by senseiwa FEniCS User (2,620 points)

Does the single core (mulit-thread) code also use gmres?

It looks like the linear solver is having problems find a solution, which I think usually means you need a better preconditioner. Try adding these lines to the top of your code:

PETScOptions.set('pc_type', 'asm') 
PETScOptions.set('sub_pc_type', 'lu')
PETScOptions.set('pc_asm_overlap', '10')

which is a domain decomposition preconditioner called additive schwarz. As I understand it (probably simplistically) it breaks the problem into small chunks which are given to different cores. Each core then solves its own chunk with a 'lu' solver (direct solver). Then you stich everything together again and iterate on it until you get a global solution. It usually works pretty well for me. You can play with the overlap, and try changing from lu to ilu (iterative) too.

Hope this helps!

commented Jul 29, 2014 by mwelland FEniCS User (8,410 points)

Unfortunately, it doesn't :(

I am attaching the complete source, it's a simple linear elastic problem, and it works with SHARED=1, using threads. By the way, how can I be sure Fenics is actually using threads?

With MPI mpirun -np 4 python linelastic.py, Fenics dies horribly. I also tried as you can see gmres, but with MPI it dies.

from dolfin import *

# Do not output to file, just visualize
DEBUG = 1

# Components by components
COMPONENTS = 1

# Shared or distributed memory
SHARED = 0

# Constants
Young   = 30e6
Poisson = 0.25
Force   = 1.0
Width   = 7.0
Height  = 14.0
Mu      = Young / (2.0 * (1.0 + Poisson))
Lambda  = Young * Poisson / ((1.0 + Poisson) * (1.0 - 2.0 * Poisson))

# Plane Stress Elastic Coefficients
C1111 = Young/(1.0-Poisson*Poisson)
C1122 = C1111*Poisson 
C2222 = C1111
C1212 = C1111*(1.0-Poisson)*0.5

# Threads or MPI solver parameters
if SHARED:
    parameters["num_threads"] = 6
else:
    parameters["mesh_partitioner"] = "SCOTCH"

# Indices for sigma/epsilon tensors
i, j = indices(2)

# Floating point tolerance
Epsilon = 10e-7

# Create mesh and define function space (x0, y0)-(x1, y1), nx, ny
mesh = RectangleMesh(0, 0, Width, Height, 10, 20)

# 1st order Lagrange shape functions
V = VectorFunctionSpace(mesh, 'CG', 1)

# Displacement function
u = TrialFunction(V)

# Test function
v = TestFunction(V)

# Strain
def eps(v):
    if COMPONENTS:
        return as_tensor((Dx(v[i],j)+Dx(v[j],i))*0.5, [i,j])
    return sym(grad(v))

# Stress
def sigma(v):
    if COMPONENTS:
        return as_tensor([[C1111*eps(v)[0,0]+C1122*eps(v)[1,1], C1212*eps(v)[0,1]], [C1212*eps(v)[1,0], C2222*eps(v)[1,1]+C1122*eps(v)[0,0]]])
    return 2.0 * Mu * eps(v) + Lambda * tr(eps(v)) * Identity(v.geometric_dimension())

# Bulk Force
f = Constant((0, 0))

# Set up boundary conditions
u0 = Constant((0.0, 0.0))

def on_bottom(x, on_boundary):
    return on_boundary and abs(x[1]) < Epsilon

# Force to be applied to the top
force_vector = Expression(("0", "1000000"))

# Now let's move to the boundary
boundary_parts = MeshFunction("size_t", mesh, mesh.topology().dim() - 1)

# The top boundary class
class TopBoundary(SubDomain):
    def inside(self, x, on_boundary):
        return on_boundary and abs(x[1] - Height) < Epsilon

# Mark the top boundary with 0
NeumannBoundary = TopBoundary()
NeumannBoundary.mark(boundary_parts, 1)

# Dirichlet condition on the bottom boundary
bcBottom = DirichletBC(V, u0, on_bottom)
bcs      = [bcBottom]

# Variational form
a = inner(sigma(u), eps(v)) * dx
L = inner(f, v) * dx + 1*inner(force_vector, v) * ds(1)

# Compute solution
A = assemble(a, exterior_facet_domains = boundary_parts)
b = assemble(L, exterior_facet_domains = boundary_parts)
for condition in bcs: condition.apply(A, b)

#list_linear_solver_methods()
if SHARED:
    print 'Thread #'  #, thread.get_ident()????
else:
    comm = mpi_comm_world()
    mpiRank = MPI.rank(comm)
    print 'MPI PROCESS RANK ', mpiRank

# Check if it works with shared memory and some options
PETScOptions.set('pc_type', 'asm') 
PETScOptions.set('sub_pc_type', 'lu')
PETScOptions.set('pc_asm_overlap', '10')

# SOLVE!
u = Function(V)
if SHARED:
    solve(A, u.vector(), b, 'lu')
else:
    solve(A, u.vector(), b, 'gmres') # PETSC crashes

print
print '=================================================================================='
print

# Use discontinuous functions
V_2D = TensorFunctionSpace(mesh, "CG", 1)
V_1D = VectorFunctionSpace(mesh, "CG", 1)
V_0D = FunctionSpace(mesh, "Discontinuous Lagrange", 0)

# Define the norm of the displacement
def u_norm(v):
    return sqrt(v[0] ** 2 + v[1] ** 2)

def u_sinn(v):
    return sin(u_norm(v))

# Stress
stress = project(sigma(u), V = V_2D)

# Norm
norm_u = project(u_norm(u), V = V_0D)

# Max
maxx_s = project(sigma(u)[1, 1], V = V_0D)

# Sin
sinn_u = project(u_sinn(u), V = V_0D)

# Eps
eps_uu = project(eps(u), V = V_2D)

# Processor or thread
if SHARED:
    cpu_id = project(1, V = V_0D)
else:
    cpu_id = project(MPI.rank(MPI_Comm), V = V_0D)

print '> Max norm of displacement: ', norm_u.vector().max()
print '> Max stress sigma_{1, 1} : ', maxx_s.vector().max()
print '> Strain in (w/2, h/2) 1,1: ', eps_uu(Width/2, Height/2)

# Interpolate on a finer mesh
finerMesh = RectangleMesh(0, 0, Width, Height, 100, 200)
new_u = project(u_norm(u), V = V_0D)
print '> Max norm of displacement: ', new_u.vector().max()

# Rename the displacement
u.rename("Displacement", "label")

print
print '=================================================================================='
print

if not DEBUG:
    # Save displacement
    File("linelastic_displacement.pvd", "compressed") << u

    # Save stress
    File("linelastic_stress.pvd", "compressed") << stress

    # Save displacement norm
    File("linelastic_norm.pvd", "compressed") << norm_u

    # Save strain
    File("linelastic_strain.pvd", "compressed") << eps_uu

    # Save displacement & Stress
    File("linelastic.pvd", "compressed") << u, norm_u

    if MPI.size(mesh.mpi_comm()) > 1:
        File("partitions.pvd") << CellFunction("size_t", mesh, MPI.rank(mesh.mpi_comm()))

else:
    #plot(mesh, title="Finite element mesh")
    #plot(u[1], axes = True, title = "Displacement Norm")
    plot(stress[1, 1], axes = True, title = 'Sigma_11')
    #plot(eps(u)[1, 1], mode = 'displacement', axes = True, title = 'Sin |u|')
    #plot(cpu_id)
    interactive()

commented Aug 7, 2014 by senseiwa FEniCS User (2,620 points)

I don't have experience with the discontinuous elements, so I removed the second part of your code and it ran just fine on my ubuntu package, and custom linux build (up to 64 cores). Does your CG element solver work correctly? For me GMRES and lu worked fine in parallel (MUMPS).

Re: DG, A recent post by G Wells on the news feed might be relevant:

"Garth Wells @garth_wells
DG is now supported in parallel in DOLFIN. Code is in the development branch and will be part of the next release. #fenicsnews"

Maybe you will have to build the dev version to get your DG working.

commented Aug 7, 2014 by mwelland FEniCS User (8,410 points)

Thanks mwelland, can you post the code you run? Or at least, your modifications?

(sorry I was late answering)

commented Aug 19, 2014 by senseiwa FEniCS User (2,620 points)

IT really is your code up to the first "================================" line:

On our supercomputer, we don't have scotch so I commented that line out in the code below. I also ran it on my local ubuntu (from the repo) which did run fine with scotch.

Are you still getting a PETSc crash?

from dolfin import *

# Do not output to file, just visualize
DEBUG = 1

# Components by components
COMPONENTS = 1

# Shared or distributed memory
SHARED = 0

# Constants
Young   = 30e6
Poisson = 0.25
Force   = 1.0
Width   = 7.0
Height  = 14.0
Mu      = Young / (2.0 * (1.0 + Poisson))
Lambda  = Young * Poisson / ((1.0 + Poisson) * (1.0 - 2.0 * Poisson))

# Plane Stress Elastic Coefficients
C1111 = Young/(1.0-Poisson*Poisson)
C1122 = C1111*Poisson 
C2222 = C1111
C1212 = C1111*(1.0-Poisson)*0.5

# Threads or MPI solver parameters
if SHARED:
    parameters["num_threads"] = 6
else:
    pass
    #parameters["mesh_partitioner"] = "SCOTCH"

# Indices for sigma/epsilon tensors
i, j = indices(2)

# Floating point tolerance
Epsilon = 10e-7

# Create mesh and define function space (x0, y0)-(x1, y1), nx, ny
mesh = RectangleMesh(0, 0, Width, Height, 10, 20)

# 1st order Lagrange shape functions
V = VectorFunctionSpace(mesh, 'CG', 1)

# Displacement function
u = TrialFunction(V)

# Test function
v = TestFunction(V)

# Strain
def eps(v):
    if COMPONENTS:
        return as_tensor((Dx(v[i],j)+Dx(v[j],i))*0.5, [i,j])
    return sym(grad(v))

# Stress
def sigma(v):
    if COMPONENTS:
        return as_tensor([[C1111*eps(v)[0,0]+C1122*eps(v)[1,1], C1212*eps(v)[0,1]], [C1212*eps(v)[1,0], C2222*eps(v)[1,1]+C1122*eps(v)[0,0]]])
    return 2.0 * Mu * eps(v) + Lambda * tr(eps(v)) * Identity(v.geometric_dimension())

# Bulk Force
f = Constant((0, 0))

# Set up boundary conditions
u0 = Constant((0.0, 0.0))

def on_bottom(x, on_boundary):
    return on_boundary and abs(x[1]) < Epsilon

# Force to be applied to the top
force_vector = Expression(("0", "1000000"))

# Now let's move to the boundary
boundary_parts = MeshFunction("size_t", mesh, mesh.topology().dim() - 1)

# The top boundary class
class TopBoundary(SubDomain):
    def inside(self, x, on_boundary):
        return on_boundary and abs(x[1] - Height) < Epsilon

# Mark the top boundary with 0
NeumannBoundary = TopBoundary()
NeumannBoundary.mark(boundary_parts, 1)

# Dirichlet condition on the bottom boundary
bcBottom = DirichletBC(V, u0, on_bottom)
bcs      = [bcBottom]

# Variational form
a = inner(sigma(u), eps(v)) * dx
L = inner(f, v) * dx + 1*inner(force_vector, v) * ds(1)

# Compute solution
A = assemble(a, exterior_facet_domains = boundary_parts)
b = assemble(L, exterior_facet_domains = boundary_parts)
for condition in bcs: condition.apply(A, b)

#list_linear_solver_methods()
if SHARED:
    print 'Thread #'  #, thread.get_ident()????
else:
    comm = mpi_comm_world()
    mpiRank = MPI.rank(comm)
    print 'MPI PROCESS RANK ', mpiRank

# Check if it works with shared memory and some options
PETScOptions.set('pc_type', 'asm') 
PETScOptions.set('sub_pc_type', 'lu')
PETScOptions.set('pc_asm_overlap', '10')

# SOLVE!
u = Function(V)
if SHARED:
    solve(A, u.vector(), b, 'lu')
else:
    solve(A, u.vector(), b, 'gmres') # PETSC crashes

commented Aug 19, 2014 by mwelland FEniCS User (8,410 points)

Going parallel/distributed

1 Answer