Solving linear system in parallel

Question

Hi,

I implemented an iterative method in c++ using Fenics for the assembly of some system/mass matrices. In each iteration I have to solve several times the linear system

K^(-1) x = y

which is realised by

solve(K, y, x, "gmres", "icc")

Here K is the system matrix for the Laplacian, which is constant all time and is just assembled one time of course. The solver and preconditioner are found to be very fast in 1D, for 2D I use bicgstab with hypre_amg.

Now by running some bigger computations I noticed that Fenics only uses 1 core (with 100%) which is of course not reasonable. I looked at the parallel tutorials but they are only for solving the PDE only one time.
By simply starting my program with mpirun I run into an error of course.

Is there any method to only solve Kx = y for K being a given matrix and y being a given vector in parallel and then continue in serial?

Same question holds for the multiplication of matrices which can be done perfectly in parallel. Can I just send a parameter to the linear algebra backend, telling it to compute in parallel?

Regards

Kokett · Answer 1 · May 11, 2016

I tried setting up a direct parallel solver, with PETSc and MUMPS:

PETScLUSolver Solver("mumps")
Solver.solve(*K, y, x)

but it still uses only one core.

Maybe let me reformulate my question:
Is there any way to implement a parallel solver (I would like to have parallel bicgstab + hypre_amg) in FeniCs without using MPI?

Edit: Or maybe there is a way with MPI? Still i have no idea how to implement this and how to communicate between the threads.

answered May 11, 2016 by Kokett FEniCS Novice (300 points)
edited May 11, 2016 by Kokett

Hey. I have been trying to solve navia stokes in parallel with bicgstab/grmes/cg + hyper_amg (since hypre_parasails does not work for somereason) but so far had no luck getting a solution Its seems to work fine with simple poisson however so I believe the issue is mesh/preconditioner selection)
It does iterate and convrge though.

Here is a snippit of the code to configure it Also make sure to put this at start of the code. (regarding mpirun it handles all the communication between threads for you so you don't have to deal with that, as far as I understand)
parameters["ghost_mode"] = "shared_facet"

Then just run it with "mpirun -n numberofcores python yourscript.py"

# Compute solution
problem = NonlinearVariationalProblem(F, u_, bcs, J)
solver  = NonlinearVariationalSolver(problem)
#paramters['linear_algebra_backend']= 'PETSc'

prm = solver.parameters
#prm['linear_algebra_backend']= 'PETSc'
prm['newton_solver']['absolute_tolerance'] = 1E-5
prm['newton_solver']['relative_tolerance'] = 1E-5
prm['newton_solver']['maximum_iterations'] = 10000
prm['newton_solver']['relaxation_parameter'] = 1.0
#paramters['linear_algebra_backend']= 'PETSc'
#prm['newton_solver']['linear_solver'] = "cg"
#prm['newton_solver']['linear_solver'] = "mumps"
if True :
    prm['newton_solver']['linear_solver'] = 'bicgstab'
    #prm['newton_solver']['linear_solver'] = "cg"
    #prm['newton_solver']['linear_solver'] = 'gmres'
    prm['newton_solver']['preconditioner'] = 'hypre_amg'
    prm['newton_solver']['krylov_solver']['absolute_tolerance'] = 1E-6
    prm['newton_solver']['krylov_solver']['relative_tolerance'] = 1E-6
    prm['newton_solver']['krylov_solver']['maximum_iterations'] = 100
    prm['newton_solver']['krylov_solver']['monitor_convergence'] = True
    prm['newton_solver']['krylov_solver']['nonzero_initial_guess'] = True
    prm['newton_solver']['krylov_solver']['gmres']['restart'] = 40

commented May 11, 2016 by soviet911 FEniCS Novice (180 points)

Thanks for your answer, but the problem is: I dont want to apply a mesh-partitionier. Maybe i give a little bit more insight into my code structure (pseudo code):

int main()
{
// Do Stuff in serial 
set up grid
set up functional spaces and auxiliary variables
assemble System matrix K

while iteration < iter_max
{

assemble mass matrix M1, M2

// do stuff in parallel
solve Ky = x
y = M1*M2*y
// end parallel computation

//continue things in serial


iteration++
}

}

If i run it with MPI - assume with a mesh-partitionier - the multiplication produces some errors, so my aim is to solve only the system Ky=x in parallel, and when its done continue in serial.

I'm not sure if this is even possible with Fenics.

commented May 12, 2016 by Kokett FEniCS Novice (300 points)

Hmm. I don't have time to test right now but I had success with useing MPI.Rank to force only one process perform a given task. Not sure if it will work for your applicaiton but you can do the following:

 if MPI.rank(mpi_comm_world())==0:
            wdata=open(resultvarName,'ab')
            dfile=csv.writer(wdata,dialect='excel')
            dfile.writerow([flowRate,u_flow,flux,avgTemp,outTemp,voltage,current,inU,outU,     (time.time()-solverStart)/60.0])
            wdata.close()

This is how I save data I got from integration over a surface (such as average tempreture etc) The integrations and computation is done all in parallel, but the answers are all coupled IE, when you integrate over a surface with 8 process all of them will have access to same solution, even though I think it splits up integration into 8 parts, and then sums it and returns complete answer to each process.
I am not sure what you mean multiplication produces some errors? but if you use the aproach above you need to make sure all process will have access to solved matrix. Apperntly you can use Bcast( data) to send solution to all process.. not sure but look at following PPT, slide 11 MPI Basics. (http://www.math.pitt.edu/~sussmanm/3040Summer14/fenicsI-4up.pdf)

I think you will need to use Bcast and Recv to make sure that all process have to same matrices after you do the math in serial.

Then the psudocode would be
}

}


set up grid
set up functional spaces and auxiliary variables


if (MPI.rank(mpi_comm_world())==0)
{matrix = assemble System matrix K
    comm.Bcast(matrix)}
matrix=comm.Recv(matrix)
while iteration < iter_max
{ if (MPI.rank(mpi_comm_world())==0)
{matrix = assemble System matrix M1,M2
    comm.Bcast(matrix)}
M1,M2=comm.Recv(matrix)
    y = M1*M2*y
    // end parallel computation
if (MPI.rank(mpi_comm_world())==0)
{//continue things in serial}


iteration++
}

}

This is a rather nasty implementation, so I think it might be better if you tell use whats the multiplication error?

commented May 12, 2016 by soviet911 FEniCS Novice (180 points)

Up to now i was not able to implement this code for a simple example, but I am a beginner to MPI. I am still a little bit confused that MPI.h does not contain a receive command...

Here is a little bit more details of my code to see the overall structure of my code and where the error occurs:

I mainly work with two classes:

1) PDEHandler: contains the Functionspace, system and mass matrix of my bilinear form
2) NewtonSolver: mainly contains a CG-method to solve a Newton-system of the type
M1 K^(-1) M2 K^(-1) x = M1 K^(-1) y
where M1 is newly assembled in each iteration

For applying a CG method i have to solve K(-1) x = y.

Here is the code (shorted): I work with Pointers to easily refine the mesh. And I know, i should use shared_ptr...

int main()
{
// Initialize mesh
Mesh* mesh=MakeMesh(Case,h);
// initialize PDEHandler
PDEHandler* PDESolve = new PDEHandler(*mesh, Case);
// initialize NewtonSolver
SemiSmoothNewtonSolver* NewtonSolver = new SemiSmoothNewtonSolver( PDESolve->V,  );

// define Functions
Function* z = new Function(PDESolve->V);
Function* lambda = new Function(PDESolve->V);
Function* adp = new Function(PDESolve->V);

// exact solution (if known - else = 0)
Function* p_ex = new Function(PDESolve->V);
Function* u_ex = new Function(PDESolve->V);
Function* y_ex = new Function(PDESolve->V);


// computed solution
Function* p = new Function(PDESolve->V);
Function* u = new Function(PDESolve->V);
Function* y = new Function(PDESolve->V);

// Initialize test problem
InitializeTestProblem(*p_ex,*u_ex,*y_ex,*z, PDESolve);

// iterate
for (unsigned int k = 1; k < max_iter; k++)
{

    // Solve subproblem
    NewtonSolver->Solve(*u,*lambda,*z,*adp,*p,k);


    // update lambda - this is already done in the newton method
    *lambda->vector() = *adp->vector();


    // approx. error - if exact solution is known
    ErrorNormFunctional E1(*mesh, *u_ex, *u);
    arr_error.push_back( assemble(E1) );

    // optimality
    ErrorNormFunctional E2(*mesh, temp2, *u);
    double opt = assemble(E2);

}


delete mesh;
return 0;
}

The solve Method of my Newton-Solver is a bit longer, so i just post the assembling of the right handed side of the Newton-equation:

void SemiSmoothNewtonSolver::ComputeRHS(Vector& rhs, const Function& lambda, const Function& z, Vector& g, const int k)
{

// compute g
temp1 = pua;
pMua->mult(temp1, g); //g = Mua * pua;

temp1 = pub;
pMub->mult(temp1, temp2); // temp2 = Mub * pub;

// g = Mua*pua + Mub*pub
g += temp2;

MultKinv(g,temp2);                      // temp2 = K^(-1)*g
temp2 -= *z.vector();                   // temp2 = K^(-1)*g - z
PDESolve->Mdir.mult(temp2,temp1);       // temp1 = M ( K^(-1)*g -z)
MultKinv(temp1,temp2);                  // temp2 = K^(-1 )M ( K^(-1)*g -z)
pMinact->mult(temp2,temp1);             // temp1 = MI K^(-1) M ( K^(-1)*g -z)
temp1 *= -1.0/alpha(k);                 // temp1 = -1/alpha * MI K^(-1) M ( K^(-1)*g -z)
pMinact->mult(*lambda.vector(), rhs);   // rhs = MI*lambda

rhs += temp1;                           // rhs = MI*lambda -1/alpha * MI K^(-1) M ( K^(-1)*g -z)

}

where the multiplication with Kinv = K^(-1) is performed with the solve-Command.

If i run this code (which runs fine in serial) with mpi, it produces a run-time error at the first multiplication pMub->mult(temp1, temp2); complaining:

PETSC ERROR: Nonconforming object sizes,
PETSC ERROR: Mat mat,Vec y: local dim 233 241!

I think this is due to the partioning of the mesh and the resulting different matrix dimensions.

commented May 13, 2016 by Kokett FEniCS Novice (300 points)

Have you tried adding the following parameter at start of your code? (before you build the mesh?) I had same issue when trying to set different materials with DG space in parallel.

parameters["ghost_mode"] = "shared_facet";

commented May 13, 2016 by soviet911 FEniCS Novice (180 points)

Hi,

this produces the same error.

commented May 13, 2016 by Kokett FEniCS Novice (300 points)

I inadvertently added a new answer instead of a reply. Sry about that. Lets continue discussing there.

commented May 14, 2016 by Kokett FEniCS Novice (300 points)

Kokett · Answer 2 · May 14, 2016

I found the source of the error - though I have no solution yet.

The problem is the following: In my class NewtonSolver I initialise some help-Vectors of the following form with a init-list

SemiSmoothNewtonSolver(... int len):
    help(MPI_COMM_WORLD, len)
{
....
}

where len is the (global) number of DOFs. I though this was a good idea to avoid reinitialisation of my help-vectors in each iteration. Here is a minimal working example (taken from the Poisson example file) to reconstruct the error. Works fine in serial, fails in parallel:

class mySolver
{
public:
    Matrix* K;

    void sol(GenericVector& u, int len)
    {
            Vector b(MPI_COMM_WORLD, len);

            b = 1;

            //each of the following lines produces some errors
            K->mult(b,u);
            solve(*K, u, b, "bicgstab", "hypre_amg");
    }
};


int main()
{
 // Create mesh and function space
 UnitSquareMesh mesh(32, 32);
 Poisson::FunctionSpace V(mesh);

// Define boundary condition
Constant u0(0.0);
DirichletBoundary boundary;
DirichletBC bc(V, u0, boundary);

// Define variational forms
Poisson::BilinearForm a(V, V);
Poisson::LinearForm L(V);

Source f;
dUdN g;
L.f = f;
L.g = g;


// Compute solution
Function u(V);

Matrix K;
Vector b;

assemble(K, a);
assemble(b,L);
bc.apply(K,b);


mySolver NewtonSolver;
NewtonSolver.K = &K;


NewtonSolver.sol(*u.vector(), b.size());
//solve(K, *u.vector(), b, "bicgstab", "hypre_amg");


// Plot solution
plot(u);
interactive();

return 0;
}

How can i create a vector in each thread with 1s in of the correct size, such that the above multiplication and solve command is working?

Okay here is a workaround (not very nice but working):

I assemble a vector with a LinearForm and use this vector to init my help-Vectors, so that the specific structure is copied:

Vector vec_init; assemble(vec_init, PoissonLinearForm); [...] SemiSmoothNewtonSolver(..., Vector& _init): help(_init) { ... }

Now (nearly) everything works perfectly in parallel. Still some problems left, but I will figure that out. Thanks for your help! — Kokett, May 15, 2016

Solving linear system in parallel

2 Answers