Assemble system in parallel and store complete sparse matrix

Question

Is there any nice way in FEniCS to assemble systems IN PARALLEL and store the complete (not partioned) assembled sparse matrix? In serial it's no problem but it becomes very slow for large systems. Some workaround like in https://fenicsproject.org/qa/435/read-and-write-matrix-in-dolfin using

A = PETScMatrix()
A.binary_dump()

would be fine for me if it would be possible to load this dumped matrix afterwards in serial.

chris_richardson · Answer 1 · Apr 21, 2017

You probably want to use PETSc directly. You can get the PETSc Mat from dolfin with

A.mat()

http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Mat/MatView.html

answered Apr 21, 2017 by chris_richardson FEniCS Expert (31,740 points)

Thanks.

I'm already using the .mat() functionality in serial. But how can I store the distributed matrix in parallel? I don't want to store one part of the matrix per process.

I have 2 ideads in mind but I'm not sure if one of them works:

1st: initialize PETScMatrix with mpi_comm_self()

A = PETScMatrix(mpi_comm_self())
... assemble in A ...

In this case every process should store the complete matrix? Will the assembly be still done in parallel and can I store the matrix from e.g. process 0 ?

2nd: Is there any way to assemble in a distributed PETScMatrix and maybe broadcast the complete matrix to one process after assembly?(pseudocode):

A = distirbuted matrix
for proc in n_processes:
    if proc == mpi rank()
        A_complete = broadcast(A, root=0)

Is it possible to do something like this?

commented Apr 24, 2017 by RR FEniCS User (3,330 points)

I thought that PETSc MatView could deal with parallel matrices.

commented Apr 24, 2017 by chris_richardson FEniCS Expert (31,740 points)

Sorry, I'm not familiar with the PETSc syntax. maybe this little example shows what I want to do: I want to have the following code snippet running in parallel (print examples for mpirun -n 2):

A = PETScMatrix()    
b = PETScVector()
assemble_system(lhs,rhs,BC, x0=None, form_compiler_parameters=None,add_values=False, finalize_tensor=True,  keep_diagonal=False, A_tensor=A, b_tensor=b)
Amat = A.mat()
r,c = Amat.getSize()
rptr,cind,vals = Amat.getValuesCSR()

All mpi procs know the complete size of A

print(r, c)   

>>> (7930, 7930)
>>> (7930, 7930)

But if I look at the OwnershipRanges, it's obvious that the matrix is distributed:

print(Amat.getOwnershipRanges())

>>> [   0 3955  7930]
>>> [   0 3955  7930]

Now I could save the matrix in serial as follows, but of course this is not possible anymore if the matrix is distibuted, since "rptr,cind,vals" are local values on the different processes.

dump_csr('Unit_Cube_mat', vals, cind, rptr, r)  #  my save function, needs complete matrix on one process

I hoped there is something easier than going for a new PETScMatrix(mpi_comm_self()) on proc 0 and copying all the values from the parallel matrix (depending on the global OwnershipRange) to the new one.

commented Apr 24, 2017 by RR FEniCS User (3,330 points)

Assemble system in parallel and store complete sparse matrix

1 Answer