Hi all,
I plan on running some of my code in parallel and want to do a strong scaling study. Our research group's cluster - which has 80 nodes that each contain 20 Intel Xeon E5-2680v2 2.8 GHz cores with the SLURM job scheduler. I modified the demo_poisson.py code to solve what I believe to be a decently large problem - 1,002,001 vertices and 2 million cells. Here's the code:
from dolfin import *
mesh = UnitSquareMesh(1000,1000)
V = FunctionSpace(mesh, "Lagrange", 1)
def boundary(x):
return x[0] < DOLFIN_EPS or x[0] > 1.0 - DOLFIN_EPS
u0 = Constant(0.0)
bc = DirichletBC(V, u0, boundary)
u = TrialFunction(V)
v = TestFunction(V)
f = Expression("10*exp(-(pow(x[0] - 0.5, 2) + pow(x[1] - 0.5, 2)) / 0.02)")
g = Expression("sin(5*x[0])")
a = inner(grad(u), grad(v))*dx
L = f*v*dx + g*v*ds
u = Function(V)
set_log_level(DEBUG)
solve(a == L, u, bc)
And the shell script file i used to submit the job (not sure if this is necessary information but here it is anyway):
#!/bin/bash
#SBATCH -J test
#SBATCH -o test.txt
#SBATCH -n 4
#SBATCH -N 1
echo "============="
echo "1 MPI process"
echo "============="
mpirun -np 1 python demo_poisson.py
echo "============="
echo "2 MPI process"
echo "============="
mpirun -np 2 python demo_poisson.py
echo "============="
echo "4 MPI process"
echo "============="
mpirun -np 4 python demo_poisson.py
And here's the output:
=============
1 MPI process
=============
Solving linear variational problem.
Matrix of size 1002001 x 1002001 has 7006001 (0.000697805%) nonzero entries.
Elapsed time: 0.34781 (Build sparsity)
Elapsed time: 0.0301769 (Init tensor)
Elapsed time: 9.53674e-07 (Delete sparsity)
---
Elapsed time: 1.22476 (Assemble cells)
Elapsed time: 0.046319 (Apply (PETScMatrix))
Elapsed time: 0.00504494 (Build sparsity)
Elapsed time: 8.4877e-05 (Apply (PETScVector))
Elapsed time: 0.00675416 (Init tensor)
Elapsed time: 9.53674e-07 (Delete sparsity)
Elapsed time: 3.40939e-05 (Apply (PETScVector))
---
Elapsed time: 1.06944 (Assemble cells)
Elapsed time: 0.072017 (Assemble exterior facets)
Elapsed time: 0.000118971 (Apply (PETScVector))
Computing sub domain markers for sub domain 0.
---
Elapsed time: 1.45284 (DirichletBC init facets)
Elapsed time: 1.45608 (DirichletBC compute bc)
Applying boundary conditions to linear system.
Elapsed time: 9.98974e-05 (Apply (PETScVector))
Elapsed time: 0.00995994 (Apply (PETScMatrix))
Elapsed time: 1.47875 (DirichletBC apply)
Solving linear system of size 1002001 x 1002001 (PETSc LU solver, (null)).
Elapsed time: 12.6462 (PETSc LU solver)
Elapsed time: 12.6465 (LU solver)
WARNING! There are options you set that were not used!
WARNING! could be spelling mistake, etc!
Option left: name:-mat_superlu_dist_colperm value: MMD_AT_PLUS_A
=============
2 MPI process
=============
...
Matrix of size 1002001 x 1002001 has 3513181 (0.000349916%) nonzero entries.
Diagonal: 3506776 (99.8177%), off-diagonal: 1593 (0.0453435%), non-local: 4812 (0.13697%)
Matrix of size 1002001 x 1002001 has 3500295 (0.000348633%) nonzero entries.
Diagonal: 3493879 (99.8167%), off-diagonal: 1595 (0.0455676%), non-local: 4821 (0.137731%)
Elapsed time: 0.221062 (Build sparsity)
Elapsed time: 0.215639 (Build sparsity)
Elapsed time: 0.024085 (Init tensor)
Elapsed time: 9.53674e-07 (Delete sparsity)
Elapsed time: 0.024086 (Init tensor)
Elapsed time: 1.90735e-06 (Delete sparsity)
Elapsed time: 0.539708 (Assemble cells)
---
Elapsed time: 0.722365 (Assemble cells)
Elapsed time: 0.259871 (Apply (PETScMatrix))
Elapsed time: 0.0276511 (Apply (PETScMatrix))
Elapsed time: 0.00205278 (Build sparsity)
Elapsed time: 0.00235796 (Build sparsity)
Elapsed time: 0.000106812 (Apply (PETScVector))
Elapsed time: 0.00275898 (Init tensor)
Elapsed time: 0 (Delete sparsity)
Elapsed time: 5.60284e-05 (Apply (PETScVector))
Elapsed time: 0.00246716 (Init tensor)
Elapsed time: 0 (Delete sparsity)
Elapsed time: 6.00815e-05 (Apply (PETScVector))
Elapsed time: 3.98159e-05 (Apply (PETScVector))
Elapsed time: 0.339929 (Assemble cells)
Elapsed time: 0.0274351 (Assemble exterior facets)
Elapsed time: 0.513431 (Assemble cells)
Elapsed time: 0.0339222 (Assemble exterior facets)
Elapsed time: 0.18015 (Apply (PETScVector))
Elapsed time: 0.000181913 (Apply (PETScVector))
Computing sub domain markers for sub domain 0.
Computing sub domain markers for sub domain 0.
---
Elapsed time: 0.660405 (DirichletBC init facets)
Elapsed time: 0.661449 (DirichletBC compute bc)
Applying boundary conditions to linear system.
---
Elapsed time: 0.815914 (DirichletBC init facets)
Elapsed time: 0.816927 (DirichletBC compute bc)
Applying boundary conditions to linear system.
Elapsed time: 0.155571 (Apply (PETScVector))
Elapsed time: 9.799e-05 (Apply (PETScVector))
Elapsed time: 0.00872207 (Apply (PETScMatrix))
Elapsed time: 0.834597 (DirichletBC apply)
Elapsed time: 0.00855994 (Apply (PETScMatrix))
Elapsed time: 0.836372 (DirichletBC apply)
Solving linear system of size 1002001 x 1002001 (PETSc LU solver, (null)).
Elapsed time: 9.50418 (PETSc LU solver)
Elapsed time: 9.50418 (PETSc LU solver)
Elapsed time: 9.5045 (LU solver)
Elapsed time: 9.50451 (LU solver)
WARNING! There are options you set that were not used!
WARNING! could be spelling mistake, etc!
Option left: name:-mat_superlu_dist_colperm value: MMD_AT_PLUS_A
=============
4 MPI process
=============
...
Matrix of size 1002001 x 1002001 has 1743023 (0.000173607%) nonzero entries.
...
Applying boundary conditions to linear system.
Elapsed time: 9.48906e-05 (Apply (PETScVector))
Elapsed time: 0.0143411 (Apply (PETScVector))
Elapsed time: 0.0473421 (Apply (PETScVector))
Elapsed time: 0.012908 (Apply (PETScVector))
Elapsed time: 0.00383282 (Apply (PETScMatrix))
Elapsed time: 0.381625 (DirichletBC apply)
Elapsed time: 0.003865 (Apply (PETScMatrix))
Elapsed time: 0.38162 (DirichletBC apply)
Elapsed time: 0.00385189 (Apply (PETScMatrix))
Elapsed time: 0.00387883 (Apply (PETScMatrix))
Elapsed time: 0.381851 (DirichletBC apply)
Elapsed time: 0.381871 (DirichletBC apply)
Solving linear system of size 1002001 x 1002001 (PETSc LU solver, (null)).
Elapsed time: 6.83441 (PETSc LU solver)
Elapsed time: 6.83475 (LU solver)
Elapsed time: 6.83441 (PETSc LU solver)
Elapsed time: 6.83475 (LU solver)
Elapsed time: 6.83441 (PETSc LU solver)
Elapsed time: 6.83475 (LU solver)
Elapsed time: 6.83442 (PETSc LU solver)
Elapsed time: 6.83475 (LU solver)
WARNING! There are options you set that were not used!
WARNING! could be spelling mistake, etc!
Option left: name:-mat_superlu_dist_colperm value: MMD_AT_PLUS_A
Based on these solve times, I seem to have extremely bad scaling - speedups of 1.33 and 1.85 for 2 and 4 processors respectively doesn't seem very efficient, especially for a problem with 1 million degrees of freedom. The metrics don't get any better when I increase the size of the problem. Am I missing something here, or is this expected?
Thanks,
Justin
PS - btw I had to clip a lot of information (denoted by "...") due to character limitation so tell me if more information is needed