list_timings() output; build sparsity; bad parallel work balance

The pictures show 2 simulations of the same program but with different number of processes (256 and 1024). Solving a poisson eq on a unit square with around 80 million cells.
You can see the distribution of each internal dolfin Timer. In some cases there are big differences between the min and max time to handle one task / timer.

I'm wondering about the behaviour of "build sparsity" and "init tensor":
In some simulations there is the work well balanced, so that there is nearly no difference. But in some over cases the distribution of the "build sparsity" timer influences the distribution of the "init tensor" timer. You can these two behaviours in the following plots.
Are the any explanations for this phenomenon??

https://www.dropbox.com/sh/du2y46hgq6o578r/AACQvxUIAchx9QpKr2MMU7Z7a/list_timings_256_256_1_box.png?dl=0

https://www.dropbox.com/sh/du2y46hgq6o578r/AAB5bvKyD1uGvjSuqzh8fn3_a/list_timings_256_1024_4_box.png?dl=0

Thomas

list_timings() output; build sparsity; bad parallel work balance

1 Answer