I am interested in solving the Stokes equations with fluxes constraints on the boundary of say, a square domain, like $\int_e v[0] = C$ for some edge $e$. For this I use a mixed formulation, with velocity, pressure and a Lagrange multiplier. The latter is a constant function on the whole domain: FunctionSpace(mesh, 'R', 0)
.
Doing so, I have noticed that this severely impact performance of assembly (time) and resolution of the system (time + memory), compared with 'strong' Dirichlet BCs. And looking for a base case, I came up with the following problem : solve the Laplace equation on a square with homogeneous Neumann BCs, with a zero mean constraint on the solution. The snippet below looks at assembly of the $\int \nabla u \cdot \nabla v$ term only.
# coding: utf-8
from dolfin import *
from time import clock
# track assembly progress, as suggested by debmukh
set_log_level(PROGRESS)
N = 600
side_length = 10
mesh = UnitSquareMesh(N, N)
mesh.coordinates()[:] = mesh.coordinates()*side_length
V = FunctionSpace(mesh, 'P', 1)
L = FunctionSpace(mesh, 'R', 0)
u = TrialFunction(V)
v = TestFunction(V)
tic = clock()
assemble(inner(grad(u), grad(v))*dx)
elapsed_simple = clock() - tic
W = MixedFunctionSpace([V, L])
u, p = split(TrialFunction(W))
v, q = split(TestFunction(W))
tic = clock()
assemble(inner(grad(u), grad(v))*dx)
elapsed_mixed = clock() - tic
slow = elapsed_mixed/elapsed_simple
dofs = V.dofmap().global_dimension()
print 'Simple assembly time: {0}s'.format(elapsed_simple)
print ' Mixed assembly time: {0}s (x{1})'.format(elapsed_mixed, int(slow))
print ' Slow down / #DOFs² = {0}'.format(slow/dofs**2)
To assemble the Dirichlet form in the snippet above, in which the Lagrange multiplier is not involved in any way apart from the function space, the time required with the mixed function space is ~300 times larger than with the simple function space (250s up from 0.7). Then it seems that assembling the stiffness matrix by hand, i.e. by assembling the blocks $\int_\Omega \nabla u \cdot \nabla v\; dx$ and $\int_e u\, v\; ds$ individually and gluing them together afterwards, would be much faster.
Is this expected, or am I doing something silly here? Shouldn't it be possible to optimize assembly in some way so it is both speedy and automatic?
Edits:
I should mention that this was tested using FEniCS 1.4. I'm not able to try version 1.5 since I'm using an academic, shared system at the moment. Could someone with access to 1.5 try to see if this has changed?
Could this be related to FFC issue #61 ?