I managed to find the answer by mimicking the non-liner Poisson demo. Adding it here in case someone else has a similar problem.
The following lines work fine ( of course I have ommited, the mesh, bcs, etc here)
(w,q) = TestFunctions(W)
(v, p) = TrialFunctions(W)
def mu(p):
return 1 + p*p
F = (inner(sym(grad(w)), 2*mu(p,D)*sym(grad(v))) - div(w)*p - q*div(v))*dx - inner(w, rho*b)*dx
u_p = Function(W) # the most recently computed solution
R = action(F, u_p)
DR = derivative(R, u_p) # Gateaux derivative
problem = NonlinearVariationalProblem(R, u_p, bcs, DR)
solver = NonlinearVariationalSolver(problem)