Gossip and consensus algorithms are attractive for distributed
function computation, but in networks with poor edge expansion, such
as grids and random geometric graphs, the message complexity has
suboptimal scaling.  We study a class of distributed averaging
algorithms where each node locally operates a linear predictor using
two memory taps.  Agents iteratively update their local value with a
convex combination of the linear predictor and the usual weighted
linear combination of values received from neighboring nodes.  We
derive the optimal mixing parameter for the convex combination and
show how it can easily be computed in a distributed manner.
Surprisingly, using just two memory taps provably leads to significant
improvements in the rate of convergence.  For example, for an n-node
two-dimensional grid topology we obtain a factor of \sqrt{n} reduction
in the number of iterations required to reach a prescribed level of
accuracy.