Fork_join.for_
The allocator means calling a closure at each step, which means local a reference will have to be heap allocated (and worse, that floats will be unboxed). Instead we give the function a pair of low,high bounds for a local for.
Fork_join.{for_,map_reduce_commutative}