1) I've removed the costing changes. As Tom mentions elsewhere in this thread, that's probably not needed for v1 and it's true those estimates are probably somewhat sketchy anyway.
but I noticed the comment immediately above that says
* Notify child node about limit. Note: think not to "optimize" by * skipping ExecSetTupleBound if compute_tuples_needed returns < 0. We * must update the child node anyway, in case this is a rescan and the * previous time we got a different result.
which applies to WITH_TIES too, no? So I've modified compute_tuples_needed to return -1, and reverted this change. Not sure, though.
I agree that passing -1 to ExecSetTupleBound is more appropriate implementation
3) I'm a bit confused by the initialization added to ExecInitLimit. It first gets the tuple descriptor from the limitstate (it should not do so directly but use ExecGetResultType). But when it creates the extra slot, it uses ops extracted from the outer plan. That's strange, I guess ...
And then it extracts the descriptor from the outer plan and uses it when calling execTuplesMatchPrepare. But AFAIK it's going to be compared to the last_slot, which is using a descriptor from the limitstate.
IMHO all of this should use descriptor/ops from the outer plan, no? It probably does not change anything because limit does not project, but it seems confusing.