On Saturday 26 January 2008, I wrote:
On Friday 25 January 2008, Bernie Cosell wrote:
I've been fiddling with the recursion for the expected number of probes [without replacement] to find any of M specific values among N objects. ... But for general M it seems a lot harder. The recursion I have is:
Em(N) = m/N + (N-m)/N * (1 + Em(N-1))
... = m/N + (N-m)/N + (N-m)/N . Em(N-1) = 1 + (N-m)/N . Em(N-1)
It's pretty straightforward to unwind this in the case m=2 (you get a sum of products where most of the terms cancel, and then the sum is very easy), which leads to the conjecture
Em(N) = (N+1)/(m+1)
which, once made, is easy to verify by induction: base case N=m is trivial, and the induction step is simple algebra.
So there ought to be a simple reason. What is it? The expected number of trials is Pr(at least one trial) + Pr(at least two trials) + ..., and Pr(at least k trials) is (N-m choose k) / (N choose k) = (N-m)!(N-k)!/(N-m-k)!N!, which leads to the same telescoping sum as you get from unwinding the recurrence, but it doesn't seem like there's an insightful one-liner down this road. Is there some correspondence between (N,m) without replacement and (N+1,m+1) with replacement? Even if there is, it's probably not very clear and obvious; after all, the with-replacement case can go on unboundedly long, and the without-replacement case can't. But surely this must be obvious when you look at it right. (Knuth's TAOCP puts the thing we're summing into the more helpful form (N-k choose m) / (N choose m), whereupon the sum over k becomes obvious. That's nice, but it still doesn't feel like it's providing much insight into why the answer should be (N+1)/(m+1).) -- g