--- Michael Kleber <kleber@brandeis.edu> wrote:
Eugene Salamin wrote:
Common sense predicts that when each object is observed multiple times, the posterior will sharply select a unique N. On the other hand, a distribution with a long tail of objects appearing just once will provide little information at large N.
I was a bit confused here about that second sentence. If N is very large, then all n selections will be different. So it is only in this case that the likelihood is a flat function of N for large N. Even a single duplication must make the likeluhood fall off with increasing N, but as my example showed, the fall off need not be sufficient to ensure that the posterior is a proper probability (i.e. the sum over N could diverge).
Unfortunately, I expect that my real-world application will turn out to be the latter case; that my number of samples will be small enough that I'll be lucky to ever see anything three times. (In which case, I suppose, knowing the total number of distinct elements seen is all the information there is!)
Let N be the unknown actual number of objects. Let n be the number of draws, with replacement. Let d be the number of distinct objects found. Let P(d|N,n) be the probability of getting d when N and n are known (so that the sum of P(d|N,n) over all d equals 1). Then P(d|N,n) equals n! times binomial(N,d) times the coefficient of t^d in [exp(t/N)-1]^d. By Bayes theorem, this equals the likelihood of N given the observed n and d.
On Thomas's suggestion (off-list), I had already started trying to work out the Bayesian approach, but to say that my skills here are rusty would be misrepresenting the presence of some mettle (heh) in the first place. Maybe tomorrow I'll try to work out the case where each thing is seen at most three times.
Start with Jaynes. Publisher's web page ( http://us.cambridge.org/titles/catalogue.asp?isbn=0521592712 ). It's a fun book to read, the math is not too difficult, it's full of history and amusing paradoxes. It's also 758 pages. Also see the official E. T. Jaynes web site at ( http://bayes.wustl.edu/ ).
An observation (n1, n2, ...) that is atypical of any N may lead one, in a real world situation, to entertain additional hypotheses, for example, that we have been disinformed.
Yes, that's quite likely for me too. (In particular, my ability to recognize when two draws are actually the same or different is almost certainly imperfect.)
Problems like this one, Bayes theorem, scientific inference, probability and common sense, probability as an extension of deductive logic, these are the subject matter of Edwin T. Jaynes' book "Probability, the Logic of Science".
On that note, let me announce my change in employment. Having been a math professor at MIT and Brandeis for six years, I've just started doing something different. As of last month, I'm working at the Broad (rhymes with "road") Institute, an umbrella organization for research on genomics in medicine, affiliated with MIT, Harvard, and the Whitehead Institute (which the whole structure was a part of until last November).
This is Eric Lander's institute, the birthplace of the Human Genome Project and probably the world's leading center for genome sequencing. You may have seen the news story two or three weeks ago that the dog genome had just been released; that was us.
I'm in the Whole Genome Assembly group. We do "shotgun assembly" : the people in the laboratory take the DNA from an organism and chop it up into lots of little pieces, and read the sequences of letters near each end of each little piece; we take all the pieces and put them back together.
Immediate change: the old mathematician's dilemma of how to deal with the question "But what real use is your work?" is entirely gone. "Oh, we're going to cure cancer" is sort of an ace in the hole...
--Michael Kleber kleber@broad.mit.edu
Since you will do doing real world scientific inference, you certainly will need to have a working knowledge of Bayesian methodology. So again, I recommend Jaynes. Gene __________________________________ Do you Yahoo!? Yahoo! Mail is new and improved - Check it out! http://promotions.yahoo.com/new_mail