On 4/23/07, James Propp <propp@math.wisc.edu> wrote:
Steve Witham writes:
Wouldn't it be easier to look at the minimum frequency instead of the distribution of frequencies? I realize it's not the same question, but as long as you're hunting in the dark.
I agree: for fixed n and N, one could look at the sequence s of length n that occurs the least number of times in one's sequence S of length N, and try to make this quantity as large as possible by choosing S cleverly.
There seems to be some confusion here --- serving to emphasise the point I was trying to make earlier about considering the standard deviation of the distances between successive occurrences of a given word, rather than dragging in unnecessary sums to N. Unless the (asymptotic) density of every word of fixed length n is 2^(-n), --- i.e. the sequence is oo-distributed --- you aren't even going to consider it. What will affect your discrepancy measure is _not_ the number of occurrences of the word in some interval of interest, but how irregularly they occur --- effectively, the standard deviation. Fred Lunnon