Those interested in this topic will probably enjoy http://research.microsoft.com/pubs/209622/patsort-sigmod14.pdf "Patience is a Virtue: Revisiting Merge and Sort on Modern Processors" Abstract: The vast quantities of log-based data appearing in data centers has generated an interest in sorting almost-sorted datasets. We revisit the problem of sorting and merging data in main memory, and show that a long-forgotten technique called Patience Sort can, with some key modifications, be made competitive with today’s best comparison-based sorting techniques for both random and almost sorted data. Patience sort consists of two phases: the creation of sorted runs, and the merging of these runs. Through a combination of algorithmic and architectural innovations, we dramatically improve Patience sort for both random and almost-ordered data. Of particular interest is a new technique called ping-pong merge for merging sorted runs in main memory. Together, these innovations produce an extremely fast sorting technique that we call P3 Sort (for Ping-Pong Patience+ Sort), which is competitive with or better than the popular implementations of the fastest comparison-based sort techniques of today. For example, our implementation of P3 sort is around 20% faster than GNU Quicksort on random data, and 20% to 4x faster than Timsort for almost sorted data. Finally, we investigate replacement selection sort in the context of single-pass sorting of logs with bounded disorder, and leverage P3 sort to improve replacement selection. Experiments show that our proposal, P3 replacement selection, significantly improves performance, with speedups of 3x to 20x over classical replacement selection. -Thomas C On Wed, May 21, 2014 at 1:07 PM, Henry Baker <hbaker1@pipeline.com> wrote:
In computer architectures, branch prediction "works" only to the extent that the probability of one arm is substantially smaller than the other arm. If both arms are equally likely, then many/(most?) branch prediction schemes will actually do worse, since the overheads for the branch prediction scheme aren't paid for on either arm.
The best sorting algorithms arrange for the comparisons to be "most informative" -- i.e., their branch probabilities are very close to 1/2 -- so traditional "branch prediction" isn't going to be very helpful.
Now to the extent that you can defer branches and deal with the consequences later may allow you to overlap some activity so that previously sequential operations are now in parallel.
However, you should be aware that for most modern CPU's, things like "instruction fetch" are more-or-less free, since instruction fetch units typically fetch instructions from both branch paths.
At 04:58 PM 5/20/2014, Warren D Smith wrote:
Suppose you are doing quicksort. Zoing, zoing, zoing, you do compares, and suddenly you decide "aha! this last comparison tells me I need to swap item[K]!" And then you branch off someplace else in your program to do it. Big lose. Mispredicted branch.
_______________________________________________ math-fun mailing list math-fun@mailman.xmission.com http://mailman.xmission.com/cgi-bin/mailman/listinfo/math-fun