Agreed.
Blind men & elephants come from designing & evaluating
at the same time.
I'm trying to figure out what major mechanisms are required
to achieve "essentially" optimal results w/o overdesigning.
E.g., I've been resisting allowing "massive" simultaneous
rearrangements, but they may be required in order that
worst-case behavior doesn't dominate the average case
behavior.
Why I'm doing this: I've been unhappy with the standard
RAM model which assumes constant memory access time for
all memory accesses, which is complete hogwash.
What I'm really trying to design/model here is a memory
"cache", but a bizarre type of cache in that every cache
line has a slightly different access time, with more
"remote" elements being monotonically slower than "closer"
elements. As in modern microprocessors, the speed of
the fastest cache lines may be 3 orders of magnitude
faster than the slowest cache lines. Furthermore, the
cost of a cache line is directly correlated with its
speed, so there are far more slower elements than faster
ones.
I haven't settled down on a particular cost model,
because I wanted to play with different cost models.
As usual, we want most of the "work" to be done in
the closest/fastest cache lines, including any work
of rearranging the order of the cache lines. This
is because "small" rearrangements can be much faster
than "large" rearrangements.
These are all "exclusive" caches, meaning that a
particular item lives in exactly one cache line.
I think I'm close to deciding that some massive
rearrangements -- e.g., replacing the first N elements
of the fastest part of the cache with N sequential
remote elements -- will be a necessary builtin feature.
However, it is likely that the cost of such a massive
rearrangement will have to be correlated with the
"distance" as well as the number of items. Furthermore,
it may be necessary and/or convenient to have the
number of elements (N) involved in each transfer be
correlated with the distance. There is a tradeoff
here because a large transfer from "far" may bring
in something you need, but also a lot of other stuff
you did't want -- thus "polluting" the fast cache.
This is reminiscent of 1950's/1960's style programming
where large blocks of data have to be brought in/out by
"channel operations"; some of these channels were even
smart enough to transfer data *within* main memory w/o
even touching an I/O device.
At 12:44 PM 4/20/2018, Tomas Rokicki wrote:
>I think we need some sort of explicit cost model here. Otherwise we are
>each blindly touching a different part of the elephant.