[math-fun] intel and bad instruction sets history/philosophy
When reading Knuth a long time ago, I never cared much for his MIX code; I just skipped over it. Why should I waste my time trying to understand some assembly code for a non-existent computer?
--I was kind of annoyed by MIX too. One of the reasons C language was originally successful, was it tried to make a high-level language which also enjoyed the benefits of,and unashamedly tried to connect to, assembler. (As opposed to pretending assembler did not exist.) So Knuth would have been better off using C than MIX in many ways. However, C fell short of its design goal plus now with C++ has become horrible mess. But anyway, apparently with MMIX, Knuth is trying to be a bit of a prosyletizer for what instruction sets ought to have, rather than just trying to approximate what they are. Also a bit of a bizarre move for a textbook, but I hope it'll influence people a bit to overcome the energy barrier.
I used to like elegant instruction sets: the PDP-10 and the Motorola 68000. National Semiconductor came out with a yet better instruction set in their 32000 microprocessor. We seriously thought about using the 32000 in our medical lasers at Coherent, but when we talked to the National Semiconductor rep, they had totally inferior in-circuit emulators and such development hardware. We went with the 68000.
As hardware technology advanced, we had prefetching, simultaneous instruction execution, pipelining, and so on. The nail in the coffin for me (or should it be the stake through the heart?) was when I was told by someone I trusted that modern compilers can produce faster executing code than all but the most expert hand coding of assembly. At that point, I changed over completely, and now I just couldn't care less about instruction sets.
--umm. Well, it depends: do you want performance? If you do, then you should care about instruction sets and architecture. But here is the thing: 90% of computer customers have never heard of the POPCOUNT instruction, much less want to use it and know they want to. So as a business matter, intel does not care about making good instruction sets. What a heck of a lot of computer buyers DO care about, is: can I run some piece of software for shooting monsters with great graphics, or watching porno movies, that was originally written in assembler for performance? And I want that software right now, I do not want to wait a few years for somebody to re-code it. So the fact that, if better instruction sets & architectures are provided, then in 5 years we'll get better performing software, is not a matter of business importance. As a total guess I'd think we could get 2X speedup and/or electrical power reduction with better architectures, but businesswise they do not care because that speedup is for FUTURE software, not today's software. Almost nobody buys a computer because of hoped-for future software.
Warren, what is this better architecture that Intel ran up a flagpole? What should we search for to look it up?
--itanium. And by the way, looking into this, it seems ARM and itanium have made some business progress, BUT the way they did so was by going into entirely new markets, like servers and cell phones. In those markets, the software was not already there. So the buyers really *DID* care about "future software" as opposed to "old software." So they really did care about performance. So that allowed these new better architectures to gain some business success.
Why did Apple switch from the 68000 to the x86? Steve Jobs may have been an asshole, but he was a smart one.
Yes, backward compatibility is very important. Nobody wants to be endlessly rewriting their code; we want to move on to newer and better things, not rehashing the old.
What makes a computer language "better"; that seems to be a matter of personal opinion. I did my 68000 coding at Coherent Medical in C, and I liked the language. Later, Coherent switched over to Windows based controllers and C++, but by then I had moved from software engineering to optics and laser engineering. I never learned C++, but I can appreciate its usefulness. I'm still fond of Fortran. I tend to detest new languages, and the only one I've embraced is Python. Once at a computer show, I asked a Forth guy what's so good about Forth. He told me astronomers like it, but otherwise go read about it. When I asked a real astronomer, he said "Nah, we don't use Forth." Then there was Smalltalk; it was supposed to solve all the problems of the universe. I said that when it takes off, there will be plenty of books to read about it; I'm still waiting.
--quite. But if you look at some of the languages, they have nice ideas. And I have ideas of my own too for what should be in languages. But all that usually is not going anywhere due to the same energy barrier as Esperanto, only with some new bad-incentives added to (like: fear that the language will go away and you'll be left twisting in the wind).
So, why are we not speaking Esperanto? Maybe because its only use is to converse with fellow fanatics.
--it is because, even though it is a better language, and humanity would be better off if we all spoke it -- we'd outperform old-style humanity by a lot -- the average Joe does not find it to his individual benefit to learn Esperanto right now. It only works if nearly everybody does it. If 1 person does it, it's the opposite of useful. It's a barrier to human progress caused by the difference between individual and society-wide incentives. And similar remarks could be made about, e.g. switching to better energy sources... switching to better voting systems... This is a fundamental common problem-structure that comes up in many guises.
--I was kind of annoyed by MIX too. One of the reasons C language was originally successful, was it tried to make a high-level language which also enjoyed the benefits of,and unashamedly tried to connect to, assembler. (As opposed to pretending assembler did not exist.) So Knuth would have been better off using C than MIX in many ways. However, C fell short of its design goal plus now with C++ has become horrible mess.
--for example... Even though C "tried to connect with assembler" it in many ways failed to do so. What about "add with carry"? Sorry, C pretends that does not exist and so if you want to write multiprecision arithmetic, you cannot use C (unless willing to pay heavy performance price) and must put in some assembler. All because, the providers of C, *SUCKED.* And practically every high-level language in the universe, pretends add-with-carry does not exist. It is just appalling. What is their problem? And popcount and such are excellent instructions for dealing with sets as bits, which again practically every high level language pretends does not exist. (With C you can get it thru nonstandard compiler extensions that are ugly and nonportable. It makes you live in hell.) All because the language designers suck. And if they'd put this in the language, hardware manufacturers would have had incentive to add popcount to hardware. The level of stupidity here was unbelievable. Now if you ask language designers, they say stuff like "oh, instructions do not matter, syntactical details do not matter, it is our overarching concepts that matter. You are just a naive child." Bullshit. The reality is, C caught on EXACTLY because it connected well to assembler and it had compact well thought out syntax. Not perfect in either respect, but it tried. And language designers today completely ignore that history that is just staring them in the face. And why did C++ catch on? It is an ugly huge piece of crap. If C++ had been introduced ab initio back in the day, it never would have caught on. It would rightly have been regarded as garbage and landed on the scrap heap with PL/I and stuff. Heck, they probably could not have even built a compiler for it. The sole reason C++ caught on was it was built on top of C and exploited the desire for backwards compatibility to the hilt. Anyway, how about SWAP(a,b)? That is something that one does very often, but no(?) high level language provides it. That kind of thing is like a thorn in your side. It just annoys you a little each day. And how about this: I want some stuff to be computed at compilation time, other stuff at run time I want to control that with same-syntax language performing this control. (E.g. "#for" would unroll a loop at compile time, versus plain "for" is a loop done at runtime.) Again, no language I know of has this obvious idea. And why do we get cyclic shifts of words, but not reversal, so we cannot get the dihedral group? Why? For what possible reason? I could go on and on. The level of stupidity is just appalling.
On 19-Apr-14 21:33, Warren D Smith wrote:
--I was kind of annoyed by MIX too. One of the reasons C language was originally successful, was it tried to make a high-level language which also enjoyed the benefits of,and unashamedly tried to connect to, assembler. (As opposed to pretending assembler did not exist.) So Knuth would have been better off using C than MIX in many ways. However, C fell short of its design goal plus now with C++ has become horrible mess. --for example...
Even though C "tried to connect with assembler" it in many ways failed to do so.
What about "add with carry"? Sorry, C pretends that does not exist and so if you want to write multiprecision arithmetic, you cannot use C (unless willing to pay heavy performance price) and must put in some assembler.
All because, the providers of C, *SUCKED.* And practically every high-level language in the universe, pretends add-with-carry does not exist. It is just appalling. What is their problem? At least in python, integers are inherently arbitrary precision, expanding as necessary. So add-with-carry isn't needed.
And popcount and such are excellent instructions for dealing with sets as bits, which again practically every high level language pretends does not exist. (With C you can get it thru nonstandard compiler extensions that are ugly and nonportable. It makes you live in hell.) All because the language designers suck. And if they'd put this in the language, hardware manufacturers would have had incentive to add popcount to hardware. The level of stupidity here was unbelievable.
Now if you ask language designers, they say stuff like "oh, instructions do not matter, syntactical details do not matter, it is our overarching concepts that matter. You are just a naive child." Bullshit.
The reality is, C caught on EXACTLY because it connected well to assembler and it had compact well thought out syntax. Not perfect in either respect, but it tried. And language designers today completely ignore that history that is just staring them in the face. One thing I hate about C is that it uses declarations far removed from the executable code to decide whether a<b is a signed or unsigned comparison. And I even more hate the way java "fixed" this problem by only having signed integers. In general, the overloading of operators depending on the type of the operands seems to me like a bad idea, particularly when the operands don't carry an inherent type. And in C, the "usual [but not necessarily logical] arithmetic conversions" compound the problem. Some other bad decisions in C: terminating strings with null instead of having a length; having the shift operators with the wrong precedence (shifts should have the same precedence as * and /), as well as other precedence mistakes. And why did C++ catch on? It is an ugly huge piece of crap. If C++ had been introduced ab initio back in the day, it never would have caught on. It would rightly have been regarded as garbage and landed on the scrap heap with PL/I and stuff. Heck, they probably could not have even built a compiler for it. The sole reason C++ caught on was it was built on top of C and exploited the desire for backwards compatibility to the hilt.
Anyway, how about SWAP(a,b)? That is something that one does very often, but no(?) high level language provides it. That kind of thing is like a thorn in your side. It just annoys you a little each day. python: a,b = b,a
And how about this: I want some stuff to be computed at compilation time, other stuff at run time I want to control that with same-syntax language performing this control. (E.g. "#for" would unroll a loop at compile time, versus plain "for" is a loop done at runtime.) Again, no language I know of has this obvious idea.
And why do we get cyclic shifts of words, but not reversal, so we cannot get the dihedral group? Why? For what possible reason?
I could go on and on. The level of stupidity is just appalling.
_______________________________________________ math-fun mailing list math-fun@mailman.xmission.com http://mailman.xmission.com/cgi-bin/mailman/listinfo/math-fun
* Warren D Smith <warren.wds@gmail.com> [Apr 20. 2014 07:41]:
[...]
And why did C++ catch on? It is an ugly huge piece of crap.
Christ! Opinions are ..., but everybody has got one.
If C++ had been introduced ab initio back in the day, it never would have caught on. It would rightly have been regarded as garbage and landed on the scrap heap with PL/I and stuff. Heck, they probably could not have even built a compiler for it. The sole reason C++ caught on was it was built on top of C and exploited the desire for backwards compatibility to the hilt.
Anyway, how about SWAP(a,b)? That is something that one does very often, but no(?) high level language provides it. That kind of thing is like a thorn in your side. It just annoys you a little each day.
std::swap() ? http://www.cplusplus.com/reference/algorithm/swap/ It doesn't get much more generic than that.
And how about this: I want some stuff to be computed at compilation time, other stuff at run time I want to control that with same-syntax language performing this control.
Ever heard of templates? Looks like you'd appreciate (gasp!) template meta programming.
(E.g. "#for" would unroll a loop at compile time, versus plain "for" is a loop done at runtime.) Again, no language I know of has this obvious idea.
That's a tall request. But there: OpenMP The proper approach is (of course!) to write programs that write programs.
And why do we get cyclic shifts of words, but not reversal, so we cannot get the dihedral group? Why? For what possible reason?
The designers of C didn't think of every possible use 30 years in the future. They should really be ashamed. I found it more astounding that the unix command line doesn't give the dihedral group. That's why there is "rotate", near bottom of the page http://jjj.de/shell/shellpage.html
I could go on and on. The level of stupidity is just appalling.
Here is what I like about every single language that is "so much cooler than C": it's written in C. And, by the way, there is a threshold in programming language criticism where "Write your own or cork it!" applies. Best, jj
[...]
On 20/04/2014 02:33, Warren D Smith wrote:
Anyway, how about SWAP(a,b)? That is something that one does very often, but no(?) high level language provides it. That kind of thing is like a thorn in your side. It just annoys you a little each day.
Common Lisp: (rotatef a b) Python: a,b = b,a Ruby: a,b = b,a C++: std::swap(a,b)
And why do we get cyclic shifts of words, but not reversal, so we cannot get the dihedral group? Why? For what possible reason?
The vast majority of the world's software doesn't have any particular need to represent subsets of small dihedral groups with machine integers. There's only so much space in the instruction set encoding, and it's not clear that bit reversal would earn its place. (Perhaps it might, for FFT-related reasons. But if no major CPU architecture has ever had a bit-reversal operation -- is that true? -- then it seems pretty likely that at least once someone's given careful thought to whether it would be worth including and decided not.) -- g
* Gareth McCaughan <gareth.mccaughan@pobox.com> [Apr 20. 2014 16:25]:
[...]
(Perhaps it might, for FFT-related reasons. But if no major CPU architecture has ever had a bit-reversal operation -- is that true? -- then it seems pretty likely that at least once someone's given careful thought to whether it would be worth including and decided not.)
I do not know how many gates it takes for revbin, but speculate it would be very cheap (certainly cheaper than shift/rotate by [arg] positions), as you'd need just N/2 "swap bit pair" mechanisms (where N = word length). There is another _very_ good reason for revbin: quite a few tricks in use the fact that the adder propagates "instantaneously" through arbitrarily man positions. I'll show a very neat trick in another message that works in _one_ direction. Best, jj
-- g
_______________________________________________ math-fun mailing list math-fun@mailman.xmission.com http://mailman.xmission.com/cgi-bin/mailman/listinfo/math-fun
Doesn't it take _no_ gates to reverse a word, no matter how long it is? Just a bunch of wires. On Sun, Apr 20, 2014 at 10:38 AM, Joerg Arndt <arndt@jjj.de> wrote:
* Gareth McCaughan <gareth.mccaughan@pobox.com> [Apr 20. 2014 16:25]:
[...]
(Perhaps it might, for FFT-related reasons. But if no major CPU architecture has ever had a bit-reversal operation -- is that true? -- then it seems pretty likely that at least once someone's given careful thought to whether it would be worth including and decided not.)
I do not know how many gates it takes for revbin, but speculate it would be very cheap (certainly cheaper than shift/rotate by [arg] positions), as you'd need just N/2 "swap bit pair" mechanisms (where N = word length).
There is another _very_ good reason for revbin: quite a few tricks in use the fact that the adder propagates "instantaneously" through arbitrarily man positions. I'll show a very neat trick in another message that works in _one_ direction.
Best, jj
-- g
_______________________________________________ math-fun mailing list math-fun@mailman.xmission.com http://mailman.xmission.com/cgi-bin/mailman/listinfo/math-fun
_______________________________________________ math-fun mailing list math-fun@mailman.xmission.com http://mailman.xmission.com/cgi-bin/mailman/listinfo/math-fun
On 20/04/2014 15:38, Joerg Arndt wrote:
* Gareth McCaughan <gareth.mccaughan@pobox.com> [Apr 20. 2014 16:25]: ...
(Perhaps it might, for FFT-related reasons. But if no major CPU architecture has ever had a bit-reversal operation -- is that true? -- then it seems pretty likely that at least once someone's given careful thought to whether it would be worth including and decided not.)
I do not know how many gates it takes for revbin, but speculate it would be very cheap (certainly cheaper than shift/rotate by [arg] positions), as you'd need just N/2 "swap bit pair" mechanisms (where N = word length).
The scarce resource is not gates, nor even "wiring space" on the chip. It's space in the instruction set: whatever opcodes you dedicate to bit-reversal (or maybe, ARM-like, to making bit-reversal an option in other instructions) are not available for (1) other things that might offer more performance benefits in real code or (2) future expansion.
There is another _very_ good reason for revbin: quite a few tricks in use the fact that the adder propagates "instantaneously" through arbitrarily man positions. I'll show a very neat trick in another message that works in _one_ direction.
Is that "very good, for people who love neat hacks" or "very good, in actual performance improvement in databases and web browsers and first-person shooter games and nuclear bomb simulations and other things that people with money to spend on their computers care about?"? (For the avoidance of doubt, I personally derive much more pleasure from a neat bit-twiddling hack than from seeing someone's database run 0.1% faster. But I, and others like me, don't account for a large fraction of Intel's -- or ARM's, AMD's, etc. -- revenues.) -- g
It strikes me that making a bit reverse without gates isn't a simple analog of making a "move" instruction. You'd need a 64 layer board.
It strikes me that making a bit reverse without gates isn't a simple analog of making a "move" instruction. You'd need a 64 layer board.
Or a two layer board. On Apr 20, 2014 1:47 PM, "Dave Dyer" <ddyer@real-me.net> wrote:
It strikes me that making a bit reverse without gates isn't a simple analog of making a "move" instruction.
You'd need a 64 layer board.
_______________________________________________ math-fun mailing list math-fun@mailman.xmission.com http://mailman.xmission.com/cgi-bin/mailman/listinfo/math-fun
I'm seeing only two layers: 64 wires run horizontally, then they hit a diagonal row of holes, through which they all dive to the other layer, in which they run vertically. On Sun, Apr 20, 2014 at 4:48 PM, Tom Rokicki <rokicki@gmail.com> wrote:
Or a two layer board. On Apr 20, 2014 1:47 PM, "Dave Dyer" <ddyer@real-me.net> wrote:
It strikes me that making a bit reverse without gates isn't a simple analog of making a "move" instruction.
You'd need a 64 layer board.
_______________________________________________ math-fun mailing list math-fun@mailman.xmission.com http://mailman.xmission.com/cgi-bin/mailman/listinfo/math-fun
_______________________________________________ math-fun mailing list math-fun@mailman.xmission.com http://mailman.xmission.com/cgi-bin/mailman/listinfo/math-fun
* Warren D Smith <warren.wds@gmail.com> [Apr 20. 2014 07:41]:
[...]
As hardware technology advanced, we had prefetching, simultaneous instruction execution, pipelining, and so on. The nail in the coffin for me (or should it be the stake through the heart?) was when I was told by someone I trusted that modern compilers can produce faster executing code than all but the most expert hand coding of assembly. At that point, I changed over completely, and now I just couldn't care less about instruction sets.
That was me? Background: I wrote a simple 3dim graphics engine some years ago in C++. The machine code (ADM64 bit system) was just 44 _kilo_ bytes (texture mapping, transparency, colored light, ..., all that included). Looking at the assembler code was pretty much a revelation: few humans could possibly do better, and, more importantly, would be willing to do so.
[...]
Warren, what is this better architecture that Intel ran up a flagpole? What should we search for to look it up?
--itanium.
Have you ever worked with an itanium system? No? I thought so.
And by the way, looking into this, it seems ARM and itanium have made some business progress,
itanium is dead. AMD(64) has forced intel to make much better CPUs for us unwashed masses. At work, I have a system with a Intel(R) Xeon(R) CPU E3-1275 V2 @ 3.50GHz The performance is beyond awesome. Just one detail: the CPU has what intel calls a "loop stream detector (abbrev.: LSD)". When a tight loop is executed this is detected and instructions are executed without the need to go through the (pre-)decoder. Thus performance of some of my combinatorial generators is simply doubled, sometimes needing only a few CPU cycles per generated object. Cherry picked examples: ------------------------------------------------------------ // output of demo/comb/composition-nz-subset-lex-demo.cc: // Description: //% Compositions of n into positive parts, subset-lex order. ----- args=32 0 COMPOSITION_NZ_SUBSET_LEX_FIXARRAYS defined. forward: ct=2147483648 ./bin 32 0 2.08s user 0.00s system 99% cpu 2.079 total ==> 2147483648/2.08 == 1032.444061 [M per second]; 1/rate == 3.39 [cycles] ------------------------------------------------------------ // output of demo/comb/mixedradix-subset-lexrev-demo.cc: // Description: //% Mixed radix numbers in reversed subset-lexicographic order. ----- args=8 16 1 MIXEDRADIX_SUBSET_LEXREV_FIXARRAYS is defined. backward: ct=4294967296 ./bin 8 16 1 2.47s user 0.00s system 99% cpu 2.476 total ==> 4294967296/2.47 == 1738.853156 [M per second]; 1/rate == 2.01 [cycles] ------------------------------------------------------------ End cherry pick. Most simple such generators need less than 10 cycles per object ( >= 350 M objects per second, 500 M/sec quite usual), 20 cycles is almost "lame" by this CPU's standards. Currently AMD is falling behind at the performance end and intel is much less forced to both innovate and keep the prices reasonable. This is very bad.
[...]
Best, jj
participants (7)
-
Allan Wechsler -
Dave Dyer -
Gareth McCaughan -
Joerg Arndt -
Mike Speciner -
Tom Rokicki -
Warren D Smith