I wouldn't confuse machine instruction encoding with machine instruction execution. There are some machines that "compile" standardly-encoded instructions on-the-fly into the instruction cache as very different RISC/VLIW instructions.* Thus, a lot of compiler cleverness (e.g., LEA) is wasted because both the clever and the non-clever instruction encodings end up as the same bit patterns (and execution speeds) in the instruction cache. So unless the instructions are never already in the instruction cache at the time that the instruction is actually executed, there isn't much penalty to the non-clever encoding. Note also that many machines do relatively aggressive pre-fetch on instructions, because the penalty for extra speculation on instructions isn't very high. On newer Intel processors with execute-only pages, the processor is free to pre-fetch like crazy. Thus, the only savings for such clever compiler encodings is in the size of the binary file, which -- in these days of 100MB-1GB applications -- is pretty insignificant. * This is yet another reason for separate instruction & data caches, which can cause all sorts of mischief when then get out of sync -- e.g., you can hide malicious code in plain sight (well, in the instruction cache), while the data cache shows the non-malicious code. But this is a discussion for another day. At 08:53 AM 4/8/2015, Joerg Arndt wrote:
less asm.lst [...] 9:foo.cc **** unsigned foo(unsigned x) 10:foo.cc **** { 64 .loc 1 10 0 65 .cfi_startproc 66 .LVL0: 67 .LBB2: 11:foo.cc **** unsigned y = x*x; 68 .loc 1 11 0 69 0000 89F8 movl %edi, %eax # x, y 70 0002 0FAFC7 imull %edi, %eax # x, y 71 .LVL1: 12:foo.cc **** return x + 1 + 4*y; 72 .loc 1 12 0 73 0005 8D448701 leal 1(%rdi,%rax,4), %eax #, D.2236 74 .LVL2: 75 .LBE2: 13:foo.cc **** } [...]
And compilers do MUCH more sophisticated thinks than this.