There are a few cases where this can be useful. Apart from the obvious
(and silly)
100*n + 200*n
where we generate one IMUL instead of two, consider:
15*n + 31*n
Currently, the compiler strength-reduces both imuls, generating:
0x0000 00000 MOVQ "".n+8(SP), AX
0x0005 00005 MOVQ AX, CX
0x0008 00008 SHLQ $4, AX
0x000c 00012 SUBQ CX, AX
0x000f 00015 MOVQ CX, DX
0x0012 00018 SHLQ $5, CX
0x0016 00022 SUBQ DX, CX
0x0019 00025 ADDQ CX, AX
0x001c 00028 MOVQ AX, "".~r1+16(SP)
0x0021 00033 RET
But combining the imuls is both faster and shorter:
0x0000 00000 MOVQ "".n+8(SP), AX
0x0005 00005 IMULQ $46, AX
0x0009 00009 MOVQ AX, "".~r1+16(SP)
0x000e 00014 RET
even without strength-reduction.
Moreover, consider:
5*n + 7*(n+1) + 11*(n+2)
We already have a rule that rewrites 7(n+1) into 7n+7, so the
generated code (without imuls merging) looks like this:
0x0000 00000 MOVQ "".n+8(SP), AX
0x0005 00005 LEAQ (AX)(AX*4), CX
0x0009 00009 MOVQ AX, DX
0x000c 00012 NEGQ AX
0x000f 00015 LEAQ (AX)(DX*8), AX
0x0013 00019 ADDQ CX, AX
0x0016 00022 LEAQ (DX)(CX*2), CX
0x001a 00026 LEAQ 29(AX)(CX*1), AX
0x001f 00031 MOVQ AX, "".~r1+16(SP)
But with imuls merging, the 5n, 7n and 11n factors get merged, and the
generated code looks like this:
0x0000 00000 MOVQ "".n+8(SP), AX
0x0005 00005 IMULQ $23, AX
0x0009 00009 ADDQ $29, AX
0x000d 00013 MOVQ AX, "".~r1+16(SP)
0x0012 00018 RET
Which is both faster and shorter; that's also the exact same code that
clang and the intel c compiler generate for the above expression.
Change-Id: Ib4d5503f05d2f2efe31a1be14e2fe6cac33730a9
Reviewed-on: https://go-review.googlesource.com/55143
Reviewed-by: Keith Randall <khr@golang.org>