cmd/6g: avoid MOVSD between registers

MOVSD only copies the low half of the packed register pair, while MOVAPD copies both halves. I assume the internal register renaming works better with the latter, since it makes our code run 25% faster. Before: mandelbrot 16000 gcc -O2 mandelbrot.c 28.44u 0.00s 28.45r gc mandelbrot 44.12u 0.00s 44.13r gc_B mandelbrot 44.17u 0.01s 44.19r After: mandelbrot 16000 gcc -O2 mandelbrot.c 28.22u 0.00s 28.23r gc mandelbrot 32.81u 0.00s 32.82r gc_B mandelbrot 32.82u 0.00s 32.83r R=ken2 CC=golang-dev https://golang.org/cl/6248068
2024-11-20 09:44:45 -07:00 · 2012-05-30 14:41:19 -04:00 · 2012-05-30 14:41:19 -04:00 · a768de8347
commit a768de8347
parent eb056dbea7
1 changed files with 11 additions and 0 deletions
--- a/src/cmd/6g/peep.c
+++ b/src/cmd/6g/peep.c
@ -283,6 +283,12 @@ loop1:
 	// copyprop.  Now that copyprop is done, remov MOVLQZX R1, R2
 	// if it is dominated by an earlier ADDL/MOVL/etc into R1 that
 	// will have already cleared the high bits.
+	//
+	// MOVSD removal.
+	// We never use packed registers, so a MOVSD between registers
+	// can be replaced by MOVAPD, which moves the pair of float64s
+	// instead of just the lower one.  We only use the lower one, but
+	// the processor can do better if we do moves using both.
 	for(r=firstr; r!=R; r=r->link) {
 		p = r->prog;
 		if(p->as == AMOVLQZX)
@ -290,6 +296,11 @@ loop1:
 		if(p->from.type == p->to.type)
 		if(prevl(r, p->from.type))
 			excise(r);
+		
+		if(p->as == AMOVSD)
+		if(regtyp(&p->from))
+		if(regtyp(&p->to))
+			p->as = AMOVAPD;
 	}

 	// load pipelining