qbit/go - go - Tape:neT

qbit/go

Fork 0

mirror of https://github.com/golang/go synced 2024-11-14 08:40:27 -07:00

Commit Graph

Author	SHA1	Message	Date
Michael Munday	eed6938cbb	cmd/asm, cmd/internal/obj/s390x, math: add LGDR and LDGR instructions The instructions allow moves between floating point and general purpose registers without any conversion taking place. Change-Id: I82c6f3ad9c841a83783b5be80dcf5cd538ff49e6 Reviewed-on: https://go-review.googlesource.com/38777 Run-TryBot: Michael Munday <munday@ca.ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>	2017-04-17 16:33:51 +00:00
Michael Munday	a524616860	cmd/{asm,internal/obj/s390x}, math: remove emulated float instructions The s390x port was based on the ppc64 port and, because of the way the port was done, inherited some instructions from it. ppc64 supports 3-operand (4-operand for FMADD etc.) floating point instructions but s390x doesn't (the destination register is always an input) and so these were emulated. There is a bug in the emulation of FMADD whereby if the destination register is also a source for the multiplication it will be clobbered. This doesn't break any assembly code in the std lib but could affect future work. To fix this I have gone through the floating point instructions and removed all unnecessary 3-/4-operand emulation. The compiler doesn't need it and assembly writers don't need it, it's just a source of bugs. I've also deleted the FNMADD family of emulated instructions. They aren't used anywhere. Change-Id: Ic07cedcf141a6a3b43a0c84895460f6cfbf56c04 Reviewed-on: https://go-review.googlesource.com/33350 Run-TryBot: Michael Munday <munday@ca.ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>	2017-02-10 16:11:25 +00:00
Bill O'Farrell	b6a15683f0	math: use SIMD to accelerate some scalar math functions on s390x Note, most math functions are structured to use stubs, so that they can be accelerated with assembly on any platform. Sinh, cosh, and tanh were not structued with stubs, so this CL does that. This set of routines was chosen as likely to produce good speedups with assembly on any platform. Technique used was minimax polynomial approximation using tables of polynomial coefficients, with argument range reduction. A table of scaling factors was also used for cosh and log10. before after speedup BenchmarkCos 22.1 ns/op 6.79 ns/op 3.25x BenchmarkCosh 125 ns/op 11.7 ns/op 10.68x BenchmarkLog10 48.4 ns/op 12.5 ns/op 3.87x BenchmarkSin 22.2 ns/op 6.55 ns/op 3.39x BenchmarkSinh 125 ns/op 14.2 ns/op 8.80x BenchmarkTanh 65.0 ns/op 15.1 ns/op 4.30x Accuracy was tested against a high precision reference function to determine maximum error. Approximately 4,000,000 points were tested for each function, producing the following result. Note: ulperr is error in "units in the last place" max ulperr sin 1.43 (returns NaN beyond +-2^50) cos 1.79 (returns NaN beyond +-2^50) cosh 1.05 sinh 3.02 tanh 3.69 log10 1.75 Also includes a set of tests to test non-vector functions even when SIMD is enabled Change-Id: Icb45f14d00864ee19ed973d209c3af21e4df4edc Reviewed-on: https://go-review.googlesource.com/32352 Run-TryBot: Michael Munday <munday@ca.ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Michael Munday <munday@ca.ibm.com>	2016-11-11 20:20:23 +00:00

Author

SHA1

Message

Date

Michael Munday

eed6938cbb

cmd/asm, cmd/internal/obj/s390x, math: add LGDR and LDGR instructions

The instructions allow moves between floating point and general
purpose registers without any conversion taking place.

Change-Id: I82c6f3ad9c841a83783b5be80dcf5cd538ff49e6
Reviewed-on: https://go-review.googlesource.com/38777
Run-TryBot: Michael Munday <munday@ca.ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>

2017-04-17 16:33:51 +00:00

Michael Munday

a524616860

cmd/{asm,internal/obj/s390x}, math: remove emulated float instructions

The s390x port was based on the ppc64 port and, because of the way the
port was done, inherited some instructions from it. ppc64 supports
3-operand (4-operand for FMADD etc.) floating point instructions
but s390x doesn't (the destination register is always an input) and
so these were emulated.

There is a bug in the emulation of FMADD whereby if the destination
register is also a source for the multiplication it will be
clobbered. This doesn't break any assembly code in the std lib but
could affect future work.

To fix this I have gone through the floating point instructions and
removed all unnecessary 3-/4-operand emulation. The compiler doesn't
need it and assembly writers don't need it, it's just a source of
bugs.

I've also deleted the FNMADD family of emulated instructions. They
aren't used anywhere.

Change-Id: Ic07cedcf141a6a3b43a0c84895460f6cfbf56c04
Reviewed-on: https://go-review.googlesource.com/33350
Run-TryBot: Michael Munday <munday@ca.ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>

2017-02-10 16:11:25 +00:00

Bill O'Farrell

b6a15683f0

math: use SIMD to accelerate some scalar math functions on s390x

Note, most math functions are structured to use stubs, so that they can
be accelerated with assembly on any platform.
Sinh, cosh, and tanh were not structued with stubs, so this CL does
that. This set of routines was chosen as likely to produce good speedups
with assembly on any platform.

Technique used was minimax polynomial approximation using tables of
polynomial coefficients, with argument range reduction.
A table of scaling factors was also used for cosh and log10.

                     before       after      speedup
BenchmarkCos         22.1 ns/op   6.79 ns/op  3.25x
BenchmarkCosh       125   ns/op  11.7  ns/op 10.68x
BenchmarkLog10       48.4 ns/op  12.5  ns/op  3.87x
BenchmarkSin         22.2 ns/op   6.55 ns/op  3.39x
BenchmarkSinh       125   ns/op  14.2  ns/op  8.80x
BenchmarkTanh        65.0 ns/op  15.1  ns/op  4.30x

Accuracy was tested against a high precision
reference function to determine maximum error.
Approximately 4,000,000 points were tested for each function,
producing the following result.
Note: ulperr is error in "units in the last place"

       max
      ulperr
sin    1.43 (returns NaN beyond +-2^50)
cos    1.79 (returns NaN beyond +-2^50)
cosh   1.05
sinh   3.02
tanh   3.69
log10  1.75

Also includes a set of tests to test non-vector functions even
when SIMD is enabled

Change-Id: Icb45f14d00864ee19ed973d209c3af21e4df4edc
Reviewed-on: https://go-review.googlesource.com/32352
Run-TryBot: Michael Munday <munday@ca.ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Michael Munday <munday@ca.ibm.com>

2016-11-11 20:20:23 +00:00

3 Commits