qbit/go - go - Tape:neT

qbit/go

mirror of https://github.com/golang/go synced 2024-11-19 14:14:40 -07:00

Author	SHA1	Message	Date
Robert Griesemer	654a185f20	math/big: faster assembly kernels for AddVx/SubVx for 386. (analog to Change-Id: Ia473e9ab9c63a955c252426684176bca566645ae) Fixes #9243. benchmark old ns/op new ns/op delta BenchmarkAddVV_1 5.76 5.60 -2.78% BenchmarkAddVV_2 7.17 6.98 -2.65% BenchmarkAddVV_3 8.69 8.57 -1.38% BenchmarkAddVV_4 10.5 10.5 +0.00% BenchmarkAddVV_5 13.3 11.6 -12.78% BenchmarkAddVV_1e1 20.4 19.3 -5.39% BenchmarkAddVV_1e2 166 140 -15.66% BenchmarkAddVV_1e3 1588 1278 -19.52% BenchmarkAddVV_1e4 16138 12657 -21.57% BenchmarkAddVV_1e5 167608 127836 -23.73% BenchmarkAddVW_1 4.87 4.76 -2.26% BenchmarkAddVW_2 6.10 6.07 -0.49% BenchmarkAddVW_3 7.75 7.65 -1.29% BenchmarkAddVW_4 9.30 9.39 +0.97% BenchmarkAddVW_5 10.8 10.9 +0.93% BenchmarkAddVW_1e1 18.8 18.8 +0.00% BenchmarkAddVW_1e2 143 134 -6.29% BenchmarkAddVW_1e3 1390 1266 -8.92% BenchmarkAddVW_1e4 13877 12545 -9.60% BenchmarkAddVW_1e5 155330 125432 -19.25% benchmark old MB/s new MB/s speedup BenchmarkAddVV_1 5556.09 5715.12 1.03x BenchmarkAddVV_2 8926.55 9170.64 1.03x BenchmarkAddVV_3 11042.15 11201.77 1.01x BenchmarkAddVV_4 12168.21 12245.50 1.01x BenchmarkAddVV_5 12041.39 13805.73 1.15x BenchmarkAddVV_1e1 15659.65 16548.18 1.06x BenchmarkAddVV_1e2 19268.57 22728.64 1.18x BenchmarkAddVV_1e3 20141.45 25033.36 1.24x BenchmarkAddVV_1e4 19827.86 25281.92 1.28x BenchmarkAddVV_1e5 19092.06 25031.92 1.31x BenchmarkAddVW_1 822.12 840.92 1.02x BenchmarkAddVW_2 1310.89 1317.89 1.01x BenchmarkAddVW_3 1549.31 1568.26 1.01x BenchmarkAddVW_4 1720.45 1703.77 0.99x BenchmarkAddVW_5 1857.12 1828.66 0.98x BenchmarkAddVW_1e1 2126.39 2132.38 1.00x BenchmarkAddVW_1e2 2784.49 2969.21 1.07x BenchmarkAddVW_1e3 2876.89 3157.35 1.10x BenchmarkAddVW_1e4 2882.32 3188.51 1.11x BenchmarkAddVW_1e5 2575.16 3188.96 1.24x (measured on OS X 10.9.5, 2.3 GHz Intel Core i7, 8GB 1333 MHz DDR3) Change-Id: I46698729d5e0bc3e277aa0146a9d7a086c0c26f1 Reviewed-on: https://go-review.googlesource.com/2560 Reviewed-by: Keith Randall <khr@golang.org>	2015-01-08 20:58:59 +00:00
Robert Griesemer	067acd51b0	math/big: faster "pure Go" addition/subtraction for long vectors (platforms w/o corresponding assembly kernels) For short vector adds there's some erradic slow-down, but overall these routines have become significantly faster. This only matters for platforms w/o native (assembly) versions of these kernels, so we are not concerned about the minor slow-down for short vectors. This code was already reviewed under Mercurial (golang.org/cl/172810043) but wasn't submitted before the switch to git. Benchmarks run on 2.3GHz Intel Core i7, running OS X 10.9.5, with the respective AddVV and AddVW assembly routines disabled. benchmark old ns/op new ns/op delta BenchmarkAddVV_1 6.59 7.09 +7.59% BenchmarkAddVV_2 10.3 10.1 -1.94% BenchmarkAddVV_3 10.9 12.6 +15.60% BenchmarkAddVV_4 13.9 15.6 +12.23% BenchmarkAddVV_5 16.8 17.3 +2.98% BenchmarkAddVV_1e1 29.5 29.9 +1.36% BenchmarkAddVV_1e2 246 232 -5.69% BenchmarkAddVV_1e3 2374 2185 -7.96% BenchmarkAddVV_1e4 58942 22292 -62.18% BenchmarkAddVV_1e5 668622 225279 -66.31% BenchmarkAddVW_1 6.81 5.58 -18.06% BenchmarkAddVW_2 7.69 6.86 -10.79% BenchmarkAddVW_3 9.56 8.32 -12.97% BenchmarkAddVW_4 12.1 9.53 -21.24% BenchmarkAddVW_5 13.2 10.9 -17.42% BenchmarkAddVW_1e1 23.4 18.0 -23.08% BenchmarkAddVW_1e2 175 141 -19.43% BenchmarkAddVW_1e3 1568 1266 -19.26% BenchmarkAddVW_1e4 15425 12596 -18.34% BenchmarkAddVW_1e5 156737 133539 -14.80% BenchmarkFibo 381678466 132958666 -65.16% benchmark old MB/s new MB/s speedup BenchmarkAddVV_1 9715.25 9028.30 0.93x BenchmarkAddVV_2 12461.72 12622.60 1.01x BenchmarkAddVV_3 17549.64 15243.82 0.87x BenchmarkAddVV_4 18392.54 16398.29 0.89x BenchmarkAddVV_5 18995.23 18496.57 0.97x BenchmarkAddVV_1e1 21708.98 21438.28 0.99x BenchmarkAddVV_1e2 25956.53 27506.88 1.06x BenchmarkAddVV_1e3 26947.93 29286.66 1.09x BenchmarkAddVV_1e4 10857.96 28709.46 2.64x BenchmarkAddVV_1e5 9571.91 28409.21 2.97x BenchmarkAddVW_1 1175.28 1433.98 1.22x BenchmarkAddVW_2 2080.01 2332.54 1.12x BenchmarkAddVW_3 2509.28 2883.97 1.15x BenchmarkAddVW_4 2646.09 3356.83 1.27x BenchmarkAddVW_5 3020.69 3671.07 1.22x BenchmarkAddVW_1e1 3425.76 4441.40 1.30x BenchmarkAddVW_1e2 4553.17 5642.96 1.24x BenchmarkAddVW_1e3 5100.14 6318.72 1.24x BenchmarkAddVW_1e4 5186.15 6350.96 1.22x BenchmarkAddVW_1e5 5104.07 5990.74 1.17x Change-Id: I7a62023b1105248a0e85e5b9819d3fd4266123d4 Reviewed-on: https://go-review.googlesource.com/2480 Reviewed-by: Russ Cox <rsc@golang.org> Reviewed-by: Alan Donovan <adonovan@google.com>	2015-01-08 17:00:59 +00:00
Robert Griesemer	80b3ff9f82	math/big: faster assembly kernels for AddVx/SubVx for amd64. Replaced use of rotate instructions (RCRQ, RCLQ) with ADDQ/SBBQ for restoring/saving the carry flag per suggestion from Torbjörn Granlund (author of GMP bignum libs for C). The rotate instructions tend to be slower on todays machines. benchmark old ns/op new ns/op delta BenchmarkAddVV_1 5.69 5.51 -3.16% BenchmarkAddVV_2 7.15 6.87 -3.92% BenchmarkAddVV_3 8.69 8.06 -7.25% BenchmarkAddVV_4 8.10 8.13 +0.37% BenchmarkAddVV_5 8.37 8.47 +1.19% BenchmarkAddVV_1e1 13.1 12.0 -8.40% BenchmarkAddVV_1e2 78.1 69.4 -11.14% BenchmarkAddVV_1e3 815 656 -19.51% BenchmarkAddVV_1e4 8137 7345 -9.73% BenchmarkAddVV_1e5 100127 93909 -6.21% BenchmarkAddVW_1 4.86 4.71 -3.09% BenchmarkAddVW_2 5.67 5.50 -3.00% BenchmarkAddVW_3 6.51 6.34 -2.61% BenchmarkAddVW_4 6.69 6.66 -0.45% BenchmarkAddVW_5 7.20 7.21 +0.14% BenchmarkAddVW_1e1 10.0 9.34 -6.60% BenchmarkAddVW_1e2 45.4 52.3 +15.20% BenchmarkAddVW_1e3 417 491 +17.75% BenchmarkAddVW_1e4 4760 4852 +1.93% BenchmarkAddVW_1e5 69107 67717 -2.01% benchmark old MB/s new MB/s speedup BenchmarkAddVV_1 11241.82 11610.28 1.03x BenchmarkAddVV_2 17902.68 18631.82 1.04x BenchmarkAddVV_3 22082.43 23835.64 1.08x BenchmarkAddVV_4 31588.18 31492.06 1.00x BenchmarkAddVV_5 38229.90 37783.17 0.99x BenchmarkAddVV_1e1 48891.67 53340.91 1.09x BenchmarkAddVV_1e2 81940.61 92191.86 1.13x BenchmarkAddVV_1e3 78443.09 97480.44 1.24x BenchmarkAddVV_1e4 78644.18 87129.50 1.11x BenchmarkAddVV_1e5 63918.48 68150.84 1.07x BenchmarkAddVW_1 13165.09 13581.00 1.03x BenchmarkAddVW_2 22588.04 23275.41 1.03x BenchmarkAddVW_3 29483.82 30303.96 1.03x BenchmarkAddVW_4 38286.54 38453.21 1.00x BenchmarkAddVW_5 44414.57 44370.59 1.00x BenchmarkAddVW_1e1 63816.84 68494.08 1.07x BenchmarkAddVW_1e2 140885.41 122427.16 0.87x BenchmarkAddVW_1e3 153258.31 130325.28 0.85x BenchmarkAddVW_1e4 134447.63 131904.02 0.98x BenchmarkAddVW_1e5 92609.41 94509.88 1.02x Change-Id: Ia473e9ab9c63a955c252426684176bca566645ae Reviewed-on: https://go-review.googlesource.com/2503 Reviewed-by: Keith Randall <khr@golang.org>	2015-01-08 16:57:11 +00:00
Shenghou Ma	43178697db	math/big: panic if n <= 0 for ProbablyPrime Fixes #9509 Change-Id: I3b86745d38e09093fe2f4b918d774bd6608727d7 Reviewed-on: https://go-review.googlesource.com/2313 Reviewed-by: Robert Griesemer <gri@golang.org>	2015-01-05 23:11:35 +00:00
Fazlul Shahriar	e6f76aac32	math: be consistent in how we document special cases Change-Id: Ic6bc4af7bcc89b2881b2b9e7290aeb6fd54804e2 Reviewed-on: https://go-review.googlesource.com/2239 Reviewed-by: Ian Lance Taylor <iant@golang.org>	2015-01-05 21:01:46 +00:00
Guobiao Mei	59cb2d9ca6	math/rand: fix example_test to show with the correct method Originally it used r.Int63() to show "Uint32", and now we use the correct r.Uint32() method. Fixes #9429 Change-Id: I8a1228f1ca1af93b0e3104676fc99000257c456f Reviewed-on: https://go-review.googlesource.com/2069 Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>	2014-12-23 17:24:24 +00:00
Alberto Donizetti	5de497bc6f	math: Added parity check to ProbablyPrime Fixes #9269 Change-Id: I25751632e95978537b656aedfa5c35ab2273089b Reviewed-on: https://go-review.googlesource.com/1380 Reviewed-by: Robert Griesemer <gri@golang.org>	2014-12-12 00:25:16 +00:00
Russ Cox	09d92b6bbf	all: power64 is now ppc64 Fixes #8654. LGTM=austin R=austin CC=golang-codereviews https://golang.org/cl/180600043	2014-12-05 19:13:20 -05:00
Austin Clements	062e354c84	[dev.power64] runtime: power64 fixes and ports of changes Fix include paths that got moved in the great pkg/ rename. Add missing runtime/arch_* files for power64. Port changes that happened on default since branching to runtime/{asm,atomic,sys_linux}_power64x.s (precise stacks, calling convention change, various new and deleted functions. Port struct renaming and fix some bugs in runtime/defs_linux_power64.h. LGTM=rsc R=rsc, dave CC=golang-codereviews https://golang.org/cl/161450043	2014-10-27 17:27:03 -04:00
Austin Clements	f0bd539c59	[dev.power64] all: merge default into dev.power64 This brings dev.power64 up-to-date with the current tip of default. go_bootstrap is still panicking with a bad defer when initializing the runtime (even on amd64). LGTM=rsc R=rsc CC=golang-codereviews https://golang.org/cl/152570049	2014-10-22 15:51:54 -04:00
Austin Clements	2bd616b1a7	build: merge the great pkg/ rename into dev.power64 This also removes pkg/runtime/traceback_lr.c, which was ported to Go in an earlier commit and then moved to runtime/traceback.go. Reviewer: rsc@golang.org rsc: LGTM	2014-10-22 13:25:37 -04:00
Keith Randall	96d1e4ab59	math/big: Allow non-prime modulus for ModInverse The inverse is defined whenever the element and the modulus are relatively prime. The code already handles this situation, but the spec does not. Test that it does indeed work. Fixes #8875 LGTM=agl R=agl CC=golang-codereviews https://golang.org/cl/155010043	2014-10-14 14:09:56 -07:00
Casey Marshall	7371153321	math/big: Fixes issue 8920 (*Rat).SetString checks for denominator. LGTM=gri R=golang-codereviews, gri CC=golang-codereviews https://golang.org/cl/159760043	2014-10-13 12:41:14 -07:00
Robert Griesemer	87f51f1031	math/big: fix doc comments Fixes #8904. TBR=iant R=iant CC=golang-codereviews https://golang.org/cl/148650043	2014-10-07 10:56:58 -07:00
Robert Griesemer	28ddfb090c	math/big: math.Exp should return result >= 0 for \|m\| > 0 The documentation states that Exp(x, y, m) computes x**y mod \|m\| for m != nil && m > 0. In math.big, Mod is the Euclidean modulus, which is always >= 0. Fixes #8822. LGTM=agl, r, rsc R=agl, r, rsc CC=golang-codereviews https://golang.org/cl/145650043	2014-10-02 13:02:25 -07:00
Russ Cox	4a8cb4a49c	math: avoid assumption of denormalized math mode in Sincos The extra-clever code in Sincos is trying to do if v&2 == 0 { mask = 0xffffffffffffffff } else { mask = 0 } It does this by turning v&2 into a float64 X0 and then using MOVSD $0.0, X3 CMPSD X0, X3, 0 That CMPSD is defined to behave like: if X0 == X3 { X3 = 0xffffffffffffffff } else { X3 = 0 } which gives the desired mask in X3. The goal in using the CMPSD was to avoid a conditional branch. This code fails when called from a PortAudio callback. In particular, the failure behavior is exactly as if the CMPSD always chose the 'true' execution. Notice that the comparison X0 == X3 is comparing as floating point values the 64-bit pattern v&2 and the actual floating point value zero. The only possible values for v&2 are 0x0000000000000000 (floating point zero) and 0x0000000000000002 (floating point 1e-323, a denormal). If they are both comparing equal to zero, I conclude that in a PortAudio callback (whatever that means), the processor is running in "denormals are zero" mode. I confirmed this by placing the processor into that mode and running the test case in the bug; it produces the incorrect output reported in the bug. In general, if a Go program changes the floating point math modes to something other than what Go expects, the math library is not going to work exactly as intended, so we might be justified in not fixing this at all. However, it seems reasonable that the client code might have expected "denormals are zero" mode to only affect actual processing of denormals. This code has produced what is in effect a gratuitous denormal by being extra clever. There is nothing about the computation being requested that fundamentally requires a denormal. It is also easy to do this computation in integer math instead: mask = ((v&2)>>1)-1 Do that. For the record, the other math tests that fail if you put the processor in "denormals are zero" mode are the tests for Frexp, Ilogb, Ldexp, Logb, Log2, and FloatMinMax, but all fail processing denormal inputs. Sincos was the only function for which that mode causes incorrect behavior on non-denormal inputs. The existing tests check that the new assembly is correct. There is no test for behavior in "denormals are zero" mode, because I don't want to add assembly to change that. Fixes #8623. LGTM=josharian R=golang-codereviews, josharian CC=golang-codereviews, iant, r https://golang.org/cl/151750043	2014-09-26 17:13:24 -04:00
Russ Cox	c007ce824d	build: move package sources from src/pkg to src Preparation was in CL 134570043. This CL contains only the effect of 'hg mv src/pkg/* src'. For more about the move, see golang.org/s/go14nopkg.	2014-09-08 00:08:51 -04:00

17 Commits