1
0
mirror of https://github.com/golang/go synced 2024-09-29 09:34:28 -06:00
Commit Graph

35645 Commits

Author SHA1 Message Date
isharipo
b80b4a23d1 cmd/internal/obj/x86: add missing legacy insts
Minimizes the amount of "TODO" stuff in test suite
of cmd/asm/internal/asm/testdata/amd64enc.s.

Some instructions were already implemented, but
test cases for them were commented-out.

Does not enable MMX instructions, calls/jumps and some
segment registers instructions.

-- Affected instructions --
BLENDVPD, BLENDVPS
BSWAPW
CBW
CDQE
CLAC
CLFLUSHOPT
CMPXCHG16B
CRC32B, CRC32L, CRC32W
CWDE
FBLD
FBSTP
FCMOVB
FCMOVBE
FCMOVE
FCMOVNB
FCMOVNBE
FCMOVU
FCOMI
FCOMIP
IMUL3L, IMUL3Q, IMUL3W
ICEBP, INT
INVPCID
LARQ
LGDT, LIDT, LLDT
LMSW
LTR
LZCNTL, LZCNTQ, LZCNTW
MONITOR
MOVBELL, MOVBEQQ, MOVBEWW
MOVBQZX
MOVQ
MOVSWW, MOVZWW
MWAIT
NOPL, NOPW
PBLENDVB
PEXTRW
RDPKRU
RDRANDL, RDRANDQ, RDRANDW
RDSEEDL, RDSEEDQ, RDSEEDW
RDTSCP
SAHF
SGDT, SIDT
SLDTL, SLDTQ, SLDTW
SMSWL, SMSWQ, SMSWW
STAC
STRL, STRQ, STRW
SYSENTER, SYSENTER64
SYSEXIT, SYSEXIT64
SHA256RNDS2
TZCNTL, TZCNTQ, TZCNTW
UD1, UD2
WRPKRU
XRSTOR, XRSTOR64
XRSTORS, XRSTORS64
XSAVE, XSAVE64
XSAVEC, XSAVEC64
XSAVEOPT, XSAVEOPT64
XSAVES, XSAVES64
XSETBV

Fixes #6739

Change-Id: I8b125d9a5ea39bb4b9da7e66a63a16f609cef376
Reviewed-on: https://go-review.googlesource.com/97235
Run-TryBot: Iskander Sharipov <iskander.sharipov@intel.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2018-02-27 21:55:25 +00:00
Daniel Martí
c55505bae2 cmd/vet: type conversions never have side effects
Make the hasSideEffects func use type information to see if a CallExpr
is a type conversion or not. In case it is, there cannot be any side
effects.

Now that vet always has type information, we can afford to use it here.
Update the tests and remove the TODO there too.

Change-Id: I74fdacf830aedf2371e67ba833802c414178caf1
Reviewed-on: https://go-review.googlesource.com/79536
Reviewed-by: Robert Griesemer <gri@golang.org>
2018-02-27 21:48:10 +00:00
Ilya Tocar
c2ccc48165 cmd/compile/internal/ssa: refactor zeroUpper32Bits
Explicitly whitelist args of OpSelect{1|2} that zero upper 32 bits.
Use better values in corresponding test.
This should have been a part of  CL 96815, but it was submitted, before
relevant comments.

Change-Id: Ic85d90a4471a17f6d64f8f5c405f21378bf3a30d
Reviewed-on: https://go-review.googlesource.com/97295
Run-TryBot: Ilya Tocar <ilya.tocar@intel.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2018-02-27 20:38:32 +00:00
Darshan Parajuli
ccaa2bc5c0 fmt: change some unexported method names to camel case
Change-Id: I12f96a9397cfaebe11e616543d804bd62cd72da0
Reviewed-on: https://go-review.googlesource.com/97379
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2018-02-27 20:12:04 +00:00
Josh Bleecher Snyder
15b0d1376a cmd/compile: clean up comments
Follow-up to CL 94256.

Change-Id: I61c450dee5975492192453738f734f772e95c1a5
Reviewed-on: https://go-review.googlesource.com/97515
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2018-02-27 20:06:22 +00:00
ChrisALiles
4f5389c321 cmd/compile: move the SSA local type definitions to a single location
Fixes #20304

Change-Id: I52ee02d1602ed7fffc96b27fd60990203c771aaf
Reviewed-on: https://go-review.googlesource.com/94256
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2018-02-27 19:40:36 +00:00
Ilya Tocar
0f2ef0ad44 cmd/compile/internal/ssa: combine byte stores on amd64
On amd64 we optimize  encoding/binary.BigEndian.PutUint{16,32,64}
into bswap + single store, but strangely enough not LittleEndian.PutUint{16,32}.
We have similar rules, but they use 64-bit shifts everywhere,
and fail for 16/32-bit case. Add rules that matchLittleEndian.PutUint,
and relevant tests. Performance results:

LittleEndianPutUint16-6    1.43ns ± 0%    1.07ns ± 0%   -25.17%  (p=0.000 n=9+9)
LittleEndianPutUint32-6    2.14ns ± 0%    0.94ns ± 0%   -56.07%  (p=0.019 n=6+8)

LittleEndianPutUint16-6  1.40GB/s ± 0%  1.87GB/s ± 0%   +33.24%  (p=0.000 n=9+9)
LittleEndianPutUint32-6  1.87GB/s ± 0%  4.26GB/s ± 0%  +128.54%  (p=0.000 n=8+8)

Discovered, while looking at ethereum_ethash from community benchmarks

Change-Id: Id86d5443687ecddd2803edf3203dbdd1246f61fe
Reviewed-on: https://go-review.googlesource.com/95475
Run-TryBot: Ilya Tocar <ilya.tocar@intel.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2018-02-27 19:38:50 +00:00
Matthew Dempsky
d7cd61ceaa cmd/compile: fix inlining of constant if statements
We accidentally overlooked needing to still visit Ninit for OIF
statements with constant conditions in golang.org/cl/96778.

Fixes #24120.

Change-Id: I5b341913065ff90e1163fb872b9e8d47e2a789d2
Reviewed-on: https://go-review.googlesource.com/97475
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2018-02-27 19:27:32 +00:00
Heschi Kreinick
230b0bad1f cmd/link: fix up location lists for dsymutil
LLVM tools, particularly lldb and dsymutil, don't support base address
selection entries in location lists. When targeting GOOS=darwin,
mode, have the linker translate location lists to CU-relative form
instead.

Technically, this isn't necessary when linking internally, as long as
nobody plans to use anything other than Delve to look at the DWARF. But
someone might want to use lldb, and it's really confusing when dwarfdump
shows gibberish for the location entries. The performance cost isn't
noticeable, so enable it even for internal linking.

Doing this in the linker is a little weird, but it was more expensive in
the compiler, probably because the compiler is much more stressful to
the GC. Also, if we decide to only do it for external linking, the
compiler can't see the link mode.

Benchmark before and after this commit on Mac with -dwarflocationlists=1:

name        old time/op       new time/op       delta
StdCmd            21.3s ± 1%        21.3s ± 1%    ~     (p=0.310 n=27+27)

Only StdCmd is relevant, because only StdCmd runs the linker. Whatever
the cost is here, it's not very large.

Change-Id: I200246dedaee4f824966f7551ac95f8d7123d3b1
Reviewed-on: https://go-review.googlesource.com/89535
Reviewed-by: David Chase <drchase@google.com>
2018-02-27 18:55:23 +00:00
Tobias Klauser
5b21bf6f81 runtime: simplify walltime/nanotime on linux/{386,amd64}
Avoid an unnecessary MOVL/MOVQ.

Follow CL 97377

Change-Id: Ic43976d6b0cece3ed455496d18aedd67e0337d3f
Reviewed-on: https://go-review.googlesource.com/97358
Run-TryBot: Tobias Klauser <tobias.klauser@gmail.com>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2018-02-27 18:42:41 +00:00
Tobias Klauser
2013ad897d os, syscall: use pipe2 instead of pipe syscall on OpenBSD
The pipe2 syscall is part of OpenBSD since version 5.7 and thus exists in
all officially supported versions.

Follows CL 38426 and CL 94035

Change-Id: I8f93ecbc89664241f1b6b0d069e948776941b1d0
Reviewed-on: https://go-review.googlesource.com/97356
Run-TryBot: Tobias Klauser <tobias.klauser@gmail.com>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2018-02-27 18:37:36 +00:00
Philip Hofer
81786649c5 cmd/compile/internal/ssa: clear branch likeliness in clobberBlock
The branchelim pass makes some blocks unreachable, but does not
remove them from Func.Values. Consequently, ssacheck complains
when it finds a block with a non-zero likeliness value but no
successors.

Fixes #24014

Change-Id: I2dcf1d8f4e769a2f363508dab3b11198ead336b6
Reviewed-on: https://go-review.googlesource.com/96075
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
Run-TryBot: Philip Hofer <phofer@umich.edu>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2018-02-27 18:14:05 +00:00
Josh Bleecher Snyder
c5d6c42d35 runtime: improve 386/amd64 systemstack
Minor improvements, noticed while investigating other things.

Shorten the prologue.

Make branch direction better for static branch prediction;
the most common case by far is switching stacks (g==curg).

Change-Id: Ib2211d3efecb60446355cda56194221ccb78057d
Reviewed-on: https://go-review.googlesource.com/97377
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2018-02-27 18:10:38 +00:00
Joe Tsai
f399af3114 go/doc: replace unexported values with underscore if necessary
When a var or const declaration contains a mixture of exported and unexported
identifiers, replace the unexported identifiers with underscore.
Otherwise, the LHS and the RHS may mismatch or the declaration may mismatch
with an iota from above.

Fixes #22426

Change-Id: Icd5fb81b4ece647232a9f7d05cb140227091e9cb
Reviewed-on: https://go-review.googlesource.com/94877
Run-TryBot: Joe Tsai <thebrokentoaster@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Robert Griesemer <gri@golang.org>
2018-02-27 16:29:17 +00:00
erifan01
ed6c6c9c11 math: optimize sinh and cosh
Improve performance by reducing unnecessary function calls

Benchmarks:

Tme    old time/op  new time/op  delta
Cosh-8   229ns ± 0%   138ns ± 0%  -39.74%  (p=0.008 n=5+5)
Sinh-8   231ns ± 0%   139ns ± 0%  -39.83%  (p=0.008 n=5+5)

Change-Id: Icab5485849bbfaafca8429d06b67c558101f4f3c
Reviewed-on: https://go-review.googlesource.com/85477
Reviewed-by: Robert Griesemer <gri@golang.org>
2018-02-27 04:34:37 +00:00
Josh Bleecher Snyder
486caa26d7 runtime: short-circuit typedmemmove when dst==src
Change-Id: I855268a4c0d07ad602ec90f5da66422d3d87c5f2
Reviewed-on: https://go-review.googlesource.com/94595
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
Reviewed-by: Keith Randall <khr@golang.org>
2018-02-27 00:56:18 +00:00
Giovanni Bajo
68def82008 cmd/compile: fix bit-test rules for highest bit
Bit-test rules failed to match when matching the highest bit
of a word because operands in SSA are signed int64. Fix
them by treating them as unsigned (and correctly handling
32-bit operands as well).

Tests will be added in next CL.

Change-Id: I491c4e88e7e2f87e9bb72bd0d9fa5d4025b90736
Reviewed-on: https://go-review.googlesource.com/94765
Reviewed-by: Keith Randall <khr@golang.org>
2018-02-27 00:51:40 +00:00
Giovanni Bajo
098208a0d9 cmd/compile: fold bit masking on bits that have been shifted away
Spotted while working on #18943, it triggers once during bootstrap.

Change-Id: Ia4330ccc6395627c233a8eb4dcc0e3e2a770bea7
Reviewed-on: https://go-review.googlesource.com/94764
Reviewed-by: Keith Randall <khr@golang.org>
2018-02-27 00:51:19 +00:00
Chad Rosier
ecd9e8a2fe cmd/compile/internal/ssa: combine zero stores into larger stores on arm64
This reduces the go tool binary on arm64 by 12k.

go1 results on Amberwing:
name                   old time/op    new time/op    delta
RegexpMatchEasy0_32       249ns ± 0%     249ns ± 0%    ~     (p=0.087 n=10+10)
RegexpMatchEasy0_1K       584ns ± 0%     584ns ± 0%    ~     (all equal)
RegexpMatchEasy1_32       246ns ± 0%     246ns ± 0%    ~     (p=1.000 n=10+10)
RegexpMatchEasy1_1K       806ns ± 0%     806ns ± 0%    ~     (p=0.706 n=10+9)
RegexpMatchMedium_32      314ns ± 0%     314ns ± 0%    ~     (all equal)
RegexpMatchMedium_1K     52.1µs ± 0%    52.1µs ± 0%    ~     (p=0.245 n=10+8)
RegexpMatchHard_32       2.75µs ± 1%    2.75µs ± 1%    ~     (p=0.690 n=10+10)
RegexpMatchHard_1K       78.9µs ± 0%    78.9µs ± 1%    ~     (p=0.295 n=9+9)
FmtFprintfEmpty          58.5ns ± 0%    58.5ns ± 0%    ~     (all equal)
FmtFprintfString          112ns ± 0%     112ns ± 0%    ~     (all equal)
FmtFprintfInt             117ns ± 0%     116ns ± 0%  -0.85%  (p=0.000 n=10+10)
FmtFprintfIntInt          181ns ± 0%     181ns ± 0%    ~     (all equal)
FmtFprintfPrefixedInt     222ns ± 0%     224ns ± 0%  +0.90%  (p=0.000 n=9+10)
FmtFprintfFloat           318ns ± 1%     322ns ± 0%    ~     (p=0.059 n=10+8)
FmtManyArgs               736ns ± 1%     735ns ± 0%    ~     (p=0.206 n=9+9)
Gzip                      437ms ± 0%     436ms ± 0%  -0.25%  (p=0.000 n=10+10)
HTTPClientServer         89.8µs ± 1%    90.2µs ± 2%    ~     (p=0.393 n=10+10)
JSONEncode               20.1ms ± 1%    20.2ms ± 1%    ~     (p=0.065 n=9+10)
JSONDecode               94.2ms ± 1%    93.9ms ± 1%  -0.42%  (p=0.043 n=10+10)
GobDecode                12.7ms ± 1%    12.8ms ± 2%  +0.94%  (p=0.019 n=10+10)
GobEncode                12.1ms ± 0%    12.1ms ± 0%    ~     (p=0.052 n=10+10)
Mandelbrot200            5.06ms ± 0%    5.05ms ± 0%  -0.04%  (p=0.000 n=9+10)
TimeParse                 450ns ± 3%     446ns ± 0%    ~     (p=0.238 n=10+9)
TimeFormat                485ns ± 1%     483ns ± 1%    ~     (p=0.073 n=10+10)
Template                 90.4ms ± 0%    90.7ms ± 0%  +0.29%  (p=0.000 n=8+10)
GoParse                  6.01ms ± 0%    6.03ms ± 0%  +0.35%  (p=0.000 n=10+10)
BinaryTree17              11.7s ± 0%     11.7s ± 0%    ~     (p=0.481 n=10+10)
Revcomp                   669ms ± 0%     669ms ± 0%    ~     (p=0.315 n=10+10)
Fannkuch11                3.40s ± 0%     3.37s ± 0%  -0.92%  (p=0.000 n=10+10)
[Geo mean]               67.9µs         67.9µs       +0.02%

name                   old speed      new speed      delta
RegexpMatchEasy0_32     128MB/s ± 0%   128MB/s ± 0%  -0.08%  (p=0.003 n=8+10)
RegexpMatchEasy0_1K    1.75GB/s ± 0%  1.75GB/s ± 0%    ~     (p=0.642 n=8+10)
RegexpMatchEasy1_32     130MB/s ± 0%   130MB/s ± 0%    ~     (p=0.690 n=10+9)
RegexpMatchEasy1_1K    1.27GB/s ± 0%  1.27GB/s ± 0%    ~     (p=0.661 n=10+9)
RegexpMatchMedium_32   3.18MB/s ± 0%  3.18MB/s ± 0%    ~     (all equal)
RegexpMatchMedium_1K   19.7MB/s ± 0%  19.6MB/s ± 0%    ~     (p=0.190 n=10+9)
RegexpMatchHard_32     11.6MB/s ± 0%  11.6MB/s ± 1%    ~     (p=0.669 n=10+10)
RegexpMatchHard_1K     13.0MB/s ± 0%  13.0MB/s ± 0%    ~     (p=0.718 n=9+9)
Gzip                   44.4MB/s ± 0%  44.5MB/s ± 0%  +0.24%  (p=0.000 n=10+10)
JSONEncode             96.5MB/s ± 1%  96.1MB/s ± 1%    ~     (p=0.065 n=9+10)
JSONDecode             20.6MB/s ± 1%  20.7MB/s ± 1%  +0.42%  (p=0.041 n=10+10)
GobDecode              60.6MB/s ± 1%  60.0MB/s ± 2%  -0.92%  (p=0.016 n=10+10)
GobEncode              63.4MB/s ± 0%  63.6MB/s ± 0%    ~     (p=0.055 n=10+10)
Template               21.5MB/s ± 0%  21.4MB/s ± 0%  -0.30%  (p=0.000 n=9+10)
GoParse                9.64MB/s ± 0%  9.61MB/s ± 0%  -0.36%  (p=0.000 n=10+10)
Revcomp                 380MB/s ± 0%   380MB/s ± 0%    ~     (p=0.323 n=10+10)
[Geo mean]             56.0MB/s       55.9MB/s       -0.07%

Change-Id: Ia732fa57fbcf4767d72382516d9f16705d177736
Reviewed-on: https://go-review.googlesource.com/96435
Run-TryBot: Cherry Zhang <cherryyz@google.com>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2018-02-27 00:07:25 +00:00
Josh Bleecher Snyder
3a9e4440fd cmd/compile: tighten after lowering
Moving tighten after lowering benefits from the removal of values by
lowering and lowered CSE. It lets us make better decisions about
which values are rematerializable and which generate flags.
Empirically, it lowers stack usage (by avoiding spills)
and generates slightly smaller and faster binaries.


Fixes #19853
Fixes #21041

name        old time/op       new time/op       delta
Template          195ms ± 4%        193ms ± 4%  -1.33%  (p=0.000 n=92+97)
Unicode          94.1ms ± 9%       92.5ms ± 8%  -1.66%  (p=0.002 n=97+95)
GoTypes           572ms ± 5%        566ms ± 7%  -0.92%  (p=0.001 n=95+98)
Compiler          2.56s ± 4%        2.52s ± 3%  -1.41%  (p=0.000 n=94+97)
SSA               6.52s ± 2%        6.47s ± 3%  -0.82%  (p=0.000 n=96+94)
Flate             117ms ± 5%        116ms ± 7%  -0.72%  (p=0.018 n=97+97)
GoParser          148ms ± 6%        146ms ± 4%  -0.97%  (p=0.002 n=98+95)
Reflect           370ms ± 7%        363ms ± 6%  -1.79%  (p=0.000 n=99+98)
Tar               175ms ± 6%        173ms ± 6%  -1.11%  (p=0.001 n=94+95)
XML               204ms ± 6%        201ms ± 5%  -1.49%  (p=0.000 n=97+96)
[Geo mean]        363ms             359ms       -1.22%

name        old user-time/op  new user-time/op  delta
Template          251ms ± 5%        245ms ± 5%  -2.40%  (p=0.000 n=97+93)
Unicode           131ms ±10%        128ms ± 9%  -1.93%  (p=0.001 n=100+99)
GoTypes           760ms ± 4%        752ms ± 4%  -0.96%  (p=0.000 n=97+95)
Compiler          3.51s ± 3%        3.48s ± 2%  -1.04%  (p=0.000 n=96+95)
SSA               9.57s ± 4%        9.52s ± 2%  -0.50%  (p=0.004 n=97+96)
Flate             149ms ± 6%        147ms ± 6%  -1.46%  (p=0.000 n=98+96)
GoParser          184ms ± 5%        181ms ± 7%  -1.84%  (p=0.000 n=98+97)
Reflect           469ms ± 6%        461ms ± 6%  -1.69%  (p=0.000 n=100+98)
Tar               219ms ± 8%        217ms ± 7%  -0.90%  (p=0.035 n=96+96)
XML               255ms ± 5%        251ms ± 6%  -1.48%  (p=0.000 n=98+98)
[Geo mean]        476ms             469ms       -1.42%

name        old alloc/op      new alloc/op      delta
Template         37.8MB ± 0%       37.8MB ± 0%  -0.17%  (p=0.000 n=100+100)
Unicode          28.8MB ± 0%       28.8MB ± 0%  -0.02%  (p=0.000 n=100+95)
GoTypes           112MB ± 0%        112MB ± 0%  -0.20%  (p=0.000 n=100+97)
Compiler          466MB ± 0%        464MB ± 0%  -0.27%  (p=0.000 n=100+100)
SSA              1.49GB ± 0%       1.49GB ± 0%  -0.08%  (p=0.000 n=100+99)
Flate            24.4MB ± 0%       24.3MB ± 0%  -0.25%  (p=0.000 n=98+99)
GoParser         30.7MB ± 0%       30.6MB ± 0%  -0.26%  (p=0.000 n=99+100)
Reflect          76.4MB ± 0%       76.4MB ± 0%    ~     (p=0.253 n=100+100)
Tar              38.9MB ± 0%       38.8MB ± 0%  -0.20%  (p=0.000 n=100+97)
XML              41.5MB ± 0%       41.4MB ± 0%  -0.19%  (p=0.000 n=100+98)
[Geo mean]       77.5MB            77.4MB       -0.16%

name        old allocs/op     new allocs/op     delta
Template           381k ± 0%         381k ± 0%  -0.15%  (p=0.000 n=100+100)
Unicode            342k ± 0%         342k ± 0%  -0.01%  (p=0.000 n=100+98)
GoTypes           1.19M ± 0%        1.18M ± 0%  -0.24%  (p=0.000 n=100+100)
Compiler          4.52M ± 0%        4.50M ± 0%  -0.29%  (p=0.000 n=100+100)
SSA               12.3M ± 0%        12.3M ± 0%  -0.11%  (p=0.000 n=100+100)
Flate              234k ± 0%         234k ± 0%  -0.26%  (p=0.000 n=99+96)
GoParser           318k ± 0%         317k ± 0%  -0.21%  (p=0.000 n=99+100)
Reflect            974k ± 0%         974k ± 0%  -0.03%  (p=0.000 n=100+100)
Tar                392k ± 0%         391k ± 0%  -0.17%  (p=0.000 n=100+99)
XML                404k ± 0%         403k ± 0%  -0.24%  (p=0.000 n=99+99)
[Geo mean]         794k              792k       -0.17%

name        old object-bytes  new object-bytes  delta
Template          393kB ± 0%        392kB ± 0%  -0.19%  (p=0.008 n=5+5)
Unicode           207kB ± 0%        207kB ± 0%    ~     (all equal)
GoTypes          1.23MB ± 0%       1.22MB ± 0%  -0.11%  (p=0.008 n=5+5)
Compiler         4.34MB ± 0%       4.33MB ± 0%  -0.15%  (p=0.008 n=5+5)
SSA              9.85MB ± 0%       9.85MB ± 0%  -0.07%  (p=0.008 n=5+5)
Flate             235kB ± 0%        234kB ± 0%  -0.59%  (p=0.008 n=5+5)
GoParser          297kB ± 0%        296kB ± 0%  -0.22%  (p=0.008 n=5+5)
Reflect          1.03MB ± 0%       1.03MB ± 0%  -0.00%  (p=0.008 n=5+5)
Tar               332kB ± 0%        331kB ± 0%  -0.15%  (p=0.008 n=5+5)
XML               413kB ± 0%        412kB ± 0%  -0.19%  (p=0.008 n=5+5)
[Geo mean]        728kB             727kB       -0.17%

Change-Id: I9b5cdb668ed102a001897a05e833105acba220a2
Reviewed-on: https://go-review.googlesource.com/95995
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2018-02-27 00:03:24 +00:00
Keith Randall
4b00d3f4a2 cmd/compile: implement comparisons directly with memory
Allow the compiler to generate code like CMPQ 16(AX), $7

It's tricky because it's difficult to spill such a comparison during
flagalloc, because the same memory state might not be available at
the restore locations.

Solve this problem by decomposing the compare+load back into its parts
if it needs to be spilled.

The big win is that the write barrier test goes from:

MOVL	runtime.writeBarrier(SB), CX
TESTL	CX, CX
JNE	60

to

CMPL	runtime.writeBarrier(SB), $0
JNE	59

It's one instruction and one byte smaller.

Fixes #19485
Fixes #15245
Update #22460

Binaries are about 0.15% smaller.

Change-Id: I4fd8d1111b6b9924d52f9a0901ca1b2e5cce0836
Reviewed-on: https://go-review.googlesource.com/86035
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Reviewed-by: Ilya Tocar <ilya.tocar@intel.com>
2018-02-26 23:49:44 +00:00
Kunpei Sakai
30673769ed cmd/compile: fix typechecking in finishcompare
Previously, finishcompare just used SetTypecheck, but this didn't
recursively update any untyped bool typed subexpressions. This CL
changes it to call typecheck, which correctly handles this.

Also cleaned up outdated code for simplifying logic.

Updates #23834

Change-Id: Ic7f92d2a77c2eb74024ee97815205371761c1c90
Reviewed-on: https://go-review.googlesource.com/97035
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2018-02-26 22:10:51 +00:00
Kunpei Sakai
0c471dfae2 cmd: avoid unnecessary type conversions
CL generated mechanically with github.com/mdempsky/unconvert.

Also updated cmd/compile/internal/ssa/gen/*.rules manually.

Change-Id: If721ef73cf0771ae83ce7e2d11623fc8d9155768
Reviewed-on: https://go-review.googlesource.com/97075
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2018-02-26 20:22:06 +00:00
Ilya Tocar
f4d9c30901 cmd/compile/internal/amd64: use appropriate NEG for div
Currently we generate NEGQ for DIV{Q,L,W}. By generating NEGL and NEGW,
we will reduce code size, because NEGL doesn't require rex prefix.
This also guarantees that upper 32 bits are zeroed, so we can revert CL 85736,
and remove zero-extensions of DIVL results.
Also adds test for redundant zero extend elimination.

Fixes #23310

Change-Id: Ic58c3104c255a71371a06e09d10a975bbe5df587
Reviewed-on: https://go-review.googlesource.com/96815
Run-TryBot: Ilya Tocar <ilya.tocar@intel.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2018-02-26 20:09:21 +00:00
Yury Smolsky
3b7ad1680f archive/zip: improve Writer.Create documentation on how to add directories
FileHeader.Name also reflects this fact.

Fixes #24018

Change-Id: Id0860a9b23c264ac4c6ddd65ba20e0f1f36e4865
Reviewed-on: https://go-review.googlesource.com/97057
Reviewed-by: Joe Tsai <thebrokentoaster@gmail.com>
2018-02-26 19:58:48 +00:00
motemen
fd5bf0393e cmd/go: fix formatting of file paths under cwd
The output of go with -x flag is formatted in a manner that file paths
under current directory are modified to start with a dot (.), but when
the directory path ends with a slash (/), the formatting goes wrong.

Fixes #23982

Change-Id: I8f8d15dd52bee882a9c6357eb9eabdc3eaa887c3
GitHub-Last-Rev: 1493f38baf
GitHub-Pull-Request: golang/go#23985
Reviewed-on: https://go-review.googlesource.com/95755
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2018-02-26 19:42:47 +00:00
Hana Kim
a5c987fcbb cmd/trace: trace error check and more logging in annotations test
This is for debugging the reported flaky tests.

Update #24081

Change-Id: Ica046928f675d69e38251a47a6f225efedce920c
Reviewed-on: https://go-review.googlesource.com/96855
Run-TryBot: Hyang-Ah Hana Kim <hyangah@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Heschi Kreinick <heschi@google.com>
2018-02-26 19:18:01 +00:00
Robert Griesemer
99843e22e8 go/types: type-check embedded methods in correct scope (regression)
Change https://go-review.googlesource.com/79575 fixed the computation
of recursive method sets by separating the method set computation from
type computation. However, it didn't track an embedded method's scope
and as a result, some methods' signatures were typed in the wrong
context.

This change tracks embedded methods together with their scope and
uses that scope for the correct context setup when typing those
method signatures.

Fixes #23914.

Change-Id: If3677dceddb43e9db2f9fb3c7a4a87d2531fbc2a
Reviewed-on: https://go-review.googlesource.com/96376
Reviewed-by: Alan Donovan <adonovan@google.com>
2018-02-26 18:46:26 +00:00
Ian Lance Taylor
85caeafb8c unsafe: fix reference to string header
Fixes #24115

Change-Id: I89d3d5a9c0916fd2e21fe5930549c4129de8ab48
Reviewed-on: https://go-review.googlesource.com/96983
Reviewed-by: Giovanni Bajo <rasky@develer.com>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2018-02-26 18:35:46 +00:00
Rens Rikkerink
cbfda7f892 cmd/cgo: clarify implicit "cgo" build constraint
When using the special import "C", the "cgo" build constraint is implied for the go file,
potentially triggering unclear "undefined" error messages.
Explicitly explain this in the documentation.

Updates #24068

Change-Id: Ib656ceccd52c749ffe7fb2d3db9ac144f17abb32
GitHub-Last-Rev: 5a13f00a9b
GitHub-Pull-Request: golang/go#24072
Reviewed-on: https://go-review.googlesource.com/96655
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2018-02-26 18:32:38 +00:00
Robert Griesemer
515fa58ac9 cmd/compile: track line directives w/ column information
Extend cmd/internal/src.PosBase to track column information,
and adjust the meaning of the PosBase position to mean the
position at which the PosBase's relative (line, col) position
starts (rather than indicating the position of the //line
directive). Because this semantic change is made in the
compiler's noder, it doesn't affect the logic of src.PosBase,
only its test setup (where PosBases are constructed with
corrected incomming positions). In short, src.PosBase now
matches syntax.PosBase with respect to the semantics of
src.PosBase.pos.

For #22662.

Change-Id: I5b1451cb88fff3f149920c2eec08b6167955ce27
Reviewed-on: https://go-review.googlesource.com/96535
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2018-02-26 18:32:03 +00:00
Robert Griesemer
6fa6bde924 cmd/compile/internal/syntax: implement //line :line:col handling
For line directives which have a line and a column number,
an omitted filename means that the filename has not changed
(per the issue below).

For line directives w/o a column number, an omitted filename
means the empty filename (to preserve the existing behavior).

For #22662.

Change-Id: I32cd9037550485da5445a34bb104706eccce1df1
Reviewed-on: https://go-review.googlesource.com/96476
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2018-02-26 18:27:44 +00:00
Robert Griesemer
5c08b9e8bd cmd/compile/internal/syntax: remove dependency on cmd/internal/src
For dependency reasons, the data structure implementing source
positions in the compiler is in cmd/internal/src. It contains
highly compiler specific details (e.g. inlining index).

This change introduces a parallel but simpler position
representation, defined in the syntax package, which removes
that package's dependency on cmd/internal/src, and also removes
the need to deal with certain filename-specific operations
(defined by the needs of the compiler) in the syntax package.
As a result, the syntax package becomes again a compiler-
independent, stand-alone package that at some point might
replace (or augment) the existing top-level go/* syntax-related
packages.

Additionally, line directives that update column numbers
are now correctly tracked through the syntax package, with
additional tests added. (The respective changes also need to
be made in cmd/internal/src; i.e., the compiler accepts but
still ignores column numbers in line directives.)

This change comes at the cost of a new position translation
step, but that step is cheap because it only needs to do real
work if the position base changed (i.e., if there is a new file,
or new line directive).

There is no noticeable impact on overall compiler performance
measured with `compilebench -count 5 -alloc`:

name       old time/op       new time/op       delta
Template         220ms ± 8%        228ms ±18%    ~     (p=0.548 n=5+5)
Unicode          119ms ±11%        113ms ± 5%    ~     (p=0.056 n=5+5)
GoTypes          684ms ± 6%        677ms ± 3%    ~     (p=0.841 n=5+5)
Compiler         3.19s ± 7%        3.01s ± 1%    ~     (p=0.095 n=5+5)
SSA              7.92s ± 8%        7.79s ± 1%    ~     (p=0.690 n=5+5)
Flate            141ms ± 7%        139ms ± 4%    ~     (p=0.548 n=5+5)
GoParser         173ms ±12%        171ms ± 4%    ~     (p=1.000 n=5+5)
Reflect          417ms ± 5%        411ms ± 3%    ~     (p=0.548 n=5+5)
Tar              205ms ± 5%        198ms ± 2%    ~     (p=0.690 n=5+5)
XML              232ms ± 4%        229ms ± 4%    ~     (p=0.690 n=5+5)
StdCmd           28.7s ± 5%        28.2s ± 2%    ~     (p=0.421 n=5+5)

name       old user-time/op  new user-time/op  delta
Template         269ms ± 4%        265ms ± 5%    ~     (p=0.421 n=5+5)
Unicode          153ms ± 7%        149ms ± 3%    ~     (p=0.841 n=5+5)
GoTypes          850ms ± 7%        862ms ± 4%    ~     (p=0.841 n=5+5)
Compiler         4.01s ± 5%        3.86s ± 0%    ~     (p=0.190 n=5+4)
SSA              10.9s ± 4%        10.8s ± 2%    ~     (p=0.548 n=5+5)
Flate            166ms ± 7%        167ms ± 6%    ~     (p=1.000 n=5+5)
GoParser         204ms ± 8%        206ms ± 7%    ~     (p=0.841 n=5+5)
Reflect          514ms ± 5%        508ms ± 4%    ~     (p=0.548 n=5+5)
Tar              245ms ± 6%        244ms ± 3%    ~     (p=0.690 n=5+5)
XML              280ms ± 4%        278ms ± 4%    ~     (p=0.841 n=5+5)

name       old alloc/op      new alloc/op      delta
Template        37.9MB ± 0%       37.9MB ± 0%    ~     (p=0.841 n=5+5)
Unicode         28.8MB ± 0%       28.8MB ± 0%    ~     (p=0.841 n=5+5)
GoTypes          113MB ± 0%        113MB ± 0%    ~     (p=0.151 n=5+5)
Compiler         468MB ± 0%        468MB ± 0%  -0.01%  (p=0.032 n=5+5)
SSA             1.50GB ± 0%       1.50GB ± 0%    ~     (p=0.548 n=5+5)
Flate           24.4MB ± 0%       24.4MB ± 0%    ~     (p=1.000 n=5+5)
GoParser        30.7MB ± 0%       30.7MB ± 0%    ~     (p=1.000 n=5+5)
Reflect         76.5MB ± 0%       76.5MB ± 0%    ~     (p=0.548 n=5+5)
Tar             38.9MB ± 0%       38.9MB ± 0%    ~     (p=0.222 n=5+5)
XML             41.6MB ± 0%       41.6MB ± 0%    ~     (p=0.548 n=5+5)

name       old allocs/op     new allocs/op     delta
Template          382k ± 0%         382k ± 0%  +0.01%  (p=0.008 n=5+5)
Unicode           343k ± 0%         343k ± 0%    ~     (p=0.841 n=5+5)
GoTypes          1.19M ± 0%        1.19M ± 0%  +0.01%  (p=0.008 n=5+5)
Compiler         4.53M ± 0%        4.53M ± 0%  +0.03%  (p=0.008 n=5+5)
SSA              12.4M ± 0%        12.4M ± 0%  +0.00%  (p=0.008 n=5+5)
Flate             235k ± 0%         235k ± 0%    ~     (p=0.079 n=5+5)
GoParser          318k ± 0%         318k ± 0%    ~     (p=0.730 n=5+5)
Reflect           978k ± 0%         978k ± 0%    ~     (p=1.000 n=5+5)
Tar               393k ± 0%         393k ± 0%    ~     (p=0.056 n=5+5)
XML               405k ± 0%         405k ± 0%    ~     (p=0.548 n=5+5)

name       old text-bytes    new text-bytes    delta
HelloSize        672kB ± 0%        672kB ± 0%    ~     (all equal)
CmdGoSize       7.12MB ± 0%       7.12MB ± 0%    ~     (all equal)

name       old data-bytes    new data-bytes    delta
HelloSize        133kB ± 0%        133kB ± 0%    ~     (all equal)
CmdGoSize        390kB ± 0%        390kB ± 0%    ~     (all equal)

name       old exe-bytes     new exe-bytes     delta
HelloSize       1.07MB ± 0%       1.07MB ± 0%    ~     (all equal)
CmdGoSize       11.2MB ± 0%       11.2MB ± 0%    ~     (all equal)

Passes toolstash compare.

For #22662.

Change-Id: I19edb53dd9675af57f7122cb7dba2a6d8bdcc3da
Reviewed-on: https://go-review.googlesource.com/94515
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2018-02-26 18:27:15 +00:00
Brad Fitzpatrick
b1accced20 strings: add Builder benchmarks comparing bytes.Buffer and strings.Builder
Despite the existing test that locks in the allocation behavior, people
really want a benchmark. So:

BenchmarkBuildString_Builder/1Write_NoGrow-4    20000000  60.4 ns/op   48 B/op  1 allocs/op
BenchmarkBuildString_Builder/3Write_NoGrow-4    10000000   230 ns/op  336 B/op  3 allocs/op
BenchmarkBuildString_Builder/3Write_Grow-4      20000000   102 ns/op  112 B/op  1 allocs/op
BenchmarkBuildString_ByteBuffer/1Write_NoGrow-4 10000000   125 ns/op  160 B/op  2 allocs/op
BenchmarkBuildString_ByteBuffer/3Write_NoGrow-4  5000000   339 ns/op  400 B/op  3 allocs/op
BenchmarkBuildString_ByteBuffer/3Write_Grow-4    5000000   316 ns/op  336 B/op  3 allocs/op

I don't think these allocate-as-fast-as-you-can benchmarks are very
interesting because they're effectively just GC benchmarks, but sure.
If one wants to see that there's 1 fewer allocation, there it is. The
ns/op and B/op numbers will change as the built string size changes.

Updates #18990

Change-Id: Ifccf535bd396217434a0e6989e195105f90132ae
Reviewed-on: https://go-review.googlesource.com/96980
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Alan Donovan <adonovan@google.com>
2018-02-26 18:00:12 +00:00
Tobias Klauser
495eb3f922 syscall: remove/update outdated TODO comments
Error returns for linux/arm syscalls are handled since a long time.

Remove another list of unimplemented syscalls, following CL 96315.

The root-only check in TestSyscallNoError was shown to be sufficient as
part of CL 84485 already.

NetBSD and OpenBSD do not implement the sendfile syscall (yet), so add a
link to golang.org/issue/5847

Change-Id: I07efc3c3203537a4142707385f31b59dc0ecca42
Reviewed-on: https://go-review.googlesource.com/97115
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2018-02-26 17:54:31 +00:00
Tobias Klauser
ad9814de61 os: unify supportsCloseOnExec definition
On Darwin and FreeBSD, supportsCloseOnExec is defined in its own file,
even though it is set to true as on other Unices. Drop the separate
definitions but keep the accompanying comments.

Change-Id: Iab1d20e1b2590800f141d54b55a099c9cd7ae57e
Reviewed-on: https://go-review.googlesource.com/97155
Run-TryBot: Tobias Klauser <tobias.klauser@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2018-02-26 17:10:24 +00:00
Alex Brainman
9cae3aaf47 os: do not forget to set ModeDevice when using ModeCharDevice
Fixes #23123

Change-Id: Ia4ac947cc49ef3d150ef60a095b86552dcef397d
Reviewed-on: https://go-review.googlesource.com/84435
Reviewed-by: Giovanni Bajo <rasky@develer.com>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Giovanni Bajo <rasky@develer.com>
2018-02-26 17:02:29 +00:00
Tobias Klauser
144bf04a2b net, internal/poll, net/internal/socktest: use SOCK_{CLOEXEC,NONBLOCK} accept4/socket flags on OpenBSD
The SOCK_CLOEXEC and SOCK_NONBLOCK flags to the socket syscall and the
accept4 syscall are supported since OpenBSD 5.7.

Follows CL 40895 and CL 94295

Change-Id: Icaf35ace2ef5e73279a70d4f1a9fbf3be9371e6c
Reviewed-on: https://go-review.googlesource.com/97196
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2018-02-26 16:59:38 +00:00
Kevin Burke
db7af2e67b os/user: clean up grammar in comments
Change-Id: If9fe04894851d60a682346415c2e5523b2f04929
Reviewed-on: https://go-review.googlesource.com/96981
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Reviewed-by: Alex Brainman <alex.brainman@gmail.com>
2018-02-26 16:57:47 +00:00
Kunpei Sakai
7e9a8546e4 time: avoid unnecessary type conversions
Change-Id: Ic318c25b21298ec123eb27c814c79f637887713c
Reviewed-on: https://go-review.googlesource.com/97135
Run-TryBot: Kunpei Sakai <namusyaka@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2018-02-26 16:14:51 +00:00
Giovanni Bajo
dd3b4714be build: small cleanup in error message in make.bat
Contrary to bash, double quotes cannot be used to group
arguments in Windows shell, so they were being printed as
literals by the echo command.

Since a literal '>' is present in the string, it is sufficient
to escape it correctly through '^'.

Change-Id: Icc8c92b3dc8d813825adadbe3d921a38d44a1a94
Reviewed-on: https://go-review.googlesource.com/97056
Reviewed-by: Alex Brainman <alex.brainman@gmail.com>
2018-02-26 10:27:14 +00:00
unknown
e9c57bea11 net/http,doc: use HTTP status code constants where applicable
There are a few places where the integer value is used.
Use the equivalent constants to aid with readability.

Change-Id: I023b1dbe605340544c056d0e0d9d6d5a7d7d0edc
GitHub-Last-Rev: c1c90bcd25
GitHub-Pull-Request: golang/go#24123
Reviewed-on: https://go-review.googlesource.com/96984
Reviewed-by: Andrew Bonventre <andybons@golang.org>
2018-02-26 05:04:31 +00:00
Agniva De Sarker
39852bf4cc archive/tar: remove loop label from reader
CL 14624 introduced this label. At that time,
the switch-case had a break to label statement which made this necessary.
But now, the code no longer has a break statement and it directly returns.

Hence, it is no longer necessary to have a label.

Change-Id: Idde0fcc4d2db2d76424679f5acfe33ab8573bce4
Reviewed-on: https://go-review.googlesource.com/96935
Reviewed-by: Joe Tsai <thebrokentoaster@gmail.com>
2018-02-25 21:50:17 +00:00
Lubomir I. Ivanov (VMware)
7a218942be os/user: obtain a user home path on Windows
newUserFromSid() is extended so that the retriaval of the user home
path based on a user SID becomes possible.

(1) The primary method it uses is to lookup the Windows registry for
the following key:
  HKLM\SOFTWARE\Microsoft\Windows NT\CurrentVersion\ProfileList\[SID]

If the key does not exist the user might not have logged in yet.
If (1) fails it falls back to (2)

(2) The second method the function uses is to look at the default home
path for users (e.g. WINAPI's GetProfilesDirectory()) and append
the username to that. The procedure is in the lines of:
  c:\Users + \ + <username>

The function newUser() now requires the following arguments:
  uid, gid, dir, username, domain
This is done to avoid multiple calls to usid.String() and
usid.LookupAccount("") in the case of a newUserFromSid()
call stack.

The functions current() and newUserFromSid() both call newUser()
supplying the arguments in question. The helpers
lookupUsernameAndDomain() and findHomeDirInRegistry() are
added.

This commit also updates:
- go/build/deps_test.go, so that the test now includes the
"internal/syscall/windows/registry" import.
- os/user/user_test.go, so that User.HomeDir is tested on Windows.

GitHub-Last-Rev: 25423e2a38
GitHub-Pull-Request: golang/go#23822
Change-Id: I6c3ad1c4ce3e7bc0d1add024951711f615b84ee5
Reviewed-on: https://go-review.googlesource.com/93935
Reviewed-by: Alex Brainman <alex.brainman@gmail.com>
Run-TryBot: Alex Brainman <alex.brainman@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2018-02-24 23:13:03 +00:00
Daniel Martí
c879153831 cmd/compile/internal/syntax: use stringer for operators and tokens
With its new -linecomment flag, it is now possible to use stringer on
values whose strings aren't valid identifiers. This is the case with
tokens and operators in Go.

Operator alredy had inline comments with each operator's string
representation; only minor modifications were needed. The inline
comments were added to each of the token names, using the same strategy.

Comments that were previously inline or part of the string arrays were
moved to the line immediately before the name they correspond to.

Finally, declare tokStrFast as a function that uses the generated arrays
directly. Avoiding the branch and strconv call means that we avoid a
performance regression in the scanner, perhaps due to the lack of
mid-stack inlining.

Performance is not affected. Measured with 'go test -run StdLib -fast'
on an X1 Carbon Gen2 (i5-4300U @ 1.90GHz, 8GB RAM, SSD), the best of 5
runs before and after the changes are:

	parsed 1709399 lines (3763 files) in 1.707402159s (1001169 lines/s)
	allocated 449.282Mb (263.137Mb/s)

	parsed 1709329 lines (3765 files) in 1.706663154s (1001562 lines/s)
	allocated 449.290Mb (263.256Mb/s)

Change-Id: Idcc4f83393fcadd6579700e3602c09496ea2625b
Reviewed-on: https://go-review.googlesource.com/95357
Reviewed-by: Robert Griesemer <gri@golang.org>
2018-02-24 00:20:46 +00:00
Ilya Tocar
c3935c08d2 math/big: speed-up addMulVVW on amd64
Use MULX/ADOX/ADCX instructions to speed-up addMulVVW,
when they are available. addMulVVW is a hotspot in rsa.
This is faster than ADD/ADC/IMUL version, because ADOX/ADCX only
modify carry/overflow flag, so they can be interleaved with each other
and with MULX, which doesn't modify flags at all.
Increasing unroll factor to e. g. 16 makes rsa 1% faster, but 3PrimeRSA2048Decrypt
performance falls back to baseline.

Updates #20058

AddMulVVW/1-8                       3.28ns ± 2%     3.26ns ± 3%     ~     (p=0.107 n=10+10)
AddMulVVW/2-8                       4.26ns ± 2%     4.24ns ± 3%     ~     (p=0.327 n=9+9)
AddMulVVW/3-8                       5.07ns ± 2%     5.26ns ± 2%   +3.73%  (p=0.000 n=10+10)
AddMulVVW/4-8                       6.40ns ± 2%     6.50ns ± 2%   +1.61%  (p=0.000 n=10+10)
AddMulVVW/5-8                       6.77ns ± 2%     6.86ns ± 1%   +1.38%  (p=0.001 n=9+9)
AddMulVVW/10-8                      12.2ns ± 2%     10.6ns ± 3%  -13.65%  (p=0.000 n=10+10)
AddMulVVW/100-8                     79.7ns ± 2%     52.4ns ± 1%  -34.17%  (p=0.000 n=10+10)
AddMulVVW/1000-8                     695ns ± 1%      491ns ± 2%  -29.39%  (p=0.000 n=9+10)
AddMulVVW/10000-8                   7.26µs ± 2%     5.92µs ± 6%  -18.42%  (p=0.000 n=10+10)
AddMulVVW/100000-8                  72.6µs ± 2%     62.2µs ± 2%  -14.31%  (p=0.000 n=10+10)

crypto/rsa speed-up is smaller, but stil noticeable:

RSA2048Decrypt-8        1.61ms ± 1%  1.38ms ± 1%  -14.13%  (p=0.000 n=10+10)
RSA2048Sign-8           1.93ms ± 1%  1.70ms ± 1%  -11.86%  (p=0.000 n=10+10)
3PrimeRSA2048Decrypt-8   932µs ± 0%   828µs ± 0%  -11.15%  (p=0.000 n=10+10)

Results on crypto/tls:

HandshakeServer/RSA-8                        901µs ± 1%    777µs ± 0%  -13.70%  (p=0.000 n=10+8)
HandshakeServer/ECDHE-P256-RSA-8            1.01ms ± 1%   0.90ms ± 0%  -11.53%  (p=0.000 n=10+9)

Full math/big benchmarks:

name                              old time/op    new time/op     delta
AddVV/1-8                           3.74ns ± 6%     3.55ns ± 2%     ~     (p=0.082 n=10+8)
AddVV/2-8                           3.96ns ± 2%     3.98ns ± 5%     ~     (p=0.794 n=10+9)
AddVV/3-8                           4.97ns ± 2%     4.94ns ± 1%     ~     (p=0.081 n=10+9)
AddVV/4-8                           5.59ns ± 2%     5.59ns ± 2%     ~     (p=0.809 n=10+10)
AddVV/5-8                           6.63ns ± 1%     6.62ns ± 1%     ~     (p=0.560 n=9+10)
AddVV/10-8                          8.11ns ± 1%     8.11ns ± 2%     ~     (p=0.402 n=10+10)
AddVV/100-8                         46.9ns ± 2%     46.8ns ± 1%     ~     (p=0.809 n=10+10)
AddVV/1000-8                         389ns ± 1%      391ns ± 4%     ~     (p=0.809 n=10+10)
AddVV/10000-8                       5.05µs ± 5%     4.98µs ± 2%     ~     (p=0.113 n=9+10)
AddVV/100000-8                      55.3µs ± 3%     55.2µs ± 3%     ~     (p=0.796 n=10+10)
AddVW/1-8                           3.04ns ± 3%     3.02ns ± 3%     ~     (p=0.538 n=10+10)
AddVW/2-8                           3.57ns ± 2%     3.61ns ± 2%   +1.12%  (p=0.032 n=9+9)
AddVW/3-8                           3.77ns ± 1%     3.79ns ± 2%     ~     (p=0.719 n=10+10)
AddVW/4-8                           4.69ns ± 1%     4.69ns ± 2%     ~     (p=0.920 n=10+9)
AddVW/5-8                           4.58ns ± 1%     4.58ns ± 1%     ~     (p=0.812 n=10+10)
AddVW/10-8                          7.62ns ± 2%     7.63ns ± 1%     ~     (p=0.926 n=10+10)
AddVW/100-8                         41.1ns ± 2%     42.4ns ± 3%   +3.34%  (p=0.000 n=10+10)
AddVW/1000-8                         386ns ± 2%      389ns ± 4%     ~     (p=0.514 n=10+10)
AddVW/10000-8                       3.88µs ± 3%     3.87µs ± 3%     ~     (p=0.448 n=10+10)
AddVW/100000-8                      41.2µs ± 3%     41.7µs ± 3%     ~     (p=0.148 n=10+10)
AddMulVVW/1-8                       3.28ns ± 2%     3.26ns ± 3%     ~     (p=0.107 n=10+10)
AddMulVVW/2-8                       4.26ns ± 2%     4.24ns ± 3%     ~     (p=0.327 n=9+9)
AddMulVVW/3-8                       5.07ns ± 2%     5.26ns ± 2%   +3.73%  (p=0.000 n=10+10)
AddMulVVW/4-8                       6.40ns ± 2%     6.50ns ± 2%   +1.61%  (p=0.000 n=10+10)
AddMulVVW/5-8                       6.77ns ± 2%     6.86ns ± 1%   +1.38%  (p=0.001 n=9+9)
AddMulVVW/10-8                      12.2ns ± 2%     10.6ns ± 3%  -13.65%  (p=0.000 n=10+10)
AddMulVVW/100-8                     79.7ns ± 2%     52.4ns ± 1%  -34.17%  (p=0.000 n=10+10)
AddMulVVW/1000-8                     695ns ± 1%      491ns ± 2%  -29.39%  (p=0.000 n=9+10)
AddMulVVW/10000-8                   7.26µs ± 2%     5.92µs ± 6%  -18.42%  (p=0.000 n=10+10)
AddMulVVW/100000-8                  72.6µs ± 2%     62.2µs ± 2%  -14.31%  (p=0.000 n=10+10)
DecimalConversion-8                  108µs ±19%      104µs ± 4%     ~     (p=0.460 n=10+8)
FloatString/100-8                    926ns ±14%      908ns ± 5%     ~     (p=0.398 n=9+9)
FloatString/1000-8                  25.7µs ± 1%     25.7µs ± 1%     ~     (p=0.739 n=10+10)
FloatString/10000-8                 2.13ms ± 1%     2.12ms ± 1%     ~     (p=0.353 n=10+10)
FloatString/100000-8                 207ms ± 1%      206ms ± 2%     ~     (p=0.912 n=10+10)
FloatAdd/10-8                       61.3ns ± 3%     61.9ns ± 3%     ~     (p=0.183 n=10+10)
FloatAdd/100-8                      62.0ns ± 2%     62.9ns ± 4%     ~     (p=0.118 n=10+10)
FloatAdd/1000-8                     84.7ns ± 2%     84.4ns ± 1%     ~     (p=0.591 n=10+10)
FloatAdd/10000-8                     305ns ± 2%      306ns ± 1%     ~     (p=0.443 n=10+10)
FloatAdd/100000-8                   2.45µs ± 1%     2.46µs ± 1%     ~     (p=0.782 n=10+10)
FloatSub/10-8                       56.8ns ± 4%     56.5ns ± 5%     ~     (p=0.423 n=10+10)
FloatSub/100-8                      57.3ns ± 4%     57.1ns ± 5%     ~     (p=0.540 n=10+10)
FloatSub/1000-8                     66.8ns ± 4%     66.6ns ± 1%     ~     (p=0.868 n=10+10)
FloatSub/10000-8                     199ns ± 1%      198ns ± 1%     ~     (p=0.287 n=10+9)
FloatSub/100000-8                   1.47µs ± 2%     1.47µs ± 2%     ~     (p=0.920 n=10+9)
ParseFloatSmallExp-8                8.74µs ±10%     9.48µs ±10%   +8.51%  (p=0.010 n=9+10)
ParseFloatLargeExp-8                39.2µs ±25%     39.6µs ±12%     ~     (p=0.529 n=10+10)
GCD10x10/WithoutXY-8                 173ns ±23%      177ns ±20%     ~     (p=0.698 n=10+10)
GCD10x10/WithXY-8                    736ns ±12%      728ns ±16%     ~     (p=0.838 n=10+10)
GCD10x100/WithoutXY-8                325ns ±16%      326ns ±14%     ~     (p=0.912 n=10+10)
GCD10x100/WithXY-8                  1.14µs ±13%     1.16µs ± 6%     ~     (p=0.287 n=10+9)
GCD10x1000/WithoutXY-8               851ns ±25%      820ns ±12%     ~     (p=0.592 n=10+10)
GCD10x1000/WithXY-8                 2.89µs ±17%     2.85µs ± 5%     ~     (p=1.000 n=10+9)
GCD10x10000/WithoutXY-8             6.66µs ±12%     6.82µs ±19%     ~     (p=0.529 n=10+10)
GCD10x10000/WithXY-8                18.0µs ± 5%     17.2µs ±19%     ~     (p=0.315 n=7+10)
GCD10x100000/WithoutXY-8            77.8µs ±18%     73.3µs ±11%     ~     (p=0.315 n=10+9)
GCD10x100000/WithXY-8                186µs ±14%      204µs ±29%     ~     (p=0.218 n=10+10)
GCD100x100/WithoutXY-8              1.09µs ± 1%     1.09µs ± 2%     ~     (p=0.117 n=9+10)
GCD100x100/WithXY-8                 7.93µs ± 1%     7.97µs ± 1%   +0.52%  (p=0.006 n=10+10)
GCD100x1000/WithoutXY-8             2.00µs ± 3%     2.04µs ± 6%     ~     (p=0.053 n=9+10)
GCD100x1000/WithXY-8                9.23µs ± 1%     9.29µs ± 1%   +0.63%  (p=0.009 n=10+10)
GCD100x10000/WithoutXY-8            10.2µs ±11%      9.7µs ± 6%     ~     (p=0.278 n=10+9)
GCD100x10000/WithXY-8               33.3µs ± 4%     33.6µs ± 4%     ~     (p=0.481 n=10+10)
GCD100x100000/WithoutXY-8            106µs ±17%      105µs ±13%     ~     (p=0.853 n=10+10)
GCD100x100000/WithXY-8               289µs ±17%      276µs ± 8%     ~     (p=0.353 n=10+10)
GCD1000x1000/WithoutXY-8            12.2µs ± 1%     12.1µs ± 1%   -0.45%  (p=0.007 n=10+10)
GCD1000x1000/WithXY-8                131µs ± 1%      132µs ± 0%   +0.93%  (p=0.000 n=9+7)
GCD1000x10000/WithoutXY-8           20.6µs ± 2%     20.6µs ± 1%     ~     (p=0.326 n=10+9)
GCD1000x10000/WithXY-8               238µs ± 1%      237µs ± 1%     ~     (p=0.356 n=9+10)
GCD1000x100000/WithoutXY-8           117µs ± 8%      114µs ±11%     ~     (p=0.190 n=10+10)
GCD1000x100000/WithXY-8             1.51ms ± 1%     1.50ms ± 1%     ~     (p=0.053 n=9+10)
GCD10000x10000/WithoutXY-8           220µs ± 1%      218µs ± 1%   -0.86%  (p=0.000 n=10+10)
GCD10000x10000/WithXY-8             3.04ms ± 0%     3.05ms ± 0%   +0.33%  (p=0.001 n=9+10)
GCD10000x100000/WithoutXY-8          513µs ± 0%      511µs ± 0%   -0.38%  (p=0.000 n=10+10)
GCD10000x100000/WithXY-8            15.1ms ± 0%     15.0ms ± 0%     ~     (p=0.053 n=10+9)
GCD100000x100000/WithoutXY-8        10.4ms ± 1%     10.4ms ± 2%     ~     (p=0.258 n=9+9)
GCD100000x100000/WithXY-8            205ms ± 1%      205ms ± 1%     ~     (p=0.481 n=10+10)
Hilbert-8                           1.25ms ±15%     1.24ms ±17%     ~     (p=0.853 n=10+10)
Binomial-8                          3.03µs ±24%     2.90µs ±16%     ~     (p=0.481 n=10+10)
QuoRem-8                            1.95µs ± 1%     1.95µs ± 2%     ~     (p=0.117 n=9+10)
Exp-8                               5.12ms ± 2%     3.99ms ± 1%  -22.02%  (p=0.000 n=10+9)
Exp2-8                              5.14ms ± 2%     3.98ms ± 0%  -22.55%  (p=0.000 n=10+9)
Bitset-8                            16.4ns ± 2%     16.5ns ± 2%     ~     (p=0.311 n=9+10)
BitsetNeg-8                         46.3ns ± 4%     45.8ns ± 4%     ~     (p=0.272 n=10+10)
BitsetOrig-8                         250ns ±19%      247ns ±14%     ~     (p=0.671 n=10+10)
BitsetNegOrig-8                      416ns ±14%      429ns ±14%     ~     (p=0.353 n=10+10)
ModSqrt225_Tonelli-8                 400µs ± 0%      320µs ± 0%  -19.88%  (p=0.000 n=9+7)
ModSqrt224_3Mod4-8                   123µs ± 1%       97µs ± 0%  -21.21%  (p=0.000 n=9+10)
ModSqrt5430_Tonelli-8                1.87s ± 0%      1.39s ± 1%  -25.70%  (p=0.000 n=9+10)
ModSqrt5430_3Mod4-8                  630ms ± 2%      465ms ± 1%  -26.12%  (p=0.000 n=10+10)
Sqrt-8                              25.8µs ± 1%     25.9µs ± 0%   +0.66%  (p=0.002 n=10+8)
IntSqr/1-8                          11.3ns ± 1%     11.3ns ± 2%     ~     (p=0.360 n=9+10)
IntSqr/2-8                          26.6ns ± 1%     27.4ns ± 2%   +2.87%  (p=0.000 n=8+9)
IntSqr/3-8                          36.5ns ± 6%     36.6ns ± 5%     ~     (p=0.589 n=10+10)
IntSqr/5-8                          57.2ns ± 2%     57.8ns ± 1%   +0.92%  (p=0.045 n=10+9)
IntSqr/8-8                           112ns ± 1%       93ns ± 1%  -16.60%  (p=0.000 n=10+10)
IntSqr/10-8                          148ns ± 1%      129ns ± 5%  -12.85%  (p=0.000 n=10+10)
IntSqr/20-8                          642ns ±28%      692ns ±21%     ~     (p=0.105 n=10+10)
IntSqr/30-8                         1.03µs ±18%     1.06µs ±15%     ~     (p=0.422 n=10+8)
IntSqr/50-8                         2.33µs ±14%     2.14µs ±20%     ~     (p=0.063 n=10+10)
IntSqr/80-8                         4.06µs ±13%     3.72µs ±14%   -8.31%  (p=0.029 n=10+10)
IntSqr/100-8                        5.79µs ±10%     5.20µs ±18%  -10.15%  (p=0.004 n=10+10)
IntSqr/200-8                        17.1µs ± 1%     12.9µs ± 3%  -24.44%  (p=0.000 n=10+10)
IntSqr/300-8                        35.9µs ± 0%     26.6µs ± 1%  -25.75%  (p=0.000 n=10+10)
IntSqr/500-8                        84.9µs ± 0%     71.7µs ± 1%  -15.49%  (p=0.000 n=10+10)
IntSqr/800-8                         170µs ± 1%      142µs ± 2%  -16.73%  (p=0.000 n=10+10)
IntSqr/1000-8                        258µs ± 1%      218µs ± 1%  -15.65%  (p=0.000 n=10+10)
Mul-8                               10.4ms ± 1%      8.3ms ± 0%  -20.05%  (p=0.000 n=10+9)
Exp3Power/0x10-8                     311ns ±15%      321ns ±24%     ~     (p=0.447 n=10+10)
Exp3Power/0x40-8                     358ns ±21%      346ns ±37%     ~     (p=0.591 n=10+10)
Exp3Power/0x100-8                    611ns ±19%      570ns ±27%     ~     (p=0.393 n=10+10)
Exp3Power/0x400-8                   1.31µs ±26%     1.34µs ±19%     ~     (p=0.853 n=10+10)
Exp3Power/0x1000-8                  6.76µs ±23%     6.22µs ±16%     ~     (p=0.095 n=10+9)
Exp3Power/0x4000-8                  37.6µs ±14%     36.4µs ±21%     ~     (p=0.247 n=10+10)
Exp3Power/0x10000-8                  345µs ±14%      310µs ±11%   -9.99%  (p=0.005 n=10+10)
Exp3Power/0x40000-8                 2.77ms ± 1%     2.34ms ± 1%  -15.47%  (p=0.000 n=10+10)
Exp3Power/0x100000-8                25.1ms ± 1%     21.3ms ± 1%  -15.26%  (p=0.000 n=10+10)
Exp3Power/0x400000-8                 225ms ± 1%      190ms ± 1%  -15.61%  (p=0.000 n=10+10)
Fibo-8                              23.4ms ± 1%     23.3ms ± 0%     ~     (p=0.052 n=10+10)
NatSqr/1-8                          58.4ns ±24%     59.8ns ±38%     ~     (p=0.739 n=10+10)
NatSqr/2-8                           122ns ±21%      122ns ±16%     ~     (p=0.896 n=10+10)
NatSqr/3-8                           140ns ±28%      148ns ±30%     ~     (p=0.288 n=10+10)
NatSqr/5-8                           193ns ±29%      210ns ±34%     ~     (p=0.469 n=10+10)
NatSqr/8-8                           317ns ±21%      296ns ±25%     ~     (p=0.393 n=10+10)
NatSqr/10-8                          362ns ± 8%      373ns ±30%     ~     (p=0.617 n=9+10)
NatSqr/20-8                         1.24µs ±16%     1.06µs ±29%  -14.57%  (p=0.019 n=10+10)
NatSqr/30-8                         1.90µs ±32%     1.71µs ±10%     ~     (p=0.176 n=10+9)
NatSqr/50-8                         4.22µs ±19%     3.67µs ± 7%  -13.03%  (p=0.017 n=10+9)
NatSqr/80-8                         7.33µs ±20%     6.50µs ±15%  -11.26%  (p=0.009 n=10+10)
NatSqr/100-8                        9.84µs ±18%     9.33µs ± 8%     ~     (p=0.280 n=10+10)
NatSqr/200-8                        21.4µs ± 7%     20.0µs ±14%     ~     (p=0.075 n=10+10)
NatSqr/300-8                        38.0µs ± 2%     31.3µs ±10%  -17.63%  (p=0.000 n=10+10)
NatSqr/500-8                         102µs ± 5%      101µs ± 4%     ~     (p=0.780 n=9+10)
NatSqr/800-8                         190µs ± 3%      166µs ± 6%  -12.29%  (p=0.000 n=10+10)
NatSqr/1000-8                        277µs ± 2%      245µs ± 6%  -11.64%  (p=0.000 n=10+10)
ScanPi-8                             144µs ±23%      149µs ±24%     ~     (p=0.579 n=10+10)
StringPiParallel-8                  25.6µs ± 0%     25.8µs ± 0%   +0.69%  (p=0.000 n=9+10)
Scan/10/Base2-8                      305ns ± 1%      309ns ± 1%   +1.32%  (p=0.000 n=10+9)
Scan/100/Base2-8                    1.95µs ± 1%     1.98µs ± 1%   +1.10%  (p=0.000 n=10+10)
Scan/1000/Base2-8                   19.5µs ± 1%     19.7µs ± 1%   +1.39%  (p=0.000 n=10+10)
Scan/10000/Base2-8                   270µs ± 1%      272µs ± 1%   +0.58%  (p=0.024 n=9+9)
Scan/100000/Base2-8                 10.3ms ± 0%     10.3ms ± 0%   +0.16%  (p=0.022 n=9+10)
Scan/10/Base8-8                      146ns ± 4%      154ns ± 4%   +5.57%  (p=0.000 n=9+9)
Scan/100/Base8-8                     748ns ± 1%      759ns ± 1%   +1.51%  (p=0.000 n=9+10)
Scan/1000/Base8-8                   7.88µs ± 1%     8.00µs ± 1%   +1.64%  (p=0.000 n=10+10)
Scan/10000/Base8-8                   155µs ± 1%      155µs ± 1%     ~     (p=0.968 n=10+9)
Scan/100000/Base8-8                 9.11ms ± 0%     9.11ms ± 0%     ~     (p=0.604 n=9+10)
Scan/10/Base10-8                     140ns ± 5%      149ns ± 5%   +6.39%  (p=0.000 n=9+10)
Scan/100/Base10-8                    680ns ± 0%      688ns ± 1%   +1.08%  (p=0.000 n=9+10)
Scan/1000/Base10-8                  7.09µs ± 1%     7.16µs ± 1%   +0.98%  (p=0.019 n=10+10)
Scan/10000/Base10-8                  149µs ± 3%      150µs ± 3%     ~     (p=0.143 n=10+10)
Scan/100000/Base10-8                9.16ms ± 0%     9.16ms ± 0%     ~     (p=0.661 n=10+9)
Scan/10/Base16-8                     134ns ± 5%      135ns ± 3%     ~     (p=0.505 n=9+9)
Scan/100/Base16-8                    560ns ± 1%      563ns ± 0%   +0.67%  (p=0.000 n=10+8)
Scan/1000/Base16-8                  6.28µs ± 1%     6.26µs ± 1%     ~     (p=0.448 n=10+10)
Scan/10000/Base16-8                  161µs ± 1%      162µs ± 1%   +0.74%  (p=0.008 n=9+9)
Scan/100000/Base16-8                9.64ms ± 0%     9.64ms ± 0%     ~     (p=0.436 n=10+10)
String/10/Base2-8                    116ns ±12%      118ns ±13%     ~     (p=0.645 n=10+10)
String/100/Base2-8                   871ns ±23%      860ns ±22%     ~     (p=0.699 n=10+10)
String/1000/Base2-8                 10.0µs ±20%     10.0µs ±23%     ~     (p=0.853 n=10+10)
String/10000/Base2-8                 110µs ±21%      120µs ±25%     ~     (p=0.436 n=10+10)
String/100000/Base2-8                768µs ±11%      733µs ±16%     ~     (p=0.393 n=10+10)
String/10/Base8-8                   51.3ns ± 1%     51.0ns ± 3%     ~     (p=0.286 n=9+9)
String/100/Base8-8                   284ns ± 9%      272ns ±12%     ~     (p=0.267 n=9+10)
String/1000/Base8-8                 3.06µs ± 9%     3.04µs ±10%     ~     (p=0.739 n=10+10)
String/10000/Base8-8                36.1µs ±14%     35.1µs ± 9%     ~     (p=0.447 n=10+9)
String/100000/Base8-8                371µs ±12%      373µs ±16%     ~     (p=0.739 n=10+10)
String/10/Base10-8                   167ns ±11%      165ns ± 9%     ~     (p=0.781 n=10+10)
String/100/Base10-8                  727ns ± 1%      740ns ± 2%   +1.70%  (p=0.001 n=10+10)
String/1000/Base10-8                5.30µs ±18%     5.37µs ±14%     ~     (p=0.631 n=10+10)
String/10000/Base10-8               45.0µs ±14%     44.6µs ±10%     ~     (p=0.720 n=9+10)
String/100000/Base10-8              5.10ms ± 1%     5.05ms ± 3%     ~     (p=0.211 n=9+10)
String/10/Base16-8                  47.7ns ± 6%     47.7ns ± 6%     ~     (p=0.985 n=10+10)
String/100/Base16-8                  221ns ±10%      234ns ±27%     ~     (p=0.541 n=10+10)
String/1000/Base16-8                2.23µs ±11%     2.12µs ± 8%   -4.81%  (p=0.029 n=9+8)
String/10000/Base16-8               28.3µs ±21%     28.5µs ±14%     ~     (p=0.796 n=10+10)
String/100000/Base16-8               291µs ±16%      293µs ±15%     ~     (p=0.931 n=9+9)
LeafSize/0-8                        2.43ms ± 1%     2.49ms ± 1%   +2.56%  (p=0.000 n=10+10)
LeafSize/1-8                        49.7µs ± 9%     46.3µs ±16%   -6.78%  (p=0.017 n=10+9)
LeafSize/2-8                        48.4µs ±18%     46.3µs ±19%     ~     (p=0.436 n=10+10)
LeafSize/3-8                        81.7µs ± 3%     80.9µs ± 3%     ~     (p=0.278 n=10+9)
LeafSize/4-8                        47.0µs ± 7%     47.9µs ±13%     ~     (p=0.905 n=9+10)
LeafSize/5-8                        96.8µs ± 1%     97.3µs ± 2%     ~     (p=0.515 n=8+10)
LeafSize/6-8                        82.5µs ± 4%     80.9µs ± 2%   -1.92%  (p=0.019 n=10+10)
LeafSize/7-8                        67.2µs ±13%     66.6µs ± 9%     ~     (p=0.842 n=10+9)
LeafSize/8-8                        46.0µs ±28%     45.1µs ±12%     ~     (p=0.739 n=10+10)
LeafSize/9-8                         111µs ± 1%      111µs ± 1%     ~     (p=0.739 n=10+10)
LeafSize/10-8                       98.8µs ± 4%     97.9µs ± 3%     ~     (p=0.278 n=10+9)
LeafSize/11-8                       96.8µs ± 1%     96.4µs ± 1%     ~     (p=0.211 n=9+10)
LeafSize/12-8                       81.0µs ± 4%     81.3µs ± 3%     ~     (p=0.579 n=10+10)
LeafSize/13-8                       79.7µs ± 5%     79.2µs ± 3%     ~     (p=0.661 n=10+9)
LeafSize/14-8                       67.6µs ±12%     65.8µs ± 7%     ~     (p=0.447 n=10+9)
LeafSize/15-8                       63.9µs ±17%     66.3µs ±14%     ~     (p=0.481 n=10+10)
LeafSize/16-8                       44.0µs ±28%     46.0µs ±27%     ~     (p=0.481 n=10+10)
LeafSize/32-8                       46.2µs ±13%     43.5µs ±18%     ~     (p=0.156 n=9+10)
LeafSize/64-8                       53.3µs ±10%     53.0µs ±19%     ~     (p=0.730 n=9+9)
ProbablyPrime/n=0-8                 3.60ms ± 1%     3.39ms ± 1%   -5.87%  (p=0.000 n=10+9)
ProbablyPrime/n=1-8                 4.42ms ± 1%     4.08ms ± 1%   -7.69%  (p=0.000 n=10+10)
ProbablyPrime/n=5-8                 7.57ms ± 2%     6.79ms ± 1%  -10.24%  (p=0.000 n=10+10)
ProbablyPrime/n=10-8                11.6ms ± 2%     10.2ms ± 1%  -11.69%  (p=0.000 n=10+10)
ProbablyPrime/n=20-8                19.4ms ± 2%     16.9ms ± 2%  -12.89%  (p=0.000 n=10+10)
ProbablyPrime/Lucas-8               2.81ms ± 2%     2.72ms ± 1%   -3.22%  (p=0.000 n=10+9)
ProbablyPrime/MillerRabinBase2-8     797µs ± 1%      680µs ± 1%  -14.64%  (p=0.000 n=10+10)

name                              old speed      new speed       delta
AddVV/1-8                         17.1GB/s ± 6%   18.0GB/s ± 2%     ~     (p=0.122 n=10+8)
AddVV/2-8                         32.4GB/s ± 2%   32.2GB/s ± 4%     ~     (p=0.661 n=10+9)
AddVV/3-8                         38.6GB/s ± 2%   38.9GB/s ± 1%     ~     (p=0.113 n=10+9)
AddVV/4-8                         45.8GB/s ± 2%   45.8GB/s ± 2%     ~     (p=0.796 n=10+10)
AddVV/5-8                         48.1GB/s ± 2%   48.3GB/s ± 1%     ~     (p=0.315 n=10+10)
AddVV/10-8                        78.9GB/s ± 1%   78.9GB/s ± 2%     ~     (p=0.353 n=10+10)
AddVV/100-8                        136GB/s ± 2%    137GB/s ± 1%     ~     (p=0.971 n=10+10)
AddVV/1000-8                       164GB/s ± 1%    164GB/s ± 4%     ~     (p=0.853 n=10+10)
AddVV/10000-8                      126GB/s ± 6%    129GB/s ± 2%     ~     (p=0.063 n=10+10)
AddVV/100000-8                     116GB/s ± 3%    116GB/s ± 3%     ~     (p=0.796 n=10+10)
AddVW/1-8                         2.64GB/s ± 3%   2.64GB/s ± 3%     ~     (p=0.579 n=10+10)
AddVW/2-8                         4.49GB/s ± 2%   4.44GB/s ± 2%   -1.09%  (p=0.040 n=9+9)
AddVW/3-8                         6.36GB/s ± 1%   6.34GB/s ± 2%     ~     (p=0.684 n=10+10)
AddVW/4-8                         6.83GB/s ± 1%   6.82GB/s ± 2%     ~     (p=0.905 n=10+9)
AddVW/5-8                         8.75GB/s ± 1%   8.73GB/s ± 1%     ~     (p=0.796 n=10+10)
AddVW/10-8                        10.5GB/s ± 2%   10.5GB/s ± 1%     ~     (p=0.971 n=10+10)
AddVW/100-8                       19.5GB/s ± 2%   18.9GB/s ± 2%   -3.22%  (p=0.000 n=10+10)
AddVW/1000-8                      20.7GB/s ± 2%   20.6GB/s ± 4%     ~     (p=0.631 n=10+10)
AddVW/10000-8                     20.6GB/s ± 3%   20.7GB/s ± 3%     ~     (p=0.481 n=10+10)
AddVW/100000-8                    19.4GB/s ± 2%   19.2GB/s ± 3%     ~     (p=0.165 n=10+10)
AddMulVVW/1-8                     19.5GB/s ± 2%   19.7GB/s ± 3%     ~     (p=0.123 n=10+10)
AddMulVVW/2-8                     30.1GB/s ± 2%   30.2GB/s ± 3%     ~     (p=0.297 n=9+9)
AddMulVVW/3-8                     37.9GB/s ± 2%   36.5GB/s ± 2%   -3.63%  (p=0.000 n=10+10)
AddMulVVW/4-8                     40.0GB/s ± 2%   39.4GB/s ± 2%   -1.58%  (p=0.001 n=10+10)
AddMulVVW/5-8                     47.3GB/s ± 2%   46.6GB/s ± 1%   -1.35%  (p=0.001 n=9+9)
AddMulVVW/10-8                    52.3GB/s ± 2%   60.6GB/s ± 3%  +15.76%  (p=0.000 n=10+10)
AddMulVVW/100-8                   80.3GB/s ± 2%  122.1GB/s ± 1%  +51.92%  (p=0.000 n=10+10)
AddMulVVW/1000-8                  92.0GB/s ± 1%  130.3GB/s ± 2%  +41.61%  (p=0.000 n=9+10)
AddMulVVW/10000-8                 88.2GB/s ± 2%  108.2GB/s ± 5%  +22.66%  (p=0.000 n=10+10)
AddMulVVW/100000-8                88.2GB/s ± 2%  102.9GB/s ± 2%  +16.69%  (p=0.000 n=10+10)

Change-Id: Ic98e30c91d437d845fed03e07e976c3fdbf02b36
Reviewed-on: https://go-review.googlesource.com/74851
Run-TryBot: Ilya Tocar <ilya.tocar@intel.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Adam Langley <agl@golang.org>
2018-02-24 00:13:03 +00:00
Joe Tsai
9697a119e6 archive/zip: fix handling of Info-ZIP Unix extended timestamps
The Info-ZIP Unix1 extra field is specified as such:
>>>
Value    Size   Description
-----    ----   -----------
0x5855   Short  tag for this extra block type ("UX")
TSize    Short  total data size for this block
AcTime   Long   time of last access (GMT/UTC)
ModTime  Long   time of last modification (GMT/UTC)
<<<

The previous handling was incorrect in that it read the AcTime field
instead of the ModTime field.

The test-osx.zip test unfortunately locked in the wrong behavior.
Manually parsing that ZIP file shows that the encoded MS-DOS
date and time are 0x4b5f and 0xa97d, which corresponds with a
date of 2017-10-31 21:11:58, which matches the correct mod time
(off by 1 second due to MS-DOS timestamp resolution).

Fixes #23901

Change-Id: I567824c66e8316b9acd103dbecde366874a4b7ef
Reviewed-on: https://go-review.googlesource.com/96895
Run-TryBot: Joe Tsai <joetsai@google.com>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2018-02-23 23:48:32 +00:00
Ian Lance Taylor
804e3e565e runtime: don't check for String/Error methods in printany
They have either already been called by preprintpanics, or they can
not be called safely because of the various conditions checked at the
start of gopanic.

Fixes #24059

Change-Id: I4a6233d12c9f7aaaee72f343257ea108bae79241
Reviewed-on: https://go-review.googlesource.com/96755
Reviewed-by: Austin Clements <austin@google.com>
2018-02-23 22:39:46 +00:00
Yuval Pavel Zholkover
a5e8e2d998 os: respect umask in Mkdir and OpenFile on BSD systems when perm has ModeSticky set
Instead of calling Chmod directly on perm, stat the created file/dir to extract the
actual permission bits which can be different from perm due to umask.

Fixes #23120.

Change-Id: I3e70032451fc254bf48ce9627e98988f84af8d91
Reviewed-on: https://go-review.googlesource.com/84477
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2018-02-23 22:37:55 +00:00
Austin Clements
788464724c runtime: reduce arena size to 4MB on 64-bit Windows
Currently, we use 64MB heap arenas on 64-bit platforms. This works
well on UNIX-like OSes because they treat untouched pages as
essentially free. However, on Windows, committed memory is charged
against a process whether or not it has demand-faulted physical pages
in. Hence, on Windows, even a process with a tiny heap will commit
64MB for one heap arena, plus another 32MB for the arena map. Things
are much worse under the race detector, which increases the heap
commitment by a factor of 5.5X, leading to 384MB of committed memory
at runtime init.

Fix this by reducing the heap arena size to 4MB on Windows.

To counterbalance the effect of increasing the arena map size by a
factor of 16, and to further reduce the impact of the commitment for
the arena map, we switch from a single entry L1 arena map to a 64
entry L1 arena map.

Compared to the original arena design, this slows down the
x/benchmarks garbage benchmark by 0.49% (the slow down of this commit
alone is 1.59%, but the previous commit bought us a 1% speed-up):

name                       old time/op  new time/op  delta
Garbage/benchmem-MB=64-12  2.28ms ± 1%  2.29ms ± 1%  +0.49%  (p=0.000 n=17+18)

(https://perf.golang.org/search?q=upload:20180223.1)

(This was measured on linux/amd64 by modifying its arena configuration
as above.)

Fixes #23900.

Change-Id: I6b7fa5ecebee2947bf20cfeb78c248809469c6b1
Reviewed-on: https://go-review.googlesource.com/96780
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
2018-02-23 21:59:51 +00:00