1
0
mirror of https://github.com/golang/go synced 2024-10-05 16:31:21 -06:00
Commit Graph

26 Commits

Author SHA1 Message Date
Nathan VanBenschoten
b04f3b06ec all: replace strings.Index with strings.Contains where possible
Change-Id: Ia613f1c37bfce800ece0533a5326fca91d99a66a
Reviewed-on: https://go-review.googlesource.com/18120
Reviewed-by: Robert Griesemer <gri@golang.org>
Run-TryBot: Robert Griesemer <gri@golang.org>
2016-02-19 01:06:05 +00:00
Paul Wankadia
5ccaf0255b regexp/syntax: fix factoring of common prefixes in alternations
In the past, `a.*?c|a.*?b` was factored to `a.*?[bc]`. Thus, given
"abc" as its input string, the automaton would consume "ab" and
then stop (when unanchored) whereas it should consume all of "abc"
as per leftmost semantics.

Fixes #13812.

Change-Id: I67ac0a353d7793b3d0c9c4aaf22d157621dfe784
Reviewed-on: https://go-review.googlesource.com/18357
Reviewed-by: Russ Cox <rsc@golang.org>
2016-01-08 16:41:46 +00:00
Russ Cox
0680e9c0c1 regexp/syntax: fix handling of \Q...\E
It's not a group: must handle the inside as a sequence of literal chars,
not a single literal string.

That is, \Qab\E+ is the same as ab+, not (ab)+.

Fixes #11187.

Change-Id: I5406d05ccf7efff3a7f15395bdb0cfb2bd23a8ed
Reviewed-on: https://go-review.googlesource.com/17233
Reviewed-by: David Crawshaw <crawshaw@golang.org>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2015-12-01 22:45:12 +00:00
Caleb Spare
e36bf614c8 regexp: fix one-pass compilation
The one-pass transformation is structured as a search over the input
machine for conditions that violate the one-pass requisites. At each
iteration, we should fully explore all non-input paths that proceed from
the current instruction; this is implemented via recursive check calls.
But when we reach instructions that demand input (InstRune*), these
should be put onto the search queue.

Instead of searching this way, the routine previously (effectively)
proceeded through the machine one instruction at a time until finding an
Inst{Match,Fail,Rune*}, calling check on each instruction. This caused
bug #11905, where the transformation stopped before rewriting all
InstAlts as InstAltMatches.

Further, the check function unnecessarily recurred on InstRune*
instructions. (I believe this helps to mask the above bug.)

This change also deletes some unused functions and duplicate test cases.

Fixes #11905.

Change-Id: I5b0b26efea3d3bd01c7479a518b5ed1b886701cd
Reviewed-on: https://go-review.googlesource.com/17195
Reviewed-by: Russ Cox <rsc@golang.org>
2015-11-25 17:44:17 +00:00
Caleb Spare
d1eedfe132 regexp: fix LiteralPrefix for certain onepass progs
The prefix computation for onepass was incorrectly returning
complete=true when it encountered a beginning-of-text empty width match
(^) in the middle of an expression.

Fix by returning complete only when the prefix is followed by $ and then
an accepting state.

Fixes #11175.

Change-Id: Ie9c4cf5f76c1d2c904a6fb2f016cedb265c19fde
Reviewed-on: https://go-review.googlesource.com/16200
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Russ Cox <rsc@golang.org>
2015-11-25 17:28:33 +00:00
Caleb Spare
937172b247 regexp: add Copy method to Regexp
This helps users who wish to use separate Regexps in each goroutine to
avoid lock contention. Previously they had to parse the expression
multiple times to achieve this.

I used variants of the included benchmark to evaluate this change. I
used the arguments -benchtime 20s -cpu 1,2,4,8,16 on a machine with 16
hardware cores.

Comparing a single shared Regexp vs. copied Regexps, we can see that
lock contention causes huge slowdowns at higher levels of parallelism.
The copied version shows the expected linear speedup.

name              old time/op  new time/op  delta
MatchParallel      366ns ± 0%   370ns ± 0%   +1.09%   (p=0.000 n=10+8)
MatchParallel-2    324ns ±28%   184ns ± 1%  -43.37%  (p=0.000 n=10+10)
MatchParallel-4    352ns ± 5%    93ns ± 1%  -73.70%   (p=0.000 n=9+10)
MatchParallel-8    480ns ± 3%    46ns ± 0%  -90.33%    (p=0.000 n=9+8)
MatchParallel-16   510ns ± 8%    24ns ± 6%  -95.36%   (p=0.000 n=10+8)

I also compared a modified version of Regexp that has no mutex and a
single machine (the "RegexpForSingleGoroutine" rsc mentioned in
https://github.com/golang/go/issues/8232#issuecomment-66096128).

In this next test, I compared using N copied Regexps vs. N separate
RegexpForSingleGoroutines. This shows that, even for this relatively
simple regex, avoiding the lock entirely would only buy about 10-12%
further improvement.

name              old time/op  new time/op  delta
MatchParallel      370ns ± 0%   322ns ± 0%  -12.97%    (p=0.000 n=8+8)
MatchParallel-2    184ns ± 1%   162ns ± 1%  -11.60%  (p=0.000 n=10+10)
MatchParallel-4   92.7ns ± 1%  81.1ns ± 2%  -12.43%  (p=0.000 n=10+10)
MatchParallel-8   46.4ns ± 0%  41.8ns ±10%   -9.78%   (p=0.000 n=8+10)
MatchParallel-16  23.7ns ± 6%  20.6ns ± 1%  -13.14%    (p=0.000 n=8+8)

Updates #8232.

Change-Id: I15201a080c363d1b44104eafed46d8df5e311902
Reviewed-on: https://go-review.googlesource.com/16110
Reviewed-by: Russ Cox <rsc@golang.org>
2015-11-25 17:26:37 +00:00
Tamir Duberstein
3aa755b8fb regexp/syntax: correctly print ^ BOL and $ EOL
Fixes #12980.

Change-Id: I936db2f57f7c4dc80bb8ec32715c4c6b7bf0d708
Reviewed-on: https://go-review.googlesource.com/16112
Reviewed-by: Russ Cox <rsc@golang.org>
2015-11-25 17:25:22 +00:00
Didier Spezia
f75f2f3fcc regexp: fix slice bounds out of range panics
Regular expressions involving a (x){0} term are
simplified by removing this term from the
expression, just before the expression is compiled.

The number of subexpressions is evaluated before
the simplification. The number of capture instructions
in the compiled expressions is not necessarily in line
with the number of subexpressions.

When the ReplaceAll(String) methods are used, a number
of capture slots (nmatch) is evaluated as 2*(s+1)
(s being the number of subexpressions).

In some case, it can be higher than the number of capture
instructions evaluated at compile time, resulting in a
panic when the internal slices of regexp.machine
are resized to this value.

Fixed by capping the number of capture slots to the number
of capture instructions.

I must say I do not really see the benefits of setting
nmatch lower than re.prog.NumCap using this 2*(s+1) formula,
so perhaps this can be further simplified.

Fixes #11178
Fixes #11176

Change-Id: I21415e8ef2dd5f2721218e9a679f7f6bfb76ae9b
Reviewed-on: https://go-review.googlesource.com/14013
Reviewed-by: Russ Cox <rsc@golang.org>
2015-10-23 03:30:25 +00:00
Seth Hoenig
226aaf4267 regexp: add runnable example to regex.Split
The existing comment for regex.Split contains a plain text example,
while many of the other regex functions have runnable examples. This
change provides a runnable example for Split.

Change-Id: I5373f57f532fe843d7d0adcf4b513061ec797047
Reviewed-on: https://go-review.googlesource.com/14737
Reviewed-by: Andrew Gerrand <adg@golang.org>
Run-TryBot: Andrew Gerrand <adg@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2015-09-23 03:29:32 +00:00
Michael Matloob
71e83b8855 regexp: small correction to test comment
s/Backtrace/Backtrack/

Change-Id: I062aab18f23f2bc2110cf7210c2e7264747e02cf
Reviewed-on: https://go-review.googlesource.com/11091
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2015-06-14 17:09:13 +00:00
Rob Pike
ae38ef4cdf regexp: suggest go doc, not godoc
In 1.6, go doc is more likely to be available.

Change-Id: I970ad1d3317b35273f5c8d830f75713d3570c473
Reviewed-on: https://go-review.googlesource.com/10518
Reviewed-by: Andrew Gerrand <adg@golang.org>
2015-06-01 20:16:31 +00:00
Dmitry Savintsev
8cb9c21cce regexp: trivial change in comments to update code.google.com link
Replaced code.google.com/p/re2/ with github.com/google/re2/ and
updated the file names (re2-exhaustive.txt.bz2 not re2.txt.gz)
as well as the re2 make command (make log).

Change-Id: I15937b0b8a898d78d45366857ed86421c8d69960
Reviewed-on: https://go-review.googlesource.com/9372
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2015-04-27 20:18:25 +00:00
Michael Matloob
485f348979 regexp: set b.cap[0] and b.cap[1] only when captures requested
Fixes #10319

Change-Id: I96015b0e1dff30a72de11fea3837638b5c672891
Reviewed-on: https://go-review.googlesource.com/8501
Reviewed-by: Russ Cox <rsc@golang.org>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
2015-04-17 17:10:07 +00:00
Matthew Brennan
a513088396 regexp: skip backtracker for long programs
This update makes maxBacktrackLen return 0 if
len(prog.Inst) > maxBacktrackProg. This prevents an attempt to
backtrack against a nil bitstate.

Fixes #10319

Change-Id: Icdbeb2392782ccf66f9d0a70ea57af22fb93f01b
Reviewed-on: https://go-review.googlesource.com/8473
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2015-04-09 09:38:23 +00:00
Brad Fitzpatrick
fc9a234d9f regexp: fix link to RE2 syntax
Fixes #10224

Change-Id: I21037379b4667575e51ab0b6b683138c505c3f68
Reviewed-on: https://go-review.googlesource.com/7960
Reviewed-by: David Crawshaw <crawshaw@golang.org>
2015-03-23 19:17:52 +00:00
Michael Matloob
93238623e2 regexp: port RE2's bitstate backtracker to the regexp package
This is a port of RE2's bitstate backtracker, which triggers under
the same conditions that the RE2 backtracker triggers.  However I wasn't
sure how to port over some of the optimizations in the RE2 backtracker,
and there is a ~2% penalty on benchmarks that don't trigger the backtracker.

benchmark                                 old ns/op      new ns/op      delta
BenchmarkLiteral                          312            189            -39.42%
BenchmarkNotLiteral                       4435           3001           -32.33%
BenchmarkMatchClass                       5758           4378           -23.97%
BenchmarkMatchClass_InRange               5385           4084           -24.16%
BenchmarkReplaceAll                       5291           3505           -33.76%
BenchmarkAnchoredLiteralShortNonMatch     190            200            +5.26%
BenchmarkAnchoredLiteralLongNonMatch      189            194            +2.65%
BenchmarkAnchoredShortMatch               479            304            -36.53%
BenchmarkAnchoredLongMatch                478            499            +4.39%
BenchmarkOnePassShortA                    791            798            +0.88%
BenchmarkNotOnePassShortA                 3202           1571           -50.94%
BenchmarkOnePassShortB                    614            633            +3.09%
BenchmarkNotOnePassShortB                 2685           881            -67.19%
BenchmarkOnePassLongPrefix                152            154            +1.32%
BenchmarkOnePassLongNotPrefix             505            533            +5.54%
BenchmarkMatchEasy0_32                    139            171            +23.02%
BenchmarkMatchEasy0_1K                    653            1797           +175.19%
BenchmarkMatchEasy0_32K                   12032          13346          +10.92%
BenchmarkMatchEasy0_1M                    462882         461272         -0.35%
BenchmarkMatchEasy0_32M                   15015339       15365238       +2.33%
BenchmarkMatchEasy1_32                    122            168            +37.70%
BenchmarkMatchEasy1_1K                    3339           2612           -21.77%
BenchmarkMatchEasy1_32K                   72330          71721          -0.84%
BenchmarkMatchEasy1_1M                    2545410        2652284        +4.20%
BenchmarkMatchEasy1_32M                   80072063       82609750       +3.17%
BenchmarkMatchMedium_32                   2359           1980           -16.07%
BenchmarkMatchMedium_1K                   75939          58593          -22.84%
BenchmarkMatchMedium_32K                  2450907        2501106        +2.05%
BenchmarkMatchMedium_1M                   78707697       80174418       +1.86%
BenchmarkMatchMedium_32M                  2535146010     2570896441     +1.41%
BenchmarkMatchHard_32                     4297           2960           -31.11%
BenchmarkMatchHard_1K                     133592         88997          -33.38%
BenchmarkMatchHard_32K                    4240445        4336907        +2.27%
BenchmarkMatchHard_1M                     136187006      139350238      +2.32%
BenchmarkMatchHard_32M                    4350855890     4478537306     +2.93%

benchmark                    old MB/s     new MB/s     speedup
BenchmarkMatchEasy0_32       228.74       186.11       0.81x
BenchmarkMatchEasy0_1K       1565.91      569.64       0.36x
BenchmarkMatchEasy0_32K      2723.31      2455.10      0.90x
BenchmarkMatchEasy0_1M       2265.32      2273.22      1.00x
BenchmarkMatchEasy0_32M      2234.68      2183.79      0.98x
BenchmarkMatchEasy1_32       261.08       190.22       0.73x
BenchmarkMatchEasy1_1K       306.59       391.91       1.28x
BenchmarkMatchEasy1_32K      453.03       456.88       1.01x
BenchmarkMatchEasy1_1M       411.95       395.35       0.96x
BenchmarkMatchEasy1_32M      419.05       406.18       0.97x
BenchmarkMatchMedium_32      13.56        16.16        1.19x
BenchmarkMatchMedium_1K      13.48        17.48        1.30x
BenchmarkMatchMedium_32K     13.37        13.10        0.98x
BenchmarkMatchMedium_1M      13.32        13.08        0.98x
BenchmarkMatchMedium_32M     13.24        13.05        0.99x
BenchmarkMatchHard_32        7.45         10.81        1.45x
BenchmarkMatchHard_1K        7.67         11.51        1.50x
BenchmarkMatchHard_32K       7.73         7.56         0.98x
BenchmarkMatchHard_1M        7.70         7.52         0.98x
BenchmarkMatchHard_32M       7.71         7.49         0.97x

Fixes #4154

Change-Id: Iff7fb9507f0872b320d08afc08679751ed1b28bc
Reviewed-on: https://go-review.googlesource.com/2153
Reviewed-by: Russ Cox <rsc@golang.org>
2015-03-23 16:17:42 +00:00
Josh Bleecher Snyder
2adc4e8927 all: use "reports whether" in place of "returns true if(f)"
Comment changes only.

Change-Id: I56848814564c4aa0988b451df18bebdfc88d6d94
Reviewed-on: https://go-review.googlesource.com/7721
Reviewed-by: Rob Pike <r@golang.org>
2015-03-18 15:14:06 +00:00
Shenghou Ma
e24e299f41 regexp: update URLs in tests
Change-Id: I06035d949272157bbb7255563b37ac93cbf07f15
Reviewed-on: https://go-review.googlesource.com/3272
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2015-01-26 01:54:28 +00:00
Michael Matloob
c7eb9663aa regexp: fix typo in comment: s/onpass/onepass/
Change-Id: Idff57050a34d09e7fa9b77e9b53d61bb5ea2a71c
Reviewed-on: https://go-review.googlesource.com/2095
Reviewed-by: Minux Ma <minux@golang.org>
2014-12-24 07:30:28 +00:00
Robin Eklind
04c7b68b4a regexp/syntax: Clarify comment of OpAnyCharNotNL.
LGTM=iant
R=golang-codereviews, iant
CC=golang-codereviews
https://golang.org/cl/171560043
2014-11-11 18:52:07 -08:00
Russ Cox
22be4bfdbf regexp: fix TestOnePassCutoff
The stack blowout can no longer happen,
but we can still test that too-complex regexps
are rejected.

Replacement for CL 162770043.

LGTM=iant, r
R=r, iant
CC=bradfitz, golang-codereviews
https://golang.org/cl/162860043
2014-10-20 12:16:46 -04:00
Ian Lance Taylor
0f022fdd52 regexp/syntax: fix validity testing of zero repeats
This is already tested by TestRE2Exhaustive, but the build has
not broken because that test is not run when using -test.short.

LGTM=rsc
R=rsc
CC=golang-codereviews
https://golang.org/cl/155580043
2014-10-20 08:12:45 -07:00
Ian Lance Taylor
3c5fd98918 regexp: correct doc comment for ReplaceAllLiteralString
Fixes #8959.

LGTM=adg
R=golang-codereviews, adg
CC=golang-codereviews
https://golang.org/cl/161790043
2014-10-19 10:28:27 -07:00
Russ Cox
85fd0fd7c4 regexp/syntax: regenerate doc.go from re2 syntax
Generated using re2/doc/mksyntaxgo.

Fixes #8505.

LGTM=iant
R=r, iant
CC=golang-codereviews
https://golang.org/cl/155890043
2014-10-06 15:32:11 -04:00
Russ Cox
9b2b0c8c16 regexp/syntax: reject large repetitions created by nesting small ones
Fixes #7609.

LGTM=r
R=r
CC=golang-codereviews
https://golang.org/cl/150270043
2014-09-30 12:08:09 -04:00
Russ Cox
c007ce824d build: move package sources from src/pkg to src
Preparation was in CL 134570043.
This CL contains only the effect of 'hg mv src/pkg/* src'.
For more about the move, see golang.org/s/go14nopkg.
2014-09-08 00:08:51 -04:00