Regular Expressions and negating a whole character group [duplicate]

Question

I'm attempting something which I feel should be fairly obvious to me but it's not. I'm trying to match a string which does NOT contain a specific sequence of characters. I've tried using [^ab], [^(ab)], etc. to match strings containing no 'a's or 'b's, or only 'a's or only 'b's or 'ba' but not match on 'ab'. The examples I gave won't match 'ab' it's true but they also won't match 'a' alone and I need them to. Is there some simple way to do this?

@finnw maybe he was refering to it into the context of stackoverflow.com/q/36754105/3186555? — user3186555
– user3186555, Commented Apr 21, 2016 at 1:24

Peter Boughton · Accepted Answer · 2009-06-10 18:20:39Z

473

Using a character class such as [^ab] will match a single character that is not within the set of characters. (With the ^ being the negating part).

To match a string which does not contain the multi-character sequence ab, you want to use a negative lookahead:

^(?:(?!ab).)+$

And the above expression disected in regex comment mode is:

(?x)    # enable regex comment mode
^       # match start of line/string
(?:     # begin non-capturing group
  (?!   # begin negative lookahead
    ab  # literal text sequence ab
  )     # end negative lookahead
  .     # any single character
)       # end non-capturing group
+       # repeat previous match one or more times
$       # match end of line/string

edited Jun 10, 2009 at 18:20

answered Jun 10, 2009 at 18:11

Peter Boughton

113k32 gold badges125 silver badges177 bronze badges

Sign up to request clarification or add additional context in comments.

Gibado Over a year ago

Dissecting the regex was very helpful for me. Thank you.

phil294 Over a year ago

..and for replacing it, probably just ^((?!ab).+)$.

Thiago Mata Over a year ago

A small note. The . from the "any single character" is only for the same line. If you need to do this to multi-line regex, you may need to replace it to (.|\n)

Nikkorian Over a year ago

Thanks for that - very informative. Having played with it, I think it's worth noting that the 'ab' that you've described as, and in your example is, a "literal text sequence" can in fact be a complex regular expression. So if you have a regex that matches some pattern in strings, then wrap that regex inside '^(?:(?!' and ').)+$', the resulting regex will match strings that do not contain a match for the original regex.

KalenGi Over a year ago

The Debug feature of RegexBuddy does a great job of illustrating how the negative lookahead works.

Geremia · Accepted Answer · 2023-09-02 23:44:22Z

238

Use negative lookahead (cf. Regexr.com explanation):

^(?!.*ab).*$

UPDATE: In the comments below, I stated that this approach is slower than the one given in Peter's answer. I've run some tests since then, and found that it's really slightly faster. However, the reason to prefer this technique over the other is not speed, but simplicity.

The other technique, described here as a tempered greedy token, is suitable for more complex problems, like matching delimited text where the delimiters consist of multiple characters (like HTML, as Luke commented below). For the problem described in the question, it's overkill.

For anyone who's interested, I tested with a large chunk of Lorem Ipsum text, counting the number of lines that don't contain the word "quo". These are the regexes I used:

(?m)^(?!.*\bquo\b).+$

(?m)^(?:(?!\bquo\b).)+$

Whether I search for matches in the whole text, or break it up into lines and match them individually, the anchored lookahead consistently outperforms the floating one.

edited Sep 2, 2023 at 23:44

Geremia

5,9161 gold badge44 silver badges47 bronze badges

answered Jun 10, 2009 at 18:10

Alan Moore

75.7k13 gold badges110 silver badges161 bronze badges

Blixt Over a year ago

I believe this is more efficient: (?:(?!ab).)*

Peter Boughton Over a year ago

Also wants to use start/end markers to enforce the check on the whole string.

Alan Moore Over a year ago

@Blixit: yes, it is. But it's also harder to read, especially for regex newbies. The one I posted will be efficient enough for most applications.

Peter Boughton Over a year ago

Don't write code aimed at newbies! If code is hard to read, leave comments/documentation so they can learn, instead of using lesser code that keeps them ignorant.

Alan Moore Over a year ago

If I had thought there would be a noticeable difference between the two approaches, I wouldn't have hesitated to recommend the faster one. On the other hand, regexes are so opaque (if not cryptic), I think it's worthwhile to break the knowledge into smaller, more manageable chunks whenever possible.

Abhinav Gupta · Accepted Answer · 2009-06-10 18:16:07Z

75

Yes its called negative lookahead. It goes like this - (?!regex here). So abc(?!def) will match abc not followed by def. So it'll match abce, abc, abck, etc.

Similarly there is positive lookahead - (?=regex here). So abc(?=def) will match abc followed by def.

There are also negative and positive lookbehind - (?<!regex here) and (?<=regex here) respectively

One point to note is that the negative lookahead is zero-width. That is, it does not count as having taken any space.

So it may look like a(?=b)c will match "abc" but it won't. It will match 'a', then the positive lookahead with 'b' but it won't move forward into the string. Then it will try to match the 'c' with 'b' which won't work. Similarly ^a(?=b)b$ will match 'ab' and not 'abb' because the lookarounds are zero-width (in most regex implementations).

More information on this page

answered Jun 10, 2009 at 18:16

Abhinav Gupta

4,7122 gold badges26 silver badges18 bronze badges

Leith Over a year ago

Referencing the 'lookbehind' operators as well was useful, not all online regex parsers/documentation will include it, even if it is valid and works.

Guilherme Taffarel Bergamin Over a year ago

in regex101.com, ?! did ignore that group, but it kept matching the rest of the string. Is there a way to use that for making the whole line excluded if it has a specific pattern?

Krabat · Accepted Answer · 2010-11-17 13:10:38Z

6

abc(?!def) will match abc not followed by def. So it'll match abce, abc, abck, etc. what if I want neither def nor xyz will it be abc(?!(def)(xyz)) ???

I had the same question and found a solution:

abc(?:(?!def))(?:(?!xyz))

These non-counting groups are combined by "AND", so it this should do the trick. Hope it helps.

answered Nov 17, 2010 at 13:10

Krabat

1431 silver badge6 bronze badges

Scratte Over a year ago

Where is that quote from? Only part of it comes from this Answer. Apart from that, you've not answered the Question, but seems to have answered something that you haven't linked to. I think abc(?:(?!def)(?!xyz)) would do. They're in the con-capturing group already. No need to put another one inside it. They're also not "combined by "AND"". They're checked one at a time, just like ab is first checked for a, then for b, but lookaheads just don't move the cursor along.

Copas · Accepted Answer · 2009-06-10 18:10:19Z

5

Using a regex as you described is the simple way (as far as I am aware). If you want a range you could use [^a-f].

answered Jun 10, 2009 at 18:10

Copas

5,9425 gold badges32 silver badges43 bronze badges

Comments

patjbs · Accepted Answer · 2009-06-10 20:33:35Z

5

In this case I might just simply avoid regular expressions altogether and go with something like:

if (StringToTest.IndexOf("ab") < 0)
  //do stuff

This is likely also going to be much faster (a quick test vs regexes above showed this method to take about 25% of the time of the regex method). In general, if I know the exact string I'm looking for, I've found regexes are overkill. Since you know you don't want "ab", it's a simple matter to test if the string contains that string, without using regex.

answered Jun 10, 2009 at 20:33

patjbs

4,8523 gold badges26 silver badges18 bronze badges

Peter Boughton Over a year ago

This is a good point! If the sequence is a simple string then a regex is over-complicating things; a contains/indexOf check is the more sensible option.

Diego Perini · Accepted Answer · 2010-12-08 17:45:47Z

4

Just search for "ab" in the string then negate the result:

!/ab/.test("bamboo"); // true
!/ab/.test("baobab"); // false

It seems easier and should be faster too.

answered Dec 8, 2010 at 17:45

Diego Perini

8,5791 gold badge21 silver badges9 bronze badges

Comments

Amit Joki · Accepted Answer · 2014-07-02 17:12:04Z

4

Simplest way is to pull the negation out of the regular expression entirely:

if (!userName.matches("^([Ss]ys)?admin$")) { ... }

edited Jul 2, 2014 at 17:12

Amit Joki

59.5k7 gold badges80 silver badges97 bronze badges

answered Jun 10, 2009 at 18:16

user71268

7691 gold badge5 silver badges6 bronze badges

Godeke Over a year ago

While this is useful if you are consuming just that expression, as part of a larger expression the negative lookahead method described by Peter allows both positive and negative conditions in a single string.

user71268 Over a year ago

Absolutely true. But the question was to "match a string which does NOT contain a specific sequence of characters". I think for that purpose negative lookahead is overkill.

Jamel Toms Over a year ago

Can't do this if you're using a text editor.

mwieczorek Over a year ago

Not useful if you're using regex outside of a programming language, like Apache or Nginx config....

Wiktor Stribiżew · Accepted Answer · 2020-06-29 09:14:51Z

2

The regex [^ab] will match for example 'ab ab ab ab' but not 'ab', because it will match on the string ' a' or 'b '.

What language/scenario do you have? Can you subtract results from the original set, and just match ab?

If you are using GNU grep, and are parsing input, use the '-v' flag to invert your results, returning all non-matches. Other regex tools also have a 'return nonmatch' function, too.

If I understand correctly, you want everything except for those items which contain 'ab' anywhere.

edited Jun 29, 2020 at 9:14

Wiktor Stribiżew

632k41 gold badges506 silver badges636 bronze badges

answered Jun 10, 2009 at 18:13

maxwellb

14k2 gold badges30 silver badges36 bronze badges

Scratte Over a year ago

"The regex [^ab] will match for example 'ab ab ab ab' but not 'ab', because it will match on the string ' a' or 'b '.". This seems to be incorrect. [^ab] is a character class that matches everything except a's and b's. Obviously it will match the spaces.

Collectives™ on Stack Overflow

Regular Expressions and negating a whole character group [duplicate]

9 Answers 9

Comments

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

9 Answers 9

6 Comments

15 Comments

2 Comments

1 Comment

Comments

1 Comment

Comments

4 Comments

1 Comment

Linked

Related