Discussion:
RegEx: text not inside
davidoff12001@yahoo.de [sed-users]
2014-08-14 21:01:57 UTC
Permalink
Hello,

I've read a difficult problem:
a RegEx, that matches a string, which doesn't include a 2nd string.
foo-xxx--bar -> no match
foo-xyx--bar -> match
It should match, if "foo.+?bar" doesn't contain "xx", something like
"foo.+(?!xx)bar", but that matches always.

Any idea?

-davidoff12001
Tim Chase sed@thechases.com [sed-users]
2014-08-14 21:48:43 UTC
Permalink
Post by ***@yahoo.de [sed-users]
a RegEx, that matches a string, which doesn't include a 2nd string.
foo-xxx--bar -> no match
foo-xyx--bar -> match
It should match, if "foo.+?bar" doesn't contain "xx", something like
"foo.+(?!xx)bar", but that matches always.
You mention wanting to match, but not what you want to do with the
match. Just print it? Extract portions? Find lines that match and
then perform some other action?

You also write to the sed-users mailing list but your syntax at the
bottom looks like a Perlish regex, so I want to confirm that you're
using sed, not some other program.

In a Perlish, you'd do something like

foo(?:(?!xx).)*bar

which works at least here in Python (which uses mostly-PCREs)

-tim
davidoff12001@yahoo.de [sed-users]
2014-08-15 12:35:27 UTC
Permalink
Post by Tim Chase ***@thechases.com [sed-users]
You also write to the sed-users mailing list but your syntax at the
bottom looks like a Perlish regex, so I want to confirm that you're
using sed, not some other program.
You are right, it's a 3rd party program, that uses PCRE. I thought,
they are compatible and asked here, because here write very clever
people.
Post by Tim Chase ***@thechases.com [sed-users]
foo(?:(?!xx).)*bar
And my last thought was right. :-) :-) I don't understand it but this
works, great! Thank you very much!!

-davidoff12001
Tim Chase sed@thechases.com [sed-users]
2014-08-15 13:36:27 UTC
Permalink
Post by ***@yahoo.de [sed-users]
Post by Tim Chase ***@thechases.com [sed-users]
foo(?:(?!xx).)*bar
And my last thought was right. :-) :-) I don't understand it but
this works, great!
It translates as

foo # the literal "foo"
(?: # a non-collecting group
(?!xx) # assert that "xx" doesn't match here
. # any single character
) # the end of the non-collecting group
* # that "any character where 'xx' doesn't match"
# zero or more times
bar # the literal "bar"

-tim
davidoff12001@yahoo.de [sed-users]
2014-08-15 15:34:38 UTC
Permalink
Post by Tim Chase ***@thechases.com [sed-users]
(?: # a non-collecting group
Thanks for the explanation. These groups I've used rarely and forgot
them.
Post by Tim Chase ***@thechases.com [sed-users]
. # any single character
) # the end of the non-collecting group
* # that "any character where 'xx' doesn't match"
Nice idea.

-davidoff12001

Davide Brini dave_br@gmx.com [sed-users]
2014-08-14 22:04:51 UTC
Permalink
Post by ***@yahoo.de [sed-users]
Hello,
a RegEx, that matches a string, which doesn't include a 2nd string.
foo-xxx--bar -> no match
foo-xyx--bar -> match
It should match, if "foo.+?bar" doesn't contain "xx", something like
"foo.+(?!xx)bar", but that matches always.
The best you can do in sed is to check that there is a match for (as an
example) foo-...-bar and if so, check that there is no match for whatever
you want to exclude.
ISTR that some there are plans to add PCRE support to GNU sed, so at some
point you may be able to do it with a single regex in sed too.

But for the time being, use perl. Seriously.
--
D.
Jim Hill gjthill@gmail.com [sed-users]
2014-08-15 01:21:35 UTC
Permalink
That's pushing regex limits, but it's easily within sed's:

/foo.*bar/!d
/foo.*xx.*bar/d


[Non-text portions of this message have been removed]
Daniel Goldman dgoldman@ehdp.com [sed-users]
2014-08-15 01:59:00 UTC
Permalink
Not quite sure what you want to match (not match). The regex also seems
a bit odd to me. To simplify things, it would help if you would give
concrete test cases, like:


Match:
........
........
........
........


No Match:
........
........
........
........


Then someone can make a test file, propose a solution.


Daniel
Post by ***@yahoo.de [sed-users]
Hello,
a RegEx, that matches a string, which doesn't include a 2nd string.
foo-xxx--bar -> no match
foo-xyx--bar -> match
It should match, if "foo.+?bar" doesn't contain "xx", something like
"foo.+(?!xx)bar", but that matches always.
Any idea?
-davidoff12001
------------------------------------
------------------------------------
Loading...