Discussion:
help with sed-friendly regex with negative look-ahead syntax for sendmail
Rob Kudyba rkudyba@yahoo.com [sed-users]
2017-02-23 15:51:31 UTC
Permalink
So the below regex (the left side in bold) does not seem to work in sendmail's R & K commands. I’m trying to allow emails from cities and schools that have email addresses that look like:
***@ci.boston.us or ***@aschool.ny.us. It seems that negative look-aheads are not supported. Would anyone have any idea on how to make this sed-friendly syntax? Note that the right side (unbolded) works fine by itself.
(?!.*\@ci\..+?\.us$)(?!.*\@*\..+?\.ny.us$)[a-zA-Z_0-9.-]+<@[a-zA-Z_0-9-]+?\.+[a-zA-Z_0-9.-]+?\.(us|info|to|br|bid|cn|ru)


[Non-text portions of this message have been removed]
Tim Chase sed@thechases.com [sed-users]
2017-02-23 19:35:35 UTC
Permalink
Post by Rob Kudyba ***@yahoo.com [sed-users]
So the below regex (the left side in bold) does not seem to work in
sendmail's R & K commands. I’m trying to allow emails from cities
negative look-aheads are not supported. Would anyone have any idea
on how to make this sed-friendly syntax? Note that the right side
(unbolded) works fine by itself.
[Non-text portions of this message have been removed]
Your message came through as plain-text (and the mailing list
apparently stripped any HTML mime-part) which doesn't show bold.
Could you provide parallel examples of both the working and
non-working regexps?

-tim
Rob Kudyba rkudyba@yahoo.com [sed-users]
2017-02-23 19:43:55 UTC
Permalink
Post by Tim Chase ***@thechases.com [sed-users]
Your message came through as plain-text (and the mailing list
apparently stripped any HTML mime-part) which doesn't show bold.
Could you provide parallel examples of both the working and
non-working regexps?
This by itself works:
[a-zA-Z_0-9.-]+<@[a-zA-Z_0-9-]+?\.+[a-zA-Z_0-9.-]+?\.(us|info|to|br|bid|cn|ru)
I'm just trying to preface that with:(?!.*\@ci\..+?\.us$)(?!.*\@*\..+?\.ny.us$)

Or just:(?!.*\@ci\..+?\.us$)

So the entire regex I have the works on regex testers is:(?!.*\@ci\..+?\.us$)(?!.*\@*\..+?\.ny.us$)[a-zA-Z_0-9.-]+<@[a-zA-Z_0-9-]+?\.+[a-zA-Z_0-9.-]+?\.(us|info|to|br|bid|cn|ru)

But it appears I have to convert PCRE syntax to sed-friendly syntax. #yiv6222507201 #yiv6222507201 -- #yiv6222507201ygrp-mkp {border:1px solid #d8d8d8;font-family:Arial;margin:10px 0;padding:0 10px;}#yiv6222507201 #yiv6222507201ygrp-mkp hr {border:1px solid #d8d8d8;}#yiv6222507201 #yiv6222507201ygrp-mkp #yiv6222507201hd {color:#628c2a;font-size:85%;font-weight:700;line-height:122%;margin:10px 0;}#yiv6222507201 #yiv6222507201ygrp-mkp #yiv6222507201ads {margin-bottom:10px;}#yiv6222507201 #yiv6222507201ygrp-mkp .yiv6222507201ad {padding:0 0;}#yiv6222507201 #yiv6222507201ygrp-mkp .yiv6222507201ad p {margin:0;}#yiv6222507201 #yiv6222507201ygrp-mkp .yiv6222507201ad a {color:#0000ff;text-decoration:none;}#yiv6222507201 #yiv6222507201ygrp-sponsor #yiv6222507201ygrp-lc {font-family:Arial;}#yiv6222507201 #yiv6222507201ygrp-sponsor #yiv6222507201ygrp-lc #yiv6222507201hd {margin:10px 0px;font-weight:700;font-size:78%;line-height:122%;}#yiv6222507201 #yiv6222507201ygrp-sponsor #yiv6222507201ygrp-lc .yiv6222507201ad {margin-bottom:10px;padding:0 0;}#yiv6222507201 #yiv6222507201actions {font-family:Verdana;font-size:11px;padding:10px 0;}#yiv6222507201 #yiv6222507201activity {background-color:#e0ecee;float:left;font-family:Verdana;font-size:10px;padding:10px;}#yiv6222507201 #yiv6222507201activity span {font-weight:700;}#yiv6222507201 #yiv6222507201activity span:first-child {text-transform:uppercase;}#yiv6222507201 #yiv6222507201activity span a {color:#5085b6;text-decoration:none;}#yiv6222507201 #yiv6222507201activity span span {color:#ff7900;}#yiv6222507201 #yiv6222507201activity span .yiv6222507201underline {text-decoration:underline;}#yiv6222507201 .yiv6222507201attach {clear:both;display:table;font-family:Arial;font-size:12px;padding:10px 0;width:400px;}#yiv6222507201 .yiv6222507201attach div a {text-decoration:none;}#yiv6222507201 .yiv6222507201attach img {border:none;padding-right:5px;}#yiv6222507201 .yiv6222507201attach label {display:block;margin-bottom:5px;}#yiv6222507201 .yiv6222507201attach label a {text-decoration:none;}#yiv6222507201 blockquote {margin:0 0 0 4px;}#yiv6222507201 .yiv6222507201bold {font-family:Arial;font-size:13px;font-weight:700;}#yiv6222507201 .yiv6222507201bold a {text-decoration:none;}#yiv6222507201 dd.yiv6222507201last p a {font-family:Verdana;font-weight:700;}#yiv6222507201 dd.yiv6222507201last p span {margin-right:10px;font-family:Verdana;font-weight:700;}#yiv6222507201 dd.yiv6222507201last p span.yiv6222507201yshortcuts {margin-right:0;}#yiv6222507201 div.yiv6222507201attach-table div div a {text-decoration:none;}#yiv6222507201 div.yiv6222507201attach-table {width:400px;}#yiv6222507201 div.yiv6222507201file-title a, #yiv6222507201 div.yiv6222507201file-title a:active, #yiv6222507201 div.yiv6222507201file-title a:hover, #yiv6222507201 div.yiv6222507201file-title a:visited {text-decoration:none;}#yiv6222507201 div.yiv6222507201photo-title a, #yiv6222507201 div.yiv6222507201photo-title a:active, #yiv6222507201 div.yiv6222507201photo-title a:hover, #yiv6222507201 div.yiv6222507201photo-title a:visited {text-decoration:none;}#yiv6222507201 div#yiv6222507201ygrp-mlmsg #yiv6222507201ygrp-msg p a span.yiv6222507201yshortcuts {font-family:Verdana;font-size:10px;font-weight:normal;}#yiv6222507201 .yiv6222507201green {color:#628c2a;}#yiv6222507201 .yiv6222507201MsoNormal {margin:0 0 0 0;}#yiv6222507201 o {font-size:0;}#yiv6222507201 #yiv6222507201photos div {float:left;width:72px;}#yiv6222507201 #yiv6222507201photos div div {border:1px solid #666666;height:62px;overflow:hidden;width:62px;}#yiv6222507201 #yiv6222507201photos div label {color:#666666;font-size:10px;overflow:hidden;text-align:center;white-space:nowrap;width:64px;}#yiv6222507201 #yiv6222507201reco-category {font-size:77%;}#yiv6222507201 #yiv6222507201reco-desc {font-size:77%;}#yiv6222507201 .yiv6222507201replbq {margin:4px;}#yiv6222507201 #yiv6222507201ygrp-actbar div a:first-child {margin-right:2px;padding-right:5px;}#yiv6222507201 #yiv6222507201ygrp-mlmsg {font-size:13px;font-family:Arial, helvetica, clean, sans-serif;}#yiv6222507201 #yiv6222507201ygrp-mlmsg table {font-size:inherit;font:100%;}#yiv6222507201 #yiv6222507201ygrp-mlmsg select, #yiv6222507201 input, #yiv6222507201 textarea {font:99% Arial, Helvetica, clean, sans-serif;}#yiv6222507201 #yiv6222507201ygrp-mlmsg pre, #yiv6222507201 code {font:115% monospace;}#yiv6222507201 #yiv6222507201ygrp-mlmsg * {line-height:1.22em;}#yiv6222507201 #yiv6222507201ygrp-mlmsg #yiv6222507201logo {padding-bottom:10px;}#yiv6222507201 #yiv6222507201ygrp-msg p a {font-family:Verdana;}#yiv6222507201 #yiv6222507201ygrp-msg p#yiv6222507201attach-count span {color:#1E66AE;font-weight:700;}#yiv6222507201 #yiv6222507201ygrp-reco #yiv6222507201reco-head {color:#ff7900;font-weight:700;}#yiv6222507201 #yiv6222507201ygrp-reco {margin-bottom:20px;padding:0px;}#yiv6222507201 #yiv6222507201ygrp-sponsor #yiv6222507201ov li a {font-size:130%;text-decoration:none;}#yiv6222507201 #yiv6222507201ygrp-sponsor #yiv6222507201ov li {font-size:77%;list-style-type:square;padding:6px 0;}#yiv6222507201 #yiv6222507201ygrp-sponsor #yiv6222507201ov ul {margin:0;padding:0 0 0 8px;}#yiv6222507201 #yiv6222507201ygrp-text {font-family:Georgia;}#yiv6222507201 #yiv6222507201ygrp-text p {margin:0 0 1em 0;}#yiv6222507201 #yiv6222507201ygrp-text tt {font-size:120%;}#yiv6222507201 #yiv6222507201ygrp-vital ul li:last-child {border-right:none !important;}#yiv6222507201

[Non-text portions of this message have been removed]
Rob Kudyba rkudyba@yahoo.com [sed-users]
2017-02-23 19:45:46 UTC
Permalink
‹I'll try copy/paste as plain text:

This by itself works:[a-zA-Z_0-9.-]+<@[a-zA-Z_0-9-]+?\.+[a-zA-Z_0-9.-]+?\.(us|info|to|br|bid|cn|ru)‹I'm just trying to preface that with:(?!.*\@ci\..+?\.us$)(?!.*\@*\..+?\.ny.us$)‹Or just:(?!.*\@ci\..+?\.us$)‹So the entire regex I have the works on regex testers is: Hide original message(?!.*\@ci\..+?\.us$)(?!.*\@*\..+?\.ny.us$)[a-zA-Z_0-9.-]+<@[a-zA-Z_0-9-]+?\.+[a-zA-Z_0-9.-]+?\.(us|info|to|br|bid|cn|ru)


[Non-text portions of this message have been removed]
Daniel Goldman dgoldman@ehdp.com [sed-users]
2017-02-23 20:13:49 UTC
Permalink
It would help if you provide an test input file, and the output you
want. Otherwise, at least for me, I'm kind of guessing about what you
are trying to do, and can't know for sure that a suggested solution does
all it needs to. Daniel
Post by Rob Kudyba ***@yahoo.com [sed-users]
[Non-text portions of this message have been removed]
------------------------------------
------------------------------------
Rob Kudyba rkudyba@yahoo.com [sed-users]
2017-02-23 20:26:09 UTC
Permalink
If you look at the guide in http://www.xiitec.com/blog/2009/02/25/using-regular-expressions-in-sendmail/, the goal is to reject emails from certain top level domains, such as .info, .us. and .bid but only if they have a subdomain. So:
good: ***@nicedomain.us, so *@*.usbad: ***@subdomain.notnicedomain.us, so *@*.*.us
This regex works:[a-zA-Z_0-9.-]+\@[a-zA-Z_0-9-]+?\.+[a-zA-Z_0-9.-]+?\.(us|info|bid)
I'm trying to include/append to the beginning of the above regex, a negative look-ahead that would, in essence, ignore emails like:1) ***@ci.anydomain.us so *@ci.*.us2) ***@anything.state.us (and state is one of the 50 states or DC) so *@*.ny.us (ny or any other state that is)

[Non-text portions of this message have been removed]
Daniel Goldman dgoldman@ehdp.com [sed-users]
2017-02-23 22:44:46 UTC
Permalink
Maybe others will help. I will not help without a test input file, with
the required output. That's the only way I can know that a solution is
correct. Otherwise, I'm just guessing and wasting time.

I think you need a test file, too, to verify any proposed solution.

You know exactly which emails to reject. So I think you could easily
create the test input and output files, to take care of all cases.

I'm not trying to be difficult. If you want to be effective using sed,
you need to make good test files. In any case, thanks for posting.
Post by Rob Kudyba ***@yahoo.com [sed-users]
[Non-text portions of this message have been removed]
------------------------------------
------------------------------------
Cameron Simpson cs@zip.com.au [sed-users]
2017-02-24 20:42:12 UTC
Permalink
Post by Rob Kudyba ***@yahoo.com [sed-users]
I'm trying to include/append to the beginning of the above regex, a negative
look-ahead that would, in essence, ignore emails like:1)
The simplest this would be to reject them as a separate prior stop.

In sed, something like:

/\.info$/d
/\.us$/d
/things-your-would-otherwise-accept/{
do whatever with those
}

Don't put it all in one regexp.

Cheers,
Cameron Simpson <***@zip.com.au>
sharma__r@hotmail.com [sed-users]
2017-02-25 08:41:46 UTC
Permalink
POSIX "sed" not only does not support lookarounds but also has a different notation for quantifiers.

And how I look at lookarounds is that they are positional and they constrain your existing regexs to make them look at things with a finer focus.


So with that rough idea we can rework your negative lookaround regex in regular Sed syntax as follows:




/[a-zA-Z_0-9.-]+<@[a-zA-Z_0-9-]+?\.+[a-zA-Z_0-9.-]+?\.(us|info|to|br|bid|cn|ru)/


We'll break it up and look at individual portions starting from L->R:


[a-zA-Z_0-9.-]+ => [a-zA-Z_0-9.-]\{1,\}



< => <


@ => @


[a-zA-Z_0-9-]+? => [a-zA-Z_0-9-]


\..+ => \..\{1,\}


[a-zA-Z_0-9.-]+? => [a-zA-Z_0-9.-]




\.(us|info|to|br|bid|cn|ru) => \.\(us\|info\|to\|br\|bid\|cn\|ru\)




So the regex becomes as a first pass(despite how it looks the below is all on one line and there are NO spaces at all)/


/[a-zA-Z_0-9.-]\{1,\}<@[a-zA-Z_0-9-]\..\{1,\}[a-zA-Z_0-9.-]\.(us\|info\|to\|br\|bid\|cn\|ru\)/





Now we have to constrain it some more for it to restrict somewhat, akin to tightening a water tap but not completely closing it off:



/regex_above/{


/.*\@.*\..\.ny.us$/d

/.*\@ci\..\.us$/d


# add some more here per taste


## what appears here is negative look arounded effectively.


}


Keep in mind that /regex_above/ is for GNU sed and not POSIX sed.


For POSIX sed you would need to break it up piece by piece, like as:


/[a-zA-Z_0-9.-]\{1,\}<@[a-zA-Z_0-9-]\..\{1,\}[a-zA-Z_0-9.-]\.us/bA
/[a-zA-Z_0-9.-]\{1,\}<@[a-zA-Z_0-9-]\..\{1,\}[a-zA-Z_0-9.-]\.info/bA

/[a-zA-Z_0-9.-]\{1,\}<@[a-zA-Z_0-9-]\..\{1,\}[a-zA-Z_0-9.-]\.to/bA

/[a-zA-Z_0-9.-]\{1,\}<@[a-zA-Z_0-9-]\..\{1,\}[a-zA-Z_0-9.-]\.br/bA

/[a-zA-Z_0-9.-]\{1,\}<@[a-zA-Z_0-9-]\..\{1,\}[a-zA-Z_0-9.-]\.bid/bA

/[a-zA-Z_0-9.-]\{1,\}<@[a-zA-Z_0-9-]\..\{1,\}[a-zA-Z_0-9.-]\.cn/bA

/[a-zA-Z_0-9.-]\{1,\}<@[a-zA-Z_0-9-]\..\{1,\}[a-zA-Z_0-9.-]\.ru/bA

d
#
:A


/.*\@.*\..\.ny.us$/d
/.*\@ci\..\.us$/d

## add more negative lookahead constraints here


## what appears here is negative look arounded effectively







NOTE: I did not verify the veracity of the regexes as I have used as u gave here. Some of them look suspect (could be due to mailer issues) but you need to take a careful n closer look at them.




HTH




[Non-text portions of this message have been removed]
Rob Kudyba rkudyba@yahoo.com [sed-users]
2017-02-27 14:47:24 UTC
Permalink
Hm no matter what I try sendmail replies with: "
/etc/mail/sendmail.cf: line 200: unknown configuration line "/[a-zA-Z_0-9.-]\{1,\}<@[a-zA-Z_0-9-]\..\{1,\}[a-zA-Z_0-9.-]\.(us\|info\|to\|br\|bid\|cn\|ru\)/{"
Perhaps it's not exactly sed syntax for sendmail?


[Non-text portions of this message have been removed]
Daniel Goldman dgoldman@ehdp.com [sed-users]
2017-02-27 18:18:14 UTC
Permalink
If you are asking a sendmail question (are you?), this may not be the
best place to get help.

If you want to transform text (do you?), using sed or whatever is best
suited, this group is a good place to get help, especially if you can
post a test input file, and the desired output.

Yes, different softwares have subtle differences in their regular
expression syntax.

Daniel
Post by Rob Kudyba ***@yahoo.com [sed-users]
Hm no matter what I try sendmail replies with: "
Perhaps it's not exactly sed syntax for sendmail?
[Non-text portions of this message have been removed]
------------------------------------
------------------------------------
------------------------------------

------------------------------------
--
------------------------------------

Yahoo Groups Links

<*> To visit your group on the web, go to:
http://groups.yahoo.com/group/sed-users/

<*> Your email settings:
Individual Email | Traditional

<*> To change settings online go to:
http://groups.yahoo.com/group/sed-users/join
(Yahoo! ID required)

<*> To change settings via email:
sed-users-***@yahoogroups.com
sed-users-***@yahoogroups.com

<*> To unsubscribe from this group, send an email to:
sed-users-***@yahoogroups.com

<*> Your use of Yahoo Groups is subject to:
https://info.yahoo.com/legal/us/yahoo/utos/terms/
sharma__r@hotmail.com [sed-users]
2017-03-02 08:10:03 UTC
Permalink
That is because you are using the GNU sed's enhanced syntax us\|info\|....
which "sendmail" will not like.


You'll have to break it up piece by piece, like as,


/[a-zA-Z_0-9.-]\{1,\}<@[a-zA-Z_0-9-]\..\{1,\}[a-zA-Z_0-9.-]\.us/bOK
/[a-zA-Z_0-9.-]\{1,\}<@[a-zA-Z_0-9-]\..\{1,\}[a-zA-Z_0-9.-]\.info/bOK
....


d
:OK
# do whatever you wanted to do with the .us|.info|.... mails here


---In sed-***@yahoogroups.com, <***@...> wrote :

Hm no matter what I try sendmail replies with: "
/etc/mail/sendmail.cf: line 200: unknown configuration line "/[a-zA-Z_0-9.-]\{1,\}<@[a-zA-Z_0-9-]\..\{1,\}[a-zA-Z_0-9.-]\.(us\|info\|to\|br\|bid\|cn\|ru\)/{"
Perhaps it's not exactly sed syntax for sendmail?


[Non-text portions of this message have been removed]




[Non-text portions of this message have been removed]
rkudyba@yahoo.com [sed-users]
2017-03-02 15:39:31 UTC
Permalink
Hi Sharma,

But I don't have to break it up piece by piece as this works by itself:


[a-zA-Z_0-9.-]+<@[a-zA-Z_0-9-]+?\.+[a-zA-Z_0-9.-]+?\.(us|info|to|br|bid|cn|ru)


It's the negative lookahead that I can't figure out. This tutorial says POSIX should work and perhaps I have to use one of the switches like -s or -n: http://etutorials.org/Server+Administration/Sendmail/Part+III+The+Configuration+File/Chapter+23.+The+K+Database-Map+Configuration+Command/regex/ http://etutorials.org/Server+Administration/Sendmail/Part+III+The+Configuration+File/Chapter+23.+The+K+Database-Map+Configuration+Command/regex/


But this might fall out of the scope of this group as it really may be a sendmail question.


---In sed-***@yahoogroups.com, <***@...> wrote :

That is because you are using the GNU sed's enhanced syntax us\|info\|....
which "sendmail" will not like.


You'll have to break it up piece by piece, like as,


/[a-zA-Z_0-9.-]\{1,\}<@[a-zA-Z_0-9-]\..\{1,\}[a-zA-Z_0-9.-]\.us/bOK
/[a-zA-Z_0-9.-]\{1,\}<@[a-zA-Z_0-9-]\..\{1,\}[a-zA-Z_0-9.-]\.info/bOK
....


d
:OK
# do whatever you wanted to do with the .us|.info|.... mails here


---In sed-***@yahoogroups.com mailto:sed-***@yahoogroups.com, <***@...> wrote :

Hm no matter what I try sendmail replies with: "
/etc/mail/sendmail.cf: line 200: unknown configuration line "/[a-zA-Z_0-9.-]\{1,\}<@[a-zA-Z_0-9-]\..\{1,\}[a-zA-Z_0-9.-]\.(us\|info\|to\|br\|bid\|cn\|ru\)/{"
Perhaps it's not exactly sed syntax for sendmail?


[Non-text portions of this message have been removed]




[Non-text portions of this message have been removed]







[Non-text portions of this message have been removed]

Loading...