mutt+sed - deleting line ranges

Discussion:

Sven Guckes maillists-yahoo@guckes.net [sed-users]

2016-01-19 01:06:39 UTC

so my mailer is "mutt" (www.mutt.org) - and it
has this nifty feature of a "display filter",
ie messages get filtered through this program before the
resulting text is shown, so i can change and delete text.
changing allow me to correct those misspelled words and
deleting allows for hiding signature and maillist footers.

i had been using "sed" as my filter for many years now.
the setup for mutt is as easy as this line:

set display_filter="/bin/sed -f /home/user/guckes/.mutt/df.sed"

so.. it's just a simple "set option=value" command.

getting rid of signatures which are delimited by
the sigdashes line (das-dash-space) is dead easy:

/-- /,$d

since that change by yahoo groups my From: line looks like this:

From: "Sven Guckes maillists-***@guckes.net [sed-users]"

fixing this to "name <address>" is easy within sed:

/^From: .*sed-users/s:"$.*$ $[^ ]*@[^ ]*$ \[sed-users\]".*:\1 <\2>:

but i should also get rid of seeing those footers now..
example:

------------------------------------
Posted by: Sven Guckes <maillists-***@guckes.net>
------------------------------------

deleting this in the vim editor (www.vim.org) is easy:

:/Posted by/-1,/Posted by/+1d

that's because you can do easy arithmetic
on the pattern to get a new line address:

/pattern/-1,/pattern/+1

but how do you do arithmetic with lines in sed again?
how to tell sed to "find the matching line, go back one,
and then delete three (current and next two) lines"?

i am sure the hold and pattern spaces are to be
used here - but i never quite got the hang of it. :-/

maybe this can even be extended to
"delete this paragraph if it contains pattern" or
"delete from line1 to line2 if it contains pattern"?

incidentally - is there some "visualizer"
to show what sed is currently doing?
you know, debugger style and all. ;)

Sven

Daniel Goldman dgoldman@ehdp.com [sed-users]

2016-01-19 04:42:29 UTC

Permalink

A minor point, but assuming the dash-dash-space is a line by itself (is
that the case?), you might use /^-- $/,$d instead.

The only "visualizer" I know of is Aurelio Jargas' sedsed debugger.

"ex scripts" might help with the task, or might just be interesting for
anyone not familiar with them. Hardly anybody does ex scripts anymore.
However, they are very useful, and kind of related to sed scripts.

Anybody good at vi ex mode could be good at using ex scripts.

Of course, "ex scripts" are NOT filters. ex always edits the file in
place. That is probably a deal-breaker for you. Perhaps an ex script
could somehow be embedded within a shell script to serve as a filter. Or
perhaps mutt has another option that would allow ex to be used. Anyway,
here's how ex would get rid of the lines, using the same vi syntax you
used (since the syntax is just vi ex mode):

========================

$ cat message.txt
Hello, Dan!!!

------------------------------------
Posted by: Sven Guckes <***@experts.sed>
------------------------------------

$ cat script.txt
/^Posted by:/-1,//+1d
wq

$ cp message.txt temp.txt
$ cat script.txt | ex -s temp.txt
$ cat temp.txt
Hello, Dan!!!

========================

As you say, the sed hold space can be used for the same end. It's
arguably more tricky / tedious to program / maintain. awk is another
possibility: the previous line can be saved to a variable, and printed
if current line does not match the pattern.

Daniel

Post by Sven Guckes maillists-***@guckes.net [sed-users]
so my mailer is "mutt" (www.mutt.org) - and it
has this nifty feature of a "display filter",
ie messages get filtered through this program before the
resulting text is shown, so i can change and delete text.
changing allow me to correct those misspelled words and
deleting allows for hiding signature and maillist footers.
i had been using "sed" as my filter for many years now.
set display_filter="/bin/sed -f /home/user/guckes/.mutt/df.sed"
so.. it's just a simple "set option=value" command.
getting rid of signatures which are delimited by
/-- /,$d
but i should also get rid of seeing those footers now..
:/Posted by/-1,/Posted by/+1d
that's because you can do easy arithmetic
/pattern/-1,/pattern/+1
but how do you do arithmetic with lines in sed again?
how to tell sed to "find the matching line, go back one,
and then delete three (current and next two) lines"?
i am sure the hold and pattern spaces are to be
used here - but i never quite got the hang of it. :-/
maybe this can even be extended to
"delete this paragraph if it contains pattern" or
"delete from line1 to line2 if it contains pattern"?
incidentally - is there some "visualizer"
to show what sed is currently doing?
you know, debugger style and all. ;)
Sven

Cameron Simpson cs@zip.com.au [sed-users]

2016-01-19 05:07:03 UTC

Permalink

Post by Daniel Goldman ***@ehdp.com [sed-users]
A minor point, but assuming the dash-dash-space is a line by itself (is
that the case?), you might use /^-- $/,$d instead.

Yes, signatures are supposed to introduced by a "-- " line. Particularly so
people like Sven can delete them:-)

Cheers,
Cameron Simpson <***@zip.com.au>

Sven Guckes maillists-yahoo@guckes.net [sed-users]

2016-01-19 16:42:24 UTC

Permalink

Post by Daniel Goldman ***@ehdp.com [sed-users]
A minor point, but assuming the dash-dash-space is a line by
itself (is that the case?), you might use /^-- $/,$d instead.

oops.. yes - i simply had forgotten to type the BOL/EOL anchors.
so, indeed, "/^-- $/,$d" is what i use in my display_filter.

Post by Daniel Goldman ***@ehdp.com [sed-users]
The only "visualizer" I know of is Aurelio Jargas' sedsed debugger.

why - Aurelio's debugger, of course! :-)

but i never really worked with it.
is anyone using it? can you please
share some examples with screenshots?

Post by Daniel Goldman ***@ehdp.com [sed-users]
"ex scripts" might help with the task, or might just be interesting for
anyone not familiar with them. Hardly anybody does ex scripts anymore.
However, they are very useful, and kind of related to sed scripts.
Anybody good at vi ex mode could be good at using ex scripts.
Of course, "ex scripts" are NOT filters. ex always edits
the file in place. That is probably a deal-breaker for you.

the in-place-editing is a deal-breaker, indeed.
reading those mails certainly requires a filter.

once i am replying ti message i am within vim.
and i do use a lot of commands+macros then.
but i really do not want to change using
sed as my display_filter in mutt.

Post by Daniel Goldman ***@ehdp.com [sed-users]
Perhaps an ex script could somehow be embedded
within a shell script to serve as a filter.
Or perhaps mutt has another option that would allow ex to be used.

you can practically use *anything* as the display_filter. :-)
set display_filter="~/bin/script"

but i do not want to rewrite all that sed does for me in "ex". ;)

Post by Daniel Goldman ***@ehdp.com [sed-users]
As you say, the sed hold space can be used for the same end.
It's arguably more tricky / tedious to program / maintain.
awk is another possibility: the previous line can be saved to a
variable, and printed if current line does not match the pattern.

i have been thinking about using awk for a long time now..
but most things so far were simply changes on one line -
so there hasnt been any good reason for this yes.
besides - sed is really really FAST! :)

but once i get the feeling for using hold+pattern space more
maybe i can also tackle those problems i had described
which concern conditions over multiple lines.

i'd describe these kind of problems like this:
RE1...RE2...RE3

the patterns RE1 and RE3 give the "frame" -
and RE2 comes somewhere in between -
hence the dots to describe that.

the basic idea is to require that given frame
for making changes/deletions to pattern RE2.
read this as "make changes/deletions to RE2
only iff it is surrounded by RE1 and RE3".

one of these is deleting those extra "signatures":

------------------------------------
Posted by: ...
------------------------------------

here, the patterns are like these:
RE1="^-{37}$"
RE2="^Posted by:"
RE3="^-{37}$"

a possible notation:
|RE1|RE2|RE3|/RE2/d

so "/RE2/d" simply deletes all lines
within the RE1,RE3 block matching RE2.

it is pretty obvious that such constructs can
become arbitrarily large in the data they amount.
however, maybe the can be restricted in some way?
something like "test for max N lines"?

maybe this will look pretty weird in sed,
and can be described more easily in awk.
a nice syntax would certainly be a
reason to switch to awk then. ;)

so.. if you have any ideas - i am really
looking forward to some ideas handling
those RE1...RE2...RE3 problems. :)

Sven

dgoldman@ehdp.com [sed-users]

2016-01-19 20:18:05 UTC

Permalink

I use sedsed sometimes to verify logic, to follow what sed is doing. sedsed is not perfect, sometimes it gets confused (I forget the exact circumstances). However, it is usually helpful.

I would not call sedsed a "visualizer". To me, it seems like a debugger. It shows various intermediate steps and buffer contents. I like it.

If Aurelio is on the group, it might be better for him to explain sedsed, since he created it and knows the most about it.

If we do not hear from Aurelio in a while, or nobody else jumps in the meantime, I'll provide some examples in a week or so.

Daniel

[Non-text portions of this message have been removed]

Sven Guckes maillists-yahoo@guckes.net [sed-users]

2016-01-19 21:25:55 UTC

Permalink

Post by ***@ehdp.com [sed-users]
I use sedsed sometimes to verify logic, to follow what sed is doing.
sedsed is not perfect, sometimes it gets confused (I forget
the exact circumstances). However, it is usually helpful.
I would not call sedsed a "visualizer". To me, it seems like a debugger.
It shows various intermediate steps and buffer contents. I like it.
If Aurelio is on the group, it might be better for him to explain
sedsed, since he created it and knows the most about it.

i just realized that i had not looked at
Aurelio's website for quite some time.

it's one of those big sites with a TON of info
which made me want to learn yet another language -
in this case LANG=pt_BR (brazilian portugese).

here is my short info:

Aurelio Jargas - aurelio.net - sed, vim - and more! LANG=pt_BR
sed: http://aurelio.net/sed/
http://aurelio.net/projects/sedsed/
http://aurelio.net/projects/sedsokoban/
http://sed.sf.net [since 2002] "the sed $HOME"
https://github.com/aureliojargas/sed.sf.net
shell: http://aurelio.net/shell/ crygwin; awk bash grep sed
vim: http://aurelio.net/vim/

Post by ***@ehdp.com [sed-users]
If we do not hear from Aurelio in a while, or nobody else jumps
in the meantime, I'll provide some examples in a week or so.

examples are *always* welcome! :)

Aurelio has given a few right there:
http://aurelio.net/projects/sedsed/

Sven

dgoldman@ehdp.com [sed-users]

2016-01-19 21:40:03 UTC

Permalink

Yes, Aurelio has a great website with high quality software. Yes, the sedsed examples on the website are good.

sedsed is not that complicated, which is great. Basically, instead of saying "sed ..." one says "sedsed -d ...". To my understanding, that is about all required. There is not much else to it, other than the ability to suppress parts of the debugging output to reduce clutter.

The debugging output lets one follow the logic of what sed (or I guess the python-based sed that Aurelio wrote?) is doing.

Daniel

[Non-text portions of this message have been removed]

dgoldman@ehdp.com [sed-users]

2016-01-19 22:18:02 UTC

Permalink

Sven,

Here's a way (I think) to include the filter in a shell wrapper (filter.sh file). In this case, I think using a wrapper is better than solely relying upon sed, because it gives you a lot more flexibility to use other tools (tr, grep, awk, ex, etc).

$ cat input.txt
Hello, Dan!!!
------------------------------------
Posted by: Sven Guckes <***@...>
------------------------------------

$ cat sed.txt
s/Hello/Goodbye/g

$ cat ex.txt
/^Posted by:/-1,//+1d
wq

$ cat filter.sh
#!/bin/bash

TMP_FILE=/tmp/tmpfile-$$
trap "rm -f $TMP_FILE" 0 2 9 15 # No clutter

cat > $TMP_FILE # Filter in

sed -i -f sed.txt $TMP_FILE # Modify content with sed
cat ex.txt | ex -s $TMP_FILE # Modify content with ex
# Modify content with awk, or whatever

cat $TMP_FILE # Filter out

# EOF

$ cat input.txt | filter.sh
Goodbye, Dan!!!

Using a wrapper, and especially using ex, will not be as fast as using pure sed, as you point out. However, in practice the difference is usually not perceptible. I use sed thousands of times most days, and write new sed code just about every day, but almost always within a shell or C wrapper. Of course, I also have an e3-1245 cpu. If it ends up too slow for you, then you might need to do something different.

I rarely use awk. It is surprisingly powerful and clean. However, I just never need awk. I use "sed / tr / grep / cut / paste / etc." for simple operations, shell scripts for wrappers and moderately complex operations, and C for wrappers and greatly complex operations. Nothing wrong with awk. I just don't currently need it.

I hope this helps some. Just opinions based on my experience.

Daniel

[Non-text portions of this message have been removed]

Sven Guckes maillists-yahoo@guckes.net [sed-users]

2016-01-19 22:58:30 UTC

Permalink

thanks, Dan!

so.. a tempfile to work on with all tools within
a shell script - neat! i need to test this a bit..

$ cat ex.txt
/^Posted by:/-1,//+1d
wq
cat ex.txt | ex -s $TMP_FILE # Modify content with ex

i like using the arithmentic of ex (vi;vim), of course.

but how to tackle this "RE1...RE2...RE3" thing?
selecting a range like "/RE1/,/RE3/" is easy -
but how to ensure that "RE2" in between that?
anyone here with some knowledge on awk?

Sven

sharma__r@hotmail.com [sed-users]

2016-01-20 13:08:30 UTC

Permalink

Post by Daniel Goldman ***@ehdp.com [sed-users]
A minor point, but assuming the dash-dash-space is a line by
itself (is that the case?), you might use /^-- $/,$d instead.

oops.. yes - i simply had forgotten to type the BOL/EOL anchors.
so, indeed, "/^-- $/,$d" is what i use in my display_filter.

Post by Daniel Goldman ***@ehdp.com [sed-users]
The only "visualizer" I know of is Aurelio Jargas' sedsed debugger.

why - Aurelio's debugger, of course! :-)

but i never really worked with it.
is anyone using it? can you please
share some examples with screenshots?

you can practically use *anything* as the display_filter. :-)
set display_filter="~/bin/script"

but i do not want to rewrite all that sed does for me in "ex". ;)

Post by Daniel Goldman ***@ehdp.com [sed-users]
so.. if you have any ideas - i am really
looking forward to some ideas handling
those RE1...RE2...RE3 problems. :)
Sven

One straightforward way for this is using the sed's range operator.

BEGIN='^--*$'

END='^--*$'
MID='Posted by[:]'

sed -e "
/$BEGIN/,/$END/!d ; # skip non-interesting portion
/$MID/{
# this mid always occurs inside of begin-end portion
s/$/ this is a comment on the mid-portion/ ; # YMMV
}
"

HTH
-Rakesh

[Non-text portions of this message have been removed]

sharma__r@hotmail.com [sed-users]

2016-01-19 06:17:35 UTC

Permalink

One way to accomplish the pruning can be the following:

sed -e '
/^[-][-]*$/{
$q;N;$q;N
/\nPosted by[:] Sven Guckes <maillists-***@guckes[.]net>\n[-][-]*$/d
}

'

YMMV in case of the regex for the "Posted by" line.

[Non-text portions of this message have been removed]

Sven Guckes maillists-yahoo@guckes.net [sed-users]

2016-01-19 23:11:43 UTC

Permalink

Post by ***@hotmail.com [sed-users]
sed -e '
/^[-][-]*$/{
$q;N;$q;N
}
'
YMMV in case of the regex for the "Posted by" line.

well, i got this now in my display_filter sed file:

# by Rakesh Sharma ***@hotmail.com
/^--*$/{
$q;N;$q;N
/\nPosted by:.*\n--*$/d
}

that works! now i wont have to see those lines
in sed-users mails until i reply to them, ;)
thanks, Rakesh! :)

now to understand it all.. "$q"?

my biggest gripe with sed syntax is that
you have to use literal line breaks.
so much about "one-liners". *sigh*

ps: how does yahoogroups staff actually
call that "--- Posted by: ---" thing?

Sven

dgoldman@ehdp.com [sed-users]

2016-01-19 23:25:55 UTC

Permalink

$q means "quit if last line of file".

Here is a similar, simpler way that I think works, even as a somewhat long one-liner:

$ cat input.txt
Hello, Dan!!!
------------------------------------
Posted by: Sven Guckes <***@...>
------------------------------------

$ cat sed.txt
/^-----*$/ {N; /Posted by:/ {N; /------$/ d } }

$ sed -f sed.txt input.txt
Hello, Dan!!!

Daniel

[Non-text portions of this message have been removed]

Sven Guckes maillists-yahoo@guckes.net [sed-users]

2016-01-20 00:19:13 UTC

Permalink

Post by ***@ehdp.com [sed-users]
$q means "quit if last line of file".

ah! :)

Post by ***@ehdp.com [sed-users]
Here is a similar, simpler way that I think
/^-----*$/ {N; /Posted by:/ {N; /------$/ d } }

yay! :)

a one-liner right there. no need to change
to another tool or scripting. happy again. :)

thank you all for contributing!

so my sed script changes+deletes text -
and i got many text colouring for mutt,
and also colouring for text in vim
as well as many commands to fix stuff.

now on to finally adjusting some things
to asciidoc or markdown or pandoc - and
finally put it all together in a book
which is small and yet readable on phones.
stay tuned! :)

Sven

sharma__r@hotmail.com [sed-users]

2016-01-20 08:59:20 UTC

Permalink

"$q; N" is an operation to workaround the behavior of "sed" when it encounters
reading the next line when it's at eof. POSIX sed (not GNU sed) will quickly terminate
without printing what's there in the pattern space at that point in time when it executes
the N command on the eof. Hence, the $q; N workarounds this limitation.

sed -e '/^--*$/!b' -e '$q;N' -e '/\nPosted by:/!{P;D;}' -e '$q;N' -e '/\n--*$/d'

It's read from left -> right as:
a ) skip & print any line not comprising only dashes.
b ) in case we have a dashes-only line, just check if it's by any chance the last line.
in case it is the last, then just print it since it's not interesting for us.
c ) and when it's not the last line, we grab the next line as we enter the interesting phase.
d ) the pattern space now has two lines in it. 1st portion is good, i.e., dashes only.
now we need to check for the other portion. incase it's not "Posted by:" then we print the
1st portion and chop it off & go back with the remaining pattern space, (since it very well
might be dashes only) to the beginning of the sed code, but without reading the next
line of input.
e ) when the 2nd portion of the pattern space is "Posted by:" then we are well on the
road to a successful match, just need to read one more line to decide. But , as
before, we check whether we"are at eof, incase yes we just print the pattern space
and quit, or we grab the next line and attach it to the pattern space. Now the
pattern space contains three lines. (= 2 \n chars). portions 1 and 2 are just right for
a successful match. In case the 3rd line is a dashes only line, then yipeee we have
our target. Promptly delete it and start all over again. Else, print the three lines
and start over again.

Post by ***@ehdp.com [sed-users]
$q means "quit if last line of file".

ah! :)

Post by ***@ehdp.com [sed-users]
Here is a similar, simpler way that I think
/^-----*$/ {N; /Posted by:/ {N; /------$/ d } }