I imagine the non-blank lines to be like islands in a sea of blank
lines. Then how we deal with going from island to island is what the sed code does. Here is the "verbo-sed" version of the sed code presented earlier.
Notes:
blank line => /^[ \t]*$/
non-blank line => /[^ \t]/
sed -e '
# delete all leading blank lines
/[^ \t]/,$!d
# non-blanks to be displayed
/[^ \t]/b
# as soon as you hit a blank line which is not leading
# (since, all the leading blanks have already been take care of above)
# start accumulating them. Then in that process one of 2 things can
# happen: i) we either run out of lines => it was the bunch of
# trailing blank lines , which need to be deleted per specs. Or,
# ii) we hit a non-blank line, meaning we need to output that bunch
# and need to restart. Now, it's important to realize that when this
# happens sed will NOT use the first line of code, since it's a range
# operator which has already been turned OFF after all the leading
# blanks have been deleted.
# the command duo: N; /\n[ \t]*$/ba => after adding the next line
# into the pattern space, check whether the just added line was
# blank. If it was, then just loop back to accumulate more,
# otherwise, display the pattern space & start all over.
# the below can be looked upon as the sed version of a do-while loop.
:loop
$d
N
/\n[ \t]*$/bloop
' yourfile
HTH
Rakesh
Post by DanielThat is really great.
The first part (remove leading blank lines) is clear to me: "Find the range from the first non-blank line to the end of the stream. Delete the opposite range (any leading blank lines)".
The second part (removes trailing blank lines) is less clear, even after using sedsed some. It works in all the test cases I tried. But can you explain a little how / why it works?
Thanks,
Daniel
Post by RAKESHAfter posting the reply I saw the typo(line-2), & then while fixing it re-wrote the regex.
sed -e '
/[^ \t]/,$!d
/[^ \t]/b
:a
$d
N
/\n[ \t]*$/ba
' yourfile
Post by RAKESHThanks for the detailed distinctions in blank lines vs. the empty ones. Although I was aware of the differences(hence mentioned upfront my assumptions), and chose to use the /^$/ variety since it doesn't get in the way of the sed code in terms of legibility.
recapitulating, (i.e., /^$/ based )
#-----------------------------
/./,$!d
/./b
:a
$d
N
/\n\n*$/ba
#-----------------------------
Now this is how the above would appear using the /^\s*$/ approach
Note: repl. \t => literal TAB since "sed" doesn't support \t
and there's no way to show it's a TAB otherwise in here.
#------------------------------------------
/[^ \t]/,$!d
/[^ \t/b
:a
$d
N
h
s/.*\n//
/^[ \t]*$/{
g;ba
}
g
#------------------------------------------
Notice the complexity increase in this case, which is due to fact that "sed" doesn't come armed with the \s \S regexes, otherwise this would have been a breeze, as you yourself state.
Just so that the OP gets an idea of a sed-based flow is why I chose the code that I gave. Once the hang of that is got, then the OP can upgrade to the /^\s*$/ approach if needed.
Rakehs
Post by Sven GuckesPost by RAKESHNote: blank line => /^$/ i.e.,
one which has no characters,
not even spaces &/or TABs in it.
well, a line with *no* characters in it i'd call "empty".
a "blank line" is contains only "blank characters";
"blank characters" are characters
which do not show up with any dots
(unless a font gives them any dots).
an empty line is a special case of a blank line
because all characters contained are "blank".
it is certainly valid for every contained element
as there are none for which it requires such quality.
(okay.. a little bit of set logic here..)
you may define a subset of these blank characters
to be valid only, eg spaces and tabs.
in the editor vim you can use the pattern "\s" for these.
(i think the notation of '\s' has been taken from
"perl compatible regular expressions" aka PCRE.)
so "blank lines" match the pattern "^\s*$"
which also matches with empty lines.
too bad we dont have "anchors" for
"start of data" and "end of data".
these would probably make the
given problem a breeze.
Sven
--
$ man 7 regex
alnum alpha blank cntrl digit graph
lower print punct space upper xdigit