lines matching N and N+1

Discussion:

Daniel

2013-01-28 20:18:23 UTC

This came up in a recent cross-Atlantic conversation. :)

The problem: How to apply a change when a *combination*
of lines occurs, eg when the word "foo" appears in line
N and the word "bar" appears in line N+1. How then to
apply the change in line N (or line N+1)? For example,
change word "foo" to "FOO" and word "bar" to "BAR".

Here's a test file I dreamed up:

food foot foo
bare bark bar
bar bark bare
bark bar bare
foot foo food
bark bar bare
foot foo food
foo foot food
bar bark bare
foot foo food

Here's the desired output:

food foot FOO
bare bark BAR
bar bark bare
bark bar bare
foot FOO food
bark BAR bare
foot foo food
FOO foot food
BAR bark bare
foot foo food

I tried doing this in bash and couldn't make it
work. I do have a possible sed solution.

Thought I'd post this, see if anyone would like
to take a look, either sed or bash (or whatever!).

Daniel

Jim Hill

2013-01-30 23:48:31 UTC

Permalink

How about

sed -r '/<foo>/!{p;d}; $!N; s/<foo>(.*\n.*)<bar>/FOO\1BAR/'

[Non-text portions of this message have been removed]

Jim Hill

2013-01-31 00:21:16 UTC

Permalink

Aaaand of course that won't work with the foo... foo ... bar sequence.

sed -r '/<foo>/!{p;d}
: n
$!N
/<foo>(.*\n.*)<bar>/FOO\1BAR/ {s//FOO\1BAR/;p;d}
/\n.*<foo>/!{p;d}
P; s/.*\n//
t n
'

Post by Jim Hill
How about
sed -r '/<foo>/!{p;d}; $!N; s/<foo>(.*\n.*)<bar>/FOO\1BAR/'

[Non-text portions of this message have been removed]

Joel Hammer

2013-01-31 04:18:02 UTC

Permalink

Brute force is always my way.

sed "
:loop1
N
/[^\n]*foo */{
/\n[a-z ]*bar /{
s/$foo$$[ \n]$/FOO\2/
s/$\n[a-z ]*$$bar $/\1BAR /
P
s/.*\n//
bloop1
}
/\n[a-z ]*bar$/{
s/$foo$$[ \n]$/FOO\2/
s/$\n[a-z ]*$$bar$$/\1BAR/
P
s/.*\n//
bloop1
}
}
P
s/.*\n//
bloop1
" yourfile

In English.
Read in one line.
With N, append 2nd line.
Search for foo in first line
If no foo, P the first line, erase it, then N the next line.

If foo in first line, is there bar[blank] in 2nd line? If so, capitalize
foo and bar, P the first line, erase the first line, then N the next
line.

If no bar[blank], is there a bar$ in 2nd line? If so, capitalize, etc.

If no bar is found, P the first line, erase the first line, N in the
next line.

(Note: Matching bar in the 2nd line was tougher than foo in the first
line)

This seems to work for all combinations of foo and bar, whether they
occur on the same line or in the last line of the file, etc. YMMV.

Joel

Post by Daniel
This came up in a recent cross-Atlantic conversation. :)
The problem: How to apply a change when a *combination*
of lines occurs, eg when the word "foo" appears in line
N and the word "bar" appears in line N+1. How then to
apply the change in line N (or line N+1)? For example,
change word "foo" to "FOO" and word "bar" to "BAR".
food foot foo
bare bark bar
bar bark bare
bark bar bare
foot foo food
bark bar bare
foot foo food
foo foot food
bar bark bare
foot foo food
food foot FOO
bare bark BAR
bar bark bare
bark bar bare
foot FOO food
bark BAR bare
foot foo food
FOO foot food
BAR bark bare
foot foo food
I tried doing this in bash and couldn't make it
work. I do have a possible sed solution.
Thought I'd post this, see if anyone would like
to take a look, either sed or bash (or whatever!).
Daniel

Davide Brini

2013-01-31 10:13:00 UTC

Permalink

I assume that you use "word" as meant by GNU sed's \< and \> operators (or
\b). Leaving out some unspecified corner cases:

$!N
s/\<foo\>$.*\n.*$\<bar\>/FOO\1BAR/
P
D

If "foo" and "bar" are regular expressions and not fixed strings, turning
them to uppercase is more complicated, yet still possible using GNU sed
(which I'm assuming anyway);

$!N
s/$\<foo\>$$.*\n.*$$\<bar\>$/\U\1\E\2\U\3\E/
P
D

--
D.

Daniel

2013-02-01 02:10:42 UTC

Permalink

Yes, I did mean "word" as in GNU sed \<word\> sense.

With the "test file" and "desired output" I mentioned,
here's what I came up with, similar to something I saw
from Y-J Chang. It assumes using -r command line option.

$! N
s:(.*)\<foo\>(.*)\n(.*)\<bar\>(.*):\1FOO\2\n\3BAR\4:
t
P; D

# Append next line if NOT last line of file
# Try changing foo in line #1, bar in line #2
# If change happened, print lines and read a new pair
# Else, print & delete line #1, run N to make next pair

This seems similar to some of the other solutions
that were posted.

Thanks,
Daniel

Post by Davide Brini

I assume that you use "word" as meant by GNU sed's \< and \> operators (or
$!N
s/\ $.*\n.*$\ /FOO\1BAR/
P
D
If "foo" and "bar" are regular expressions and not fixed strings, turning
them to uppercase is more complicated, yet still possible using GNU sed
(which I'm assuming anyway);
$!N
s/$\ $$.*\n.*$$\ $/\U\1\E\2\U\3\E/
P
D
--
D.

Davide Brini

2013-02-01 10:03:38 UTC

Permalink

Post by Daniel
Yes, I did mean "word" as in GNU sed \<word\> sense.
With the "test file" and "desired output" I mentioned,
here's what I came up with, similar to something I saw
from Y-J Chang. It assumes using -r command line option.
$! N

Note that you don't need to capture group \1 and \4, for the same reason
that to replace foo with bar you do

s/foo/bar/

and not

s/(.*)foo(.*)/\1bar\2/

(leaving aside that the latter form can behave differently under certain
circumstances).

Post by Daniel
t

There's no harm in printing the first line of the pattern space, so the "t"
can be removed. In fact, every time "t" is executed, you're effectively
adding two more lines to the pattern space without removing the existing
ones. This can lead to false matches in the s::: replacement (ie, times
where foo and bar are more than one line apart, yet the replacement
succeeds).

Post by Daniel
P; D
# Append next line if NOT last line of file
# Try changing foo in line #1, bar in line #2
# If change happened, print lines and read a new pair
# Else, print & delete line #1, run N to make next pair
This seems similar to some of the other solutions
that were posted.
Thanks,
Daniel

--
D.

Davide Brini

2013-02-02 14:48:57 UTC

Permalink

Post by Davide Brini

There's no harm in printing the first line of the pattern space, so the
"t" can be removed. In fact, every time "t" is executed, you're
effectively adding two more lines to the pattern space without removing
the existing ones. This can lead to false matches in the s::: replacement
(ie, times where foo and bar are more than one line apart, yet the
replacement succeeds).

This is of course wrong. The appropriate remark is that by using "t", the
two lines currently in the buffer are printed at once, and foo/bar pairs
can be missed if "foo" is in the second of the two lines being printed.
Either way, "t" is at best useless.

--
D.

Daniel

2013-02-02 18:42:24 UTC

Permalink

You're right. The t is not necessary. I'll have to
look more at this when I have some time.

I found something else, either a glitch with yahoo
groups or something I'm doing wrong. In my little
script (9152), \< and \> did not get sent correctly.
I apologize script I posted ended up mangled...
\< and \> show up when composing messsage, but
tend to drop out (get changed to a space) when
message is sent or previewed, as shown below:

Using \bfoo\b syntax (Seems to show up fine):

$! N
s:(.*)\bfoo\b(.*)\n(.*)\bbar\b(.*):\1FOO\2\n\3BAR\4:
P; D

Using \< and \> syntax (Drops out every time I try):

$! N
s:(.*)\<foo\>(.*)\n(.*)\<bar\>(.*):\1FOO\2\n\3BAR\4:
P; D

Using HTML entities for GT and LT (Won't know if
drops out or shows up until I read the post):

$! N
s:(.*)\<foo\>(.*)\n(.*)\<bar\>(.*):\1FOO\2\n\3BAR\4:
P; D

Has anyone else noticed this behavior? I'm guessing
this is because yahoo allows HTML. Is there some
setting I might change? Or something I'm doing wrong?
Any work-around, other than entering HTML entities?

Daniel

Post by Davide Brini

There's no harm in printing the first line of the pattern space, so the
"t" can be removed. In fact, every time "t" is executed, you're
effectively adding two more lines to the pattern space without removing
the existing ones. This can lead to false matches in the s::: replacement
(ie, times where foo and bar are more than one line apart, yet the
replacement succeeds).

Davide Brini

2013-02-03 11:24:54 UTC

Permalink

Post by Daniel
I found something else, either a glitch with yahoo
groups or something I'm doing wrong. In my little
script (9152), \< and \> did not get sent correctly.
I apologize script I posted ended up mangled...

For what it's worth, I saw your script correctly.

--
D.

Daniel

2013-02-03 18:00:15 UTC

Permalink

Thanks for checking. That's really interesting.

Here is a screenshot (google chrome) of what
I see (message 9157) on windows xp computer:

Loading Image...

http://tinyurl.com/b3yr53p is the same screenshot
in a more friendly url.

Here is a restatement of the problem I see:

#1 (\bfoo\b syntax) script shows fine.
#2 (verbatim GT LT) script mangled (messed up).
#3 (GT LT HTML entities) script shows fine.

By mangled, I mean "foo", "bar", LT, GT are omitted.
LT and GT are "less than" and "greater than". I
hesitate to use the real symbols, because I'm
arguing that yahoo groups deletes them sometimes.

I also tested with firefox and IE on windows
xp PC. Exactly same problem, same appearance.

Next, I tested on ipod touch (safari) and a
windows 7 PC (IE). Again, same problem. In all
my tests, #2 script NEVER displayed correctly.

It's hard to believe there could be a technical
glitch with yahoo groups, that it might delete
parts of sed scripts using GT and LT. That would
be downright evil, if yahoo mangled our syntax.

Seems more likely something I'm doing wrong. But
in that case, I can't explain why problem occurs
in so many browsers and hardware platforms.

So, if you see the script correctly in message
9157 (#2 and #3 appear the same), could you
possibly send a screenshot, and how you viewed?

Thanks,
Daniel

Post by Davide Brini

For what it's worth, I saw your script correctly.
--
D.

Davide Brini

2013-02-04 19:40:00 UTC

Permalink

Post by Daniel
Thanks for checking. That's really interesting.
Here is a screenshot (google chrome) of what
http://i1281.photobucket.com/albums/a506/parker-fuzzy/drop-gc_zpse5056428.gif
http://tinyurl.com/b3yr53p is the same screenshot
in a more friendly url.
#1 (\bfoo\b syntax) script shows fine.
#2 (verbatim GT LT) script mangled (messed up).
#3 (GT LT HTML entities) script shows fine.
By mangled, I mean "foo", "bar", LT, GT are omitted.
LT and GT are "less than" and "greater than". I
hesitate to use the real symbols, because I'm
arguing that yahoo groups deletes them sometimes.
I also tested with firefox and IE on windows
xp PC. Exactly same problem, same appearance.
Next, I tested on ipod touch (safari) and a
windows 7 PC (IE). Again, same problem. In all
my tests, #2 script NEVER displayed correctly.
It's hard to believe there could be a technical
glitch with yahoo groups, that it might delete
parts of sed scripts using GT and LT. That would
be downright evil, if yahoo mangled our syntax.
Seems more likely something I'm doing wrong. But
in that case, I can't explain why problem occurs
in so many browsers and hardware platforms.
So, if you see the script correctly in message
9157 (#2 and #3 appear the same), could you
possibly send a screenshot, and how you viewed?

Well, I viewed it with a MUA, not using webmail.

Screenshot here: Loading Image...

Looking at the source, I see these headers:

Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit

Hope this helps.

--
D.