Discussion:
reducing redundant lines, partially
Thierry Blanc Thierry.Blanc@gmx.ch [sed-users]
2016-11-20 16:05:31 UTC
Permalink
I have the output of a script in the form

file1/full/path:<number>

file1/full/path:<number>

....

file2/full/path:<number>

I want the output

file1/full/path:<number>:<number> ...file2/full/path:<number>:<number> ...

there might be one or many lines with the same file name.

/home/Journal21/weltsicherheitsrat-uneins.html:9
/home/Journal21/wenig-hoffnung-auf-ein-kriegsende.html:0
/home//Journal21/wenig-hoffnung-auf-ein-kriegsende.html:9
/home/Journal21/wenn-tote-zu-statistiken-werden.html:20





[Non-text portions of this message have been removed]
Jim Hill gjthill@gmail.com [sed-users]
2016-11-20 16:57:41 UTC
Permalink
Post by Thierry Blanc ***@gmx.ch [sed-users]
I have the output of a script in the form
file1/full/path:<number>
file1/full/path:<number>
....
file2/full/path:<number>
I want the output
file1/full/path:<number>:<number> ...file2/full/path:<number>:<number> ...
there might be one or many lines with the same file name.
#!/bin/sed -Ef
:a
N
s/(([^:\n]*)[^\n]*)\n\2(:.*)/\1\3/
ta
P
D

gets you all the `:number`s for a sequence appended on a single line,
did you want just the endpoints? I'd do that with a pair of subs
before the `P`.
Thierry Blanc Thierry.Blanc@gmx.ch [sed-users]
2016-11-20 16:58:59 UTC
Permalink
Usually I am not good with N;P;D. I guess it is the political
connotation ...

sed -rf sedscript file

sedscript
:eep;
N;
s|*^*([^:]*):([^\n]*)\n\1|\1:\2|;
teep;
P;D

The 3. line: s|^
Why is the ^ needed?
Post by Thierry Blanc ***@gmx.ch [sed-users]
I have the output of a script in the form
file1/full/path:<number>
file1/full/path:<number>
....
file2/full/path:<number>
I want the output
file1/full/path:<number>:<number> ...file2/full/path:<number>:<number> ...
there might be one or many lines with the same file name.
/home/Journal21/weltsicherheitsrat-uneins.html:9
/home/Journal21/wenig-hoffnung-auf-ein-kriegsende.html:0
/home//Journal21/wenig-hoffnung-auf-ein-kriegsende.html:9
/home/Journal21/wenn-tote-zu-statistiken-werden.html:20
[Non-text portions of this message have been removed]
------------------------------------
------------------------------------
[Non-text portions of this message have been removed]
sharma__r@hotmail.com [sed-users]
2016-11-22 04:36:11 UTC
Permalink
The normal flow of sed is that a line from the input file is read in to the pattern space and all the sed commands are applied on this pattern space in the order they appear(barring any change of flow commands like b, t). Once the end of sed code is reached the by-now pattern space is printed to stdout.
Then the cycle is started all over again using the next line from the input file.
Notice that the pattern space just one line all the while.


The N;P;D method is different than the above.
Think of the N;P;D paradigm in "sed" as the sliding-window technique wherein you hold two lines in the pattern space at any time. Then perform some munging on the pattern space and print the head portion of the pattern space post munging and then chop it off and go back for more.

Actually the little-known "list" command [l] can be of great help in the sliding-window operations. It lists the pattern space in a user-friendly format wherein the nonprintable control characters in the pattern space are shown via their symbol, take for example, the newline as \n, etc.


For in this case place the [l] command at certain locations to monitor the growth of the pattern space, like as,


sed -e '
l
:loop
$!N
l
s/^\([^:]*\)\(.*\)\n\1\(:.*\)$/\1\2\3/
tloop
P;D
' yourfile


Then observe the evolution of the pattern space. This will hopefully answer your queries regarding the regex.




[Non-text portions of this message have been removed]

Daniel Goldman dgoldman@ehdp.com [sed-users]
2016-11-20 16:50:38 UTC
Permalink
To clarify exactly what is required...

The transformation is like the following?

$ cat input.txt
/home/Journal21/auf-wiedersehen.html:20
/home/Journal21/guten-tag.html:9
/home/Journal21/guten-tag.html:20
/home/Journal21/weltsicherheitsrat-uneins.html:1
/home/Journal21/wenig-hoffnung-auf-ein-kriegsende.html:0
/home/Journal21/wenig-hoffnung-auf-ein-kriegsende.html:9
/home/Journal21/wenn-tote-zu-statistiken-werden.html:20

$ cat output.txt
/home/Journal21/auf-wiedersehen.html:20
/home/Journal21/guten-tag.html:9:20
/home/Journal21/weltsicherheitsrat-uneins.html:1
/home/Journal21/wenig-hoffnung-auf-ein-kriegsende.html:0:9
/home/Journal21/wenn-tote-zu-statistiken-werden.html:20

Daniel
Post by Thierry Blanc ***@gmx.ch [sed-users]
I have the output of a script in the form
file1/full/path:<number>
file1/full/path:<number>
....
file2/full/path:<number>
I want the output
file1/full/path:<number>:<number> ...file2/full/path:<number>:<number> ...
there might be one or many lines with the same file name.
/home/Journal21/weltsicherheitsrat-uneins.html:9
/home/Journal21/wenig-hoffnung-auf-ein-kriegsende.html:0
/home//Journal21/wenig-hoffnung-auf-ein-kriegsende.html:9
/home/Journal21/wenn-tote-zu-statistiken-werden.html:20
[Non-text portions of this message have been removed]
------------------------------------
------------------------------------
------------------------------------

------------------------------------
--
------------------------------------

Yahoo Groups Links

<*> To visit your group on the web, go to:
http://groups.yahoo.com/group/sed-users/

<*> Your email settings:
Individual Email | Traditional

<*> To change settings online go to:
http://groups.yahoo.com/group/sed-users/join
(Yahoo! ID required)

<*> To change settings via email:
sed-users-***@yahoogroups.com
sed-users-***@yahoogroups.com

<*> To unsubscribe from this group, send an email to:
sed-users-***@yahoogroups.com

<*> Your use of Yahoo Groups is subject to:
https://info.yahoo.com/legal/us/yahoo/utos/terms/
Thierry Blanc Thierry.Blanc@gmx.ch [sed-users]
2016-11-20 21:33:02 UTC
Permalink
Thanks for the answers.


$ cat input.txt

/home/Journal21/guten-tag.html:9
/home/Journal21/guten-tag.html:20
/home/Journal21/wenig-hoffnung-auf-ein-kriegsende.html:0
/home/Journal21/wenig-hoffnung-auf-ein-kriegsende.html:9

or

$ cat input.txt

/home/Journal21/guten-tag.html:9
/home/Journal21/guten-tag.html:20
/home/Journal21/guten-tag.html:11
/home/Journal21/wenig-hoffnung-auf-ein-kriegsende.html:0
/home/Journal21/wenig-hoffnung-auf-ein-kriegsende.html:2
/home/Journal21/wenig-hoffnung-auf-ein-kriegsende.html:9


There are one to many lines with same file name but always same multiple
of files for all files.

$ cat output.txt

/home/Journal21/guten-tag.html:9:20
/home/Journal21/wenig-hoffnung-auf-ein-kriegsende.html:0:9

or
$ cat output.txt
/home/Journal21/guten-tag.html:9:20:11
/home/Journal21/wenig-hoffnung-auf-ein-kriegsende.html:0:9:2


Jim's solution works well:

#!/bin/sed -Ef
:a
N
s/(([^:\n]*)[^\n]*)\n\2(:.*)/\1\3/
ta
P
D

My own, that I found a few minutes after Jim,:

sedscript:
:eep;
N;
s|^([^:]*):([^\n]*)\n\1|\1:\2|;
teep;
P;D


What I don't understand:
The 3. line: s|^
Why is the ^ needed? Without it, the script hangs or loops forever.
Post by Daniel Goldman ***@ehdp.com [sed-users]
To clarify exactly what is required...
The transformation is like the following?
$ cat input.txt
/home/Journal21/auf-wiedersehen.html:20
/home/Journal21/guten-tag.html:9
/home/Journal21/guten-tag.html:20
/home/Journal21/weltsicherheitsrat-uneins.html:1
/home/Journal21/wenig-hoffnung-auf-ein-kriegsende.html:0
/home/Journal21/wenig-hoffnung-auf-ein-kriegsende.html:9
/home/Journal21/wenn-tote-zu-statistiken-werden.html:20
$ cat output.txt
/home/Journal21/auf-wiedersehen.html:20
/home/Journal21/guten-tag.html:9:20
/home/Journal21/weltsicherheitsrat-uneins.html:1
/home/Journal21/wenig-hoffnung-auf-ein-kriegsende.html:0:9
/home/Journal21/wenn-tote-zu-statistiken-werden.html:20
Daniel
Post by Thierry Blanc ***@gmx.ch [sed-users]
I have the output of a script in the form
file1/full/path:<number>
file1/full/path:<number>
....
file2/full/path:<number>
I want the output
file1/full/path:<number>:<number> ...file2/full/path:<number>:<number> ...
there might be one or many lines with the same file name.
/home/Journal21/weltsicherheitsrat-uneins.html:9
/home/Journal21/wenig-hoffnung-auf-ein-kriegsende.html:0
/home//Journal21/wenig-hoffnung-auf-ein-kriegsende.html:9
/home/Journal21/wenn-tote-zu-statistiken-werden.html:20
[Non-text portions of this message have been removed]
------------------------------------
------------------------------------
------------------------------------
------------------------------------
------------------------------------

------------------------------------
--
------------------------------------

Yahoo Groups Links

<*> To visit your group on the web, go to:
http://groups.yahoo.com/group/sed-users/

<*> Your email settings:
Individual Email | Traditional

<*> To change settings online go to:
http://groups.yahoo.com/group/sed-users/join
(Yahoo! ID required)

<*> To change settings via email:
sed-users-***@yahoogroups.com
sed-users-***@yahoogroups.com

<*> To unsubscribe from this group, send an email to:
sed-users-***@yahoogroups.com

<*> Your use of Yahoo Groups is subject to:
https://info.yahoo.com/legal/us/yahoo/utos/terms/
Jim Hill gjthill@gmail.com [sed-users]
2016-11-20 22:24:08 UTC
Permalink
Post by Thierry Blanc ***@gmx.ch [sed-users]
:eep;
N;
s|^([^:]*):([^\n]*)\n\1|\1:\2|;
teep;
P;D
The 3. line: s|^
Why is the ^ needed? Without it, the script hangs or loops forever.
I think it's that, without the `^`, `\1` in that will always match at
least the null string before the colon so it will start endlessly
duplicating what follows. I don't yet see why it hangs or loops,
though, unless your tests are all large enough that the O(2^n) on line
count (or swapping to store it) gets you, because the `N` in the loop
should finish it eventually.

Mine should verify the `:` terminator to avoid errors like
`test1:20\ntest:21`.producing `test1:20:21` popping up on unsorted
input.

#!/bin/sed -Ef
:a
N
s/(([^:\n]*):[^\n]*)\n\2(:.*)/\1\3/
ta
P
D
Daniel Goldman dgoldman@ehdp.com [sed-users]
2016-11-20 22:49:46 UTC
Permalink
When I run it without the ^, as follows:

sedscript:
:eep;
N;
s|([^:]*):([^\n]*)\n\1|\1:\2|;
teep;
P;D

it does not hang here, as I see Jim also observed, but produces
incorrect results:

$ sed -r -f sed.txt input.txt
/home/Journal21/guten-tag.html:9:20/home/Journal21/wenig-hoffnung-auf-ein-kriegsende.html:0/home/Journal21/wenig-hoffnung-auf-ein-kriegsende.html:9

as opposed to correct results with s|^:

$ sed -r -f sed.txt input.txt
/home/Journal21/guten-tag.html:9:20
/home/Journal21/wenig-hoffnung-auf-ein-kriegsende.html:0:9

Here is what I think is happening, spelled out. After doing this, I see
Jim had already see the conclusion. I abbreviate directory names.

With s|^( producing correct results:

- N appends line #2 to PatSpace
- PatSpace is guten:0\nguten:20
- s command runs
- PatSpace is guten:0:20
- t branches
- N appends line #3 to PatSpace
- PatSpace is guten:0:20\nwenig:0
- s command does NOT run
- t does NOT branch
- P prints PatSpace line #1
- D deletes PatSpace line #1
- PatSpace is wenig:0
- Next line is NOT read
- N appends line #4 to PatSpace
- PatSpace is wenig:0\nwenig:9
- s command runs
- PatSpace is wenig:0:9
- t branches
- N fails
- PatSpace is printed
- sed exits

With s|( producing correct results:

- N appends line #2 to PatSpace
- PatSpace is guten:0\nguten:20
- s command runs
- PatSpace is guten:0:20
- t branches
- N appends line #3 to PatSpace
- PatSpace is guten:0:20\nwenig:0
- s command runs # OOPS!!!
- PatSpace is guten:0:20wenig:0
- t branches
- N appends line #4 to PatSpace
- PatSpace is guten:0:20wenig:0\nwenig:9
- s command runs # OOPS!!!
- PatSpace is guten:0:20wenig:0wenig:9
- t branches
- N fails
- PatSpace is printed
- sed exits

Daniel
Post by Thierry Blanc ***@gmx.ch [sed-users]
Thanks for the answers.
$ cat input.txt
/home/Journal21/guten-tag.html:9
/home/Journal21/guten-tag.html:20
/home/Journal21/wenig-hoffnung-auf-ein-kriegsende.html:0
/home/Journal21/wenig-hoffnung-auf-ein-kriegsende.html:9
or
$ cat input.txt
/home/Journal21/guten-tag.html:9
/home/Journal21/guten-tag.html:20
/home/Journal21/guten-tag.html:11
/home/Journal21/wenig-hoffnung-auf-ein-kriegsende.html:0
/home/Journal21/wenig-hoffnung-auf-ein-kriegsende.html:2
/home/Journal21/wenig-hoffnung-auf-ein-kriegsende.html:9
There are one to many lines with same file name but always same multiple
of files for all files.
$ cat output.txt
/home/Journal21/guten-tag.html:9:20
/home/Journal21/wenig-hoffnung-auf-ein-kriegsende.html:0:9
or
$ cat output.txt
/home/Journal21/guten-tag.html:9:20:11
/home/Journal21/wenig-hoffnung-auf-ein-kriegsende.html:0:9:2
#!/bin/sed -Ef
:a
N
s/(([^:\n]*)[^\n]*)\n\2(:.*)/\1\3/
ta
P
D
:eep;
N;
s|^([^:]*):([^\n]*)\n\1|\1:\2|;
teep;
P;D
The 3. line: s|^
Why is the ^ needed? Without it, the script hangs or loops forever.
Post by Daniel Goldman ***@ehdp.com [sed-users]
To clarify exactly what is required...
The transformation is like the following?
$ cat input.txt
/home/Journal21/auf-wiedersehen.html:20
/home/Journal21/guten-tag.html:9
/home/Journal21/guten-tag.html:20
/home/Journal21/weltsicherheitsrat-uneins.html:1
/home/Journal21/wenig-hoffnung-auf-ein-kriegsende.html:0
/home/Journal21/wenig-hoffnung-auf-ein-kriegsende.html:9
/home/Journal21/wenn-tote-zu-statistiken-werden.html:20
$ cat output.txt
/home/Journal21/auf-wiedersehen.html:20
/home/Journal21/guten-tag.html:9:20
/home/Journal21/weltsicherheitsrat-uneins.html:1
/home/Journal21/wenig-hoffnung-auf-ein-kriegsende.html:0:9
/home/Journal21/wenn-tote-zu-statistiken-werden.html:20
Daniel
Post by Thierry Blanc ***@gmx.ch [sed-users]
I have the output of a script in the form
file1/full/path:<number>
file1/full/path:<number>
....
file2/full/path:<number>
I want the output
file1/full/path:<number>:<number> ...file2/full/path:<number>:<number> ...
there might be one or many lines with the same file name.
/home/Journal21/weltsicherheitsrat-uneins.html:9
/home/Journal21/wenig-hoffnung-auf-ein-kriegsende.html:0
/home//Journal21/wenig-hoffnung-auf-ein-kriegsende.html:9
/home/Journal21/wenn-tote-zu-statistiken-werden.html:20
[Non-text portions of this message have been removed]
------------------------------------
------------------------------------
------------------------------------
------------------------------------
------------------------------------
------------------------------------
------------------------------------

------------------------------------
--
------------------------------------

Yahoo Groups Links

<*> To visit your group on the web, go to:
http://groups.yahoo.com/group/sed-users/

<*> Your email settings:
Individual Email | Traditional

<*> To change settings online go to:
http://groups.yahoo.com/group/sed-users/join
(Yahoo! ID required)

<*> To change settings via email:
sed-users-***@yahoogroups.com
sed-users-***@yahoogroups.com

<*> To unsubscribe from this group, send an email to:
sed-users-***@yahoogroups.com

<*> Your use of Yahoo Groups is subject to:
https://info.yahoo.com/legal/us/yahoo/utos/terms/
Loading...