Discussion:
Combine Text Files
Ufuk YILDIRIM yildirim_ufuk@yahoo.co.uk [sed-users]
2016-05-18 14:17:49 UTC
Permalink
Hi all,


First thing first, I am a total noob using sed. I can use it for simple
tasks, but now I need to use it for a for-me-very-complex task. I have 4
files of student data containing students responses to 4 different
tests. Some students took two tests some only took one. I want to
combine the results of a student who took 2 tests in a single line and
get single file which has all the students and their responses in one
file. Can I do it with sed? And, if yes, how do I do it?


To clarify my question, here is two file version.


fileA.txt


ID111;a;a;a;a;a;a;a;a;a
ID222222;a;a;a;a;a;a;a;a;a
ID3333;a;a;a;a;a;a;a;a;a


FileB.txt


ID111;b;b;b;b;b;b;b;b;b;b;b;b;b;b;b
ID44;b;b;b;b;b;b;b;b;b;b;b;b;b;b;b
ID3333;b;b;b;b;b;b;b;b;b;b;b;b;b;b;b
ID555;b;b;b;b;b;b;b;b;b;b;b;b;b;b;b




result.txt
ID111;a;a;a;a;a;a;a;a;a;b;b;b;b;b;b;b;b;b;b;b;b;b;b;b
ID222222;a;a;a;a;a;a;a;a;a
ID3333;a;a;a;a;a;a;a;a;a;b;b;b;b;b;b;b;b;b;b;b;b;b;b;b
ID44;b;b;b;b;b;b;b;b;b;b;b;b;b;b;b
ID555;b;b;b;b;b;b;b;b;b;b;b;b;b;b;b


Thank you very much in advance


Ufuk YILDIRIM
Thierry Blanc Thierry.Blanc@gmx.ch [sed-users]
2016-05-18 16:29:16 UTC
Permalink
If you are on *nx system (console):

cat a b |sort | sed -r 'N;/([^;]*).*\n\1/s|\n[^;]*||'


a b your files
cat concatenate the files
sort sort the combined file
sed:
N add next line
/([^;]*).*\n\1/ find repeating strings
s|\n[^;]*| delete second string
Post by Ufuk YILDIRIM ***@yahoo.co.uk [sed-users]
Hi all,
First thing first, I am a total noob using sed. I can use it for simple
tasks, but now I need to use it for a for-me-very-complex task. I have 4
files of student data containing students responses to 4 different
tests. Some students took two tests some only took one. I want to
combine the results of a student who took 2 tests in a single line and
get single file which has all the students and their responses in one
file. Can I do it with sed? And, if yes, how do I do it?
To clarify my question, here is two file version.
fileA.txt
ID111;a;a;a;a;a;a;a;a;a
ID222222;a;a;a;a;a;a;a;a;a
ID3333;a;a;a;a;a;a;a;a;a
FileB.txt
ID111;b;b;b;b;b;b;b;b;b;b;b;b;b;b;b
ID44;b;b;b;b;b;b;b;b;b;b;b;b;b;b;b
ID3333;b;b;b;b;b;b;b;b;b;b;b;b;b;b;b
ID555;b;b;b;b;b;b;b;b;b;b;b;b;b;b;b
result.txt
ID111;a;a;a;a;a;a;a;a;a;b;b;b;b;b;b;b;b;b;b;b;b;b;b;b
ID222222;a;a;a;a;a;a;a;a;a
ID3333;a;a;a;a;a;a;a;a;a;b;b;b;b;b;b;b;b;b;b;b;b;b;b;b
ID44;b;b;b;b;b;b;b;b;b;b;b;b;b;b;b
ID555;b;b;b;b;b;b;b;b;b;b;b;b;b;b;b
Thank you very much in advance
Ufuk YILDIRIM
------------------------------------
------------------------------------
Davide Brini dave_br@gmx.com [sed-users]
2016-05-18 16:38:02 UTC
Permalink
On Wed, 18 May 2016 17:17:49 +0300, "Ufuk YILDIRIM
Post by Ufuk YILDIRIM ***@yahoo.co.uk [sed-users]
fileA.txt
ID111;a;a;a;a;a;a;a;a;a
ID222222;a;a;a;a;a;a;a;a;a
ID3333;a;a;a;a;a;a;a;a;a
FileB.txt
ID111;b;b;b;b;b;b;b;b;b;b;b;b;b;b;b
ID44;b;b;b;b;b;b;b;b;b;b;b;b;b;b;b
ID3333;b;b;b;b;b;b;b;b;b;b;b;b;b;b;b
ID555;b;b;b;b;b;b;b;b;b;b;b;b;b;b;b
result.txt
ID111;a;a;a;a;a;a;a;a;a;b;b;b;b;b;b;b;b;b;b;b;b;b;b;b
ID222222;a;a;a;a;a;a;a;a;a
ID3333;a;a;a;a;a;a;a;a;a;b;b;b;b;b;b;b;b;b;b;b;b;b;b;b
ID44;b;b;b;b;b;b;b;b;b;b;b;b;b;b;b
ID555;b;b;b;b;b;b;b;b;b;b;b;b;b;b;b
A possible approach with (GNU) sed:

sed '

:a

$!{
N
ba
}

s/$/\n/

:loop
s/\(ID[^;]*;\)\([^\n]*\)\n\(.*\)\1\([^\n]*\)\n/\1\2;\4\n\3/
t loop

s/\n$//' filea.txt fileb.txt


Another way is with awk:

awk -F';' -v OFS=';' '
NR==FNR{
id=$1; sub(/^[^;]*;/, ""); a[id]=$0; next
}

{
id=$1; sub(/^[^;]*;/, ""); sep=(id in a)?";":""; a[id]=a[id] sep $0
}

END {
for (id in a) {
print id";"a[id]
}
}
' filea.txt fileb.txt
--
D.
Ufuk Yildirim yildirim_ufuk@yahoo.co.uk [sed-users]
2016-05-19 12:46:14 UTC
Permalink
With small test files, awk version Davide Brini suggested seems to work perfectly. Thank you so much.

sed version, on the other hand, is not working as I wanted. All it does is put all the lines from the txt files in a file in order, which is problematic since some students appear twice.

ps. I am currently on Mac. No idea if that changes anything. Will try the sed version again on linux, if that makes any difference.
Thanks everyone for your help.
All the best,
Ufuk


On Wednesday, 18 May 2016, 19:38, "Davide Brini ***@gmx.com [sed-users]" <sed-***@yahoogroups.com> wrote:


On Wed, 18 May 2016 17:17:49 +0300, "Ufuk YILDIRIM
Post by Ufuk YILDIRIM ***@yahoo.co.uk [sed-users]
fileA.txt
ID111;a;a;a;a;a;a;a;a;a
ID222222;a;a;a;a;a;a;a;a;a
ID3333;a;a;a;a;a;a;a;a;a
FileB.txt
ID111;b;b;b;b;b;b;b;b;b;b;b;b;b;b;b
ID44;b;b;b;b;b;b;b;b;b;b;b;b;b;b;b
ID3333;b;b;b;b;b;b;b;b;b;b;b;b;b;b;b
ID555;b;b;b;b;b;b;b;b;b;b;b;b;b;b;b
result.txt
ID111;a;a;a;a;a;a;a;a;a;b;b;b;b;b;b;b;b;b;b;b;b;b;b;b
ID222222;a;a;a;a;a;a;a;a;a
ID3333;a;a;a;a;a;a;a;a;a;b;b;b;b;b;b;b;b;b;b;b;b;b;b;b
ID44;b;b;b;b;b;b;b;b;b;b;b;b;b;b;b
ID555;b;b;b;b;b;b;b;b;b;b;b;b;b;b;b
A possible approach with (GNU) sed:

sed '

:a

$!{
  N
  ba
}

s/$/\n/

:loop
s/\(ID[^;]*;\)\([^\n]*\)\n\(.*\)\1\([^\n]*\)\n/\1\2;\4\n\3/
t loop

s/\n$//' filea.txt fileb.txt


Another way is with awk:

awk -F';' -v OFS=';' '
  NR==FNR{
    id=$1; sub(/^[^;]*;/, ""); a[id]=$0; next
  }

  {
    id=$1; sub(/^[^;]*;/, ""); sep=(id in a)?";":""; a[id]=a[id] sep $0
  }

  END {
    for (id in a) {
      print id";"a[id]
    }
  }
' filea.txt fileb.txt
--
D.


------------------------------------
Posted by: Davide Brini <***@gmx.com>
------------------------------------
--
------------------------------------

Yahoo Groups Links







[Non-text portions of this message have been removed]
cs@zip.com.au [sed-users]
2016-05-19 23:04:34 UTC
Permalink
Post by Ufuk Yildirim ***@yahoo.co.uk [sed-users]
With small test files, awk version Davide Brini suggested seems to work perfectly. Thank you so much.
sed version, on the other hand, is not working as I wanted. All it does is put all the lines from the txt files in a file in order, which is problematic since some students appear twice.
ps. I am currently on Mac. No idea if that changes anything. Will try the sed version again on linux, if that makes any difference.
The Mac will have BSD sed; most Linux distros install GNU sed, which has some
extra facilities. You can install GNU sed on your Mac by using a MacPorts or
HomeBrew.

Personally, although I install lots of extra tools including GNU sed, I try to
make my sed scripts portable i.e. to also work with the system sed.

Cheers,
Cameron Simpson <***@zip.com.au>
sharma__r@hotmail.com [sed-users]
2016-05-20 05:29:48 UTC
Permalink
The sed version doesn't work since it is using the nonportable construct like [^\n]


However, all's not lost and you can try the below mentioned.
All the data is being stored in the pattern space.


sed -e '


# ps = ^IDx;s1;s2;s3;...\nIDy;r1;r2;r3;...\nIDcurr;a1;a2;a3;...$


:loop
$q; N
y/\n_/_\n/
s/\(_\([^;]*\);[^_]*\)\(_.*\)_\2\([^_]*\)$/\1\4\3/; tdone
s/^\(\([^;]*\);[^_]*\)\(_.*\)_\2\([^_]*\)$/\1\4\3/; tdone
s/\(_\([^;]*\);[^_]*\)_\2\([^_]*\)$/\1\3/; tdone
s/^\(\([^;]*\);[^_]*\)_\2\([^_]*\)$/\1\3/
:done
y/\n_/_\n/
bloop
' fileA.txt fileB.txt fileC.txt ..........





-Rakesh


---In sed-***@yahoogroups.com, <***@...> wrote :

With small test files, awk version Davide Brini suggested seems to work perfectly. Thank you so much.

sed version, on the other hand, is not working as I wanted. All it does is put all the lines from the txt files in a file in order, which is problematic since some students appear twice.

ps. I am currently on Mac. No idea if that changes anything. Will try the sed version again on linux, if that makes any difference.
Thanks everyone for your help.
All the best,
Ufuk


On Wednesday, 18 May 2016, 19:38, "Davide Brini ***@... mailto:***@... [sed-users]" <sed-***@yahoogroups.com mailto:sed-***@yahoogroups.com> wrote:


On Wed, 18 May 2016 17:17:49 +0300, "Ufuk YILDIRIM
Post by Ufuk YILDIRIM ***@yahoo.co.uk [sed-users]
fileA.txt
ID111;a;a;a;a;a;a;a;a;a
ID222222;a;a;a;a;a;a;a;a;a
ID3333;a;a;a;a;a;a;a;a;a
FileB.txt
ID111;b;b;b;b;b;b;b;b;b;b;b;b;b;b;b
ID44;b;b;b;b;b;b;b;b;b;b;b;b;b;b;b
ID3333;b;b;b;b;b;b;b;b;b;b;b;b;b;b;b
ID555;b;b;b;b;b;b;b;b;b;b;b;b;b;b;b
result.txt
ID111;a;a;a;a;a;a;a;a;a;b;b;b;b;b;b;b;b;b;b;b;b;b;b;b
ID222222;a;a;a;a;a;a;a;a;a
ID3333;a;a;a;a;a;a;a;a;a;b;b;b;b;b;b;b;b;b;b;b;b;b;b;b
ID44;b;b;b;b;b;b;b;b;b;b;b;b;b;b;b
ID555;b;b;b;b;b;b;b;b;b;b;b;b;b;b;b
A possible approach with (GNU) sed:

sed '

:a

$!{
N
ba
}

s/$/\n/

:loop
s/\(ID[^;]*;\)\([^\n]*\)\n\(.*\)\1\([^\n]*\)\n/\1\2;\4\n\3/
t loop

s/\n$//' filea.txt fileb.txt


Another way is with awk:

awk -F';' -v OFS=';' '
NR==FNR{
id=$1; sub(/^[^;]*;/, ""); a[id]=$0; next
}

{
id=$1; sub(/^[^;]*;/, ""); sep=(id in a)?";":""; a[id]=a[id] sep $0
}

END {
for (id in a) {
print id";"a[id]
}
}
' filea.txt fileb.txt


--
D.


------------------------------------
Posted by: Davide Brini <***@... mailto:***@...>
------------------------------------

--

------------------------------------

Yahoo Groups Links







[Non-text portions of this message have been removed]



[Non-text portions of this message have been removed]
Davide Brini dave_br@gmx.com [sed-users]
2016-05-20 08:28:11 UTC
Permalink
On Thu, 19 May 2016 12:46:14 +0000 (UTC), "Ufuk Yildirim
Post by Ufuk Yildirim ***@yahoo.co.uk [sed-users]
With small test files, awk version Davide Brini suggested seems to work
perfectly. Thank you so much.
sed version, on the other hand, is not working as I wanted. All it does
is put all the lines from the txt files in a file in order, which is
problematic since some students appear twice.
ps. I am currently on Mac. No idea if that changes anything. Will try the
sed version again on linux, if that makes any difference. Thanks everyone
for your help. All the best,
My sed solution uses GNU sed, which is not installed by default on Mac.
--
D.
Jim Hill gjthill@gmail.com [sed-users]
2016-05-18 17:36:52 UTC
Permalink
sort file*.txt | sed -E ':a; N; s/^([^;]*;)(.*)\n\1(.*)/\1\2;\3/; ta;
P;D'


[Non-text portions of this message have been removed]
Loading...