Combine Text Files

Discussion:

Combine Text Files

Ufuk YILDIRIM yildirim_ufuk@yahoo.co.uk [sed-users]

2016-05-18 14:17:49 UTC

Hi all,

First thing first, I am a total noob using sed. I can use it for simple
tasks, but now I need to use it for a for-me-very-complex task. I have 4
files of student data containing students responses to 4 different
tests. Some students took two tests some only took one. I want to
combine the results of a student who took 2 tests in a single line and
get single file which has all the students and their responses in one
file. Can I do it with sed? And, if yes, how do I do it?

To clarify my question, here is two file version.

fileA.txt

ID111;a;a;a;a;a;a;a;a;a
ID222222;a;a;a;a;a;a;a;a;a
ID3333;a;a;a;a;a;a;a;a;a

FileB.txt

ID111;b;b;b;b;b;b;b;b;b;b;b;b;b;b;b
ID44;b;b;b;b;b;b;b;b;b;b;b;b;b;b;b
ID3333;b;b;b;b;b;b;b;b;b;b;b;b;b;b;b
ID555;b;b;b;b;b;b;b;b;b;b;b;b;b;b;b

result.txt
ID111;a;a;a;a;a;a;a;a;a;b;b;b;b;b;b;b;b;b;b;b;b;b;b;b
ID222222;a;a;a;a;a;a;a;a;a
ID3333;a;a;a;a;a;a;a;a;a;b;b;b;b;b;b;b;b;b;b;b;b;b;b;b
ID44;b;b;b;b;b;b;b;b;b;b;b;b;b;b;b
ID555;b;b;b;b;b;b;b;b;b;b;b;b;b;b;b

Thank you very much in advance

Ufuk YILDIRIM

Thierry Blanc Thierry.Blanc@gmx.ch [sed-users]

2016-05-18 16:29:16 UTC

Permalink

If you are on *nx system (console):

cat a b |sort | sed -r 'N;/([^;]*).*\n\1/s|\n[^;]*||'

a b your files
cat concatenate the files
sort sort the combined file
sed:
N add next line
/([^;]*).*\n\1/ find repeating strings
s|\n[^;]*| delete second string

Post by Ufuk YILDIRIM ***@yahoo.co.uk [sed-users]
Hi all,
First thing first, I am a total noob using sed. I can use it for simple
tasks, but now I need to use it for a for-me-very-complex task. I have 4
files of student data containing students responses to 4 different
tests. Some students took two tests some only took one. I want to
combine the results of a student who took 2 tests in a single line and
get single file which has all the students and their responses in one
file. Can I do it with sed? And, if yes, how do I do it?
To clarify my question, here is two file version.
fileA.txt
ID111;a;a;a;a;a;a;a;a;a
ID222222;a;a;a;a;a;a;a;a;a
ID3333;a;a;a;a;a;a;a;a;a
FileB.txt
ID111;b;b;b;b;b;b;b;b;b;b;b;b;b;b;b
ID44;b;b;b;b;b;b;b;b;b;b;b;b;b;b;b
ID3333;b;b;b;b;b;b;b;b;b;b;b;b;b;b;b
ID555;b;b;b;b;b;b;b;b;b;b;b;b;b;b;b
result.txt
ID111;a;a;a;a;a;a;a;a;a;b;b;b;b;b;b;b;b;b;b;b;b;b;b;b
ID222222;a;a;a;a;a;a;a;a;a
ID3333;a;a;a;a;a;a;a;a;a;b;b;b;b;b;b;b;b;b;b;b;b;b;b;b
ID44;b;b;b;b;b;b;b;b;b;b;b;b;b;b;b
ID555;b;b;b;b;b;b;b;b;b;b;b;b;b;b;b
Thank you very much in advance
Ufuk YILDIRIM
------------------------------------
------------------------------------

Davide Brini dave_br@gmx.com [sed-users]

2016-05-18 16:38:02 UTC

Permalink

On Wed, 18 May 2016 17:17:49 +0300, "Ufuk YILDIRIM

Post by Ufuk YILDIRIM ***@yahoo.co.uk [sed-users]
fileA.txt
ID111;a;a;a;a;a;a;a;a;a
ID222222;a;a;a;a;a;a;a;a;a
ID3333;a;a;a;a;a;a;a;a;a
FileB.txt
ID111;b;b;b;b;b;b;b;b;b;b;b;b;b;b;b
ID44;b;b;b;b;b;b;b;b;b;b;b;b;b;b;b
ID3333;b;b;b;b;b;b;b;b;b;b;b;b;b;b;b
ID555;b;b;b;b;b;b;b;b;b;b;b;b;b;b;b
result.txt
ID111;a;a;a;a;a;a;a;a;a;b;b;b;b;b;b;b;b;b;b;b;b;b;b;b
ID222222;a;a;a;a;a;a;a;a;a
ID3333;a;a;a;a;a;a;a;a;a;b;b;b;b;b;b;b;b;b;b;b;b;b;b;b
ID44;b;b;b;b;b;b;b;b;b;b;b;b;b;b;b
ID555;b;b;b;b;b;b;b;b;b;b;b;b;b;b;b

--
D.

Ufuk Yildirim yildirim_ufuk@yahoo.co.uk [sed-users]

2016-05-19 12:46:14 UTC

Permalink

With small test files, awk version Davide Brini suggested seems to work perfectly. Thank you so much.

sed version, on the other hand, is not working as I wanted. All it does is put all the lines from the txt files in a file in order, which is problematic since some students appear twice.

ps. I am currently on Mac. No idea if that changes anything. Will try the sed version again on linux, if that makes any difference.
Thanks everyone for your help.
All the best,
Ufuk

On Wednesday, 18 May 2016, 19:38, "Davide Brini ***@gmx.com [sed-users]" <sed-***@yahoogroups.com> wrote:

On Wed, 18 May 2016 17:17:49 +0300, "Ufuk YILDIRIM

A possible approach with (GNU) sed:

sed '

:a

$!{
Â N
Â ba
}

s/$/\n/

:loop
s/$ID[^;]*;$$[^\n]*$\n$.*$\1$[^\n]*$\n/\1\2;\4\n\3/
t loop

s/\n$//' filea.txt fileb.txt

Another way is with awk:

awk -F';' -v OFS=';' '
Â NR==FNR{
Â Â id=$1; sub(/^[^;]*;/, ""); a[id]=$0; next
Â }

Â {
Â Â id=$1; sub(/^[^;]*;/, ""); sep=(id in a)?";":""; a[id]=a[id] sep $0
Â }

Â END {
Â Â for (id in a) {
Â Â Â print id";"a[id]
Â Â }
Â }
' filea.txt fileb.txt
--
D.

------------------------------------
Posted by: Davide Brini <***@gmx.com>
------------------------------------
--
------------------------------------

Yahoo Groups Links

[Non-text portions of this message have been removed]

cs@zip.com.au [sed-users]

2016-05-19 23:04:34 UTC

Permalink

Post by Ufuk Yildirim ***@yahoo.co.uk [sed-users]
With small test files, awk version Davide Brini suggested seems to work perfectly. Thank you so much.
sed version, on the other hand, is not working as I wanted. All it does is put all the lines from the txt files in a file in order, which is problematic since some students appear twice.
ps. I am currently on Mac. No idea if that changes anything. Will try the sed version again on linux, if that makes any difference.

The Mac will have BSD sed; most Linux distros install GNU sed, which has some
extra facilities. You can install GNU sed on your Mac by using a MacPorts or
HomeBrew.

Personally, although I install lots of extra tools including GNU sed, I try to
make my sed scripts portable i.e. to also work with the system sed.

Cheers,
Cameron Simpson <***@zip.com.au>

sharma__r@hotmail.com [sed-users]

2016-05-20 05:29:48 UTC

Permalink

The sed version doesn't work since it is using the nonportable construct like [^\n]

However, all's not lost and you can try the below mentioned.
All the data is being stored in the pattern space.

sed -e '

# ps = ^IDx;s1;s2;s3;...\nIDy;r1;r2;r3;...\nIDcurr;a1;a2;a3;...$

:loop
$q; N
y/\n_/_\n/
s/$_\([^;]*$;[^_]*\)$_.*$_\2$[^_]*$$/\1\4\3/; tdone
s/^$\([^;]*$;[^_]*\)$_.*$_\2$[^_]*$$/\1\4\3/; tdone
s/$_\([^;]*$;[^_]*\)_\2$[^_]*$$/\1\3/; tdone
s/^$\([^;]*$;[^_]*\)_\2$[^_]*$$/\1\3/
:done
y/\n_/_\n/
bloop
' fileA.txt fileB.txt fileC.txt ..........

-Rakesh

---In sed-***@yahoogroups.com, <***@...> wrote :

With small test files, awk version Davide Brini suggested seems to work perfectly. Thank you so much.

sed version, on the other hand, is not working as I wanted. All it does is put all the lines from the txt files in a file in order, which is problematic since some students appear twice.

ps. I am currently on Mac. No idea if that changes anything. Will try the sed version again on linux, if that makes any difference.
Thanks everyone for your help.
All the best,
Ufuk

On Wednesday, 18 May 2016, 19:38, "Davide Brini ***@... mailto:***@... [sed-users]" <sed-***@yahoogroups.com mailto:sed-***@yahoogroups.com> wrote:

On Wed, 18 May 2016 17:17:49 +0300, "Ufuk YILDIRIM

A possible approach with (GNU) sed:

sed '

:a

$!{
N
ba
}

s/$/\n/

:loop
s/$ID[^;]*;$$[^\n]*$\n$.*$\1$[^\n]*$\n/\1\2;\4\n\3/
t loop

s/\n$//' filea.txt fileb.txt

Another way is with awk:

awk -F';' -v OFS=';' '
NR==FNR{
id=$1; sub(/^[^;]*;/, ""); a[id]=$0; next
}

{
id=$1; sub(/^[^;]*;/, ""); sep=(id in a)?";":""; a[id]=a[id] sep $0
}

END {
for (id in a) {
print id";"a[id]
}
}
' filea.txt fileb.txt

--
D.

------------------------------------
Posted by: Davide Brini <***@... mailto:***@...>
------------------------------------

--

------------------------------------

Yahoo Groups Links

[Non-text portions of this message have been removed]

[Non-text portions of this message have been removed]

Davide Brini dave_br@gmx.com [sed-users]

2016-05-20 08:28:11 UTC

Permalink

On Thu, 19 May 2016 12:46:14 +0000 (UTC), "Ufuk Yildirim

Post by Ufuk Yildirim ***@yahoo.co.uk [sed-users]
With small test files, awk version Davide Brini suggested seems to work
perfectly. Thank you so much.
sed version, on the other hand, is not working as I wanted. All it does
is put all the lines from the txt files in a file in order, which is
problematic since some students appear twice.
ps. I am currently on Mac. No idea if that changes anything. Will try the
sed version again on linux, if that makes any difference. Thanks everyone
for your help. All the best,

My sed solution uses GNU sed, which is not installed by default on Mac.

--
D.

Jim Hill gjthill@gmail.com [sed-users]

2016-05-18 17:36:52 UTC

Permalink

sort file*.txt | sed -E ':a; N; s/^([^;]*;)(.*)\n\1(.*)/\1\2;\3/; ta;
P;D'

[Non-text portions of this message have been removed]