Discussion:
Historical question on y command
George Utley
2014-02-23 09:13:46 UTC
Permalink
People:

I have a historical question about sed's development. Does anyone know why
Lee McMahon included the y (translate) command in its existing form? That
is, for what purposes did he intend it to be used?

People have come up with fiendishly clever ways to use it for mathematics,
but it's pretty clear to me that Lee hadn't intended it for that, given you
have to really fight and bastardise the language to use it for that
purpose. Similarly rot13 sometimes features in examples, but I question how
important rot13 is in real-world uses of sed.

I presume that case conversion and case-insensitive search/replace were the
predominant and predominantly intended use, since early seds had no other
support for these important functions. Indeed, grep's case-insensitivity
switch used to be "-y" rather than "-i" presumably in tribute to sed's y
command. Even with the y command, the frequent and obvious requirement of
case-insensitive search or replace still takes a lot of work that could
have been easily avoided with a simple option to the s command as per
today's gnu sed (although early grep had the same omission). For a language
notable for the brevity of its commands, the case conversion command using
y (i.e. y/abcdefghijklmnopqrstuvyz/ABCDEFGHIJKLMNOPQRSTUVYZ/)
is unnecessarily long and tedious to type and error-prone, stripped of the
obvious shortcuts found in the command's predecessors and successors in
other tools (for which see below).

The y command has also been used to insert newlines using "\n" back in the
days when the s command didn't allow "\n" in the replacement string. But
this is silly as an intended purpose, since the developers could easilly
have allowed "\n" in the s command's replacement string in the first place.
Does anyone know why "\n" wasn't initially allowed in the replacement
string of the s command?

The one irrepleaceable virtue of the y command is that it allows swapping
of individual characters using a single pass. If the s command is used
instead for this purpose, then an intermediate character is required for
the swap that the user must ensure will not otherwise appear in the text.
The problem gets correspondingly bigger for a 3-way, four-way etc swap.
That is in addition to the added length of needing multiple s commands.
However the biggest failing of the y command for this character-swapping
purpose is its unavoidably global scope, making it difficult or impossible
to apply the y command in specific regexp contexts without convoluted
operations involving the hold space. This limitation renders the y command
much less widely applicable than it might otherwise have been, forcing
users onto the s command instead for these situations.

I did some web browsing in an attempt to understand the historical context
in which Lee included the y command. Many of sed's features, syntax and
commands were inherited from ed. However y is not one of these, nor was
there any equivalent command in ed. In fact, judging by Lee's "Kubla Khan"
documentation for the original version of sed, the y command wasn't even
included in that first version. Nor, apart from perl, did any of sed's
descendants or cousins (e.g. awk, vi, vim, emacs) have an equivalent
command. These other programs just used specific commands for case
conversion.

The y command seems to have been inherited from Unix's "tr" file utility,
which in turn was inherited from Multics's "cvc" (convert_character) file
utility, which in turn was inherited from the "translate" function of
programming language PL/1 that ran under Multics.

It appears these predecessors to sed's y command were used primarily for
case conversion, inserting/removing/replacing field separators
and newlines, and in a very limited way for some character map translations
especially between BCD and EBCDIC/ASCII. Both tr and cvc operated on entire
files rather than specific contexts within those files, limiting other
potential uses for these utilities. But both cvc and tr had inbuilt options
to abbreviate and simplify these standard uses of the commands.

The tr utility even in its first version allowed specification of character
ranges in both search and replace strings (e.g. tr '[a-z]' 'A-Z'), as well
as the useful optional ability to delete, complement the specified search
character set, squeeze multiple repeats to a single character, and place
nonprintable characters into both search and replace strings using octal
escape sequences. sed's y command has none of these options except
(eventually) the last one. cvc is even easier, allowing the user to specify
simple options to the command for commonly used conversion sets, e.g. "cvc
uc" to convert files to upper case, in addition to supporting custom
translation strings as per sed's y command.

PL/1's translate function did not have such explicit shortcuts, but the
language had separate functions for case conversion. Because PL/1 was a
proper programming language, the translation strings could be rapidly and
easily populated anyway for non-printable characters and sequential ranges
using FOR loops. Plus the PL/1 "translate" function operated on individual
string arguments rather than entire files, and therefore could be deployed
in a context-dependent manner.

In short, I am struggling to understand why Lee included (though not
initially) the tr/cvc command in sed, yet stripped it of its most useful
features, and without adding any new useful features to compensate.
Character map translation has been more appropriately and easily achieved
through specific file utilities or inter-system comms drivers. The early
sed didn't even have internal support for specifying control characters
(apart from \n) or high ASCII characters, a showstopper for use in
character mapping. Case conversion could have been more easily achieved
either with a switch for the s command as per today's sed, or as a specific
standalone command as per emacs, vi and vim.

Unlike the s command, the right hand side of the final "/" character in the
y command is unused for parameters. This could have been used for context
control, e.g. make the translation only happen within parenthesised
portions of the regexp on the right hand side of the final "/". Such a
feature would have made the y command vastly more useful and would have fit
in neatly with the regexp-based philosophy of sed. Yet to this day no such
feature exists.
Pedro Izecksohn
2014-02-24 00:28:50 UTC
Permalink
Sent: Sunday, February 23, 2014 6:13 AM
Subject: Historical question on y command
I have a historical question about sed's development. Does anyone know why Lee McMahon included the y (translate) command in its existing form? That is, for what purposes did he intend it to be used?
  Before I answer your question you need to know that sed was not developed by Lee McMahon alone: He was helped by 2 other guys whose names are not in the sed's first public documentation: I was one of those guys. (I invented the 'D' command and the back-slash followed by the line-feed that follows the commands 'a', 'i' and 'c'. And I regret having suggested all these features.)

  On those days of teletypes, people that used to use other alphabets together with ASCII used to use character encodings specified by the ISO 8859.

  I was inside that post-office room where sed was developed when McMahon came with a problem: Someone was writing a book about Jewish culture and to write that text he needed to include Latin characters, Hebrew characters and characters transliterated from Hebrew to Latin on the same line.
People have come up with fiendishly clever ways to use it for mathematics, but it's pretty clear to me that Lee hadn't intended it for that, given you have to really fight and bastardise the language to use it for that purpose. Similarly rot13 sometimes features in examples, but I question how important rot13 is in real-world uses of sed.
  These uses were also discussed there.
I presume that case conversion and case-insensitive search/replace were the predominant and predominantly intended use, since early seds had no other support for these important functions. Indeed, grep's case-insensitivity switch used to be "-y" rather than "-i" presumably in tribute to sed's y command. Even with the y command, the frequent and obvious requirement of case-insensitive search or replace still takes a lot of work that could have been easily avoided with a simple option to the s command as per today's gnu sed (although early grep had the same omission). For a language notable for the brevity of its commands, the case conversion command using y (i.e. y/abcdefghijklmnopqrstuvyz/ABCDEFGHIJKLMNOPQRSTUVYZ/)
is unnecessarily long and tedious to type and error-prone, stripped of the obvious shortcuts found in the command's predecessors and successors in other tools (for which see below).
The y command has also been used to insert newlines using "\n" back in the days when the s command didn't allow "\n" in the replacement string. But this is silly as an intended purpose, since the developers could easilly have allowed "\n" in the s command's replacement string in the first place. Does anyone know why "\n" wasn't initially allowed in the replacement string of the s command?
  What I remember is this: I asked: "-Which would be the escape character?" The answer was: "-The back-slash as common." Then I rejoined: "-Why not use the character generated by the Esc key?"

  So probably the answer to this question is this: The escape character had not been decided yet.
The one irrepleaceable virtue of the y command is that it allows swapping of individual characters using a single pass. If the s command is used instead for this purpose, then an intermediate character is required for the swap that the user must ensure will not otherwise appear in the text. The problem gets correspondingly bigger for a 3-way, four-way etc swap. That is in addition to the added length of needing multiple s commands. However the biggest failing of the y command for this character-swapping purpose is its unavoidably global scope, making it difficult or impossible to apply the y command in specific regexp contexts without convoluted operations involving the hold space. This limitation renders the y command much less widely applicable than it might otherwise have been, forcing users onto the s command instead for these situations.
  The original inventor of REGEX was against the name REGEXP. He used to say: "-The final 'P' is unnecessary and the word sounds much better without the final 'P'."
I did some web browsing in an attempt to understand the historical context in which Lee included the y command. Many of sed's features, syntax and commands were inherited from ed. However y is not one of these, nor was there any equivalent command in ed. In fact, judging by Lee's "Kubla Khan" documentation for the original version of sed, the y command wasn't even included in that first version. Nor, apart from perl, did any of sed's descendants or cousins (e.g. awk, vi, vim, emacs) have an equivalent command. These other programs just used specific commands for case conversion.
The y command seems to have been inherited from Unix's "tr" file utility, which in turn was inherited from Multics's "cvc" (convert_character) file utility, which in turn was inherited from the "translate" function of programming language PL/1 that ran under Multics.
It appears these predecessors to sed's y command were used primarily for case conversion, inserting/removing/replacing field separators and newlines, and in a very limited way for some character map translations especially between BCD and EBCDIC/ASCII. Both tr and cvc operated on entire files rather than specific contexts within those files, limiting other potential uses for these utilities. But both cvc and tr had inbuilt options to abbreviate and simplify these standard uses of the commands.
The tr utility even in its first version allowed specification of character ranges in both search and replace strings (e.g. tr '[a-z]'  'A-Z'), as well as the useful optional ability to delete, complement the specified search character set, squeeze multiple repeats to a single character, and place nonprintable characters into both search and replace strings using octal escape sequences. sed's y command has none of these options except (eventually) the last one. cvc is even easier, allowing the user to specify simple options to the command for commonly used conversion sets, e.g. "cvc uc" to convert files to upper case, in addition to supporting custom translation strings as per sed's y command.
PL/1's translate function did not have such explicit shortcuts, but the language had separate functions for case conversion. Because PL/1 was a proper programming language, the translation strings could be rapidly and easily populated anyway for non-printable characters and sequential ranges using FOR loops. Plus the PL/1 "translate" function operated on individual string arguments rather than entire files, and therefore could be deployed in a context-dependent manner.
In short, I am struggling to understand why Lee included (though not initially) the tr/cvc command in sed, yet stripped it of its most useful features, and without adding any new useful features to compensate.
Character map translation has been more appropriately and easily achieved through specific file utilities or inter-system comms drivers. The early sed didn't even have internal support for specifying control characters (apart from \n) or high ASCII characters, a showstopper for use in character mapping. Case conversion could have been more easily achieved either with a switch for the s command as per today's sed, or as a specific standalone command as per emacs, vi and vim.
  sed was not intended to be used to modify binary data nor to be used with multi-bytes character sets. I'm almost sure that the C functions and data types that deal with multi-bytes character sets had not been standardized yet. And the digitizer was lazy and said to McMahon: "-Do you want to modify what I already typed? Do it yourself. For my purpose (this meant: "For the purpose of the post-office") it is already good enough. If someone want to use sed for some text that use some multi-byte character set then he should use some other tool to convert the text." McMahon then asked him: "-What if someone want to write a text that use 3 different alphabets?" Then the digitizer rejoined: "-I work here sending e-mails every day for almost X years and never anyone brought a text like this for me to type it."
Unlike the s command, the right hand side of the final "/" character in the y command is unused for parameters. This could have been used for context control, e.g. make the translation only happen within parenthesised portions of the regexp on the right hand side of the final "/". Such a feature would have made the y command vastly more useful and would have fit in neatly with the regexp-based philosophy of sed. Yet to this day no such feature exists.
  If you don't bother of writing free software then you may modify Gnu sed, sign the paper and post the diff for the FSF.
  I hope that I have answered all your questions. You made me happy for have given me the opportunity to write this message: Thank you.
d***@ehdp.com
2014-02-24 18:19:10 UTC
Permalink
That is pretty incredible to have someone who helped originally develop sed. You give a valuable perspective about sed. Thank you for clearing up some of the history, and setting the record straight.


I have no idea about the history of the y command. FWIW, I wrote the following lines concerning the y command:


y is a poor substitute for the Unix 'tr' command, because y does not use character classes (such as '0-9' or [:alpha:]). For complex sed scripts, y is often necessary. But a more practical approach is usually to use short sed scripts from within a shell script, along with tr and other Unix commands.


I have sometimes wondered why y was called y. My suggestion is that it stands for "why". In many years of using sed, thousands of (admittedly utilitarian) sed scripts, I doubt I have ever used y.


Daniel
Tim Chase
2014-02-24 18:56:05 UTC
Permalink
On 2014-02-23 16:28, Pedro Izecksohn wrote:

I agree that it's a delight to see the old guard hanging out on the
list after such a long time.
Post by Pedro Izecksohn
I invented the 'D' command and the back-slash followed by the
line-feed that follows the commands 'a', 'i' and 'c'. And I regret
having suggested all these features.)
While you're welcome to regret the backslash-linefeed for a/i/c,
please don't regret "D", as I use it enough to miss it considerably
were it not to be there. Even if it could roughly be replicated with
something like

s/^[^\n]*\n//;n

However, that could alter the successful-substitution flag for t/T
which is a side effect that would make me need to work around.

So thanks for your and all the others who make & maintain sed!

-tkc
George Utley
2014-02-25 04:33:52 UTC
Permalink
Thanks a lot for that, Pedro, that was awesome. I never imagined I would
ever get a reply from someone who was actually there at sed's creation, let
alone that the original developers were still alive.

So if I understand you correctly, the y command was initially intended for
transliteration WITHIN a character set between high-ASCII ISO8859 foreign
characters such as Hebrew, and the equivalent low-ASCII Latin characters,
within parts of an otherwise English document? That would certainly explain
the lack of support for character ranges or control characters, as the
equivalent characters would not line up so neatly, and the line-oriented
rather than regex-oriented nature of the command. And this application
certainly wouldn't be catered for by the existing whole-file utility tr.
Now it all makes sense. But this intended use was thwarted by a lazy
programmer who couldn't be bothered fixing his code for high-ASCII support?
Note that by "high ASCII" I don't mean multibyte, I mean 8 bit characters
127. I have seen the source code for the version 7 Unix sed command, and
the y command's lookup table strips the high bit of each source character
and stops at offset 0177, apparently permitting translation from low ASCII
to high ASCII but not from high ASCII to low ASCII, which seems the
opposite of Lee's intentions.

And don't apologise for the D command, as I use sed primarily in multiline
mode and need it. But now that you bring it up, what problem was the
obligatory backslashed newline after the i/c/a commands intended to solve?
If it was to allow whitespace in front of the first displayed character,
couldn't that have been achieved just as easily by starting the displayed
text at the second character after the command, regardless of whether or
not the second character was whitespace? This at least would have been
consistent with the similarly finicky r & w commands, which used to
require exactly one space after the command and interpreted the subsequent
characters as the parameter.
Before I answer your question you need to know that sed was not
developed by Lee McMahon alone: He was helped by 2 other guys whose names
are not in the sed's first public documentation: I was one of those guys.
(I invented the 'D' command and the back-slash followed by the line-feed
that follows the commands 'a', 'i' and 'c'. And I regret having suggested
all these features.)
On those days of teletypes, people that used to use other alphabets
together with ASCII used to use character encodings specified by the ISO
8859.
I was inside that post-office room where sed was developed when McMahon
came with a problem: Someone was writing a book about Jewish culture and to
write that text he needed to include Latin characters, Hebrew characters
and characters transliterated from Hebrew to Latin on the same line.
Post by George Utley
People have come up with fiendishly clever ways to use it for
mathematics, but it's pretty clear to me that Lee hadn't intended it for
that, given you have to really fight and bastardise the language to use it
for that purpose. Similarly rot13 sometimes features in examples, but I
question how important rot13 is in real-world uses of sed.
These uses were also discussed there.
Post by George Utley
The y command has also been used to insert newlines using "\n" back in
the days when the s command didn't allow "\n" in the replacement string.
But this is silly as an intended purpose, since the developers could
easilly have allowed "\n" in the s command's replacement string in the
first place. Does anyone know why "\n" wasn't initially allowed in the
replacement string of the s command?
What I remember is this: I asked: "-Which would be the escape
"-Why not use the character generated by the Esc key?"
So probably the answer to this question is this: The escape character
had not been decided yet.
sed was not intended to be used to modify binary data nor to be used
with multi-bytes character sets. I'm almost sure that the C functions and
data types that deal with multi-bytes character sets had not been
standardized yet. And the digitizer was lazy and said to McMahon: "-Do you
want to modify what I already typed? Do it yourself. For my purpose (this
meant: "For the purpose of the post-office") it is already good enough. If
someone want to use sed for some text that use some multi-byte character
set then he should use some other tool to convert the text." McMahon then
asked him: "-What if someone want to write a text that use 3 different
alphabets?" Then the digitizer rejoined: "-I work here sending e-mails
every day for almost X years and never anyone brought a text like this for
me to type it."
Post by George Utley
Unlike the s command, the right hand side of the final "/" character in
the y command is unused for parameters. This could have been used for
context control, e.g. make the translation only happen within parenthesised
portions of the regexp on the right hand side of the final "/". Such a
feature would have made the y command vastly more useful and would have fit
in neatly with the regexp-based philosophy of sed. Yet to this day no such
feature exists.
If you don't bother of writing free software then you may modify Gnu
sed, sign the paper and post the diff for the FSF.
I hope that I have answered all your questions. You made me happy for
have given me the opportunity to write this message: Thank you.
Pedro Izecksohn
2014-03-02 03:54:53 UTC
Permalink
Thanks a lot for that, Pedro, that was awesome. I never imagined I would ever get a reply from someone who was actually there at sed's creation, let alone that the original developers were still alive.
So if I understand you correctly, the y command was initially intended for transliteration WITHIN a character set between high-ASCII ISO8859 foreign characters such as Hebrew, and the equivalent low-ASCII Latin characters, within parts of an otherwise English document? That would certainly explain the lack of support for character ranges or control characters, as the equivalent characters would not line up so neatly, and the line-oriented rather than regex-oriented nature of the command. And this application certainly wouldn't be catered for by the existing whole-file utility tr. Now it all makes sense.
But this intended use was thwarted by a lazy programmer who couldn't be bothered fixing his code for high-ASCII support? Note that by "high ASCII" I don't mean multibyte, I mean 8 bit characters >127. I have seen the source code for the version 7 Unix sed command, and the y command's lookup table strips the high bit of each source character and stops at offset 0177, apparently permitting translation from low ASCII to high ASCII but not from high ASCII to low ASCII, which seems the opposite of Lee's intentions.
  Could you send that source code to me? I use the Gnu sed.

  I remember they talking about this, but I don't remember the content of that talk.

  What I remember is that at that time the size in bits of the char type was different on different platforms and I know that the standard does not define if the char type is signed or unsigned.
And don't apologise for the D command, as I use sed primarily in multiline mode and need it. But now that you bring it up, what problem was the obligatory backslashed newline after the i/c/a commands intended to solve? If it was to allow whitespace in front of the first displayed character, couldn't that have been achieved just as easily by starting the displayed text at the second character after the command, regardless of whether or not the second character was whitespace? This at least would have been consistent with the similarly finicky r & w commands, which used to require exactly one space after the command and interpreted the subsequent characters as the parameter.
  1) We, Lee McMahon and I, expected that in the future multicharacter commands would be added to the list of sed's commands.
  2) To understand the logic behind the 'a\\n' command a small example is needed:

***@microboard:~/programming/sed/George Utley's question$ cat test.txt
Hello
world.
***@microboard:~/programming/sed/George Utley's question$ cat test.txt | sed '1a insertion'
Hello
insertion
world.

  3) It seemed logic when the specification was written, that was before the source code had been written, that a content that must occupy at least a whole line should be separated from the commands by a line-feed.
  4) It was the typist who decided the separator that is needed after the r and w commands. And he was against the use of the backslash after the i, c, and a commands.
  5) Lee E. McMahon was punished with a punch on each of his eyes for not have included our names on sed's specification.

  Consistency may be expected from a single person but not from different people.

  The inconsistencies of sed prove that it was written by a group of people who did not communicate well.

  The lack of communication between us may be explained by some facts: The typist lived and worked on Niterói. I also lived on Niterói but I used to go to the post office every day in the morning and used to stay there for just about an hour. Lee McMahon lived on Rio de Janeiro and was officially allocated to work on Rio de Janeiro but used to go to Niterói every workday to talk to us personally. The sed's specification was typed hurriedly by McMahon and it was not revised by me nor by the typist until it had been published.
George Utley
2014-03-15 11:47:55 UTC
Permalink
Sorry about the delay in replying, I haven't been checking this account
much lately.

Historical Unix source code is available at
http://minnie.tuhs.org/cgi-bin/utree.pl and elsewhere.

Actually, thinking more about Lee's intended application for the y command,
it actually is catered for by the original y command and remains relevant
today. If you only have a US English keyboard and need some lines in
foreign alphabets such as Greek or Hebrew or Cyrillic, you can type them
quickly using your preferred US English equivalents for each character,
either phonetic or graphical, using whatever transliteration makes most
sense to you personally. Then you could transliterate the foreign lines en
masse using the y command. Only nowadays, the target characters would
probably be in UTF-8 rather than single-byte high ASCII.

Fascinating that sed was developed in Brazil. I never would have guessed. I
had always assumed that everything about Research Unix was developed in
Bell Labs' New York office.

Like many others, I pretty much always write my sed scripts as command line
one-liners, sometimes within pipelines, running them and editing them
repeatedly until they give the desired result. Therefore the obligatory
newline for the i/c/a/r commands is a hindrance to this. However I have
come to value these commands for the fact that they let you output text
while leaving the pattern space unchanged. Keeping track of the effect of
each substitution on subsequent regex matches is a headache and would have
to be one of the biggest sources of error in sed scripts. The i/a/r
commands let you avoid this. For me it would have been preferable (albeit
incompatible with ed) if there were no obligatory newlines in either these
commands or their output. By contrast, I would have preferred if the "="
command had put the line number at the start of pattern space followed by a
space, rather than direct to the output on its own line. Still, sed remains
my most used Unix tool, so despite the quibbles, you and Lee must have been
doing something right.
Post by George Utley
Post by George Utley
Thanks a lot for that, Pedro, that was awesome. I never imagined I would
ever get a reply from someone who was actually there at sed's creation, let
alone that the original developers were still alive.
Post by George Utley
So if I understand you correctly, the y command was initially intended
for transliteration WITHIN a character set between high-ASCII ISO8859
foreign characters such as Hebrew, and the equivalent low-ASCII Latin
characters, within parts of an otherwise English document? That would
certainly explain the lack of support for character ranges or control
characters, as the equivalent characters would not line up so neatly, and
the line-oriented rather than regex-oriented nature of the command. And
this application certainly wouldn't be catered for by the existing
whole-file utility tr. Now it all makes sense.
Post by George Utley
But this intended use was thwarted by a lazy programmer who couldn't be
bothered fixing his code for high-ASCII support? Note that by "high ASCII"
I don't mean multibyte, I mean 8 bit characters >127. I have seen the
source code for the version 7 Unix sed command, and the y command's lookup
table strips the high bit of each source character and stops at offset
0177, apparently permitting translation from low ASCII to high ASCII but
not from high ASCII to low ASCII, which seems the opposite of Lee's
intentions.
Could you send that source code to me? I use the Gnu sed.
I remember they talking about this, but I don't remember the content of
that talk.
What I remember is that at that time the size in bits of the char type
was different on different platforms and I know that the standard does not
define if the char type is signed or unsigned.
Post by George Utley
And don't apologise for the D command, as I use sed primarily in
multiline mode and need it. But now that you bring it up, what problem was
the obligatory backslashed newline after the i/c/a commands intended to
solve? If it was to allow whitespace in front of the first displayed
character, couldn't that have been achieved just as easily by starting the
displayed text at the second character after the command, regardless of
whether or not the second character was whitespace? This at least would
have been consistent with the similarly finicky r & w commands, which used
to require exactly one space after the command and interpreted
the subsequent characters as the parameter.
1) We, Lee McMahon and I, expected that in the future multicharacter
commands would be added to the list of sed's commands.
2) To understand the logic behind the 'a\\n' command a small example is
Hello
world.
| sed '1a insertion'
Hello
insertion
world.
3) It seemed logic when the specification was written, that was before
the source code had been written, that a content that must occupy at least
a whole line should be separated from the commands by a line-feed.
4) It was the typist who decided the separator that is needed after the
r and w commands. And he was against the use of the backslash after the i,
c, and a commands.
5) Lee E. McMahon was punished with a punch on each of his eyes for not
have included our names on sed's specification.
Consistency may be expected from a single person but not from different
people.
The inconsistencies of sed prove that it was written by a group of
people who did not communicate well.
The lack of communication between us may be explained by some facts: The
typist lived and worked on Niterói. I also lived on Niterói but I used to
go to the post office every day in the morning and used to stay there for
just about an hour. Lee McMahon lived on Rio de Janeiro and was officially
allocated to work on Rio de Janeiro but used to go to Niterói every workday
to talk to us personally. The sed's specification was typed hurriedly by
McMahon and it was not revised by me nor by the typist until it had been
published.
George Utley
2014-03-18 23:59:26 UTC
Permalink
Looking around further at minnie.tuhs.org, the AUSAM version of Unix, based
on version 6 Unix, includes a pre-release version of sed from 1975,
predating the Kubla Khan doc by years. Although recognisably still sed,
it's very different to the version we know today, much more similar to ed.
It's fascinating to see how it evolved.
Unsurprisingly the y command is missing. Perhaps more surprisingly:
-There is no hold space, and therefore no commands using the hold space
(today's g,h,G,H, and x commands).
-Although g & v commands exist, these are not today's g & v commands.
Rather they are a clone of ed's g & v commands, which were used for
applying multiple commands to lines matching or not matching a specific
regex. As such the g command is redundant in sed.
-There is no provision for ! to negate the addresses. Instead you have to
use the abovementioned v command.
-The b (branch) command is called the j (jump) command, which I personally
prefer. I guess Lee wanted to avoid confusion with ed's j (join) command in
the day's when compatibility with ed was more important.
-The l command is parsed but not implemented. It is just a Do Nothing
command at this stage.
-The p & P cpommands print nothing unless the -n flag is in effect, unlike
today's sed which prints the pattern space twice.
-There are mysterious O, W and e commands provided for in the header file
but not parsed or implemented. This remained true for the first official
1979 release.
-There is no semi-colon separator for commands. The commands must be one
per line, so no sed one-liners.
-As per ed, there is no choice of field separator for regex addresses; only
"/" can be used.
Post by George Utley
Historical Unix source code is available at
http://minnie.tuhs.org/cgi-bin/utree.pl and elsewhere.
Post by George Utley
Post by George Utley
Thanks a lot for that, Pedro, that was awesome. I never imagined I would
ever get a reply from someone who was actually there at sed's creation, let
alone that the original developers were still alive.
Post by George Utley
So if I understand you correctly, the y command was initially intended
for transliteration WITHIN a character set between high-ASCII ISO8859
foreign characters such as Hebrew, and the equivalent low-ASCII Latin
characters, within parts of an otherwise English document? That would
certainly explain the lack of support for character ranges or control
characters, as the equivalent characters would not line up so neatly, and
the line-oriented rather than regex-oriented nature of the command. And
this application certainly wouldn't be catered for by the existing
whole-file utility tr. Now it all makes sense.
Post by George Utley
But this intended use was thwarted by a lazy programmer who couldn't be
bothered fixing his code for high-ASCII support? Note that by "high ASCII"
I don't mean multibyte, I mean 8 bit characters >127. I have seen the
source code for the version 7 Unix sed command, and the y command's lookup
table strips the high bit of each source character and stops at offset
0177, apparently permitting translation from low ASCII to high ASCII but
not from high ASCII to low ASCII, which seems the opposite of Lee's
intentions.
Could you send that source code to me? I use the Gnu sed.
I remember they talking about this, but I don't remember the content of
that talk.
What I remember is that at that time the size in bits of the char type
was different on different platforms and I know that the standard does not
define if the char type is signed or unsigned.
Post by George Utley
And don't apologise for the D command, as I use sed primarily in
multiline mode and need it. But now that you bring it up, what problem was
the obligatory backslashed newline after the i/c/a commands intended to
solve? If it was to allow whitespace in front of the first displayed
character, couldn't that have been achieved just as easily by starting the
displayed text at the second character after the command, regardless of
whether or not the second character was whitespace? This at least would
have been consistent with the similarly finicky r & w commands, which used
to require exactly one space after the command and interpreted
the subsequent characters as the parameter.
1) We, Lee McMahon and I, expected that in the future multicharacter
commands would be added to the list of sed's commands.
2) To understand the logic behind the 'a\\n' command a small example is
Hello
world.
| sed '1a insertion'
Hello
insertion
world.
3) It seemed logic when the specification was written, that was before
the source code had been written, that a content that must occupy at least
a whole line should be separated from the commands by a line-feed.
4) It was the typist who decided the separator that is needed after the
r and w commands. And he was against the use of the backslash after the i,
c, and a commands.
5) Lee E. McMahon was punished with a punch on each of his eyes for not
have included our names on sed's specification.
Consistency may be expected from a single person but not from different
people.
The inconsistencies of sed prove that it was written by a group of
people who did not communicate well.
The typist lived and worked on Niterói. I also lived on Niterói but I used
to go to the post office every day in the morning and used to stay there
for just about an hour. Lee McMahon lived on Rio de Janeiro and was
officially allocated to work on Rio de Janeiro but used to go to Niterói
every workday to talk to us personally. The sed's specification was typed
hurriedly by McMahon and it was not revised by me nor by the typist until
it had been published.
Loading...