George Utley
2014-02-23 09:13:46 UTC
People:
I have a historical question about sed's development. Does anyone know why
Lee McMahon included the y (translate) command in its existing form? That
is, for what purposes did he intend it to be used?
People have come up with fiendishly clever ways to use it for mathematics,
but it's pretty clear to me that Lee hadn't intended it for that, given you
have to really fight and bastardise the language to use it for that
purpose. Similarly rot13 sometimes features in examples, but I question how
important rot13 is in real-world uses of sed.
I presume that case conversion and case-insensitive search/replace were the
predominant and predominantly intended use, since early seds had no other
support for these important functions. Indeed, grep's case-insensitivity
switch used to be "-y" rather than "-i" presumably in tribute to sed's y
command. Even with the y command, the frequent and obvious requirement of
case-insensitive search or replace still takes a lot of work that could
have been easily avoided with a simple option to the s command as per
today's gnu sed (although early grep had the same omission). For a language
notable for the brevity of its commands, the case conversion command using
y (i.e. y/abcdefghijklmnopqrstuvyz/ABCDEFGHIJKLMNOPQRSTUVYZ/)
is unnecessarily long and tedious to type and error-prone, stripped of the
obvious shortcuts found in the command's predecessors and successors in
other tools (for which see below).
The y command has also been used to insert newlines using "\n" back in the
days when the s command didn't allow "\n" in the replacement string. But
this is silly as an intended purpose, since the developers could easilly
have allowed "\n" in the s command's replacement string in the first place.
Does anyone know why "\n" wasn't initially allowed in the replacement
string of the s command?
The one irrepleaceable virtue of the y command is that it allows swapping
of individual characters using a single pass. If the s command is used
instead for this purpose, then an intermediate character is required for
the swap that the user must ensure will not otherwise appear in the text.
The problem gets correspondingly bigger for a 3-way, four-way etc swap.
That is in addition to the added length of needing multiple s commands.
However the biggest failing of the y command for this character-swapping
purpose is its unavoidably global scope, making it difficult or impossible
to apply the y command in specific regexp contexts without convoluted
operations involving the hold space. This limitation renders the y command
much less widely applicable than it might otherwise have been, forcing
users onto the s command instead for these situations.
I did some web browsing in an attempt to understand the historical context
in which Lee included the y command. Many of sed's features, syntax and
commands were inherited from ed. However y is not one of these, nor was
there any equivalent command in ed. In fact, judging by Lee's "Kubla Khan"
documentation for the original version of sed, the y command wasn't even
included in that first version. Nor, apart from perl, did any of sed's
descendants or cousins (e.g. awk, vi, vim, emacs) have an equivalent
command. These other programs just used specific commands for case
conversion.
The y command seems to have been inherited from Unix's "tr" file utility,
which in turn was inherited from Multics's "cvc" (convert_character) file
utility, which in turn was inherited from the "translate" function of
programming language PL/1 that ran under Multics.
It appears these predecessors to sed's y command were used primarily for
case conversion, inserting/removing/replacing field separators
and newlines, and in a very limited way for some character map translations
especially between BCD and EBCDIC/ASCII. Both tr and cvc operated on entire
files rather than specific contexts within those files, limiting other
potential uses for these utilities. But both cvc and tr had inbuilt options
to abbreviate and simplify these standard uses of the commands.
The tr utility even in its first version allowed specification of character
ranges in both search and replace strings (e.g. tr '[a-z]' 'A-Z'), as well
as the useful optional ability to delete, complement the specified search
character set, squeeze multiple repeats to a single character, and place
nonprintable characters into both search and replace strings using octal
escape sequences. sed's y command has none of these options except
(eventually) the last one. cvc is even easier, allowing the user to specify
simple options to the command for commonly used conversion sets, e.g. "cvc
uc" to convert files to upper case, in addition to supporting custom
translation strings as per sed's y command.
PL/1's translate function did not have such explicit shortcuts, but the
language had separate functions for case conversion. Because PL/1 was a
proper programming language, the translation strings could be rapidly and
easily populated anyway for non-printable characters and sequential ranges
using FOR loops. Plus the PL/1 "translate" function operated on individual
string arguments rather than entire files, and therefore could be deployed
in a context-dependent manner.
In short, I am struggling to understand why Lee included (though not
initially) the tr/cvc command in sed, yet stripped it of its most useful
features, and without adding any new useful features to compensate.
Character map translation has been more appropriately and easily achieved
through specific file utilities or inter-system comms drivers. The early
sed didn't even have internal support for specifying control characters
(apart from \n) or high ASCII characters, a showstopper for use in
character mapping. Case conversion could have been more easily achieved
either with a switch for the s command as per today's sed, or as a specific
standalone command as per emacs, vi and vim.
Unlike the s command, the right hand side of the final "/" character in the
y command is unused for parameters. This could have been used for context
control, e.g. make the translation only happen within parenthesised
portions of the regexp on the right hand side of the final "/". Such a
feature would have made the y command vastly more useful and would have fit
in neatly with the regexp-based philosophy of sed. Yet to this day no such
feature exists.
I have a historical question about sed's development. Does anyone know why
Lee McMahon included the y (translate) command in its existing form? That
is, for what purposes did he intend it to be used?
People have come up with fiendishly clever ways to use it for mathematics,
but it's pretty clear to me that Lee hadn't intended it for that, given you
have to really fight and bastardise the language to use it for that
purpose. Similarly rot13 sometimes features in examples, but I question how
important rot13 is in real-world uses of sed.
I presume that case conversion and case-insensitive search/replace were the
predominant and predominantly intended use, since early seds had no other
support for these important functions. Indeed, grep's case-insensitivity
switch used to be "-y" rather than "-i" presumably in tribute to sed's y
command. Even with the y command, the frequent and obvious requirement of
case-insensitive search or replace still takes a lot of work that could
have been easily avoided with a simple option to the s command as per
today's gnu sed (although early grep had the same omission). For a language
notable for the brevity of its commands, the case conversion command using
y (i.e. y/abcdefghijklmnopqrstuvyz/ABCDEFGHIJKLMNOPQRSTUVYZ/)
is unnecessarily long and tedious to type and error-prone, stripped of the
obvious shortcuts found in the command's predecessors and successors in
other tools (for which see below).
The y command has also been used to insert newlines using "\n" back in the
days when the s command didn't allow "\n" in the replacement string. But
this is silly as an intended purpose, since the developers could easilly
have allowed "\n" in the s command's replacement string in the first place.
Does anyone know why "\n" wasn't initially allowed in the replacement
string of the s command?
The one irrepleaceable virtue of the y command is that it allows swapping
of individual characters using a single pass. If the s command is used
instead for this purpose, then an intermediate character is required for
the swap that the user must ensure will not otherwise appear in the text.
The problem gets correspondingly bigger for a 3-way, four-way etc swap.
That is in addition to the added length of needing multiple s commands.
However the biggest failing of the y command for this character-swapping
purpose is its unavoidably global scope, making it difficult or impossible
to apply the y command in specific regexp contexts without convoluted
operations involving the hold space. This limitation renders the y command
much less widely applicable than it might otherwise have been, forcing
users onto the s command instead for these situations.
I did some web browsing in an attempt to understand the historical context
in which Lee included the y command. Many of sed's features, syntax and
commands were inherited from ed. However y is not one of these, nor was
there any equivalent command in ed. In fact, judging by Lee's "Kubla Khan"
documentation for the original version of sed, the y command wasn't even
included in that first version. Nor, apart from perl, did any of sed's
descendants or cousins (e.g. awk, vi, vim, emacs) have an equivalent
command. These other programs just used specific commands for case
conversion.
The y command seems to have been inherited from Unix's "tr" file utility,
which in turn was inherited from Multics's "cvc" (convert_character) file
utility, which in turn was inherited from the "translate" function of
programming language PL/1 that ran under Multics.
It appears these predecessors to sed's y command were used primarily for
case conversion, inserting/removing/replacing field separators
and newlines, and in a very limited way for some character map translations
especially between BCD and EBCDIC/ASCII. Both tr and cvc operated on entire
files rather than specific contexts within those files, limiting other
potential uses for these utilities. But both cvc and tr had inbuilt options
to abbreviate and simplify these standard uses of the commands.
The tr utility even in its first version allowed specification of character
ranges in both search and replace strings (e.g. tr '[a-z]' 'A-Z'), as well
as the useful optional ability to delete, complement the specified search
character set, squeeze multiple repeats to a single character, and place
nonprintable characters into both search and replace strings using octal
escape sequences. sed's y command has none of these options except
(eventually) the last one. cvc is even easier, allowing the user to specify
simple options to the command for commonly used conversion sets, e.g. "cvc
uc" to convert files to upper case, in addition to supporting custom
translation strings as per sed's y command.
PL/1's translate function did not have such explicit shortcuts, but the
language had separate functions for case conversion. Because PL/1 was a
proper programming language, the translation strings could be rapidly and
easily populated anyway for non-printable characters and sequential ranges
using FOR loops. Plus the PL/1 "translate" function operated on individual
string arguments rather than entire files, and therefore could be deployed
in a context-dependent manner.
In short, I am struggling to understand why Lee included (though not
initially) the tr/cvc command in sed, yet stripped it of its most useful
features, and without adding any new useful features to compensate.
Character map translation has been more appropriately and easily achieved
through specific file utilities or inter-system comms drivers. The early
sed didn't even have internal support for specifying control characters
(apart from \n) or high ASCII characters, a showstopper for use in
character mapping. Case conversion could have been more easily achieved
either with a switch for the s command as per today's sed, or as a specific
standalone command as per emacs, vi and vim.
Unlike the s command, the right hand side of the final "/" character in the
y command is unused for parameters. This could have been used for context
control, e.g. make the translation only happen within parenthesised
portions of the regexp on the right hand side of the final "/". Such a
feature would have made the y command vastly more useful and would have fit
in neatly with the regexp-based philosophy of sed. Yet to this day no such
feature exists.