2018-05-06 18:40:26 -0500, Tim Chase ***@thechases.com [sed-users]:
[...]
Post by Tim Chase ***@thechases.com [sed-users]The violation is that the spec says that a <newline> or <semicolon>
MUST precede a closing brace. GNU sed doesn't adhere to this part of
the spec or enforce it. That means that people can end up writing
sed scripts that they think are POSIX compliant, but aren't; and when
those non-compliant scripts are run in a POSIX-compliant version of
sed, they break.
No again, that's a common misconception when interpreting a
standard specification and you wouldn't be the first one making
it, you'll see a lot of those on the Austin Group mailing list;
even implementators have been known to code into their
implementations what they thought was a requirement on *them* as
opposed to the applications using *them* (ending up reproducing
the same limitations/misfeatures of historical implementations).
A very important point about terminology when reading the spec
is *implementation* versus *application*.
POSIX specifies a *Portable* Operating System *Interface*.
That is it specifies a portable API. It tells *applications*
(the things that use the API; in the case of "sed", sed
scripts/invocations) how they should be written to be portable.
And it occasionally tells *implementations* (in this case, the
"sed" implementations like GNU, BSD, Solaris sed) how to do
things so they interpret portable code correctly.
POSIX doesn't specify what happens if an application doesn't
follow its specification like when a script does {s/x/y/}. It
doesn't specify what happens if you use the "i" flag (as in
s/x/y/i) or the "x" or "}" flag. It doesn't specify the "v" or
"k" sed commands, the -E, -i or --help options or if you use
"\t", "\+", "\|" in regexps outside of bracket expressions.
That doesn't mean that *implementations* have to report an error
when an *application* use any of these.
It would be silly if a specification like POSIX did that, if it
forbade extensions. That would mean that the interface would
have no chance of evolving.
POSIX is primarily *descriptive*, it describes a portable
interface, an interface portable across implementations that
were already existing, very rarely prescriptive (introduce new
APIs that implementations must start implementing to become
compliant).
The -E above is a good example. Since POSIX allows
*implementations* to support any option beside the ones it
specifies (but of course an *application* cannot use them),
several sed *implementations* have added a -E option to support
extended regexps. And as a result, that is being introduced in
a future edition of the POSIX specification
(http://austingroupbugs.net/view.php?id=528 scheduled for issue
8)
Post by Tim Chase ***@thechases.com [sed-users]Post by Stephane Chazelas ***@gmail.com [sed-users]POSIX doesn't specify the behaviour for {s/x/y/} so either
failing with an error, or the GNU behaviour (or any other
behaviour) are valid behaviours.
POSIX does define correctness though: it SHALL be preceded by a
semicolon or newline. From RFC2119
"MUST: This word, or the terms "REQUIRED" or "SHALL", mean that
the definition is an absolute requirement of the specification."
(*) https://www.ietf.org/rfc/rfc2119.txt
Note that POSIX has nothing to do with the IETF. It's a bad idea
to look at one specification to explain another specification.
For POSIX, look at
http://pubs.opengroup.org/onlinepubs/9699919799/xrat/V4_xbd_chap01.html#tag_21_01_05
which refers to ISO/IEC directives. Like
https://www.iso.org/foreword-supplementary-information.html
The "shall" in the text that was quoted earlier (POSIX generally
doesn't use "must" let alone "MUST") applies to *applications*
(portable applications that make use of that standard API,
shell/sed scripts).
Post by Tim Chase ***@thechases.com [sed-users]GNU sed does not treat a semicolon/newline-before-close-brace as
required, despite the spec requiring it.
It does treat it as required. sed '{s/x/y/;}' works as specified
in GNU sed. The behaviour for sed '{s/x/y}' however is not
specified in the POSIX spec. It is specified in the GNU sed
documentation however and again works as documented.
[...]
Post by Tim Chase ***@thechases.com [sed-users]Post by Stephane Chazelas ***@gmail.com [sed-users]POSIXLY_CORRECT is to have tools align with POSIX when they
don't by default.
Which is exactly what GNU sed is doing here, being misaligned with
POSIX so the best outcome would be for GNU sed to respect
No, POSIX *requires* sed '/[\t]/!d' to match on \ and t.
/[\t]/!d is a perfectly valid sed script whose behaviour is
fully specified by POSIX. An implementation like GNU sed without
POSIXLY_CORRECT that would fail to match on a line that contains
"t" would not be compliant.
On the other hand:
sed '{s/x/y/}' (or sed -E 's/x/y/' currently) is not a valid
POSIX sed invocation. The behaviour is unspecified as the
*application* did not obey the specification (POSIX doesn't
specify a "}" flag to the "s" command and requires the "}" to be
preceded by ";" or newline to be recognised as a closing group).
So sed *implementations* can do what they want here.
The closest POSIX will say to what you think it says is that
*if* an *implementation* considers it as a syntax error (or if
it doesn't recognise the -E option), then it shall report an
error on stderr (whose text is not specified) and exit with a
non-zero exit status.
--
Stephane