I also found this puzzling and confusing. This was a new one for me.
Looking through the official documentation, here is what I found on the
"Open Group": "1. A <circumflex> ( '^' ) outside a bracket expression
shall anchor the expression or subexpression it begins to the beginning
of a string; such an expression or subexpression can match only a
sequence starting at the first character of a string. For example, the
EREs "^ab" and "(^ab)" match "ab" in the string "abcdef", but fail to
match in the string "cdefab", and the ERE "a^b" is valid, but can never
match because the 'a' prevents the expression "^b" from matching
starting at the first character."
This makes it clear that with ERE "a^b" can never match. Now why they
would design something that can never match is beyond me. :( Anyway,
that's the way it is.
Here is what I found testing sed, using variations on your examples:
$ sed --version
GNU sed version 4.2.1
///////// First, basic regular expression (BRE) (no -r)
----- As BRE, ^ as char #1 means "begin pattern space"
----- We already knew that.
$ echo 'xaby' | sed 's/^xa/==/'
==by
----- As BRE, ^ NOT char #1 means "literal ^ character"
----- We already knew that.
$ echo 'xa^by' | sed 's/a^b/===/'
x===y
----- As BRE, \^ ALWAYS means "literal ^ character"
----- We already knew that.
$ echo 'xa^by' | sed 's/a\^b/===/'
x===y
$ echo 'xaby' | sed 's/\^xa/==/'
xaby
$ echo '^xaby' | sed 's/\^xa/==/'
==by
///////// Next, extended regular expression (ERE) (-r)
----- As ERE, ^ as char #1 means "begin pattern space"
----- We already knew that. Same as BRE behavior.
$ echo 'xaby' | sed -r 's/^xa/==/'
==by
----- As ERE, ^ NOT char #1 still means "begin pattern space"
----- Big surprise, at least to me. "a^b" NEVER matches as ERE.
$ echo 'xa^by' | sed -r 's/a^b/===/'
xa^by
----- As ERE, \^ ALWAYS means "literal ^ character"
----- We already kind of knew that. Same as BRE behavior.
$ echo 'xa^by' | sed -r 's/a\^b/===/'
x===y
$ echo 'xaby' | sed -r 's/\^xa/==/'
xaby
$ echo '^xaby' | sed -r 's/\^xa/==/'
==by
Assuming I have not missed something or made a mistake, here is what I
learned:
If using ^ as a literal in the middle of BRE, it's perhaps better to use
\^ instead, so the ERE will also work OK. Instead of matching to "a^b"
(admittedly ambiguous), it's maybe better to always use "a\^b" to make
clear matching literal hat character. Same logic applies to dollar sign,
as "e$f" can never match as ERE.
I don't think this quirky behavior was generally recognized before your
post. I appreciate you found this.
Daniel
Post by ***@hotmail.com [sed-users]I am trying to understand the working of ^ and $ in the middle of a
regexp with gnused 4.2.1. The help file says they are considered as
normal characters (except at start and end of sub expressions).
sed -n -e "/a^b/p" input.txt
sed -n -e "/\(a^b\)/p" input.txt
sed -r -n -e "/a^b/p" input.txt
sed -r -n -e "/(a^b)/p" input.txt
This is puzzling. Am I missing something?
Visit Your Group
<https://groups.yahoo.com/neo/groups/sed-users/info;_ylc=X3oDMTJldnJxbjVsBF9TAzk3MzU5NzE0BGdycElkAzI0ODk2MzkEZ3Jwc3BJZAMxNzA5MzM1MDAyBHNlYwN2dGwEc2xrA3ZnaHAEc3RpbWUDMTQwNDEzOTM3MA-->
Yahoo! Groups
<https://groups.yahoo.com/neo;_ylc=X3oDMTJkaThtOTdpBF9TAzk3NDc2NTkwBGdycElkAzI0ODk2MzkEZ3Jwc3BJZAMxNzA5MzM1MDAyBHNlYwNmdHIEc2xrA2dmcARzdGltZQMxNDA0MTM5Mzcw>
⢠Privacy <https://info.yahoo.com/privacy/us/yahoo/groups/details.html>
⢠Unsubscribe
Terms of Use <https://info.yahoo.com/legal/us/yahoo/utos/terms/>