Thanks ***@gmail.com.
Could you tell me the official download location of
sed 4.2.2.93-31c8-dirty - windows binary?
Thanks Thierry,
your response about the "interpretation" was very important for me.
Following your suggestion I wrote a sed script (sedcmd.txt) containing
all the desired steps. Help to improve it is still much appreciated.
The real world environment:
A huge server repository of papers from medical journals (free full text).
This server is a unix server.
It's shared to a several number of users (mostly students).
They do have the rights to add and to download papers into this repository,
often without sense for simple rules of filename conventions and ordering.
In some rare cases the filenames are too long if copied from the
server to a local windows PC (259+null chars boundary) and must be
snipped somehow. But this is another task I think.
We (me in most cases) are in need to name (or rename :-( ) these files with
a beginning year "four digits_" followed by the word "Review_"
(if this appears somewhere in the filename).
It's not possible to change all naming mistakes but the code below tries
to cover the most common issues here.
Regarding the y command I re-arranged some other commands to work
with sed v407 from the UnxUtils.
Command: DIR /B /S /A:-D "DirPath"|sed -r -f "sedcmd.txt" > "OutFile"
--- start sedcmd.txt---
# copy line to hold space
h
# delete all up to the last backslash
s/.*\\//
# add "e" to all "ÃÃÃÀÌö"
s/[ÃÃÃ]/\0e/g
s/[ÀÌö]/\0e/g
# replace "ÃÃÃÀÌö" with "AOUauo"
y/ÃÃÃÀÌö/AOUauo/
# replace all "Ã" with "ss"
s/Ã/ss/g
# replace one or more occurences of
# "PlusMinusHyphenPeriodCommaUnderscoreAmpersandMiddotDashes"
# with one "Hyphen" (all)
s/[+-.,_&\d183\d150\d151]{1,}/-/g
# print filename extension and add one preceding "Dot" for lines
# containing a "Hyphen" and at least "1-3 chars" at end of line
# (dot has been overwritten due to the preceding command)
# (should be improved to cover any filename extension)
s/-(.{1,3}$)/.\1/
# replace two or more occurences of a "Hyphen" with one "Hyphen" (all)
s/-{2,}/-/g
# print "1-4 digits" (= Year) and add one "Underscore" for lines starting
# with "1-4 digits" and none or any occurence of a "hyphen"
s/(^[[:digit:]]{1,4})-*/\1_/
# replace the word "Review" (case insensitive) and none or any
# occurence of a "hyphen" with "ReviewUnderscore"
# (sometimes the word "Review" appears somewhere in the filename.
# it would be nice to shift the word "Review" right behind
# the Year_. I don't know how to achive this)
s/Review-*/Review_/I
# remove any space (incl. horiz. tab) from start and end of line (TrimLR)
# (not needed for listings piped by the DIR command)
# s/^[[:space:]]*//g
# s/[[:space:]]*$//g
# Some of the filenames contain unicode chars that are displayed as
# "QuestionMark" in the command line window. To make this somehow clear
# replace any occurence of a "QuestionMark" with an "Asterisk"
s/\?/*/g
# exchange the pattern space with the hold space
x
# append the pattern space to the hold space (adding a newline \n)
G
# replace the newline with "QuoteSpaceQuote"
s/\n/\d34\d32\d34/
# add the word "REN" and "SpaceQuote" at start of line
s/^/REN \d34/
# append a "Quote" at end of line
s/$/\d34/
--- end sedcmd.txt---
Desired result (OutFile) is a renaming script that can be checked and
may be called inside a batch or via command line directly.
Zharif
Post by Thierry Blanc ***@gmx.ch [sed-users]yes, your interpretation is correct. I would suggest to export the
commands to a sed script, one command per line. One command per line is
more clear when you have a long list of replacements.
There was a discussion on the y command recently. Not sure if there are
drawbacks so I would stick to s command
Post by ***@arcor.de [sed-users]Thierry,
thank you much. This code works.
sed --text -r "h ;s/.*\\// ;s/Ì/ue/g ;s/\.-\./-/ ;x ;G ;s/\n/ /"
"InFile" > "OutFile"
Just for my understanding (although being at risk of needling you), is
1. Line 1: copy line to hold space (unchanged-remember)
2. Line 1: delete all up to the last backslash
3. Line 1: do some more replacements
4. exchange the pattern space with the hold space
5: append the pattern space to the hold space (adding a newline \n)
6: replace the newline with something (a space) and print
7: repeat this for the next line(s)
?
Hold buffer: Is "H" and "h" the same?
your input is also much appreciated.
I simply removed the "y" command but must confess that your code
sometimes gave some unexpected results.
But anyway, thanks.
Zharif
Post by Thierry Blanc ***@gmx.ch [sed-users]sed -r 'h;s|.*\\||;s|Ì|ue|g;s|\.-\.|-|;x;G;s|\n||'
Post by ***@arcor.de [sed-users]Thierry,
sed --text -r "p;s|.*\\||;s|Ì|ue|g;s|\.-\.|-|" "infile" > "outfile"
Line1: D:\dir1\subdir1\2006_A programme.-.a randomized controlled trial.pdf
Line2: 2006_A programme-a randomized controlled trial.pdf
Line3: D:\dir1\subdir1\2011_SchÃŒrer-Low Back Pain.pdf2011_Schuerer-Low
Back Pain.pdf
Only the third line gives the desired result.
The first filepath is splitted into two lines.
Your approach fits to the original request of the first mail I sent -
thanks much.
...does exactly what I want. It appends the filename only into the same
line (without any substitution of the filename).
Line1: ren "D:\dir1\subdir1\2006_A programme.-.a randomized controlled
trial.pdf" "2006_A programme.-.a randomized controlled trial.pdf"
Line2: ren "D:\dir1\subdir1\2011_SchÃŒrer-Low Back Pain.pdf"
"2011_SchÃŒrer-Low Back Pain.pdf"
sed --text -r -e
"h;s/.*\\//;s/[ÃÃÃ]/\0e/g;s/Ã/ss/g;y/ÃÃÃÀÌö&\d183/AOUaou+-/;s/([-+_.\x20]){2,}/\1/g;s/\x20*.?([-+_.]).?\x20*/\1/g;H;x;s/^/ren
\x22/;s/$/\x22/;s/\n/\x22\x20\x22/" "infile" > "outfile"
sed: -e expression #1, char 58: strings for y command are different lengths
Removing the "\d183" on the LHS and the "-" on the RHS of the s-command
Line1: ren "D:\dir1\subdir1\2006_A programme.-.a randomized controlled
trial.pdf" "200_programm.randomized controlled tria.df"
Line2: ren "D:\dir1\subdir1\2011_SchÃŒrer-Low Back Pain.pdf"
"201_chore-ow Back Pai.df"
Any further help is much appreciated
Zharif
Post by ***@arcor.de [sed-users]Because no one replied I assume that my request was not clear.
Maybe I used wrong phrasing or yahoo ate some of my code?
So let me try to re-compose my request with a simple example.
I do have an inputfile containing file pathes of files inside a directory.
D:\dir1\subdir1\2006_A programme.-.a randomized controlled trial.pdf
D:\dir1\subdir1\2011_SchÃŒrer-Low Back Pain.pdf
...
...
I want sed to print the full filepath+filename+extension,
then to do some substitutions to the FILENAMES ONLY and
append these substituted filename into the same line (separated by a
space).
- replace ".-." with "-"
- replace umlaut "ÃŒ" or "Ã" with "ue" or "Ue"
D:\dir1\subdir1\2006_A programme.-.a randomized controlled trial.pdf
2006_A programme-a randomized controlled trial.pdf
D:\dir1\subdir1\2011_SchÃŒrer-Low Back Pain.pdf 2011_Schuerer-Low Back
Pain.pdf"
...
...
Is this somehow possible via oneliner or maybe by a sed script?
Again, thanks for any reply
Zharif
Post by ***@arcor.de [sed-users]Env: German Win8.1x64 with sed v407 from UnxUtils.
I want to queries full filepathes and finds any filename (+extension)
containing umlauts and some special characters. Goal is to do some
substitutions for the filenames only and to redirect results into a new
file.
Command to get a list of filenames that contain umlauts and some special
DIR /B /S /A:-D "E:\"|sed --text -n
"s/^\(.*\\\)\(.*[ÀÌöÃÃÃÃÃ&\d183]\+[^\\]\+\)$/\1\2/p" >> OutFile1
1. contain a renaming command (REN+space) at start of each line in OutFile1.
2. contain the filename of the filepath only at end of each line.
Example: REN "e:\XYZ\XY\2011 - SchÃŒrer _Low Back Pain.pdf"
"2011-Schuerer_Low Back Pain.pdf"
This is my loop command in which substitutions are made to the original
FOR /F "tokens=1* delims= " %%a IN (OutFile1) DO (
FOR /F "tokens=1* delims= " %%A IN ('ECHO "%%a"^|sed --text
"s/Ã/Ae/g;s/À/ae/g;s/Ã/Ue/g;s/ÃŒ/ue/g;s/Ã/Oe/g;s/ö/oe/g;s/Ã/ss/g;s/&/+/g;s/\d183/-/g;s/[[:space:]]*\([-+_.]\)\{1,\}[[:space:]]*/\1/g;s/\([-+_.]\)\{2,\}/\1/g;s/[[:space:]]\{2,\}/
/g"') DO (ECHO REN "%%a" "%%~nxA" >> %2)
)
%%~nxA prints only the filename(n)+extension(x) of the full path(A).
This code is increddible slow for long files (as always for loops).
1. I'm sure it must be possible to solve this task with sed only without
using a FOR loop. I'm runnig out of ideas here.
2. What about the used sed commands? In need of being improved? Or are
they even faulty?
3. Another one; is sed able to count and print characters for each line
of a file (similar to wc)?
Thanks much in advance for any reply
Zharif
------------------------------------
------------------------------------
------------------------------------
------------------------------------
------------------------------------
------------------------------------
------------------------------------
------------------------------------
------------------------------------
------------------------------------
------------------------------------
------------------------------------