I have a set of files, not necessarily of the same extension. These files were created by a Python script that iterates over some files (say x.jpg, y.jpg, and z.jpg), and then numbers them with zero-padding so that the numbers are of length 7 characters (in this example, the filenames become 0000001 - x.jpg, 0000002 - y.jpg, and 0000003 - z.jpg).
I now need a script (any language is fine, but Bash/zsh is preferred), that will increment these numbers by an argument. Thereby renaming all the files in the directory. For example, I'd like to call the program as (assuming a Shell script):
./rename.sh 5
The numbers in the final filenames should be padded to length 7, and it's guaranteed that there's no file initially whose number is 9999999. So the resulting files should be 0000006 - x.jpg, 0000007.jpg, 0000008.jpg. It's guaranteed that all the files initially are incremental; that is, there are no gaps in the numbers.
I can't seem to do this easily at all in Bash, and it seems kind of like a chore even in Python. What's the best way to do this?
Edit: Okay so here are my efforts so far. I think the leading 0s are a problem, so I removed them using rename:
rename 's/^0*//' *
Now with just the numbers left, I'd ideally use a loop, something like this, but I'm not exactly familiar with the syntax and why this is wrong:
for file in "(*) - (*)" ; do mv "${file}" "$(($1+5)) - ${2}" ; done
The 5 there is just hard-coded, but I guess changing that to the first argument shouldn't be too big a deal. I can then use another loop to add the 0s back.
import sys, glob, re, os
# Get the offset at the first command-line argument
offset = int(sys.argv[1])
# Go through the list of files in the reverse order
for name in reversed(glob.glob('*.jpg')):
# Extract the number and the rest of the name
i, rest = re.findall("^(\d+)(.+)", name)[0]
# Construct the new file name
new_name = "{:07d}{}".format(int(i) + offset, rest)
# Rename
os.rename(name, new_name)
Related
I am writing a script to save some images in a folder each time it runs.
I would like make a new folder each it runs with a enumerating folder names. for example if I run it first time , it just save the images in C:\images\folder1 and next time I run it, it will save the images in C:\images\folder2 and C:\images\folder3 and so on.
And if I delete these folders, and start running again, it would start from the "C:\images\folder1" again.
I found this solution works for file names but not for the folder names:
Create file but if name exists add number
The pathlib library is the standard pythonic way of dealing with any kind of folders or files and is system independent. As far as creating a new folder name, that could be done in a number of ways. You could check for the existence of each file (like Patrick Gorman's answer) or you could save a user config file with a counter that keeps track of where you left off or you could recall your file creation function if the file already exists moving the counter. If you are planning on having a large number of sub-directories (millions), then you might consider performing a binary search for the next folder to create (instead of iterating through the directory).
Anyway, in windows creating a file/folder with the same name, adds a (2), (3), (4), etc. to the filename. The space and parenthesis make it particularly easy to identify the number of the file/folder. If you want the number directly appended, like folder1, folder2, folder3, etc., then that becomes a little tricky to detect. We essentially need to check what the folder endswith as an integer. Finding particular expressions within in a tricky string is normally done with re (regular expressions). If we had a space and parenthesis we probably wouldn't need re to detect the integer in the string.
from pathlib import Path
import re
def create_folder(string_or_path):
path = Path(string_or_path)
if not path.exists():
#You can't create files and folders with the same name in Windows. Hence, check exists.
path.mkdir()
else:
#Check if string ends with numbers and group the first part and the numbers.
search = re.search('(.*?)([0-9]+$)',path.name)
if search:
basename,ending = search.groups()
newname = basename + str(int(ending)+1)
else:
newname = path.name + '1'
create_folder(path.parent.joinpath(newname))
path = Path(r'C:\images\folder1')
create_folder(path) #creates folder1
create_folder(path) #creates folder2, since folder1 exists
create_folder(path) #creates folder3, since folder1 and 2 exist
path = Path(r'C:\images\space')
create_folder(path) #creates space
create_folder(path) #creates space1, since space exists
Note: Be sure to use raw-strings when dealing with windows paths, since "\f" means something in a python string; hence you either have to do "\\f" or tell python it is a raw-string.
I feel like you could do something by getting a list of the directories and then looping over numbers 1 to n for the different possible directories until one can't be found.
from pathlib import Path
import os
path = Path('.')
folder = "folder"
i = 1
dirs = [e for e in path.iterdir() if e.is_dir()]
while True:
if folder+str(i) not in dirs:
folder = folder+str(i)
break
i = i+1
os.mkdir(folder)
I'm sorry if I made any typos, but that seems like a way that should work.
What I'm trying to do is to write a code that will delete a single one of 2 [or 3] files on a folder. I have batch renamed that the file names are incrementing like 0.jpg, 1.jpg, 2.jpg... n.jpg and so on. What I had in mind for the every single of two files scenario was to use something like "if %2 == 0" but couldn't figure out how actually to remove the files from the list object and my folder obviously.
Below is the piece of NON-WORKING code. I guess, it is not working as the file_name is a str.
import os
os.chdir('path_to_my_folder')
for f in os.listdir():
file_name, file_ext = os.path.splitext(f)
print(file_name)
if file_name%2 == 0:
os.remove();
Yes, that's your problem: you're trying to use an integer function on a string. SImply convert:
if int(file_name)%2 == 0:
... that should fix your current problem.
Your filename is a string, like '0.jpg', and you can’t % 2 a string.1
What you want to do is pull the number out of the filename, like this:
name, ext = os.path.splitext(filename)
number = int(name)
And now, you can use number % 2.
(Of course this only works if every file in the directory is named in the format N.jpg, where N is an integer; otherwise you’ll get a ValueError.)
1. Actually, you can do that, it just doesn’t do what you want. For strings, % means printf-style formatting, so filename % 2 means “find the %d or similar format spec in filename and replace it with a string version of 2.
Thanks a lot for the answers! I have amended the code and now it looks like this;
import os
os.chdir('path_to_the_folder')
for f in os.listdir():
name, ext = os.path.splitext(f)
number = int(name)
if number % 2 == 0:
os.remove()
It doesn't give an error but it also doesn't remove/delete the files from the folder. What in the end I want to achieve is that every file name which is divisible by two will be removed so only 1.jpg, 3.jpg, 5.jpg and so on will remain.
Thanks so much for your time.
A non-Python method but sharing for future references;
cd path_to_your_folder
mkdir odd; mv *[13579].png odd
also works os OSX. This reverses the file order but that can be re-corrected easily. Still want to manage this within Python though!
I've got a segment of my script which will create a list of files to scan through for key words..
The problem is, the log files collectively are around 11gb. When I use grep in the shell to search through them, it takes around 4 or 5 minutes. When I do it with my python script, it just hangs the server to the extent where I need to reboot it.
Doesn't seem right that it would cause the whole server to crash, but in reality I don't need it to scroll through all the files, just those which were modified within the last week.
I've got this so far:
logs = [log for log in glob('/var/opt/cray/log/p0-current/*') if not os.path.isdir(log)]
I assume I will need to add something prior to this to initially filter out the wrong files?
I've been playing with os.path.getmtime in this format:
logs = [log for log in glob('/var/opt/cray/log/p0-current/*') if not os.path.isdir(log)]
for log in logs:
mtime = os.path.getmtime(log)
if mtime < "604800":
do-stuff (create a new list? Or update logs?)
That's kind of where I am now, and it doesn't work but I was hoping there was something more elegant I could do with the list inline?
Depending on how many filenames and how little memory (512MB VPS?), it's possible you're running out of memory creating two lists of all the filenames (one from glob and one from your list comprehension.) Not necessarily the case but it's all I have to go on.
Try switching to iglob (which uses os.scandir under the hood and returns an iterator) and using a generator expression and see if that helps.
Also, getmtime gets a time, not an interval from now.
import os
import glob
import time
week_ago = time.time() - 7 * 24 * 60 * 60
log_files = (
x for x in glob.iglob('/var/opt/cray/log/p0-current/*')
if not os.path.isdir(x)
and os.path.getmtime(x) > week_ago
)
for filename in log_files:
pass # do something
I'm somewhat newish to python (which is the only programming language I know), and I've got a bunch of spectral data saved as .txt files, where each row is a data point, the first number being the wavelength of light used and separated by a tab, the second number is the instrument signal/response to that wavelength of light.
I want to be able to take all the data files I have in a folder, and print a file that's an an average of all the signal/response column entries for each wavelength of light (they all contain data for responses from 350-2500nm light). Is there any way to do this? If it weren't for the fact that I need to average together 103 spectra, I'd just do it by hand, but....
EDIT: I realize I worded this terribly. I now realize I can probably just use os to access all the files in a given folder. The thing is that I want to average the signal values for each wavelength. Ie, I want to read all the data from the folder and get an average value for the signal/response at 350nm, 351nm, etc.
I'm thinking this something I could do with a loop once i get all the files read into python, but I'm not 100% sure. I'm also hesitant because I'm worried that will slow down the program a lot.
Something like this (assuming all your txt files are formatted the same, and that all files have the same range of wavelength values )
import os
import numpy as np
dat_dir = '/my/dat/dir'
fnames = [ os.path.join(x,dat_dir) for x in os.listdir(dat_dir) if x.endswith('.txt') ]
data = [ np.loadtxt( f) for f in fnames ]
xvals = data[0][:,0] #wavelengths, should be the same in each file
yvals = [ d[:,1] for d in data ] #measurement
y_mean = np.mean(yvals, axis=0 )
np.savetxt( 'spectral_ave.txt', zip(xvals, y_mean) , fmt='%.4f') # something like that
import os
dir = "./" # Your directory
lengths = 0
responses = 0
total = 0
for x in os.listdir(dir):
# Check if x has *.txt extension.
if os.path.splitext(x)[1]!=".txt": continue
fullname = os.path.join(dir, x)
# We don't want directories ending with *.txt to mess up our program (although in your case this is very unlikely)
if os.path.isdir(fullpath): continue
# Now open and read the file as binary
file = open(fullname, "rb")
content = file.read()
file.close()
# Take two entries:
content = content.split()
l = float(content[0])
r = float(content[1])
lengths += l; responses += r
total += 1
print "Avg of lengths:", lengths/total
print "Avg of responses:", responses/total
If you want it to enter the subdirectories put it into function and make it recurse when os.path.isdir(fullname) is True.
Although I wrote you the code, SO is not for that. Mind that in your next question.
If you're on anything but Windows, a common way to do this would be to write a python program that handles all the files you put on the command line. Then you can run it on results/* to process everything, or just on a single file, or just on a few files.
This would be the more Unixy way to go about things. There are many unix programs that can handle multiple input files (cat, sort, awk, etc.), but most of them leave the directory traversal to the shell.
http://www.diveintopython.net/scripts_and_streams/command_line_arguments.html has some examples of getting at the command line args for your program.
import sys
for arg in sys.argv[1:]: # argv[0] is the script's name; skip it
# print arg
sum_file(arg) # or put the code inline here, so you don't need global variables to keep state between calls.
print "totals ..."
See also this question: What is "argv", and what does it do?
I have a hundred of files in a folder which have the form i.ext where I is an integer (0 <= i). I wrote a script which take 2 files in entries but I wanted to use the script with all the files of my folder.
Could I write a script in Python with a loop such a way that the name file is in a variable like this:
from difference import *
# I have a module called "difference"
for i in range (0,100):
for j in range (0,100):
leven(i+".ext",j+".ext") #script in module which take two files in entries
Obviously my code is wrong, but I don't know how can I do :(
You cannot add a number and a string in Python.
'%d.ext' % (i,)
but i wanted to use the script with all the files of my folder.Could i write a script in Python with a loop such a way that the name file is in a variable like this:
This is most certainly possible, but if you want to use all the files from a directory following a certain pattern, I suggest you glob them.
import glob
import difference
ifile_list = glob('*.iext')
jfile_list = glob('*.jext')
for i,j in [[(ifile, jfile) for jfile in jfile_list] for ifile in ifile_list]:
difference.leven(i,j)
However I strongly suggest that instead of hardcodig those file patterns I'd supply them through command line parameters.
use str(i) and str(j) to convert i and j from integer to str.