How to increment filename from positional argument?

How to increment filename from positional argument? - python

How can I modify this function to increment the name when the filename "test.wav" already exists?
def write_float_to_16wav(signal, name = "test.wav", samplerate=20000):
signal[signal > 1.0] = 1.0
signal[signal < -1.0] = -1.0
intsignal = np.int16((2**15-1)*signal)
siow.write(name,samplerate,intsignal)

You can use os.path.exists to check for the existence of the file, and increment when needed:
import os.path
if os.path.exists(name):
name_, ext = os.path.splitext(name)
name = f'{name_}1{ext}'
# For <3.6: '{name_}1{ext}'.format(name_=name_, ext=ext)
The above will check file in the current directory, if you want to check in some other directory, you can join the paths using os.path.join:
if os.path.exists(os.path.join(directory, name)):

There are two main options. The first - and likely better - option is to simply count the number of wav files already created.
num_waves_created = 0
def write_float_to_16wav(signal, name = "test.wav", samplerate=20000):
signal[signal > 1.0] = 1.0
signal[signal < -1.0] = -1.0
intsignal = np.int16((2**15-1)*signal)
name = "test" + str(num_waves_created) + ".wav"
siow.write(name,samplerate,intsignal)
num_waves_created += 1
The second option is to test each time in the function if the file has already been created. This incorporates a while loop that operates at linear complexity, so it's efficient enough for 10 wav files, but can seriously slow down if you need to create more.
from os import path
def write_float_to_16wav(signal, name = "test.wav", samplerate=20000):
new_path = False
while (!new_path):
if path.exists(name):
break
else:
name_, ext = os.path.splitext(name)
name = f'{name_}1{ext}'
signal[signal > 1.0] = 1.0
signal[signal < -1.0] = -1.0
intsignal = np.int16((2**15-1)*signal)
siow.write(name,samplerate,intsignal)

ok based on the limited code you have supplied and assuming you're not already validating if filename exists somewhere else:
1) Check if your filename already exists (see here)
2) If path/filename already exists extract the current path/filename name (see here), else filename = test.wav
3) From the current filename extract the last incremented value (using split or substring or whatever suits best)
4) Set the new filename with incremented value (see heemayl's answer)
5) Done.

Related

How to convert a function to a recursive function

Hey guys I don't know if I can ask this but I'm working on the original files in google collab and I wrote a function that sums all the sizes of the file
import os
def recls_rtsize(argpath):
sumsize = 0
for entry in os.scandir(argpath):
path= argpath+'/'+entry.name
size= os.path.getsize(path)
sumsize+=size
return sumsize
print("total:",recls_rtsize('/var/log'))
But I need a way to make this function a recursive function or if there is some kind of formula or idea to convert no-recursive into recursive

Recursive function is the function which calls itself. For example if You are trying to calculate the sum of all files inside some directory You can just loop through files of that directory and summarize the sizes of the files. If directory You are checking for has subdirectories, then you can just put a condition, if directory has subdirs, if it is, then you can call function itself for that subdirectory.
In your case:
import os
def recls_rtsize(argpath):
sumsize = 0
for entry in os.scandir(argpath):
# think of is_directory is your custom function that checks
# if this path is a directory
if entry.is_directory():
# then call your function for this directory
size = recls_stsize(entry)
else:
path = argpath+'/'+entry.name
size = os.path.getsize(path)
sumsize += size
return sumsize
print("total:",recls_rtsize('/var/log'))

For example, you could write helper function to process it recursively, although I don't understand the purpose:
import os
def recls_rtsize(argpath):
def helper(dirs):
if not dirs:
return 0
path = argpath + '/' + dirs[0].name
size = os.path.getsize(path)
return size + helper(dirs[1:])
return helper(list(os.scandir(argpath)))
print("total:", recls_rtsize('testing_package'))
Explanation:
Let's say argpath contains several files:
argpath = [file1, file2, file2]
Then the function calls would be:
size(file1) + recls_rtsize([file2, file2]) we pass everything after the first element
size(file1) + size(file2) + recls_rtsize([file3])
size(file1) + size(file2) + size(file3) + recls_rtsize([])
There are no elements left, and we return 0 and start backtracking
size(file1) + size(file2) + size(file3) + 0
size(file1) + size(file2) + (size(file3) + 0)
size(file1) + (size(file2) + (size(file3) + 0))
(size(file1) + (size(file2) + (size(file3) + 0))) # our result
I hope it make sense

To iterate over files in sub-folders (I assume that this is your goal here) you can use os.walk().
example

problem with moving files using os.rename

i have this block of code where i try to move all the files in a folder to a different folder.
import os
from os import listdir
from os.path import isfile, join
def run():
print("Do you want to convert 1 file (0) or do you want to convert all the files in a folder(1)")
oneortwo = input("")
if oneortwo == "0":
filepathonefile = input("what is the filepath of your file?")
filepathonefilewithoutfullpath = os.path.basename(filepathonefile)
newfolder = "C:/Users/EL127032/Documents/fileconvertion/files/" + filepathonefilewithoutfullpath
os.rename(filepathonefile,newfolder)
if oneortwo == "1" :
filepathdirectory = input("what is the filepath of your folder?")
filesindirectory = [f for f in listdir(filepathdirectory) if isfile(join(filepathdirectory, f))]
numberoffiles = len(filesindirectory)
handlingfilenumber = 0
while numberoffiles > handlingfilenumber:
currenthandlingfile = filesindirectory[handlingfilenumber]
oldpathcurrenthandling = filepathdirectory + "/" + currenthandlingfile
futurepathcurrenhandlingfile = "C:/Users/EL127032/Documents/fileconvertion/files/" + currenthandlingfile
os.rename(oldpathcurrenthandling, futurepathcurrenhandlingfile)
but when i run this it gives
os.rename(oldpathcurrenthandling, futurepathcurrenhandlingfile)
FileNotFoundError: [WinError 2] System couldn't find the file: 'C:\Users\EL127032\Documents\Eligant - kopie\Klas 1\Stermodules\Basisbiologie/lopen (1).odt' -> 'C:/Users/EL127032/Documents/fileconvertion/files/lopen (1).odt'
can someone help me please.

You are trying to move the same file twice.
The bug is in this part :
numberoffiles = len(filesindirectory)
handlingfilenumber = 0
while numberoffiles > handlingfilenumber:
currenthandlingfile = filesindirectory[handlingfilenumber]
oldpathcurrenthandling = filepathdirectory + "/" + currenthandlingfile
futurepathcurrenhandlingfile = "C:/Users/EL127032/Documents/fileconvertion/files/" + currenthandlingfile
os.rename(oldpathcurrenthandling, futurepathcurrenhandlingfile)
The first time you loop, handlingfilenumber will be 0, so you will move the 0-th file from your filesindirectory list.
Then you loop again, handlingfilenumber is still 0, so you try to move it again, but it is not there anymore (you moved it already on the first turn).
You forgot to increment handlingfilenumber. Add handlingfilenumber += 1 on a line after os.rename and you will be fine.
while loops are more error-prone than simpler for loops, I recommend you use for loops when appropriate.
Here, you want to move each file, so a for loops suffices :
for filename in filesindirectory:
oldpathcurrenthandling = filepathdirectory + "/" + currenthandlingfile
futurepathcurrenhandlingfile = "C:/Users/EL127032/Documents/fileconvertion/files/" + currenthandlingfile
os.rename(oldpathcurrenthandling, futurepathcurrenhandlingfile)
No need to use len, initialize a counter, increment it, get the n-th element, ... And fewer lines.
Three other things :
you could have found the cause of the problem yourself, using debugging, there are plenty of ressources online to explain how to do it. Just printing the name of the file about to be copied (oldpathcurrenthandling) you would have seen it twice and noticed the problem causing the os error.
your variable names are not very readable. Consider following the standard style guide about variable names (PEP 8) and standard jargon, for example filepathonefilewithoutfullpath becomes filename, oldpathcurrenthandling becomes source_file_path (following the source/destination convention), ...
When you have an error, include the stacktrace that Python gives you. It would have pointed directly to the second os.rename case, the first one (when you copy only one file) does not contribute to the problem. It also helps finding a Minimal Reproducible Example.

Data stored in directories based on date. How to get all data from date1 until date2?

I have a python app that will be storing a large amount of data as custom python objects through jsonpicke.
Currently my project has a data file with this structure:
....
data/
year/
...
month/
...
day/
...
A/
data_file_1
...
data_file_n
B/
data_file_1
...
data_file_n
here I was just representing multiple potential dirs or files as '...'
I would like my user to be able to specify a start date and an end date from which I will parse all the data.
Currently my data set is quite small, and so moving it around is no problem. Furthermore the data doesn't need to be human readable, so whatever format doesn't really matter at this stage.
Is there an easier way to store this data so that I can get the data whenever I need, update it when necessary. Another library, package or just better directory layout?
If not, than my question is a bit more specific.
My solution so far has been:
import os
...
def get_data(path, dates, data):
"""
#param path: str representing
the current path being searched.
#param dates: list of tuple
representing (min, max) dates
to be considered
#param data: empty list, used
for collecting the data of the
files.
"""
if len(dates) <= 0:
#This case occurs when the subdirectories
# under the current path don't have any constraints.
# as such I can grab all the files without worry.
for dirpath, dirnames, filenames in os.walk(path):
for file in filenames:
data.append(dirpath + '/' + file)
else:
min, max = dates[0]
dirpath, dirs, files = os.walk(path).next()
for dir in dirs:
value = int(dir)
if value > min and value < max:
#unconstrained case
get_data(path + '/' + dir, [], data)
elif value == min:
#TODO recurse with boundary case minimum
elif value == max:
#TODO recurse with boundary case maximum
However these boundary cases have stumped me. If I am given some abritrarily determined dates. Lets say:
# from 8/21/2011 -> until 12/7/2014
dates = [(2011, 2014), (8, 12), (1, 7)]
The problem is then how should I set up the date to be passed into the recursive method in the boundary cases?
Am I missing a simple solution to this problem?

Use datetime objects to represent the dates, increment with day deltas.
Pseudocode:
import os.path
import datetime
def f(t0, t1):
t = t0
while t != t1:
day_path = os.path.join(t.year, t.month, t.day)
if os.path.exists(day_path):
# Do what you need inside the day dir.
t += datetime.timedelta(days=1)

number of columns in xlwt worksheet

I can't seem to find a way to return the value of the number of columns in a worksheet in xlwt.Workbook(). The idea is to take a wad of .xls files in a directory and combine them into one. One problem I am having is changing the column position when writing the next file. this is what i'm working with thus far:
import xlwt, xlrd, os
def cbc(rd_sheet, wt_sheet, rlo=0, rhi=None,
rshift=0, clo=0, chi=None, cshift = 0):
if rhi is None: rhi = rd_sheet.nrows
if chi is None: chi = 2#only first two cols needed
for row_index in xrange(rlo, rhi):
for col_index in xrange(clo, chi):
cell = rd_sheet.cell(row_index, col_index)
wt_sheet.write(row_index + rshift, col_index + cshift, cell.value)
Dir = '/home/gerg/Desktop/ex_files'
ext = '.xls'
list_xls = [file for file in os.listdir(Dir) if file.endswith(ext)]
files = [Dir + '/%s' % n for n in list_xls]
output = '/home/gerg/Desktop/ex_files/copy_test.xls'
wbook = xlwt.Workbook()
wsheet = wbook.add_sheet('Summary', cell_overwrite_ok=True)#overwrite just for the repeated testing
for XLS in files:
rbook = xlrd.open_workbook(XLS)
rsheet = rbook.sheet_by_index(0)
cbc(rsheet, wsheet, cshift = 0)
wbook.save(output)
list_xls returns:
['file2.xls', 'file3.xls', 'file1.xls', 'copy_test.xls']
files returns:
['/home/gerg/Desktop/ex_files/file2.xls', '/home/gerg/Desktop/ex_files/file3.xls', '/home/gerg/Desktop/ex_files/file1.xls', '/home/gerg/Desktop/ex_files/copy_test.xls']
My question is how to scoot each file written into xlwt.workbook over by 2 each time. This code gives me the first file saved to .../copy_test.xls. Is there a problem with the file listing as well? I have a feeling there may be.
This is Python2.6 and I bounce between windows and linux.
Thank you for your help,
GM

You are using only the first two columns in each input spreadsheet. You don't need "the number of columns in a worksheet in xlwt.Workbook()". You already have the cshift mechanism in your code, but you are not using it. All you need to do is change the loop in your outer block, like this:
for file_index, file_name in enumerate(files):
rbook = xlrd.open_workbook(file_name)
rsheet = rbook.sheet_by_index(0)
cbc(rsheet, wsheet, chi = 2, cshift = file_index * 2)
For generality, change the line
if chi is None: chi = 2
in your function to
if chi is None: chi = rsheet.ncols
and pass chi=2 in as an arg as I have done in the above code.
I don't understand your rationale for overriding the overwrite check ... surely in your application, overwriting an existing cell value is incorrect?
You say "This code gives me the first file saved to .../copy_test.xls". First in input order is file2.xls. The code that you have shown is overwriting previous input and will give you the LAST file (in input order) , not the first ... perhaps you are mistaken. Note: The last input file 'copy_test.xls' is quite likely be a previous OUTPUT file; perhaps your output file should be put in a separate folder.

Speed up the creation of pathname

I've 2 folders. The first (called A) contains same images named in the form: subject_incrementalNumber.jpg (where incrementalNumber goes from 0 to X).
Then I process each image contained in folder A and extract some pieces from it, then save each piece in folder B with the name: subject(the same of the original image contained in folder A)_incrementalNumber(the same of folder A)_anotherIncrementalNumber(that distinguish one piece from another).
Finally, I delete the processed image from folder A.
A
subjectA_0.jpg
subjectA_1.jpg
subjectA_2.jpg
...
subjectB_0.jpg
B
subjectA_0_0.jpg
subjectA_0_1.jpg
subjectA_1_0.jpg
subjectA_2_0.jpg
...
Everytime I download a new image of one subject and save it in folder A, I have to calculate a new pathname for this image (I have to found the min incrementalNumber available for the specific subject). The problem is that when I process an image I delete it from folder A and I store only the pieces in folder B, so I have to find the min number available in both folders.
Now I use the following function to create the pathname
output_name = chooseName( subject, folderA, folderB )
# Create incremental file
# If the name already exist, try with incremental number (0, 1, etc.)
def chooseName( owner, dest_files, faces_files ):
# found the min number available in both folders
v1 = seekVersion_downloaded( owner, dest_files )
v2 = seekVersion_faces( owner, faces_files )
# select the max from those 2
version = max( v1, v2 )
# create name
base = dest_files + os.sep + owner + "_"
fname = base + str(version) + ".jpg"
return fname
# Seek the min number available in folderA
def seekVersion_folderA( owner, dest_files ):
def f(x):
if fnmatch.fnmatch(x, owner + '_*.jpg'): return x
res = filter( f, dest_files )
def g(x): return int(x[x.find("_")+1:-len(".jpg")])
numbers = map( g, res )
if len( numbers ) == 0: return 0
else: return int(max(numbers))+1
# Seek the min number available in folderB
def seekVersion_folderB( owner, faces_files ):
def f(x):
if fnmatch.fnmatch(x, owner + '_*_*.jpg'): return x
res = filter( f, faces_files )
def g(x): return int(x[x.find("_")+1:x.rfind("_")])
numbers = map( g, res )
if len( numbers ) == 0: return 0
else: return int(max(numbers))+1
It works, but this process take about 10seconds for each image, and since I have a lot of images this is too inefficient.
There is any workaround to make it faster?

As specified, this is indeed a hard problem with no magic shortcuts. In order to find the minimum available number you need to use trial and error, exactly as you are doing. Whilst the implementation could be speeded up, there is a fundamental limitation in the algorithm.
I think I would relax the constraints to the problem a little. I would be prepared to choose numbers that weren't the minimum available. I would store a hidden file in the directory which contained the last number used when creating a file. Every time you come to create another one, read this number from the file, increment it by 1, and see if that name is available. If so you are good to go, if not, start counting up from there. Remember to update the file when you do settle on a name.
If no humans are reading these names, then you may be better off using randomly generated names.

I've found another solution: use the hash of the file as unique file name

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.