How to convert a function to a recursive function - python

Hey guys I don't know if I can ask this but I'm working on the original files in google collab and I wrote a function that sums all the sizes of the file
import os
def recls_rtsize(argpath):
sumsize = 0
for entry in os.scandir(argpath):
path= argpath+'/'+entry.name
size= os.path.getsize(path)
sumsize+=size
return sumsize
print("total:",recls_rtsize('/var/log'))
But I need a way to make this function a recursive function or if there is some kind of formula or idea to convert no-recursive into recursive

Recursive function is the function which calls itself. For example if You are trying to calculate the sum of all files inside some directory You can just loop through files of that directory and summarize the sizes of the files. If directory You are checking for has subdirectories, then you can just put a condition, if directory has subdirs, if it is, then you can call function itself for that subdirectory.
In your case:
import os
def recls_rtsize(argpath):
sumsize = 0
for entry in os.scandir(argpath):
# think of is_directory is your custom function that checks
# if this path is a directory
if entry.is_directory():
# then call your function for this directory
size = recls_stsize(entry)
else:
path = argpath+'/'+entry.name
size = os.path.getsize(path)
sumsize += size
return sumsize
print("total:",recls_rtsize('/var/log'))

For example, you could write helper function to process it recursively, although I don't understand the purpose:
import os
def recls_rtsize(argpath):
def helper(dirs):
if not dirs:
return 0
path = argpath + '/' + dirs[0].name
size = os.path.getsize(path)
return size + helper(dirs[1:])
return helper(list(os.scandir(argpath)))
print("total:", recls_rtsize('testing_package'))
Explanation:
Let's say argpath contains several files:
argpath = [file1, file2, file2]
Then the function calls would be:
size(file1) + recls_rtsize([file2, file2]) we pass everything after the first element
size(file1) + size(file2) + recls_rtsize([file3])
size(file1) + size(file2) + size(file3) + recls_rtsize([])
There are no elements left, and we return 0 and start backtracking
size(file1) + size(file2) + size(file3) + 0
size(file1) + size(file2) + (size(file3) + 0)
size(file1) + (size(file2) + (size(file3) + 0))
(size(file1) + (size(file2) + (size(file3) + 0))) # our result
I hope it make sense

To iterate over files in sub-folders (I assume that this is your goal here) you can use os.walk().
example

Related

problem with moving files using os.rename

i have this block of code where i try to move all the files in a folder to a different folder.
import os
from os import listdir
from os.path import isfile, join
def run():
print("Do you want to convert 1 file (0) or do you want to convert all the files in a folder(1)")
oneortwo = input("")
if oneortwo == "0":
filepathonefile = input("what is the filepath of your file?")
filepathonefilewithoutfullpath = os.path.basename(filepathonefile)
newfolder = "C:/Users/EL127032/Documents/fileconvertion/files/" + filepathonefilewithoutfullpath
os.rename(filepathonefile,newfolder)
if oneortwo == "1" :
filepathdirectory = input("what is the filepath of your folder?")
filesindirectory = [f for f in listdir(filepathdirectory) if isfile(join(filepathdirectory, f))]
numberoffiles = len(filesindirectory)
handlingfilenumber = 0
while numberoffiles > handlingfilenumber:
currenthandlingfile = filesindirectory[handlingfilenumber]
oldpathcurrenthandling = filepathdirectory + "/" + currenthandlingfile
futurepathcurrenhandlingfile = "C:/Users/EL127032/Documents/fileconvertion/files/" + currenthandlingfile
os.rename(oldpathcurrenthandling, futurepathcurrenhandlingfile)
but when i run this it gives
os.rename(oldpathcurrenthandling, futurepathcurrenhandlingfile)
FileNotFoundError: [WinError 2] System couldn't find the file: 'C:\Users\EL127032\Documents\Eligant - kopie\Klas 1\Stermodules\Basisbiologie/lopen (1).odt' -> 'C:/Users/EL127032/Documents/fileconvertion/files/lopen (1).odt'
can someone help me please.
You are trying to move the same file twice.
The bug is in this part :
numberoffiles = len(filesindirectory)
handlingfilenumber = 0
while numberoffiles > handlingfilenumber:
currenthandlingfile = filesindirectory[handlingfilenumber]
oldpathcurrenthandling = filepathdirectory + "/" + currenthandlingfile
futurepathcurrenhandlingfile = "C:/Users/EL127032/Documents/fileconvertion/files/" + currenthandlingfile
os.rename(oldpathcurrenthandling, futurepathcurrenhandlingfile)
The first time you loop, handlingfilenumber will be 0, so you will move the 0-th file from your filesindirectory list.
Then you loop again, handlingfilenumber is still 0, so you try to move it again, but it is not there anymore (you moved it already on the first turn).
You forgot to increment handlingfilenumber. Add handlingfilenumber += 1 on a line after os.rename and you will be fine.
while loops are more error-prone than simpler for loops, I recommend you use for loops when appropriate.
Here, you want to move each file, so a for loops suffices :
for filename in filesindirectory:
oldpathcurrenthandling = filepathdirectory + "/" + currenthandlingfile
futurepathcurrenhandlingfile = "C:/Users/EL127032/Documents/fileconvertion/files/" + currenthandlingfile
os.rename(oldpathcurrenthandling, futurepathcurrenhandlingfile)
No need to use len, initialize a counter, increment it, get the n-th element, ... And fewer lines.
Three other things :
you could have found the cause of the problem yourself, using debugging, there are plenty of ressources online to explain how to do it. Just printing the name of the file about to be copied (oldpathcurrenthandling) you would have seen it twice and noticed the problem causing the os error.
your variable names are not very readable. Consider following the standard style guide about variable names (PEP 8) and standard jargon, for example filepathonefilewithoutfullpath becomes filename, oldpathcurrenthandling becomes source_file_path (following the source/destination convention), ...
When you have an error, include the stacktrace that Python gives you. It would have pointed directly to the second os.rename case, the first one (when you copy only one file) does not contribute to the problem. It also helps finding a Minimal Reproducible Example.

How to get average size of a directory with given skip step

I have a directory with lots of files, I want to check every nth or fixed amount of files for its size, then extrapolate it to the total file count in that directory.
I tried something, but my precision and syntax is bad. By no means I ask to fix my code, its just an example of what doesn't work and look well.
I'm on Python 2.7
def get_size2(path):
files = os.listdir(path)
filesCount = len(files)
samples = 5.0
step = math.ceil(filesCount / samples)
files = files[0::step]
reminderCount = filesCount - len(files)
reminderStep = float(reminderCount / len(files)) + 1
total_size = 0
for f in files:
fp = os.path.join(path, f)
if not os.path.islink(fp):
total_size += os.path.getsize(fp) * reminderStep
return int(total_size)
It's hard to fully understanding what you are trying to do given the code, but I think you want to gather an estimated directory size given the average found in a sub sample.
You can iterate though files given a certain increment size by passing a third parameter to a for loop:
for count in range(0, len(files), samples):
print(f"On count: {count}")
Also, I'm a bit lost by the reminderCount and reminderStep variables.
Essentially you want to evaluate what the average size of file you view is (total size you have viewed, divided by the total number of files you have looked at) You can multiply the average file size by the number of files in the directory to extrapolate what an expected directory size would be given the sample. Turning the above logic into a function would simplify the problem to the following:
import os
import math
def get_size2(path):
files = os.listdir(path)
filesCount = len(files)
samples = 1
files_counted = 0
total_size = 0
for count in range(0, len(files), samples):
files_counted += 1
f = files[count]
fp = os.path.join(path, f)
if not os.path.islink(fp):
total_size += os.path.getsize(fp)
return int(total_size / files_counted) * filesCount
def main():
print(f'{get_size2("./test/path")}')
if __name__ == "__main__":
main()
This attempts to keep as many of the variables and as much as the structure as you posted, while adjusting the logic of the example. There are changes that I would recommend to the code such as passing the sample size as a parameter.

How to increment filename from positional argument?

How can I modify this function to increment the name when the filename "test.wav" already exists?
def write_float_to_16wav(signal, name = "test.wav", samplerate=20000):
signal[signal > 1.0] = 1.0
signal[signal < -1.0] = -1.0
intsignal = np.int16((2**15-1)*signal)
siow.write(name,samplerate,intsignal)
You can use os.path.exists to check for the existence of the file, and increment when needed:
import os.path
if os.path.exists(name):
name_, ext = os.path.splitext(name)
name = f'{name_}1{ext}'
# For <3.6: '{name_}1{ext}'.format(name_=name_, ext=ext)
The above will check file in the current directory, if you want to check in some other directory, you can join the paths using os.path.join:
if os.path.exists(os.path.join(directory, name)):
There are two main options. The first - and likely better - option is to simply count the number of wav files already created.
num_waves_created = 0
def write_float_to_16wav(signal, name = "test.wav", samplerate=20000):
signal[signal > 1.0] = 1.0
signal[signal < -1.0] = -1.0
intsignal = np.int16((2**15-1)*signal)
name = "test" + str(num_waves_created) + ".wav"
siow.write(name,samplerate,intsignal)
num_waves_created += 1
The second option is to test each time in the function if the file has already been created. This incorporates a while loop that operates at linear complexity, so it's efficient enough for 10 wav files, but can seriously slow down if you need to create more.
from os import path
def write_float_to_16wav(signal, name = "test.wav", samplerate=20000):
new_path = False
while (!new_path):
if path.exists(name):
break
else:
name_, ext = os.path.splitext(name)
name = f'{name_}1{ext}'
signal[signal > 1.0] = 1.0
signal[signal < -1.0] = -1.0
intsignal = np.int16((2**15-1)*signal)
siow.write(name,samplerate,intsignal)
ok based on the limited code you have supplied and assuming you're not already validating if filename exists somewhere else:
1) Check if your filename already exists (see here)
2) If path/filename already exists extract the current path/filename name (see here), else filename = test.wav
3) From the current filename extract the last incremented value (using split or substring or whatever suits best)
4) Set the new filename with incremented value (see heemayl's answer)
5) Done.

loop np.load until file-index exceeds index of available files

I wish to read in all files from a folder using np.load without specifying the total number of files in advance. Currently, after a few loops the index will run out of the range of available files, and the code will terminate.
index = 0
while True:
a = np.load(file=filepath + 'c_l' + pc_output_layer + '_s0_p' + str(index) + '.npy')
layer = np.append(layer, a)
index += 1
How can I keep loading until an error occurs and then continue running the rest of the script? Thank you!
You could catch the exception and break out of the loop that way, but a more 'pythonic' way would be to loop over the filenames themselves, rather than using an index.
The glob library allows you to find files matching a given pattern and return a list you can then iterate over.
E.g.:
import glob
files = glob.glob(filepath + 'c_l*.npy')
for f in files:
a = np.load(file=f)
layer = np.append(layer, a)
You could also simplify it further by creating the layers directly using a list comprehension.

Python - Can glob be used multiple times?

I want the user to process files in 2 different folders. The user does by selecting a folder for First_Directory and another folder for Second_Directory. Each of these are defined, have their own algorithms and work fine if only one directory is selected at a time. If the user selects both, only the First_Directory is processed.
Both also contain the glob module as shown in the simplified code which I think the problem lies. My question is: can the glob module be used multiple times and if not, is there an alternative?
##Test=name
##First_Directory=folder
##Second_Directory=folder
path_1 = First_Directory
path_2 = Second_Directory
path = path_1 or path_2
os.chdir(path)
def First(path_1):
output_1 = glob.glob('./*.shp')
#Do some processing
def Second(path_2):
output_2 = glob.glob('./*.shp')
#Do some other processing
if path_1 and path_2:
First(path_1)
Second(path_2)
elif path_1:
First(path_1)
elif path_2:
Second(path_2)
else:
pass
You can modify your function to only look for .shp files in the path of interest. Then you can use that function for one path or both.
def globFolder(path):
output_1 = glob.glob(path + '\*.shp')
path1 = "C:\folder\data1"
path2 = "C:\folder\data2"
Then you can use that generic function:
totalResults = globFolder(path1) + globFolder(path2)
This will combine both lists.
I think by restructring your code can obtain your goal:
def First(path,check):
if check:
output = glob.glob(path+'./*.shp')
#Do some processing
else:
output = glob.glob(path+'./*.shp')
#Do some other processing
return output
#
#
#
First(path_1,True)
First(path_2,False)

Categories

Resources