Python - Can glob be used multiple times? - python

I want the user to process files in 2 different folders. The user does by selecting a folder for First_Directory and another folder for Second_Directory. Each of these are defined, have their own algorithms and work fine if only one directory is selected at a time. If the user selects both, only the First_Directory is processed.
Both also contain the glob module as shown in the simplified code which I think the problem lies. My question is: can the glob module be used multiple times and if not, is there an alternative?
##Test=name
##First_Directory=folder
##Second_Directory=folder
path_1 = First_Directory
path_2 = Second_Directory
path = path_1 or path_2
os.chdir(path)
def First(path_1):
output_1 = glob.glob('./*.shp')
#Do some processing
def Second(path_2):
output_2 = glob.glob('./*.shp')
#Do some other processing
if path_1 and path_2:
First(path_1)
Second(path_2)
elif path_1:
First(path_1)
elif path_2:
Second(path_2)
else:
pass

You can modify your function to only look for .shp files in the path of interest. Then you can use that function for one path or both.
def globFolder(path):
output_1 = glob.glob(path + '\*.shp')
path1 = "C:\folder\data1"
path2 = "C:\folder\data2"
Then you can use that generic function:
totalResults = globFolder(path1) + globFolder(path2)
This will combine both lists.

I think by restructring your code can obtain your goal:
def First(path,check):
if check:
output = glob.glob(path+'./*.shp')
#Do some processing
else:
output = glob.glob(path+'./*.shp')
#Do some other processing
return output
#
#
#
First(path_1,True)
First(path_2,False)

Related

How to convert a function to a recursive function

Hey guys I don't know if I can ask this but I'm working on the original files in google collab and I wrote a function that sums all the sizes of the file
import os
def recls_rtsize(argpath):
sumsize = 0
for entry in os.scandir(argpath):
path= argpath+'/'+entry.name
size= os.path.getsize(path)
sumsize+=size
return sumsize
print("total:",recls_rtsize('/var/log'))
But I need a way to make this function a recursive function or if there is some kind of formula or idea to convert no-recursive into recursive
Recursive function is the function which calls itself. For example if You are trying to calculate the sum of all files inside some directory You can just loop through files of that directory and summarize the sizes of the files. If directory You are checking for has subdirectories, then you can just put a condition, if directory has subdirs, if it is, then you can call function itself for that subdirectory.
In your case:
import os
def recls_rtsize(argpath):
sumsize = 0
for entry in os.scandir(argpath):
# think of is_directory is your custom function that checks
# if this path is a directory
if entry.is_directory():
# then call your function for this directory
size = recls_stsize(entry)
else:
path = argpath+'/'+entry.name
size = os.path.getsize(path)
sumsize += size
return sumsize
print("total:",recls_rtsize('/var/log'))
For example, you could write helper function to process it recursively, although I don't understand the purpose:
import os
def recls_rtsize(argpath):
def helper(dirs):
if not dirs:
return 0
path = argpath + '/' + dirs[0].name
size = os.path.getsize(path)
return size + helper(dirs[1:])
return helper(list(os.scandir(argpath)))
print("total:", recls_rtsize('testing_package'))
Explanation:
Let's say argpath contains several files:
argpath = [file1, file2, file2]
Then the function calls would be:
size(file1) + recls_rtsize([file2, file2]) we pass everything after the first element
size(file1) + size(file2) + recls_rtsize([file3])
size(file1) + size(file2) + size(file3) + recls_rtsize([])
There are no elements left, and we return 0 and start backtracking
size(file1) + size(file2) + size(file3) + 0
size(file1) + size(file2) + (size(file3) + 0)
size(file1) + (size(file2) + (size(file3) + 0))
(size(file1) + (size(file2) + (size(file3) + 0))) # our result
I hope it make sense
To iterate over files in sub-folders (I assume that this is your goal here) you can use os.walk().
example

problem with moving files using os.rename

i have this block of code where i try to move all the files in a folder to a different folder.
import os
from os import listdir
from os.path import isfile, join
def run():
print("Do you want to convert 1 file (0) or do you want to convert all the files in a folder(1)")
oneortwo = input("")
if oneortwo == "0":
filepathonefile = input("what is the filepath of your file?")
filepathonefilewithoutfullpath = os.path.basename(filepathonefile)
newfolder = "C:/Users/EL127032/Documents/fileconvertion/files/" + filepathonefilewithoutfullpath
os.rename(filepathonefile,newfolder)
if oneortwo == "1" :
filepathdirectory = input("what is the filepath of your folder?")
filesindirectory = [f for f in listdir(filepathdirectory) if isfile(join(filepathdirectory, f))]
numberoffiles = len(filesindirectory)
handlingfilenumber = 0
while numberoffiles > handlingfilenumber:
currenthandlingfile = filesindirectory[handlingfilenumber]
oldpathcurrenthandling = filepathdirectory + "/" + currenthandlingfile
futurepathcurrenhandlingfile = "C:/Users/EL127032/Documents/fileconvertion/files/" + currenthandlingfile
os.rename(oldpathcurrenthandling, futurepathcurrenhandlingfile)
but when i run this it gives
os.rename(oldpathcurrenthandling, futurepathcurrenhandlingfile)
FileNotFoundError: [WinError 2] System couldn't find the file: 'C:\Users\EL127032\Documents\Eligant - kopie\Klas 1\Stermodules\Basisbiologie/lopen (1).odt' -> 'C:/Users/EL127032/Documents/fileconvertion/files/lopen (1).odt'
can someone help me please.
You are trying to move the same file twice.
The bug is in this part :
numberoffiles = len(filesindirectory)
handlingfilenumber = 0
while numberoffiles > handlingfilenumber:
currenthandlingfile = filesindirectory[handlingfilenumber]
oldpathcurrenthandling = filepathdirectory + "/" + currenthandlingfile
futurepathcurrenhandlingfile = "C:/Users/EL127032/Documents/fileconvertion/files/" + currenthandlingfile
os.rename(oldpathcurrenthandling, futurepathcurrenhandlingfile)
The first time you loop, handlingfilenumber will be 0, so you will move the 0-th file from your filesindirectory list.
Then you loop again, handlingfilenumber is still 0, so you try to move it again, but it is not there anymore (you moved it already on the first turn).
You forgot to increment handlingfilenumber. Add handlingfilenumber += 1 on a line after os.rename and you will be fine.
while loops are more error-prone than simpler for loops, I recommend you use for loops when appropriate.
Here, you want to move each file, so a for loops suffices :
for filename in filesindirectory:
oldpathcurrenthandling = filepathdirectory + "/" + currenthandlingfile
futurepathcurrenhandlingfile = "C:/Users/EL127032/Documents/fileconvertion/files/" + currenthandlingfile
os.rename(oldpathcurrenthandling, futurepathcurrenhandlingfile)
No need to use len, initialize a counter, increment it, get the n-th element, ... And fewer lines.
Three other things :
you could have found the cause of the problem yourself, using debugging, there are plenty of ressources online to explain how to do it. Just printing the name of the file about to be copied (oldpathcurrenthandling) you would have seen it twice and noticed the problem causing the os error.
your variable names are not very readable. Consider following the standard style guide about variable names (PEP 8) and standard jargon, for example filepathonefilewithoutfullpath becomes filename, oldpathcurrenthandling becomes source_file_path (following the source/destination convention), ...
When you have an error, include the stacktrace that Python gives you. It would have pointed directly to the second os.rename case, the first one (when you copy only one file) does not contribute to the problem. It also helps finding a Minimal Reproducible Example.

batch copy files, perform operation and copy more files

I want to copy files from a directory in batch mode, perform an operation on the copied files and then copy more files. To do this I have managed this code
import os
import sys
from shutil import copy2
_, _, filenames = next(os.walk("src/"))
print(filenames)
number_of_files = len(filenames)
batch_number = 2
i = 0
while i < number_of_files:
i += 1
j = i + batch_number
print(filenames[i:j])
and its output is
['file_02', 'file_03']
['file_03', 'file_04']
['file_04', 'file_010']
['file_010', 'file_01']
['file_01', 'file_06']
['file_06', 'file_08']
['file_08', 'file_09']
['file_09', 'file_07']
['file_07']
[]
What I want is:
['file_01', 'file_02']
['file_03', 'file_04']
['file_05', 'file_06']
['file_07', 'file_08']
['file_09', 'file_10']
What would be the best way to go about doing this?
Be careful, os.walk doesn't provide sorting in numerical way.
You can use sort() method. And pass each time to certain list, where you sort the content numerically
you_file_list.sort(key=int)
Since you case contain file name, you can put file_ to XX number in filenames list.

Opening files from directory in specific order

I have a folder that contains around 500 images that I am rotating at a random angle from 0 to 360. The files are named 00i.jpeg where i = 0 then i = 1. For example I have an image named 009.jpeg and one named 0052.jpeg and another one 00333.jpeg. My code below works as is does rotate the image, but how the files are being read through is not stepping correctly.
I would think I would need some sort of stepping code chunk that starts at 0 and adds one each time, but I'm not sure where I would put that. os.listdir doesn't allow me to do that because (from my understanding) it just lists the files out. I tried using os.walk but I cannot use cv2.imread. I receive a SystemError: <built-in function imread> returned NULL without setting an error error.
Any suggestions?
import cv2
import imutils
from random import randrange
import os
os.chdir("C:\\Users\\name\\Desktop\\training\\JPEG")
j = 0
for infile in os.listdir("C:\\Users\\name\\Desktop\\training\\JPEG"):
filename = 'testing' + str(j) + '.jpeg'
i = randrange(360)
image = cv2.imread(infile)
rotation_output = imutils.rotate_bound(image, angle=i)
os.chdir("C:\\Users\\name\\Desktop\\rotate_test")
cv2.imwrite("C:\\Users\\name\\Desktop\\rotate_test\\" + filename, rotation_output)
os.chdir("C:\\Users\\name\\Desktop\\training\\JPEG")
j = j + 1
print(infile)
000.jpeg
001.jpeg
0010.jpeg
00100.jpeg
...
Needs to be:
print(infile)
000.jpeg
001.jpeg
002.jpeg
003.jpeg
...
Get a list of files first, then use sort with key where the key is an integer version of the file name without extension.
files = os.listdir("C:\\Users\\name\\Desktop\\training\\JPEG")
files.sort(key=lambda x:int(x.split('.')[0]))
for infile in files:
...
Practical example:
files = ['003.jpeg','000.jpeg','001.jpeg','0010.jpeg','00100.jpeg','002.jpeg']
files.sort(key=lambda x:int(x.split('.')[0]))
print(files)
Output
['000.jpeg', '001.jpeg', '002.jpeg', '003.jpeg', '0010.jpeg', '00100.jpeg']

Reading images while maintaining folder structure

I have to write a matlab script in python as apparently what I want to achieve is done much more efficiently in Python.
So the first task is to read all images into python using opencv while maintaining folder structure. For example if the parent folder has 50 sub folders and each sub folder has 10 images then this is how the images variable should look like in python, very much like a cell in matlab. I read that python lists can perform this cell like behaviour without importing anything, so thats good I guess.
For example, below is how I coded it in Matlab:
path = '/home/university/Matlab/att_faces';
subjects = dir(path);
subjects = subjects(~strncmpi('.', {subjects.name}, 1)); %remove the '.' and '..' subfolders
img = cell(numel(subjects),1); %initialize the cell equal to number of subjects
for i = 1: numel(subjects)
path_now = fullfile(path, subjects(i).name);
contents = dir([path_now, '/*.pgm']);
for j = 1: numel(contents)
img{i}{j} = imread(fullfile(path_now,contents(j).name));
disp([i,j]);
end
end
The above img will have 50 cells and each cell will have stored 10 images. img{1} will be all images belonging to subject 1 and so on.
Im trying to replicate this in python but am failing, this is what I have I got so far:
import cv2
import os
import glob
path = '/home/university/Matlab/att_faces'
sub_f = os.listdir(path)
images = []
for n in sub_f:
path_now = os.path.join(path, sub_f[n], '*.pgm')
images[n] = [cv2.imread(file) for file in glob.glob(path_now)]
Its not exactly what I am looking for, some help would be appreciated. Please ignore silly mistakes as it is my first day writing in python.
Thanks
edit: directory structure:
The first problem is that n isn't a number or index, it is a string containing the path name. To get the index, you can use enumerate, which gives index, value pairs.
Second, unlike in MATLAB you can't assign to indexes that don't exist. You need to pre-allocate your image array or, better yet, append to it.
Third, it is better not to use the variable file since in python 2 it is a built-in data type so it can confuse people.
So with preallocating, this should work:
images = [None]*len(sub_f)
for n, cursub in enumerate(sub_f):
path_now = os.path.join(path, cursub, '*.pgm')
images[n] = [cv2.imread(fname) for fname in glob.glob(path_now)]
Using append, this should work:
for cursub in sub_f
path_now = os.path.join(path, cursub, '*.pgm')
images.append([cv2.imread(fname) for fname in glob.glob(path_now)])
That being said, there is an easier way to do this. You can use the pathlib module to simplify this.
So something like this should work:
from pathlib import Path
mypath = Path('/home/university/Matlab/att_faces')
images = []
for subdir in mypath.iterdir():
images.append([cv2.imread(str(curfile)) for curfile in subdir.glob('*.pgm')])
This loops over the subdirectories, then globs each one.
This can even be done in a nested list comprehension:
images = [[cv2.imread(str(curfile)) for curfile in subdir.glob('*.pgm')]
for subdir in mypath.iterdir()]
It should be the following:
import os
path = '/home/university/Matlab/att_faces'
sub_f = os.listdir(path)
print(sub_f) #--- this will print all the files present in this directory ---
#--- this a list to which you will append all the images ---
images = []
#--- iterate through every file in the directory and read those files that end with .pgm format ---
#--- after reading it append it to the list ---
for n in sub_f:
if n.endswith('.pgm'):
path_now = os.path.join(path, n)
print(path_now)
images.append(cv2.imread(path_now, 1))
import cv2
import os
import glob
path = '/home/university/Matlab/att_faces'
sub_f = os.listdir(path)
images = []
#read the images
for folder in sub_f:
path_now = os.path.join(path, folder, '*.pgm')
images.append([cv2.imread(file) for file in glob.glob(path_now)])
#display the images
for folder in images:
for image in folder:
cv2.imshow('image',image)
cv2.waitKey(0)
cv2.destroyAllWindows()

Categories

Resources