i have a code that finds every file in the directory with a certain extension:
and i want this script to be applied to every shp:
import geopandas as gpd
pst = gpd.read_file(r'C:\Users\user\Desktop\New folder1\PST')#this is not needed in the final because it takes the path by it self
dbound = gpd.read_file(r'C:\Users\user\Desktop\New folder1\DBOUND')#same here
dbound.reset_index(inplace=True)
dbound = dbound.rename(columns={'index': 'fid'})
wdp = gpd.sjoin(pst, dbound, how="inner", op='within')#each dbound and pst from every subfolder
wdp['DEC_ID']=wdp['fid']
this is the list that contains the paths to the shapefiles:
grouped_shapefiles that has these shapefiles:
[['C:\\Users\\user\\Desktop\\eff\\20194\\DBOUND\\DBOUND.shp',
'C:\\Users\\user\\Desktop\\eff\\20194\\PST\\PST.shp'],
['C:\\Users\\user\\Desktop\\eff\\20042\\DBOUND\\DBOUND.shp',
'C:\\Users\\user\\Desktop\\eff\\20042\\PST\\PST.shp'],
['C:\\Users\\user\\Desktop\\eff\\20161\\DBOUND\\DBOUND.shp',
'C:\\Users\\user\\Desktop\\eff\\20161\\PST\\PST.shp'],
['C:\\Users\\user\\Desktop\\eff\\20029\\DBOUND\\DBOUND.shp',
'C:\\Users\\user\\Desktop\\eff\\20029\\PST\\PST.shp'],
['C:\\Users\\user\\Desktop\\eff\\20008\\DBOUND\\DBOUND.shp',
'C:\\Users\\user\\Desktop\\eff\\20008\\PST\\PST.shp']]
and i want something like this:
results = []
for group in grouped_shapefiles:
#here applies the script where i need help to connect in the loop
#and then the export process- the line that follows
#o=a path
out = o +'\result.shp'#here it would be nice to add to the name in the output the name of its folder so it would be unique
data2.to_file(out)
How can i do that?
Related
This is my code.
folder_out = []
for a in range(1,80):
folder_letter = "/content/drive/MyDrive/project/Dataset/data/"
folder_out[a] = os.path.join(folder_letter, str(a))
folder_out.append(folder_out[a])
and this is an error
and this what I want
You are using the os method wrong, you want to use os.listdir(Your directory here) to get a list of all directories
import os
dir = os.listdir("/content/drive/MyDrive/project/Dataset/data/")
for f in dir:
print(f)
If you just want a list of all directories, just use os.listdir("/content/drive/MyDrive/project/Dataset/data/")
It's simply pointless to create a variable. They are unnecessary: You can store everything in lists, dictionaries and so on. Creating a new variables inside loop is very very bad practice.
Code correction: save in list instead and access them using loops or slicing.
import os
folder_out = []
for a in range(1,80):
folder_letter = "/content/drive/MyDrive/project/Dataset/data/"
folder= os.path.join(folder_letter, str(a))
folder_out.append(folder)
print(folder_out)
Gives list of folder names.
['/content/drive/MyDrive/project/Dataset/data/1', '/content/drive/MyDrive/project/Dataset/data/2', '/content/drive/MyDrive/project/Dataset/data/3', '/content/drive/MyDrive/project/Dataset/data/4', '/content/drive/MyDrive/project/Dataset/data/5', '/content/drive/MyDrive/project/Dataset/data/6', '/content/drive/MyDrive/project/Dataset/data/7', '/content/drive/MyDrive/project/Dataset/data/8', '/content/drive/MyDrive/project/Dataset/data/9',.....]
If you want to iterate over them.
for elment in folder_out:
print(elment)
Which gives #
element 1
elem2nt 2...
Like
for x in folder_out:
print(f"folder_out{c}: {x}")
c= c+1
Gives what you want
folder_out0: /content/drive/MyDrive/project/Dataset/data/1
folder_out1: /content/drive/MyDrive/project/Dataset/data/2
folder_out2: /content/drive/MyDrive/project/Dataset/data/3
folder_out3: /content/drive/MyDrive/project/Dataset/data/4
folder_out4: /content/drive/MyDrive/project/Dataset/data/5
folder_out5: /content/drive/MyDrive/project/Dataset/data/6
folder_out6: /content/drive/MyDrive/project/Dataset/data/7
folder_out7: /content/drive/MyDrive/project/Dataset/data/8
folder_out8: /content/drive/MyDrive/project/Dataset/data/9
folder_out9: /content/drive/MyDrive/project/Dataset/data/10
folder_out10: /content/drive/MyDrive/project/Dataset/data/11
folder_out11: /content/drive/MyDrive/project/Dataset/data/12
folder_out12: /content/drive/MyDrive/project/Dataset/data/13
folder_out13: /content/drive/MyDrive/project/Dataset/data/14
folder_out14: /content/drive/MyDrive/project/Dataset/data/15
folder_out15: /content/drive/MyDrive/project/Dataset/data/16
If you want to create a folder for each path:
import os
for x in folder_out:
os.mkdir(x)
which will create 79 empty folders
I have a code that projects a number of shapefiles in a folder to another coordinate system and the projected shapefiles are placed in another folder. For the projected shapefiles, I want to append "_projected" at the end each shapefile name.
What I have so far works for the projection and setting the output files into a specific folder, but the new output files are not showing the "_projected" at the end.
Here is my code
import arcpy
import os
arcpy.env.workspace = "inputdatafolder"
arcpy.env.overwriteOutput = True
outWorkspace = "outputdatafolder"
for infc in arcpy.ListFeatureClasses():
dsc = arcpy.Describe(infc)
if dsc.spatialReference.Name == "Unknown":
print ("skipped this fc due to undefined coordinate system: "+ infc)
else:
outfc = os.path.join(outWorkspace, infc)
outCS = arcpy.SpatialReference('NAD 1983 UTM Zone 10N')
arcpy.Project_management(infc, outfc, outCS)
infc = infc.replace(".shp","_projected.shp")
Since the code works, I am not getting any errors. The file name just isn't replaced with the ending I want it to.
Your code is replacing the text of the filepath of infc, but not actually renaming the file.
Furthermore, outfc is the path to the new projected shapefile you are creating, while infc is the path to the original file. Don't you want outfc to have the "_projected.shp"suffix?
The code below changes the text of the output file path to include "_projected.shp" before calling arcpy.Project_management to create the new file.
import arcpy
import os
arcpy.env.workspace = "inputdatafolder"
arcpy.env.overwriteOutput = True
outWorkspace = "outputdatafolder"
for infc in arcpy.ListFeatureClasses():
dsc = arcpy.Describe(infc)
if dsc.spatialReference.Name == "Unknown":
print ("skipped this fc due to undefined coordinate system: "+ infc)
else:
outfc = os.path.join(outWorkspace, infc).replace(".shp","_projected.shp")
outCS = arcpy.SpatialReference('NAD 1983 UTM Zone 10N')
arcpy.Project_management(infc, outfc, outCS)
I'm also not sure if you're using Describe correctly. You may need to use infc.name when constructing the file paths.
I have to write a matlab script in python as apparently what I want to achieve is done much more efficiently in Python.
So the first task is to read all images into python using opencv while maintaining folder structure. For example if the parent folder has 50 sub folders and each sub folder has 10 images then this is how the images variable should look like in python, very much like a cell in matlab. I read that python lists can perform this cell like behaviour without importing anything, so thats good I guess.
For example, below is how I coded it in Matlab:
path = '/home/university/Matlab/att_faces';
subjects = dir(path);
subjects = subjects(~strncmpi('.', {subjects.name}, 1)); %remove the '.' and '..' subfolders
img = cell(numel(subjects),1); %initialize the cell equal to number of subjects
for i = 1: numel(subjects)
path_now = fullfile(path, subjects(i).name);
contents = dir([path_now, '/*.pgm']);
for j = 1: numel(contents)
img{i}{j} = imread(fullfile(path_now,contents(j).name));
disp([i,j]);
end
end
The above img will have 50 cells and each cell will have stored 10 images. img{1} will be all images belonging to subject 1 and so on.
Im trying to replicate this in python but am failing, this is what I have I got so far:
import cv2
import os
import glob
path = '/home/university/Matlab/att_faces'
sub_f = os.listdir(path)
images = []
for n in sub_f:
path_now = os.path.join(path, sub_f[n], '*.pgm')
images[n] = [cv2.imread(file) for file in glob.glob(path_now)]
Its not exactly what I am looking for, some help would be appreciated. Please ignore silly mistakes as it is my first day writing in python.
Thanks
edit: directory structure:
The first problem is that n isn't a number or index, it is a string containing the path name. To get the index, you can use enumerate, which gives index, value pairs.
Second, unlike in MATLAB you can't assign to indexes that don't exist. You need to pre-allocate your image array or, better yet, append to it.
Third, it is better not to use the variable file since in python 2 it is a built-in data type so it can confuse people.
So with preallocating, this should work:
images = [None]*len(sub_f)
for n, cursub in enumerate(sub_f):
path_now = os.path.join(path, cursub, '*.pgm')
images[n] = [cv2.imread(fname) for fname in glob.glob(path_now)]
Using append, this should work:
for cursub in sub_f
path_now = os.path.join(path, cursub, '*.pgm')
images.append([cv2.imread(fname) for fname in glob.glob(path_now)])
That being said, there is an easier way to do this. You can use the pathlib module to simplify this.
So something like this should work:
from pathlib import Path
mypath = Path('/home/university/Matlab/att_faces')
images = []
for subdir in mypath.iterdir():
images.append([cv2.imread(str(curfile)) for curfile in subdir.glob('*.pgm')])
This loops over the subdirectories, then globs each one.
This can even be done in a nested list comprehension:
images = [[cv2.imread(str(curfile)) for curfile in subdir.glob('*.pgm')]
for subdir in mypath.iterdir()]
It should be the following:
import os
path = '/home/university/Matlab/att_faces'
sub_f = os.listdir(path)
print(sub_f) #--- this will print all the files present in this directory ---
#--- this a list to which you will append all the images ---
images = []
#--- iterate through every file in the directory and read those files that end with .pgm format ---
#--- after reading it append it to the list ---
for n in sub_f:
if n.endswith('.pgm'):
path_now = os.path.join(path, n)
print(path_now)
images.append(cv2.imread(path_now, 1))
import cv2
import os
import glob
path = '/home/university/Matlab/att_faces'
sub_f = os.listdir(path)
images = []
#read the images
for folder in sub_f:
path_now = os.path.join(path, folder, '*.pgm')
images.append([cv2.imread(file) for file in glob.glob(path_now)])
#display the images
for folder in images:
for image in folder:
cv2.imshow('image',image)
cv2.waitKey(0)
cv2.destroyAllWindows()
I have the following list:
grouped_shapefiles that has these directories for the files:(pairs of pst and dbound) in each folder.
[['C:\\Users\\user\\Desktop\\eff\\20194\\DBOUND\\DBOUND.shp',
'C:\\Users\\user\\Desktop\\eff\\20194\\PST\\PST.shp'],
['C:\\Users\\user\\Desktop\\eff\\20042\\DBOUND\\DBOUND.shp',
'C:\\Users\\user\\Desktop\\eff\\20042\\PST\\PST.shp'],
['C:\\Users\\user\\Desktop\\eff\\20161\\DBOUND\\DBOUND.shp',
'C:\\Users\\user\\Desktop\\eff\\20161\\PST\\PST.shp'],
['C:\\Users\\user\\Desktop\\eff\\20029\\DBOUND\\DBOUND.shp',
'C:\\Users\\user\\Desktop\\eff\\20029\\PST\\PST.shp'],
['C:\\Users\\user\\Desktop\\eff\\20008\\DBOUND\\DBOUND.shp',
'C:\\Users\\user\\Desktop\\eff\\20008\\PST\\PST.shp']]
I want to make a for loop that performs this piece of code in the corresponding files for each pst and dbound pair in every folder (20194,20042,20161,etc...) that the list contains.
import geopandas as gpd
import pandas
#pst = gpd.read_file(r'C:\Users\user\Desktop\New folder1\PST')#this is not needed in the final because it takes the path by it self
#dbound = gpd.read_file(r'C:\Users\user\Desktop\New folder1\DBOUND')#same here
dbound.reset_index(inplace=True)
wdp = gpd.sjoin(pst, dbound, how="inner", op='within')#each dbound and pst from every folder
wdp['DEC_ID']=wdp['index']
I just want to know how to make the for loop that will perform what the code has in the files that should. I have tried with for loops and using the position in brackets but it didn't do what it should.
Not sure if i understand your question correctly, are you trying to iterate through the items in list in pairs? If so it is pretty straight forward:
for i in grouped_shapefiles:
pst = gpd.read_file(i[0])
dbound = gpd.read_file(i[1])
if i[0].split("\\")[-3] == i[1].split("\\")[-3]:
dbound.reset_index(inplace=True)
wdp = gpd.sjoin(pst, dbound, how="inner", op='within')
wdp['DEC_ID'] = wdp['index']
else:
print ("Folder pairs mismatch")
Edited as per my understanding
I want the user to process files in 2 different folders. The user does by selecting a folder for First_Directory and another folder for Second_Directory. Each of these are defined, have their own algorithms and work fine if only one directory is selected at a time. If the user selects both, only the First_Directory is processed.
Both also contain the glob module as shown in the simplified code which I think the problem lies. My question is: can the glob module be used multiple times and if not, is there an alternative?
##Test=name
##First_Directory=folder
##Second_Directory=folder
path_1 = First_Directory
path_2 = Second_Directory
path = path_1 or path_2
os.chdir(path)
def First(path_1):
output_1 = glob.glob('./*.shp')
#Do some processing
def Second(path_2):
output_2 = glob.glob('./*.shp')
#Do some other processing
if path_1 and path_2:
First(path_1)
Second(path_2)
elif path_1:
First(path_1)
elif path_2:
Second(path_2)
else:
pass
You can modify your function to only look for .shp files in the path of interest. Then you can use that function for one path or both.
def globFolder(path):
output_1 = glob.glob(path + '\*.shp')
path1 = "C:\folder\data1"
path2 = "C:\folder\data2"
Then you can use that generic function:
totalResults = globFolder(path1) + globFolder(path2)
This will combine both lists.
I think by restructring your code can obtain your goal:
def First(path,check):
if check:
output = glob.glob(path+'./*.shp')
#Do some processing
else:
output = glob.glob(path+'./*.shp')
#Do some other processing
return output
#
#
#
First(path_1,True)
First(path_2,False)