I have a main folder like this:
mainf/01/streets/streets.shp
mainf/02/streets/streets.shp #normal files
mainf/03/streets/streets.shp
...
and another main folder like this:
mainfo/01/streets/streets.shp
mainfo/02/streets/streets.shp #empty files
mainfo/03/streets/streets.shp
...
I want to use a function that will take as first parameter the first normal file from the upper folder (normal files) and as second the corresponding from the other folder (empty files).
Based on the [-3] level folder number (ex.01,02,03,etc)
Example with a function:
appendfunc(first_file_from_normal_files,first_file_from_empty_files)
How to do this in a loop?
My code:
for i in mainf and j in mainfo:
appendfunc(i,j)
Update
Correct version:
first = ["mainf/01/streets/streets.shp", "mainf/02/streets/streets.shp", "mainf/03/streets/streets.shp"]
second = ["mainfo/01/streets/streets.shp", "mainfo/02/streets/streets.shp", "mainfo/03/streets/streets.shp"]
final = [(f,s) for f,s in zip(first,second)]
for i , j in final:
appendfunc(i,j)
An alternative to automatically put in a list all the files in a main folder with full paths?
first= []
for (dirpath, dirnames, filenames) in walk(mainf):
first.append(os.path.join(dirpath,dirnames,filenames))
second = []
for (dirpath, dirnames, filenames) in walk(mainfo):
second.append(os.path.join(dirpath,dirnames,filenames))
Use zip:
first = ["mainf/01/streets/streets.shp", "mainf/02/streets/streets.shp", "mainf/03/streets/streets.shp"]
second = ["mainf/01/streets/streets.shp", "mainf/02/streets/streets.shp", "mainf/03/streets/streets.shp"]
final = [(f,s) for f,s in zip(first,second)]
print(final)
You can't use a for ... and loop. You can loop one iterable in one statement, and another iterable in another statement. This still won't give you what you want:
for i in mainf:
for j in mainfo:
appendfunc(i,j)
What you probably want is something like (I'm assuming mainf and mainfo are essentially the same, except one is empty):
for folder_num in range(len(mainf)):
appendfunc(mainf[folder_num], mainfo[folder_num])
You haven't said what appendfunc is supposed to do, so I'll leave that to you. I'm also assuming that, depending on how you're accessing the files, you can figure out how you might need to modify the calls to mainf[folder_num] and mainfo[folder_num] (eg. you may need to inject the number back into the directory structure somehow (mainf/{}/streets/streets.shp".format(zero_padded(folder_num))).
Related
I have a function called plot_ih_il that receives two data frames in order to generate a plot. I also have a set of folders that each contain a .h5 file with the data I need to give to the function plot_ih_il... I'm trying to feed the function two datasets at a time but unsuccessfully.
I've been using pathlib to do so
path = Path("files")
for log in path.glob("log*"):
for file in log.glob("log*.h5"):
df = pd.DataFrame(file, key = "log")
but using this loop, I can only feed one data frame at a time, I need two of them.
The structure of the folders is something like,
files->log1-> log1.h5
log2-> log2.h5
log3-> log3.h5
log4-> log4.h5
I would like to feed the function plot_il_ih the following sequence,
plot_il_ih(dataframeof_log1.h5, dataframeof_log2.h5) then
plot_il_ih(dataframeof_log2.h5, dataframeof_log3.h5) and so on.
I have tried to use zip
def pairwise(iterable):
a = iter(iterable)
return zip(a, a)
for l1, l2 in pairwise(list(path.glob('log*'))):
plot_il_ih(l1, l2)
but it doesn't move forward, just opens the 2 firsts.
What is wrong with my logic?
consider something like this. You might have to play around with the indexing
filelist = list(path.glob('log*'))
for i in range(1, len(filelist)):
print(filelist[i-1])
print(filelist[i])
print('\n')
I have assigned to variables different files. Now I want to make some operations iterating those variables. For example:
reduced_file1= 'names.xlsx'
reduced_file2= 'surnames.xlsx'
reduced_file3= 'city.xlsx'
reduced_file4= 'birth.xlsx'
the operations I want to iterate (with a FOR loop ) are:
xls= pd.ExcelFile(reduced_file1)
xls= pd.ExcelFile(reduced_file2)
xls= pd.ExcelFile(reduced_file3)
xls= pd.ExcelFile(reduced_file4)
...and so on
Basically every time is changing the name of the variable : reduced_file(i)
Thanks
files= ['names.xlsx', 'surnames.xlsx', 'city.xlsx', 'birth.xlsx']
for file in files:
xls = pd.ExcelFile(file)
You can also change string names by using f-strings:
for i in range(4):
print(f"this is number {i}")
I am looping through a directory and want to get all files in a folder stored as list in a dictionary, where each key is a folder and the list of files the value.
The first print in the loop shows exactly the output I am expecting.
However the second print shows empty values.
The third print after initialization of the class shows the list of the last subfolder as value for every key.
What am I overlooking or doing wrong?
class FileAndFolderHandling() :
folders_and_files = dict()
def __init__(self) :
self.getSubfolderAndImageFileNames()
def getSubfolderAndImageFileNames(self) :
subfolder = ""
files_in_subfolder = []
for filename in glob.iglob('X:\\Some_Directory\\**\\*.tif', recursive=True) :
if not subfolder == os.path.dirname(filename) and not subfolder == "" :
print(subfolder + " / / " + str(files_in_subfolder))
self.folders_and_files[subfolder] = files_in_subfolder
files_in_subfolder.clear()
print(self.folders_and_files)
subfolder = os.path.dirname(filename) # new subfolder
files_in_subfolder.append(os.path.basename(filename))
folder_content = FileAndFolderHandling()
print(folder_content.folders_and_files)
It seems like the problem you have is that you are actually using always the same list.
Defining files_in_subfolder = [] creates a list and assigns a pointer to that list in the variable you just defined. So what happens then is that when you assign self.folders_and_files[subfolder] = files_in_subfolder you are only storing the pointer to your list (which is the same in every iteration) in the dictionary and not the actual list.
Later, when you do files_in_subfolder.clear() you are clearing the list to which that pointer was pointing to, and therefore to all the entries of the dictionary (as it was always the same list).
To solve this, I would recommend you to create a new list for each different entry in your dictionary, instead of clearing it for each iteration. This is, move the definition of files_in_subfolder from outside the loop to inside of it.
Hope it helps!
It sounds like you are after defaultdict.
I adapted your code like this:
import glob, os
from collections import defaultdict
class FileAndFolderHandling() :
folders_and_files = defaultdict(list)
def __init__(self) :
self.getSubfolderAndImageFileNames()
def getSubfolderAndImageFileNames(self) :
for filename in glob.iglob(r'C:\Temp\T\**\*.txt', recursive=True) :
# print(filename)
subfolder = os.path.dirname(filename)
self.folders_and_files[subfolder].append(os.path.basename(filename))
folder_content = FileAndFolderHandling()
print(dict(folder_content.folders_and_files))
Output:
{'C:\\Temp\\T': ['X.txt'], 'C:\\Temp\\T\\X': ['X1.txt', 'X2.txt'], 'C:\\Temp\\T\\X2': ['X1.txt']}
The defaultdict(list) makes a new list for every new key added. This is what you seems to want to happen in your code.
You are clearing the array, from what I see...
files_in_subfolder.clear()
Remove that and make sure your value gets added to the folders_and_files variable before any clear operation.
I am new to Python. I need to traverse the list of files in a directory, and have a 2D list of files (keys) with a value. Then I need to sort it based on their values, and delete the files with lower half of values. How can I do that?
This is what I did so far. I can't figure it out how to create such 2D array.
dir = "images"
num_files=len(os.listdir(dir))
for file in os.listdir(dir):
print(file)
value = my_function(file)
#this is wrong:
_list[0][0].append(value)
#and then sorting, and removing the files associated with lower half
Basically, the 2D array should look like [[file1, 0.876], [file2, 0.5], [file3, 1.24]], which needed to be sorted out based on second indexes.
Based on the comments, looks like I have to do this when appending:
mylist.append([file, value])
And for sorting, I have to do this:
mylist.sort(key=lambda mylist: mylist[1])
I don't understand what this message means.
delete the files with lower half of values
Does this mean that you have to select the files having value less than the midpoint between minimum and maximum values on the files or that you just have to select the lower half of the files?
There isn't any need to use a 2D-array if the second coordinate depends on the first thanks to my_function. Here is a function that does what you need:
from os import listdir as ls
from os import remove as rm
from os.path import realpath
def delete_low_score_files(dir, func, criterion="midpoint")
"""Delete files having low score according to function f
Args:
dir (str): path of the dir;
func (fun): function that score the files;
criterion (str): can be "midpoint" or "half-list";
Returns:
(list) deleted files.
"""
files = ls(dir)
sorted_files = sorted(files, key=func)
if criterion == "midpoint":
midpoint = func(sorted_files[-1]) - func(sorted_files[0])
files_to_delete = [f for f in sorted_files if func(f) < midpoint]
if criterion == "half-list":
n = len(sorted_files)/2
files_to_delete = sorted_files[:n]
for f in files_to_delete:
rm(realpath(f))
return files_to_delete
What is the most efficient way to check for part of a string within a list?
For example, say I am looking for "4110964_se" (True) or "4210911_sw" (False) in the following list:
files = ['H:\\co_1m_2013\\41108\\m_4110864_se_12_1_20130717.tif',
'H:\\co_1m_2013\\41108\\m_4110864_sw_12_1_20130717.tif',
'H:\\co_1m_2013\\41109\\m_4110964_se_12_1_20130722.tif']
If I use a simple check, the results are not what I would expect:
>>> "4110964_se" in files
False
any('4110964_se' in f for f in files) # check if the string is in any of the list items
Malik Brahimi's answer works, though another way of doing this is just to put the files in a for loop like this:
for f in files:
print "4110964_se" in f
The reason your solution doesn't work is because it only looks for items that have the exact value "4110964_se", rather than looking in each string to see if that value is anywhere in any of the strings. For example, if you did:
print "H:\\co_1m_2013\\41108\\m_4110864_se_12_1_20130717.tif" in files
It would print True, because you gave it the full file name, rather than just a piece of it
You can use:
for x in files:
if '4110964_se' in x:
bool('4110964_se')
It prints:
True