Searching a directory for two different file formats - python

I have the following method:
def file_match(self, fundCodes):
# Get a list of the files
files = os.listdir(self.unmappedDir)
# loop through all the files and search for matching file
for check_fund in fundCodes:
# Format of file to look for
file_match = 'unmapped_{fund}_{start}_{end}.csv'.format(fund=check_fund, start=self.startDate, end=self.endDate)
# look in the unmappeddir and see if there's a file with that name
if file_match in files:
# if there's a match, load unmapped positions as etl
return self.read_file(file_match)
The method would seek for files that can match this type of format:
unmapped_A-AGEI_2018-07-01_2018-07-09.csv or
unmapped_PWMA_2018-07-01_2018-07-09.csv
NOTE: The fundCodes argument would be an array of "fundCodes"
Now, I want it to be able to look for another type of format, which would be the following:
citco_unmapped_trades_2018-07-01_2018-07-09 I'm having a little trouble trying to figure out how to re-write the function so it can look for two possible formats and if it finds one then move on to the self.read_file(file_match) method. (If it finds both, I might have to do some error handling). Any suggestions?

There are many various approaches that can be used to do this, it depends, in particular, on your possible further enhancements. The easiest and quite straightforward way is to make a list of allowed filenames and check it one by one:
file_matches = [
'unmapped_{fund}_{start}_{end}.csv'.format(fund=check_fund, start=self.startDate, end=self.endDate),
'citco_unmapped_{fund}_{start}_{end}.csv'.format(fund=check_fund, start=self.startDate, end=self.endDate)
]
# look in the unmappeddir and see if there's a file with that name
for file_match in file_matches:
if file_match in files:
# if there's a match, load unmapped positions as etl
return self.read_file(file_match)

I came across this when I was looking for answers about something else. Keep in mind I wrote this in a couple of minutes. I am sure it can be improved. You should be able to copy and paste this and run it. You will just have to create the files or drop the script in the same directory as the files. Feel free to modify it the way you want. This may not be the best solution but it should work. I wrote this so you can test it immediately. You will just have to modify it so that it runs correctly in your program. If you need me to elaborate please comment below.
import os
def file_search(formats, fund_codes):
files = os.listdir()
for fund in fund_codes:
for fmt in formats:
file_match = fmt.format(fund=fund[0], start=fund[1], end=fund[2])
if file_match in files:
print(file_match)
formats = ['unmapped_{fund}_{start}_{end}.csv', 'citco_unmapped_{fund}_{start}_{end}.csv']
fund_codes = [['PWMA', '2018-07-01', '2018-07-09'], ['A-AGEI', '2018-07-01', '2018-07-09'], ['trades', '2018-07-01', '2018-07-09']]
file_search(formats, fund_codes)

Related

Duplicate in list created from filenames (python)

I'm trying to create a list of excel files that are saved to a specific directory, but I'm having an issue where when the list is generated it creates a duplicate entry for one of the file names (I am absolutely certain there is not actually a duplicate of the file).
import glob
# get data file names
path =r'D:\larvalSchooling\data'
filenames = glob.glob(path + "/*.xlsx")
output:
>>> filenames
['D:\\larvalSchooling\\data\\copy.xlsx', 'D:\\larvalSchooling\\data\\Raw data-SF_Fri_70dpf_GroupABC_n5_20200828_1140-Trial 1.xlsx', 'D:\\larvalSchooling\\data\\Raw data-SF_Sat_70dpf_GroupA_n5_20200808_1015-Trial 1.xlsx', 'D:\\larvalSchooling\\data\\Raw data-SF_Sat_84dpf_GroupABCD_n5_20200822_1440-Trial 1.xlsx', 'D:\\larvalSchooling\\data\\~$Raw data-SF_Fri_70dpf_GroupABC_n5_20200828_1140-Trial 1.xlsx']
you'll note 'D:\larvalSchooling\data\Raw data-SF_Fri_70dpf_GroupABC_n5_20200828_1140-Trial 1.xlsx' is listed twice.
Rather than going through after the fact and removing duplicates I was hoping to figure out why it's happening to begin with.
I'm using python 3.7 on windows 10 pro
If you wrote the code to remove duplicates (which can be as simple as filenames = set(filenames)) you'd see that you still have two filenames. Print them out one on top of the other to make a visual comparison easier:
'D:\\larvalSchooling\\data\\Raw data-SF_Sat_84dpf_GroupABCD_n5_20200822_1440-Trial 1.xlsx',
'D:\\larvalSchooling\\data\\~$Raw data-SF_Fri_70dpf_GroupABC_n5_20200828_1140-Trial 1.xlsx'
The second one has a leading ~ (probably an auto-backup).
Whenever you open an excel file it will create a ghost copy that works as a temporary backup copy for that specific file. In this case:
Raw data-SF_Fri_70dpf_GroupABC_n5_20200828_1140-Trial1.xlsx
~$ Raw data-SF_Fri_70dpf_GroupABC_n5_20200828_1140-Trial1.xlsx
This means that the file is open by some software and it's showing you that backup inside(usually that file is hidden from the explorer as well)
Just search for the program and close it. Other actions, such as adding validation so the "~$.*.xlsx" type of file is ignored should be also implemented if this is something you want to avoid.
You can use os.path.splittext to get the file extension and loop through the directory using os.listdir . The open excel files can be skipped using the following code:
filenames = []
for file in os.listdir('D:\larvalSchooling\data'):
filename, file_extension = os.path.splitext(file)
if file_extension == '.xlsx':
if not file.startswith('~$'):
filenames.append(file)
Note: this might not be the best solution, but it'll get the job done :)

Find file in a directory with python and if multiple files show up matching decide which to open

basically what the title says, what is the best approach to do this?
I was looking at a few tools like the os.walk and scandir but then I am not sure how I would store them and decide which file to open if they are multiples. I was thinking I would need to store in a dictionary and then decide which numbered item I want.
you can use
list_of_files = os.listdit(some_directory)
which returns a list of names of the files that exist in that directory, you can easily add some of these names to a dictionary based on their index in this list.
Here is a function that implements the specifications you have outlined. It may require some tinkering as your specs evolve, but it's an ok start. See the docs for the os builtin package for more info :)
import os
def my_files_dict(directory, filename):
myfilesdict = []
with os.scandir(directory) as myfiles:
for f in myfiles:
if f.name == filename and f.is_file:
myfilesdict.append(f.name)
return dict(enumerate(myfilesdict))

Using a list to find and move specific files - python 2.7

I've seen a lot of people asking questions about searching through folders and creating a list of files, but I haven't found anything that has helped me do the opposite.
I have a csv file with a list of files and their extensions (xxx0.laz, xxx1.laz, xxx2.laz, etc). I need to read through this list and then search through a folder for those files. Then I need to move those files to another folder.
So far, I've taken the csv and created a list. At first I was having trouble with the list. Each line had a "\n" at the end, so I removed those. From the only other example I've found... [How do I find and move certain files based on a list in excel?. So I created a set from the list. However, I'm not really sure why or if I need it.
So here's what I have:
id = open('file.csv','r')
list = list(id)
list_final = ''.join([item.rstrip('\n') for item in list])
unique_identifiers = set(list_final)
os.chdir(r'working_dir') # I set this as the folder to look through
destination_folder = 'folder_loc' # Folder to move files to
for identifier in unique_identifiers:
for filename in glob.glob('%s_*' % identifier)"
shutil.move(filename, destination_folder)
I've been wondering about this ('%s_*' % identifier) with the glob function. I haven't found any examples with this, perhaps that needs to be changed?
When I do all that, I don't get anything. No errors and no actual files moved...
Perhaps I'm going about this the wrong way, but that is the only thing I've found so far anywhere.
its really not hard:
for fname in open("my_file.csv").read().split(","):
shutil.move(fname.strip(),dest_dir)
you dont need a whole lot of things ...
also if you just want all the *.laz files in a source directory you dont need a csv at all ...
for fname in glob.glob(os.path.join(src_dir,"*.laz")):
shutil.move(fname,dest_dir)

Matching specific file names

I have some code that looks for net script files eg: ifcfg-eth0 etc. The code currently uses the match function available in Augeas to get all the files in the directory e.g.:
augeas.match("/files/etc/sysconfig/network-scripts/*")
However this code is matching files such as ifcfg-eth0.bak which is not a valid file for my needs. I want to match only the network scripts ranging from eth0 to eth7 (and no backup files etc). What would be a good approach to match only the correct files?
I was able to meet my requirements using the following code:
files = []
for i in range(8):
try:
filename = augeas.match('/files/etc/sysconfig/network-scripts/ifcfg-eth' + str(i))[0]
files.append(filename)
except:
continue
print files
If your absolutely sure you don't want any files that have and extension you could try this.
augeas.match('etc/sysconfig/network-scripts/*[regexp("[\w-]")]')
Editied to add quotes as mentioned below.

Best Practices when matching large number of files against large number of regex strings

I have a directory with several thousand files. I want to sort them into directories based on file name, but many of the file names are very similar.
my thinking is that i'm going to have to write up a bunch of regex strings, and then do some sort of looping. this is my question:
is one of these two options more optimal than the other? do i loop over all my files, and for each file check it against my regexs, keeping track of how many match? or do i do the opposite and loop over the regex and touch each file?
i had though to do it in python, as thats my strongest language, but i'm open to other ideas.
this is some code i use for a program of mine which i have modified for your purposes, it gets a directory (sort_dir) goes every every file there, and creates directories based on the filenames, then moves the files into those directories. since you have not provided any information as to where or how you want to sort your files, you will have to add that part where i have mentioned:
def sort_files(sort_dir):
for f in os.listdir(sort_dir):
if not os.path.isfile(os.path.join(sort_dir, f)):
continue
# this is the folder names to be created, what do you want them to be?
destinationPath = os.path.join(sort_dir,f) #right now its just the filename...
if not os.path.exists(destinationPath):
os.mkdir(destinationPath)
if os.path.exists(os.path.join(destinationPath,f)):
at = True
while at:
try:
shutil.move(os.path.join(sort_dir,f), \
os.path.join(destinationPath,f))
at = False
except:
continue
else:
shutil.move(os.path.join(sort_dir,f), destinationPath)

Categories

Resources