Deleting files based on day within filename - python

I have a directory with files like: data_Mon_15-8-22.csv, data_Tue_16-8-22.csv, data_Mon_22-8-22.csv etc and I am trying to delete all but the Monday files. However, my script doesn't seem to differentiate between the filenames and just deletes everything despite me stating it. Where did I go wrong? Any help would be much appreciated!
My Code:
def file_delete():
directory = pathlib.Path('/Path/To/Data')
for file in directory.glob('data_*.csv'):
if file != 'data_Mon_*.csv':
os.remove(file)]

if all Monday files start with "data_Mon_" then you might use str.startswith:
def file_delete():
directory = pathlib.Path('/Path/To/Data')
for file in directory.glob('data_*.csv'):
if not file.name.startswith('data_Mon_'):
os.remove(file)

if file != 'data_Mon_*.csv'
There's two problems here:
file is compared against the string 'data_Mon_*.csv'. Since file isn't a string, these two objects will never be equal. So the if condition will always be true. To fix this, you need to get the file's name, rather than using the file object directly.
Even if you fix this, the string 'data_Mon_*.csv' is literal. In other words, the * is a *. Unlike directory.glob('data_*.csv'), this will only match a * rather than match "anything" as in a glob expression. In order to fix this, you need to use a regular expression to match against your file name.

Related

Rename directory with constantly changing name

I created a script that is supposed to download some data, then run a few processes. The data source (being ArcGIS Online) always downloads the data as a zip file and when extracted the folder name will be a series of letters and numbers. I noticed that these occasionally change (not entirely sure why). My thought is to run an os.listdir to get the folder name then rename it. Where I run into issues is that the list returns the folder name with brackets and quotes. It returns as ['f29a52b8908242f5b1f32c58b74c063b.gdb'] as the folder name while folder in the file explorer does not have the brackets and quotes. Below is my code and the error I receive.
from zipfile import ZipFile
file_name = "THDNuclearFacilitiesBaseSandboxData.zip"
with ZipFile(file_name) as zip:
# unzipping all the files
print("Unzipping "+ file_name)
zip.extractall("C:/NAPSG/PROJECTS/DHS/THD_Nuclear_Facilities/SCRIPT/CountyDownload/Data")
print('Unzip Complete')
#removes old zip file
os.remove(file_name)
x = os.listdir("C:/NAPSG/PROJECTS/DHS/THD_Nuclear_Facilities/SCRIPT/CountyDownload/Data")
os.renames(str(x), "Test.gdb")
Output:
FileNotFoundError: [WinError 2] The system cannot find the file specified: "['f29a52b8908242f5b1f32c58b74c063b.gdb']" -> 'Test.gdb'
I'm relatively new to python scripting, so if there is an easier alternative, that would be great as well. Thanks!
os.listdir() returns a list files/objects that are in a folder.
lists are represented, when printed to the screen, using a set of brackets.
The name of each file is a string of characters and strings are represented, when printed to the screen, using quotes.
So we are seeing a list with a single filename:
['f29a52b8908242f5b1f32c58b74c063b.gdb']
To access an item within a list using Python, you can using index notation (which happens to also use brackets to tell Python which item in the list to use by referencing the index or number of the item.
Python list indexes starting at zero, so to get the first (and in this case only item in the list), you can use x[0].
x = os.listdir("C:/NAPSG/PROJECTS/DHS/THD_Nuclear_Facilities/SCRIPT/CountyDownload/Data")
os.renames(x[0], "Test.gdb")
Having said that, I would generally not use x as a variable name in this case... I might write the code a bit differently:
files = os.listdir("C:/NAPSG/PROJECTS/DHS/THD_Nuclear_Facilities/SCRIPT/CountyDownload/Data")
os.renames(files[0], "Test.gdb")
Square brackets indicate a list. Try x[0] that should get rid of the brackets and be just the data.
The return from listdir may be a list with only one value or a whole bunch

How to read a file from a directory and convert it to a table?

I have a class that takes in positional arguments (startDate, endDate, unmappedDir, and fundCodes), I have the following methods:
The method below is supposed to take in a an array of fundCodes and look in a directory and see if it finds files matching a certain format
def file_match(self, fundCodes):
# Get a list of the files in the unmapped directory
files = os.listdir(self.unmappedDir)
# loop through all the files and search for matching fund code
for check_fund in fundCodes:
# set a file pattern
file_match = 'unmapped_positions_{fund}_{start}_{end}.csv'.format(fund=check_fund, start=self.startDate, end=self.endDate)
# look in the unmappeddir and see if there's a file with that name
if file_match in files:
# if there's a match, load unmapped positions as etl
return self.read_file(file_match)
else:
Logger.error('No file found with those dates/funds')
The other method is simply supposed to create an etl table from that file.
def read_file(self, filename):
loadDir = Path(self.unmappedDir)
for file in loadDir.iterdir():
print('*' *40)
Logger.info("Found a file : {}".format(filename))
print(filename)
unmapped_positions_table = etl.fromcsv(filename)
print(unmapped_positions_table)
print('*' * 40)
return unmapped_positions_table
When running it, I'm able to retrieve the filename:
Found a file : unmapped_positions_PUPSFF_2018-07-01_2018-07-11.csv
unmapped_positions_PUPSFF_2018-07-01_2018-07-11.csv
But when trying to create the table, I get this error:
FileNotFoundError: [Errno 2] No such file or directory: 'unmapped_positions_PUPSFF_2018-07-01_2018-07-11.csv'
Is it expecting a full path to the filename or something?
The proximate problem is that you need a full pathname.
The filename that you're trying to call fromcsv on is passed into the function, and ultimately came from listdir(self.unmappedDir). This means it's a path relative to self.unmappedDir.
Unless that happens to also be your current working directory, it's not going to be a valid path relative to the current working directory.
To fix that, you'd want to use os.path.join(self.unmappedDir, filename) instead of just filename. Like this:
return self.read_file(os.path.join(self.unmappedDir), file_match)
Or, alternatively, you'd want to use pathlib objects instead of strings, as you do with the for file in loadDir.iterdir(): loop. If file_match is a Path instead of a dumb string, then you can just pass it to read_file and it'll work.
But, if that's what you actually want, you've got a lot of useless code. In fact, the entire read_file function should just be one line:
def read_file(self, path):
return etl.fromcsv(path)
What you're doing instead is looping over every file in the directory, then ignoring that file and reading filename, and then returning early after the first one. So, if there's 1 file there, or 20 of them, this is equivalent to the one-liner; if there are no files, it returns None. Either way, it doesn't do anything useful except to add complexity, wasted performance, and multiple potential bugs.
If, on the other hand, the loop is supposed to do something meaningful, then you should be using file rather than filename inside the loop, and you almost certainly shouldn't be doing an unconditional return inside the loop.
with this:
files = os.listdir(self.unmappedDir)
you're getting the file names of self.unmappedDir
So when you get a match on the name (when generating your name), you have to read the file by passing the full path (else the routine probably checks for the file in the current directory):
return self.read_file(os.path.join(self.unmappedDir,file_match))
Aside: use a set here:
files = set(os.listdir(self.unmappedDir))
so the filename lookup will be much faster than with a list
And your read_file method (which I didn't see earlier) should just open the file, instead of scanning the directory again (and returning at first iteration anyway, so it doesn't make sense):
def read_file(self, filepath):
print('*' *40)
Logger.info("Found a file : {}".format(filepath))
print(filepath)
unmapped_positions_table = etl.fromcsv(filepath)
print(unmapped_positions_table)
print('*' * 40)
return unmapped_positions_table
Alternately, don't change your main code (except for the set part), and prepend the directory name in read_file since it's an instance method so you have it handy.

Use part of non static file name to create new directory

I'm new to python and trying to figure some stuff out.
I'm already learning to use the shutil.copy, .move functions as well as scanning files with glob. However I have a few questions on a scenario I'm facing:
Find a file that gets deposited to the same directory everyday, but in which half the file name changes everyday,and use it to make a destination folder, or zip it up with zipfile.
Example:
File X110616.Filename_110416.txt comes in today.
Tomorrow it will be X110616.Filename_110423.txt.
Since half or part of the name changes everyday, how do I cut/save a specific part of the string for a function or module to create a destination folder, or a zip file?
I can use the glob module to scan for a file with wild card variables, and I've tried using the rstrip(), but that only seems to remove the last half and not the beginning or center of the string.
Also not sure how to save the variables it finds and use it else where to create directories or zip files. Bottom line is I know how to tell the script to look for non-static characters in string but not sure what direction to take it in when using/saving those characters for other things:
import glob
for f in glob.glob("C:\\users\%username%\\Documents\\Test_Files\\X??????.Filename_??????.txt"):
Newdir = f
print(Newdir)
#or use to make a directory, or zip file...
This will find me the file with any ending, however I can't seem to understand how to save the file's name or path (whatever it may be).
To get a substring in python you use the slice operator.
>>> a = "Hello World"
>>> a[0:5]
'Hello'
str.split is also very powerful.
>>> a.split(" ")
['Hello', 'World']
I will often solve problems like you describe with a combination of the two. But for really tricky parsing problems there are regular expressions.
>>> b = "whatsit__blah.foo"
>>> import re
>>> result = re.search("(?P<first>[a-z]+)__(?P<second>[a-z]+).(?P<ext>[a-z]+)", b)
>>> result.groups()
('whatsit', 'blah', 'foo')
>>> result.groups("first")
('whatsit', 'blah', 'foo')
>>> result.group("first")
'whatsit'
>>> result.group("second")
'blah'
>>> result.group("ext")
'foo'
As you can see there is a lot to regular expressions. Because of the added complexity I avoid them unless I have a very complex problem.
Two more functions you may find useful. The os.path module has .split(), which will split a path into the base directory and the filename and .splitext(), which will split a path on the last ".", return the extension and the remainder of the path.
So here's what I ended up doing, and it seemed to have worked. It made a folder for each file that was found in a specific directory, while only using a specific part of the file name to make a folder reflecting the filename.
destdir = "C:\\Users\\%USERNAME%\\Documents\\Test_Files\\test\\"
srcpath = "C:\\download\\"
for z in os.listdir("C:\\download"):
if z.endswith("FILE.FILENAME.ZIP"):
os.mkdir(destdir + z[0:7])
newdir = destdir + z[0:7]
print(newdir)
I added print at the end to show what it created.

PYTHON - How to change one character in many file names within a directory

I have a large directory containing image files and need to change one character within the name of each file name. The character is in the same place in each file name (17). I think I need to use the 'replace' string function but as I am very new to Python I am not sure how to write this in a script (I work in GIS and have just started learning Python). Any help would be greatly appreciated. The character I need to change is the '1' after Nepal_Landscape_S'in the file name Nepal_Landscape_S1_LayerStacked_IM9_T44RQR_stack4.tif I simply need to change this to 2, like: Nepal_Landscape_S2_LayerStacked_IM9_T44RQR_stack4.tif
You can use the string replace method as you suspect along with os.rename.
import os
files = os.listdir("path/to/files")
for src in files:
dst = src.replace('S1', 'S2')
os.rename(src, dst)
If you are able to use your shell for this type of task, there may be simpler solutions, like the rename bash command: rename S1 S2 *S1*

Regex in Python to match all the files in a folder

I'm very bad at regex.
I'm trying to locate files in a folder based on the file names. Most of the filenames are in the format GSE1234_series_matrix.txt, hence I've been using os.path.join("files", GSE_num + "_series_matrix.txt"). However, a few files have names like GSE1234-GPL22_series_matrix.txt. I'm not sure how to address all the files starting with a GSE number and ending with _series_matrix.txt together, possibly in one statement. I'd really appreciate any help.
EDIT - I have these series matrix text files in a folder, for which I mention the path using path join. I also input a text file, which has all the GSE numbers. This way it runs the script only for selected GSE numbers. So not everything that's in the folder is in GSE num list AND the list just has GSE numbers and not GPL. For instance the file GSE1234-GPL22_series_matrix.txt would be GSE1234 in the list.
Skip using regexes entirely.
good_filenames = [name for name in filenames if name.startswith("GSE") and name.endswith("_series_matrix.txt")]
You could use glob. Depending on how much of the path you include in the pattern, you wouldn't have to worry about using os.path.join at all.
import glob
good_filenames = glob.glob('/your/path/here/GSE*_series_matrix.txt')
returns:
['/your/path/here/GSE1234_series_matrix.txt',
'/your/path/here/GSE1234-GPL22_series_matrix.txt']
Kevin's answer is great! If you'd like to use a regex, you can do something like this:
^GSE\d+.*series_matrix.txt$
That would match anything that starts with GSE and a number, and ends with series_matrix.txt

Categories

Resources