Deleting Contents of a folder selevtively with python - python

I would periodically like to delete the contents of a Windows directory which includes files and sub directories that contain more files. However I do not there is one specific file that I do not want to remove (it is the same file every time). I am using shutil.rmtree to delete the contents of a folder but I am deleting the file I wish to keep also. How would I make an exception preventing the removal of the file I would like to keep and is shutil the best method for this?

To get a list of pathnames in a directory, use glob:
Say you want to only find .gif files and delete those:
import glob
gif_list = glob.glob('your/path/name/'+'*.gif')
For your path, this finds all the files that end in .gif. You can delete them, copy them, do whatever you want.
Or, use list comprehension:
import glob
#just get a list of all the files/folders in your path:
path_list = glob.glob('your/path/name/'+'*')
gif_list = [x for x in file_list if x[-4:] == '.gif']
Glob will also find all folders, of course. You can filter out which files and/or folders you want to keep, and delete the rest with rmtree. For example, say you wanted to keep the file called keep.me:
import glob
path_list = glob.glob('your/path/name/'+'*')
del_list = [x for x in path_list if x != 'your/path/name/'+'keep.me']
Something like this should work.
Try it out!

rmtree does not appear to have any kind of filtering mechanism that you could use; further, since part of its functionality is to remove the directory itself, and not just its contents, it wouldn't make sense to.
If you could do something to the file so that rmtree's attempt to delete it fails, you can have rmtree ignore such errors, thus leaving your file but deleting the others. If you cannot, you could resort to os.walk to loop over the contents of your directory, and thus decide which items to remove for yourself.

is it just one file or a ton of files?
Becouse if its just this one file you could just right click - properties - sercurity and check 'deny' everywhere
http://www.technorms.com/27407/prevent-files-from-being-deleted-windows

Related

Why does my loop work correctly on the first iteration but not on the full set I'm looping through?

I am trying to rename folders in bulk based on the folders/files contained within them (then moving the image files at the end of each path into they're respective model/color folder directories).
Each folder/file has a similar naming convention of MODEL_COLOR.
The code below works, though seems to only be working correctly on the first folder, in other words, the folders are being renamed correctly but the last leg of code seems to be taking the folder which contains the images and moves it to the corresponding path, instead of specifically moving the images to the corresponding path and dropping the folder they're originally in.
On the first folder the loop iterates, it actually moves the images to the correct Model > Color directory, though on all folders after that it seems to be moving the folder containing the images into the correct Model > Color directory, instead of just moving the images alone into the corresponding directory.
After looking at the forums I've seen similar issues where when changing the directory or deleting certain instances, the loop can't iterate correctly due to the initial set changing during the looping process (i.e. deleting or renaming part of the path while iterating). I'm pretty sure it's a simple fix but I can't seem to find the solution that'll work best.
Standard FolderNames:
CL4003IN_45F
CL4003IN_56F
CL40157U_01D
CL40157U_52H
import glob, os, shutil
folder = 'C:\\testing'
# create new folder directory based on Model/Color [is working, but moves file_path into base directory]
# arr = []
for file_path in glob.glob(os.path.join(folder, '*_*')):
new_dir = file_path.rpartition('_')[0]
new_subdir = file_path.rpartition('_')[2]
try:
os.mkdir(os.path.join(new_dir, new_subdir))
except WindowsError:
# Handle the case where the target dir already exist.
pass
shutil.move(file_path, os.path.join(new_dir, new_subdir))
# arr.append(file_path)
Completing the iteration of glob before the loop by storing it in a list helped avoid some unwanted errors.
#...
for file_path in list(glob.glob(os.path.join(folder, '*_*')
...#
But by modifying my code and removing the following from the loop:
try:
os.mkdir(os.path.join(new_dir, new_subdir))
except WindowsError:
pass
Allowed the code to iterate through all the folders in the directory without transferring the folder before the file into the new_dir > new_subdir directory.
The new code that works across a multitude of folders within a directory is:
import glob, os, shutil
folder = 'C:\\testing'
# create new folder directory based on Model > Color
for file_path in list(glob.glob(os.path.join(folder, '*_*'), recursive=True)):
new_dir = file_path.rpartition('_')[0]
new_subdir = file_path.rpartition('_')[2]
shutil.move(file_path, os.path.join(new_dir, new_subdir))
This may not be the most efficient code (and may not work across all instances, that remains to be determined!), though definitely works as intended for now.
Special thanks to those that helped with their suggestions.
The reason why you don't see the actual error is that you catch too many errors in your except: statement.
You intend to catch FileExistsError, so you should also just look for this one. Otherwise you would have noticed that the code in fact throws a FileNotFoundError.
The reason for that is that os.mkdir does not automatically create parent directories. It only creates directories one layer deep, but your code requires two layers of new directories.
For that, you would have to use os.makedirs(...) instead. Conveniently, os.makedirs also accepts an exist_ok flag, which gets rid of your entire try:-except: construct.
A further annotation is that you have a lot of duplicate calculations in your code:
file_path.rpartition('_') gets calculated twice
os.path.join(new_dir, new_subdir) gets calculated twice
I'd suggest storing those in meaningful variables. This speeds up your code, makes it more readable and more maintainable.
Here's a reworked version of your code:
import glob
import os
import shutil
folder = 'C:\\testing'
for file_path in glob.glob(os.path.join(folder, '*_*')):
(new_dir, _, new_subdir) = file_path.rpartition('_')
target_path = os.path.join(new_dir, new_subdir)
os.makedirs(target_path, exist_ok=True)
shutil.move(file_path, target_path)
Further improvements/fixes
There's still a bunch wrong in your code:
You don't check if the thing found by glob is a file
Splitting by _ does not stop at the directory divider. This means that something like C:\\my_files\\bla will get split into C:\\my and files\\bla.
I think you didn't care about either of those, because you thought 'the user would not use the script like this'. But this is actually a case that will happen:
You have a file C:\\my_files\\CL4003IN_45F, which will get moved to C:\\my_files\\CL4003IN\\45F\\CL4003IN_45F, as expected.
You run the script again. The script will find C:\\my_files\\CL4003IN. It doesn't check if it's a folder, so it will process it anyway. It will then split it into C:\\my and files\\CL4003IN.
The entire folder C:\\my_files\\CL4003IN will get moved to C:\\my\\files\\CL4003IN. Therefore the original file CL4003IN_45F ends up in C:\\my\\files\\CL4003IN\\CL4003IN\\45F\\CL4003IN_45F
The solution is:
only use rpartition on the filename, not the entire path
check if it actually is a file, or a directory
Most of these tasks get solved easier with pathlib. I took the liberty of rewriting your code and fixing those issues:
from pathlib import Path
folder = Path('C:\\testing')
for file_path in folder.iterdir():
file_name = file_path.name
# Ignore directories
if not file_path.is_file():
continue
# Split the filename by '_'
(target_dir, _, target_subdir) = file_name.rpartition('_')
# If no '_' found, ignore
if not target_dir:
continue
# Compute the target path and create it if necessary
target_path = folder / target_dir / target_subdir
target_path.mkdir(parents=True, exist_ok=True)
# Move the file to its new position
target_file_path = target_path / file_name
file_path.rename(target_file_path)
One last remark: folder.iterdir() actually does return an iterator. But that shouldn't be a problem in this case, as we explicitely check if the path is an existing file and not a directory or something that already got deleted. But if you want to be 100% safe write list(folder.iterdir()).

Duplicate in list created from filenames (python)

I'm trying to create a list of excel files that are saved to a specific directory, but I'm having an issue where when the list is generated it creates a duplicate entry for one of the file names (I am absolutely certain there is not actually a duplicate of the file).
import glob
# get data file names
path =r'D:\larvalSchooling\data'
filenames = glob.glob(path + "/*.xlsx")
output:
>>> filenames
['D:\\larvalSchooling\\data\\copy.xlsx', 'D:\\larvalSchooling\\data\\Raw data-SF_Fri_70dpf_GroupABC_n5_20200828_1140-Trial 1.xlsx', 'D:\\larvalSchooling\\data\\Raw data-SF_Sat_70dpf_GroupA_n5_20200808_1015-Trial 1.xlsx', 'D:\\larvalSchooling\\data\\Raw data-SF_Sat_84dpf_GroupABCD_n5_20200822_1440-Trial 1.xlsx', 'D:\\larvalSchooling\\data\\~$Raw data-SF_Fri_70dpf_GroupABC_n5_20200828_1140-Trial 1.xlsx']
you'll note 'D:\larvalSchooling\data\Raw data-SF_Fri_70dpf_GroupABC_n5_20200828_1140-Trial 1.xlsx' is listed twice.
Rather than going through after the fact and removing duplicates I was hoping to figure out why it's happening to begin with.
I'm using python 3.7 on windows 10 pro
If you wrote the code to remove duplicates (which can be as simple as filenames = set(filenames)) you'd see that you still have two filenames. Print them out one on top of the other to make a visual comparison easier:
'D:\\larvalSchooling\\data\\Raw data-SF_Sat_84dpf_GroupABCD_n5_20200822_1440-Trial 1.xlsx',
'D:\\larvalSchooling\\data\\~$Raw data-SF_Fri_70dpf_GroupABC_n5_20200828_1140-Trial 1.xlsx'
The second one has a leading ~ (probably an auto-backup).
Whenever you open an excel file it will create a ghost copy that works as a temporary backup copy for that specific file. In this case:
Raw data-SF_Fri_70dpf_GroupABC_n5_20200828_1140-Trial1.xlsx
~$ Raw data-SF_Fri_70dpf_GroupABC_n5_20200828_1140-Trial1.xlsx
This means that the file is open by some software and it's showing you that backup inside(usually that file is hidden from the explorer as well)
Just search for the program and close it. Other actions, such as adding validation so the "~$.*.xlsx" type of file is ignored should be also implemented if this is something you want to avoid.
You can use os.path.splittext to get the file extension and loop through the directory using os.listdir . The open excel files can be skipped using the following code:
filenames = []
for file in os.listdir('D:\larvalSchooling\data'):
filename, file_extension = os.path.splitext(file)
if file_extension == '.xlsx':
if not file.startswith('~$'):
filenames.append(file)
Note: this might not be the best solution, but it'll get the job done :)

Get absolute file path list and ignore dot directories/files python

How to get Absolute file path within a specified directory and ignore dot(.) directories and dot(.)files
I have below solution, which will provide a full path within the directory recursively,
Help me with the fastest way of list files with full path and ignore .directories/ and .files to list
(Directory may contain 100 to 500 millions files )
import os
def absoluteFilePath(directory):
for dirpath,_,filenames in os.walk(directory):
for f in filenames:
yield os.path.abspath(os.path.join(dirpath, f))
for files in absoluteFilePath("/my-huge-files"):
#use some start with dot logic ? or any better solution
Example:
/my-huge-files/project1/file{1..100} # Consider all files from file1 to 100
/my-huge-files/.project1/file{1..100} # ignore .project1 directory and its files (Do not need any files under .(dot) directories)
/my-huge-files/project1/.file1000 # ignore .file1000, it is starts with dot
os.walk by definition visits every file in a hierarchy, but you can select which ones you actually print with a simple textual filter.
for file in absoluteFilePath("/my-huge-files"):
if '/.' not in file:
print(file)
When your starting path is already absolute, calling os.path.abspath on it is redundant, but I guess in the great scheme of things, you can just leave it in.
Don't use os.walk() as it will visit every file
Instead, fall back to .scandir() or .listdir() and write your own implementation
You can use pathlib.Path(test_path).expanduser().resolve() to fully expand a path
import os
from pathlib import Path
def walk_ignore(search_root, ignore_prefixes=(".",)):
""" recursively walk directories, ignoring files with some prefix
pass search_root as an absolute directory to get absolute results
"""
for dir_entry in os.scandir(Path(search_root)):
if dir_entry.name.startswith(ignore_prefixes):
continue
if dir_entry.is_dir():
yield from walk_ignore(dir_entry, ignore_prefixes=ignore_prefixes)
else:
yield Path(dir_entry)
You may be able to save some overhead with a closure, coercing to Path once, yielding only .name, etc., but that's really up to your needs
Also not to your question, but related to it; if the files are very small, you'll likely find that packing them together (several files in one) or tuning the filesystem block size will see tremendously better performance
Finally, some filesystems come with bizarre caveats specific to them and you can likely break this with oddities like symlink loops

Finding File Path in Python

I'd like to find the full path of any given file, but when I tried to use
os.path.abspath("file")
it would only give me the file location as being in the directory where the program is running. Does anyone know why this is or how I can get the true path of the file?
What you are looking to accomplish here is ultimately a search on your filesystem. This does not work out too well, because it is extremely likely you might have multiple files of the same name, so you aren't going to know with certainty whether the first match you get is in fact the file that you want.
I will give you an example of how you can start yourself off with something simple that will allow you traverse through directories to be able to search.
You will have to give some kind of base path to be able to initiate the search that has to be made for the path where this file resides. Keep in mind that the more broad you are, the more expensive your searching is going to be.
You can do this with the os.walk method.
Here is a simple example of using os.walk. What this does is collect all your file paths with matching filenames
Using os.walk
from os import walk
from os.path import join
d = 'some_file.txt'
paths = []
for i in walk('/some/base_path'):
if d in i[2]:
paths.append(join(i[0], d))
So, for each iteration over os.walk you are going to get a tuple that holds:
(path, directories, files)
So that is why I am checking against location i[2] to look at files. Then I join with i[0], which is the path, to put together the full filepath name.
Finally, you can actually put the above code all in to one line and do:
paths = [join(i[0], d) for i in walk('/some/base_path') if d in i[2]]

file renaming using python

I am trying to rename a file in which it is auto-generated by some local modules but I was wondering if using os.listdir is the only way for me to filter/ narrow down this file.
This file will always be generated before it is removed and the code will generate the next one (still in the same directory) based on the next item in list.
Basically, whenever this file is generated, it comes in the following file path:
/user_data/.tmp/tempLibFiles/2015_03_16_192212_182096.con
I had only wanted to rename the 2015_03_16_192212_182096 into connectionFile while keeping the rest the same.
You can also use the glob module to narrow down the list of files to the one that matches a particular pattern. For example:
import glob
files = glob.glob('/user_data/.tmp/tempLibFiles/*.con')

Categories

Resources