Removing/Copying multiple files with Python - python

I use os.remove() for deleting a file, and shutil.copyfile() for copying a file. Sometimes I need to remove/copy all the files in a directory, and I use the following code.
files = glob.glob(os.path.join(profilerPath + "/*.*"))
for f in files:
os.remove(f)
It works fine, but I'd like to ask if you have better code for doing the same thing.

What about shutil.copytree() and shutil.rmtree()? They copy/delete recursivly, i.e. everything below a given path.
If you want to copy/delete files only, without traversing into subdirectories, your current solution is fine (though you should check if each file indeed is a file and not a directory -- directory names could also match the pattern *.*).

Related

Traversing a file-system directory structure

I am trying to do some work within some files in a directory. The basic structure of what I'm trying to work with is folder -> sub-folders -> files I need to access. data holds hundreds of subfolders, I am trying to access each one, find the file within them that ends in 'params', and for now just read the contents. My code is below:
import os
for sub_folder in os.scandir('data'):
os.chdir(sub_folder)
for file in os.scandir(sub_folder):
print(file.name)
if(file.name.endswith('params')):
with open(file.name, 'r') as f:
data = f.read()
I'm getting a FileNotFoundError, where it's telling me that the path 'data\\\run.0' doesn't exist. I have confirmed that 'run.0' is the first sub folder within data, so where I'm confused is how the path doesn't actually exist.
I know the error is happening when I attempt to change directories, so I'm suspecting the way that I am traversing the data folder is not a correct way of doing so. I understand that os.scandir gives a DirEntry object, which is what the variable sub_folder will be but is this not a valid input for the change directory function?
You can use os.walk, but I prefer use glob: See How to use Glob() function to find files recursively in Python?

Different File Paths in Python ZipFile Depending on .write() vs .writestr()

I just wanted to ask quickly if the behavior I'm seeing in Python's zipfile module is expected... I wanted to put together a zip archive. For reasons I don't think I need to get into, I was adding some files using zipfile.writestr() and others using .write(). I was writing some files to zip subdirectory called /scripts and others to a zip subdirectory called /data.
For /data, I originally did this:
for root, _, filenames in os.walk(tmpdirname):
for root_name in filenames:
print(f"Handle zip of {root_name}")
name = os.path.join(root, root_name)
name = os.path.normpath(name)
zipFile.write(name, f'/data/{root_name}')
This worked fine and produced a working archive that I could extract. So far, so good. To write text files to the /script subdirectory, I used:
zipFile.writestr(f'/script/{scriptname}', fileBytes)
Again, so far so good.
Now it gets odd... I wanted to extract files in /data/. So I looked for paths in zipFile.namelist() starting with /data. My code kept missing the files in /data/, however. Doing some more digging, I noticed that the files written using .writestr had a slash at the start of the zipfile path like this: "/scripts/myscript.py". The files written using .write did not have a slash at the start of the path, so the data file paths looked like this: "data/mydata.pickle".
I changed my code to use .writestr() for the data files:
for root, _, filenames in os.walk(tmpdirname):
for root_name in filenames:
print(f"Handle zip of {root_name}")
name = os.path.join(root, root_name)
name = os.path.normpath(name)
with open(name, mode='rb') as extracted_file:
zipFile.writestr(f'/data/{root_name}', extracted_file.read())
Voila, the data files now have slashes at the start of the path. I'm not sure why, however, as I'm providing the same file path either way, and I wouldn't expect using one method versus another would change the paths.
Is this supposed to work this way? Am I missing something obvious here?

Check if there are .format files in a directory

I have been trying to figure out for a while how to check if there are .pkl files in a given directory. I checked the website and I could find ways to find if there are files in the directory and list them, but I just want to check if they are there.
In my directory are a total of 7 .pkl files, as soon as I create one, the others are created so to check if the seven of them exist, it will be enough to check if one exists. Therefore, I would like to check if there is any .pkl file.
This is working if I do:
os.path.exists('folder1/folder2/filename.pkl')
But I had to write one of my file names. I would like to do so without searching for a specific file. I also tried
os.path.exists('folder1/folder2/*.pkl'),
but it is not working neither as I don't have any file named *.pkl.
You can use the python module glob (https://docs.python.org/3/library/glob.html)
Specifically, glob.glob('folder1/folder2/*.pkl') will return a list of all .pkl files in folder2.
You can use :
for dir_path, dir_names, file_names in os.walk(search_dir):
# Go over all files and folders
for file_name in file_names:
if (file_name.endswith(".pkl")):
# do something like break after the first one you find
Note : This can be used if you want to search entire directory with sub directories also
In case you want to search only one directory , you can run the "for" on os.listdir(path)

Incorrect file reading when using os.walk in python3

I am crawling through folders using the os.walk() method. In one of the folders, there is a large number of files, around 100,000 of them. The files look like: p_123_456.zip. But they are read as p123456.zip. Indeed, when I open windows explorer to browse the folder, for the first several seconds the files look like p123456.zip, but then change their appearance to p_123_456.zip. This is a strange scenario.
Now, I can't use time.sleep() because all folders and and files are being read into python variables in the looping line. Here is a snippet of the code:
for root, dirs, files in os.walk(srcFolder):
os.chdir(root)
for file in files:
shutil.copy(file, storeFolder)
In the last line, I get a file not found exception, saying that the file p123456.zip does not exist. Has anyone run into this mysterious issue? Anyway to bypass this? What is the cause of this? Thank you.
You don't seem to be concatenating the actual folder name with the filenames. Try changing your code to:
for root, dirs, files in os.walk(srcFolder):
for file in files:
shutil.copy(os.path.join(root, file), storeFolder)
os.chdir should be avoided like the plague. For one thing - if the changes suceeeds, it won't be the directory from which you are running your os.walk anymore - and then, a second chdir on another folder will fail (either stop your porgram or change you to an unexpected folder).
Just add the folder name as prefixes, and don't try using chdir.
Moreover, as for the comment from ShadowRanger above, os.walk officially breaks if you chdir inside its iteration - https://docs.python.org/3/library/os.html#os.walk - that is likely the root of the problem you had.

Python's os.walk() fails in Windows when there are long filenames

I use python os.walk() to get files and dirs in some directories, but there're files whose names are too long(>300), os.walk() return nothing, use onerror I get '[Error 234] More data is available'. I tried to use yield, but also get nothing and shows 'Traceback: StopIteration'.
OS is windows, code is simple. I have tested with a directory, if there's long-name file, problem occur, while if rename the long-name files with short names, code can get correct result.
I can do nothing for these directories, such as rename or move the long-name files.
Please help me to solve the problem!
def t(a):
for root,dirs,files in os.walk(a):
print root,dirs,files
t('c:/test/1')
In Windows file names (including path) can not be greater than 255 characters, so the error you're seeing comes from Windows, not from Python - because somehow you managed to create such big file names, but now you can't read them. See this post for more details.
The only workaround I can think of is to map the the folder to the specific directory. This will make the path way shorter. e.g. z:\myfile.xlsx instead of c:\a\b\c\d\e\f\g\myfile.xlsx

Categories

Resources