Excluding some files from an os.walk()

Excluding some files from an os.walk() - python

In Python 3, I want to run the script below to rename the files in all subdirectories of the working directory of the script so that their parent folder names are appended to their filenames. But this script processes also the .DS_Store files in the directories, as well as the .py script file itself. How can I leave those untouched?
import os
for root, dirs, files in os.walk("."):
if not files:
continue
prefix = os.path.basename(root)
for f in files:
print(prefix)
os.rename(os.path.join(root, f), os.path.join(root, "{}_{}".format(prefix, f)))

You can also use list comprehensions to achieve this, which might make it a lil shorter:
for f in [i for i in files if not (i.startswith(".") or i.endswith(".py"))]:
print(prefix)
os.rename(os.path.join(root, f), os.path.join(root, "{}_{}".format(prefix, f)))

You should be able to check if the f variable starts with '.' or ends with '.py'.
So something like:
f.startswith('.')
And then you can compare the extension with something like this:
name, extension = os.path.splitext(f)
if extension in extensions_to_ignore:
continue

Related

Using zipfile to archive directory contents while skipping files from list

I'm using zipfile to create an archive of all files in a directory (recursively, while preserving directory structure including empty folders) and want the process to skip the filenames specified in a list.
This is the basic function that os.walks through a directory and adds all the containing files and directories to an archive.
def zip_dir(path):
zipname = str(path.rsplit('/')[-1]) + '.zip'
with zipfile.ZipFile(zipname, 'w', zipfile.ZIP_DEFLATED) as zf:
if os.path.isdir(path):
for root, dirs, files in os.walk(path):
for file_or_dir in files + dirs:
zf.write(os.path.join(root, file_or_dir),
os.path.relpath(os.path.join(root, file_or_dir),
os.path.join(path, os.path.pardir)))
elif os.path.isfile(filepath):
zf.write(os.path.basename(filepath))
zf.printdir()
zf.close()
We can see the code should also have the ability to handle single files but it is mainly the part concerning directories that we are interested in.
Now let's say we have a list of filenames that we want to exclude from being added to the zip archive.
skiplist = ['.DS_Store', 'tempfile.tmp']
What is the best and cleanest way to achieve this?
I tried using zip which was somewhat successful but causes it to exclude empty folders for some reason (empty folders should be included). I'm not sure why this happens.
skiplist = ['.DS_Store', 'tempfile.tmp']
for root, dirs, files in os.walk(path):
for (file_or_dir, skipname) in zip(files + dirs, skiplist):
if skipname not in file_or_dir:
zf.write(os.path.join(root, file_or_dir),
os.path.relpath(os.path.join(root, file_or_dir),
os.path.join(path, os.path.pardir)))
It would also be interesting to see if anyone has a clever idea for adding the ability to skip specific file extensions, perhaps something like .endswith('.png') but I'm not entirely sure of how to incorporate it together with the existing skiplist.
I would also appreciate any other general comments regarding the function and if it indeed works as expected without surprises, as well as any suggestions for optimizations or improvements.

You can simply check if the file is not in skiplist:
skiplist = {'.DS_Store', 'tempfile.tmp'}
for root, dirs, files in os.walk(path):
for file in files + dirs:
if file not in skiplist:
zf.write(os.path.join(root, file),
os.path.relpath(os.path.join(root, file),
os.path.join(path, os.path.pardir)))
This will ensure that files in skiplist won't be added to the archive.
Another optimization is to make skiplist a set, just in case it gets very large, and you want constant time O(1) lookup instead of linear O(N) lookup from using a list.
You can research this more at TimeComplexity, which shows the time complexities of various Python operations on data structures.
As for extensions, you can use os.path.splitext() to extract the extension and use the same logic as above:
from os.path import splitext
extensions = {'.png', '.txt'}
for root, dirs, files in os.walk(path):
for file in files:
_, extension = splitext(file)
if extension not in extensions:
zf.write(os.path.join(root, file),
os.path.relpath(os.path.join(root, file),
os.path.join(path, os.path.pardir)))
If you want to combine the above features, then you can handle the logic for files and directories separately:
from os.path import splitext
extensions = {'.png', '.txt'}
skiplist = {'.DS_Store', 'tempfile.tmp'}
for root, dirs, files in os.walk(path):
for file in files:
_, extension = splitext(file)
if file not in skiplist and extension not in extensions:
zf.write(os.path.join(root, file),
os.path.relpath(os.path.join(root, file),
os.path.join(path, os.path.pardir)))
for directory in dirs:
if directory not in skiplist:
zf.write(os.path.join(root, directory),
os.path.relpath(os.path.join(root, directory),
os.path.join(path, os.path.pardir)))
Note: The above code snippets won't work by themselves, and you will need to weave in your current code to use these ideas.

How to delete all files inside a main folder with many subfolders?

I want to delete only the files, not the folder and subfolders?
Tried this but I dont want to give examples of characters in a condition.
for i in glob('path'+ '**/*',recursive = True):
if '.' in i:
os.remove(i)
I don't like this because some folder names have '.' in the name. Also there are many types of files there so making a list and check those in a list would not be efficient. What ways do you suggest?

You can use os.walk:
import os
for root, _, files in os.walk('path'):
for file in files:
os.remove(os.path.join(root, file))

Try something like that:
def get_file_paths(folder_path):
paths = []
for root, directories, filenames in os.walk(folder_path):
for filename in filenames:
paths.append(os.path.join(root, filename))
return paths

Ignore files from specific directory using glob or by os.walk()

I want to exclude the directory 'dir3_txt' so that I can only capture the files('.txt') from other directory . I tried to exclude directory like below but not able to figure it out how to get all the files having .txt as ext other that having it in dir3_txt using below:
for root, dirs, files in os.walk('.'):
print (root)
dirs[:] = [d for d in dirs if not d.startswith('dir3')]
for file in files:
print (os.path.join(root, file))
I am thinking of glob (got below from stack itself) but not sure how to tweak glob to use it.
for file in os.walk('.'):
for txt in glob(os.path.join(files[0], '*.txt')):
print(txt)
I went through Excluding directories in os.walk but the solution provided is not helping me, also it only tells about skipping directory that also is not helpful as I need to get files from other directories , better if we can do it with glob only?

A simple solution would just to do a string comparison against the directory-paths and files returned by os.walk:
for root, dirs, files in os.walk('.'):
if "/dir3_txt/" not in root:
for file in files:
if file.endswith(".txt"):
print (os.path.join(root, file))

for root, dirs, files in os.walk('.'):
print (root)
dirs[:]= [d for d in dirs if d[:4]!='dir3']
for file in files:
if file[-4:]=='.txt':
print (os.path.join(root, file))
I dont have any system with me now to test this , so if any problems please comment.
Edit:
Now it only detects '.txt' files.

iterate a directory and find only file whose names start with the certain string

I have a directory path and in this path there are several folders. So i am am trying to build a script which would find all the xml files and the file name must start with report. I have been so far able to iterate over all the directories but further i do not know how to proceed. Here is my code:
def search_xml_report(rootdir):
for subdir, dirs, files in os.walk(rootdir):
for file in files:
print os.path.join(subdir,file) # print statement just for testing

You can use str.startswith:
def search_xml_report(rootdir):
for subdir, dirs, files in os.walk(rootdir):
for file in files:
if file.startswith('report'):
yield subdir, file

use str.startswith with os.path.splitext
os.path.splitext: Split the extension from a pathname. Extension is everything from the last dot to the end, ignoring leading dots. Returns "(root, ext)"; ext may be empty.
if file.startswith('report') and os.path.splitext(filepath+filename)[-1] == '.xml':
return file

In python, how to get the path to all the files in a directory, including files in subdirectories, but excluding path to subdirectories

I have a directory containing folders and subfolders. At the end of each path there are files. I want to make a txt file containing the path to all the files, but excluding the path to folders.
I tried this suggestion from Getting a list of all subdirectories in the current directory, and my code looks like this:
import os
myDir = '/path/somewhere'
print [x[0] for x in os.walk(myDir)]
And it gives the path of all elements (files AND folders), but I want only the paths to the files. Any ideas for it?

os.walk(path) returns three tuples parent folder, sub directories and files.
so you can do like this:
for dir, subdir, files in os.walk(path):
for file in files:
print os.path.join(dir, file)

The os.walk method gives you dirs, subdirs and files in each iteration, so when you are looping through os.walk, you will have to then iterate over the files and combine each file with "dir".
In order to perform this combination, what you want to do is do an os.path.join between the directory and the files.
Here is a simple example to help illustrate how traversing with os.walk works
from os import walk
from os.path import join
# specify in your loop in order dir, subdirectory, file for each level
for dir, subdir, files in walk('path'):
# iterate over each file
for file in files:
# join will put together the directory and the file
print(join(dir, file))

If you just want the paths, then add a filter to your list comprehension as follows:
import os
myDir = '/path/somewhere'
print [dirpath for dirpath, dirnames, filenames in os.walk(myDir) if filenames]
This would then only add the path for folders which contain files.

def get_paths(path, depth=None):
for name in os.listdir(path):
full_path = os.path.join(path, name)
if os.path.isfile(full_path):
yield full_path
else:
d = depth - 1 if depth is not None else None
if d is None or d >= 0:
for sub_path in get_paths(full_path):
yield sub_path

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Excluding some files from an os.walk() - python

You can also use list comprehensions to achieve this, which might make it a lil shorter: for f in [i for i in files if not (i.startswith(".") or i.endswith(".py"))]: print(prefix) os.rename(os.path.join(root, f), os.path.join(root, "{}_{}".format(prefix, f)))

You should be able to check if the f variable starts with '.' or ends with '.py'. So something like: f.startswith('.') And then you can compare the extension with something like this: name, extension = os.path.splitext(f) if extension in extensions_to_ignore: continue

Related

Using zipfile to archive directory contents while skipping files from list

How to delete all files inside a main folder with many subfolders?

Ignore files from specific directory using glob or by os.walk()

iterate a directory and find only file whose names start with the certain string

In python, how to get the path to all the files in a directory, including files in subdirectories, but excluding path to subdirectories

Categories

Resources