How to delete all files inside a main folder with many subfolders? - python

I want to delete only the files, not the folder and subfolders?
Tried this but I dont want to give examples of characters in a condition.
for i in glob('path'+ '**/*',recursive = True):
if '.' in i:
os.remove(i)
I don't like this because some folder names have '.' in the name. Also there are many types of files there so making a list and check those in a list would not be efficient. What ways do you suggest?

You can use os.walk:
import os
for root, _, files in os.walk('path'):
for file in files:
os.remove(os.path.join(root, file))

Try something like that:
def get_file_paths(folder_path):
paths = []
for root, directories, filenames in os.walk(folder_path):
for filename in filenames:
paths.append(os.path.join(root, filename))
return paths

Related

Iterate over files located in different folders

I’d like to write a function to iterate over excel files that are in different folders. Parts of the path of each file are the same, for instance:
C:\Main\Division\Reports\Year\Data.xls
The only part of each path that changes is ‘Year’. The files all have the same name.
Is there a way to do this with a placeholder for Year? If not, what approach should I take?
You can use os.listdir function
directory = "C:\Main\Division\Reports"
root_dir = os.path.dirname(directory)
for data in os.listdir(directory):
file_name = os.path.join(root_dir, data, 'Data.xls')
# do something
You could try os.walk
import os
parent = "C:\Main\Division\Reports"
for root, directory, files in os.walk(parent):
print root
print directory
print files

iterate a directory and find only file whose names start with the certain string

I have a directory path and in this path there are several folders. So i am am trying to build a script which would find all the xml files and the file name must start with report. I have been so far able to iterate over all the directories but further i do not know how to proceed. Here is my code:
def search_xml_report(rootdir):
for subdir, dirs, files in os.walk(rootdir):
for file in files:
print os.path.join(subdir,file) # print statement just for testing
You can use str.startswith:
def search_xml_report(rootdir):
for subdir, dirs, files in os.walk(rootdir):
for file in files:
if file.startswith('report'):
yield subdir, file
use str.startswith with os.path.splitext
os.path.splitext: Split the extension from a pathname. Extension is everything from the last dot to the end, ignoring leading dots. Returns "(root, ext)"; ext may be empty.
if file.startswith('report') and os.path.splitext(filepath+filename)[-1] == '.xml':
return file

Excluding some files from an os.walk()

In Python 3, I want to run the script below to rename the files in all subdirectories of the working directory of the script so that their parent folder names are appended to their filenames. But this script processes also the .DS_Store files in the directories, as well as the .py script file itself. How can I leave those untouched?
import os
for root, dirs, files in os.walk("."):
if not files:
continue
prefix = os.path.basename(root)
for f in files:
print(prefix)
os.rename(os.path.join(root, f), os.path.join(root, "{}_{}".format(prefix, f)))
You can also use list comprehensions to achieve this, which might make it a lil shorter:
for f in [i for i in files if not (i.startswith(".") or i.endswith(".py"))]:
print(prefix)
os.rename(os.path.join(root, f), os.path.join(root, "{}_{}".format(prefix, f)))
You should be able to check if the f variable starts with '.' or ends with '.py'.
So something like:
f.startswith('.')
And then you can compare the extension with something like this:
name, extension = os.path.splitext(f)
if extension in extensions_to_ignore:
continue

In python, how to get the path to all the files in a directory, including files in subdirectories, but excluding path to subdirectories

I have a directory containing folders and subfolders. At the end of each path there are files. I want to make a txt file containing the path to all the files, but excluding the path to folders.
I tried this suggestion from Getting a list of all subdirectories in the current directory, and my code looks like this:
import os
myDir = '/path/somewhere'
print [x[0] for x in os.walk(myDir)]
And it gives the path of all elements (files AND folders), but I want only the paths to the files. Any ideas for it?
os.walk(path) returns three tuples parent folder, sub directories and files.
so you can do like this:
for dir, subdir, files in os.walk(path):
for file in files:
print os.path.join(dir, file)
The os.walk method gives you dirs, subdirs and files in each iteration, so when you are looping through os.walk, you will have to then iterate over the files and combine each file with "dir".
In order to perform this combination, what you want to do is do an os.path.join between the directory and the files.
Here is a simple example to help illustrate how traversing with os.walk works
from os import walk
from os.path import join
# specify in your loop in order dir, subdirectory, file for each level
for dir, subdir, files in walk('path'):
# iterate over each file
for file in files:
# join will put together the directory and the file
print(join(dir, file))
If you just want the paths, then add a filter to your list comprehension as follows:
import os
myDir = '/path/somewhere'
print [dirpath for dirpath, dirnames, filenames in os.walk(myDir) if filenames]
This would then only add the path for folders which contain files.
def get_paths(path, depth=None):
for name in os.listdir(path):
full_path = os.path.join(path, name)
if os.path.isfile(full_path):
yield full_path
else:
d = depth - 1 if depth is not None else None
if d is None or d >= 0:
for sub_path in get_paths(full_path):
yield sub_path

os.walk to crawl through folder structure

I have some code that looks at a single folder and pulls out files.
but now the folder structure has changed and i need to trawl throught the folders looking for files that match.
what the old code looks like
GSB_FOLDER = r'D:\Games\Gratuitous Space Battles Beta'
def get_module_data():
module_folder = os.path.join(GSB_FOLDER, 'data', 'modules')
filenames = [os.path.join(module_folder, f) for f in
os.listdir(module_folder)]
data = [parse_file(f) for f in filenames]
return data
But now the folder structure has changed to be like this
GSB_FOLDER\data\modules
\folder1\data\modules
\folder2\data\modules
\folder3\data\modules
where folder1,2 or 3, could be any text string
how do i rewrite the code above to do this...
I have been told about os.walk but I'm just learning Python... so any help appreciated
Nothing much changes you just call os.walk and it will recursively go thru the directory and return files e.g.
for root, dirs, files in os.walk('/tmp'):
if os.path.basename(root) != 'modules':
continue
data = [parse_file(os.path.join(root,f)) for f in files]
Here I am checking files only in folders named 'modules' you can change that check to do something else, e.g. paths which have module somewhere root.find('/modules') >= 0
os.walk is a nice easy way to get the directory structure of everything inside a dir you pass it;
in your example, you could do something like this:
for dirpath, dirnames, filenames in os.walk("...GSB_FOLDER"):
#whatever you want to do with these folders
if "/data/modules/" in dirpath:
print dirpath, dirnames, filenames
try that out, should be fairly self explanatory how it works...
Created a function that kind of serves a general purpose of crawling through directory structure and returning files and/or paths that match pattern.
import os
import re
import sys
def directory_spider(input_dir, path_pattern="", file_pattern="", maxResults=500):
file_paths = []
if not os.path.exists(input_dir):
raise FileNotFoundError("Could not find path: %s"%(input_dir))
for dirpath, dirnames, filenames in os.walk(input_dir):
if re.search(path_pattern, dirpath):
file_list = [item for item in filenames if re.search(file_pattern,item)]
file_path_list = [os.path.join(dirpath, item) for item in file_list]
file_paths += file_path_list
if len(file_paths) > maxResults:
break
return file_paths[0:maxResults]
Example usages:
directory_spider('/path/to/find') --> Finds the top 500 files in the path if it exists
directory_spider('/path/to/find',path_pattern="",file_pattern=".py$", maxResults=10)
You can use os.walk like #Anurag has detailed or you can try my small pathfinder library:
data = [parse_file(f) for f in pathfinder.find(GSB_FOLDER), just_files=True]

Categories

Resources