Python: search multiple directories and grab newest file, deleting others - python

New to python and would appreciate a little help.
I would like to go through 10 directories and copy the newest file from each directory back into a single folder. There may be multiple files in each directory.
I can pull a complete listing from each directory, not sure how to narrow this down. Any direction would be appreciated.
inside the STATES directory will be directories for each state (i.e. CA, NY, FL, MI, GA)
**Edited if it is helpful, the directory structure looks like this:
'/dat/users/states/CA/'
'/dat/users/states/NY/'
'/dat/users/states/MI/'
import glob
import os
data_dir = '/dat/users/states/*/'
file_dir_extension = os.path.join(data_dir, '*.csv')
for file_name in glob.glob(file_dir_extension):
if file_name.endswith('.csv'):
print (file_name)

You can use os.walk() instead of glob.glob() to traverse all of your folders. For each folder you get a list of the filename in it. This can be sorted by date using os.path.getmtime(). This will result in the newest file being at the start of the list.
Pop the first element off the list and copy this to your target folder. The remaining elements in the list could then be deleted using os.remove() as follows:
import os
import shutil
root = r'/src/folder/'
copy_to = r'/copy to/folder'
for dirpath, dirnames, filenames in os.walk(root):
# Filter only csv files
files = [file for file in filenames if os.path.splitext(file)[1].lower() == '.csv']
# Sort list by file date
files = sorted(files, key=lambda x: os.path.getmtime(os.path.join(dirpath, x)), reverse=True)
if files:
# Copy the newest file
copy_me = files.pop(0)
print("Copying '{}'".format(copy_me))
shutil.copyfile(os.path.join(dirpath, copy_me), os.path.join(copy_to, copy_me))
# Remove the remaining files
for file in files:
src = os.path.join(dirpath, file)
print("Removing '{}'".format(src))
#os.remove(src)
os.path.join() is used to safely join a path and filename together.
Note: If it is supported on your system, you might need to use something like:
os.stat(os.path.join(dirpath, x)).st_birthtime
to sort based on the creation date/time.

Related

Python loop through directories

I am trying to use python library os to loop through all my subdirectories in the root directory, and target specific file name and rename them.
Just to make it clear this is my tree structure
My python file is located at the root level.
What I am trying to do, is to target the directory 942ba loop through all the sub directories and locate the file 000000 and rename it to 000000.csv
the current code I have is as follow:
import os
root = '<path-to-dir>/942ba956-8967-4bec-9540-fbd97441d17f/'
for dirs, subdirs, files in os.walk(root):
for f in files:
print(dirs)
if f == '000000':
dirs = dirs.strip(root)
f_new = f + '.csv'
os.rename(os.path.join(r'{}'.format(dirs), f), os.path.join(r'{}'.format(dirs), f_new))
But this is not working, because when I run my code, for some reasons the code strips the date from the subduers
can anyone help me to understand how to solve this issue?
A more efficient way to iterate through the folders and only select the files you are looking for is below:
source_folder = '<path-to-dir>/942ba956-8967-4bec-9540-fbd97441d17f/'
files = [os.path.normpath(os.path.join(root,f)) for root,dirs,files in os.walk(source_folder) for f in files if '000000' in f and not f.endswith('.gz')]
for file in files:
os.rename(f, f"{f}.csv")
The list comprehension stores the full path to the files you are looking for. You can change the condition inside the comprehension to anything you need. I use this code snippet a lot to find just images of certain type, or remove unwanted files from the selected files.
In the for loop, files are renamed adding the .csv extension.
I would use glob to find the files.
import os, glob
zdir = '942ba956-8967-4bec-9540-fbd97441d17f'
files = glob.glob('*{}/000000'.format(zdir))
for fly in files:
os.rename(fly, '{}.csv'.format(fly))

I want to add all the names of the files in a specific folder to a list

I want to add the names of all the files in a specific folder to a list how can i do that? the pathway is from dropbox -> a folder called 'UMM' -> a folder called '2018' could someone help me with the code on this. I have tried using os.walk() but it doesn't seem to work
You can use os.walk and append only names which are in files.
from os import walk
file_names = list()
path = 'path/of/folder'
for root, dirc, files in walk(path):
for FileName in files:
file_names.append(FileName)
print(file_names)
This will append all the files name from all the directories and sub-directories of the specified path.
this will create a list of the files in a folder
from os import listdir
# the path
path = ''
fileList = listdir(path)

Finding correct path to files in subfolders with os.walk with python?

I am trying to create a program that copies files with certain file extension to the given folder. When files are located in subfolders instead of the root folder the program fails to get correct path. In its current state the program works perfectly for the files in the root folder, but it crashes when it finds matching items in subfolders. The program tries to use rootfolder as directory instead of the correct subfolder.
My code is as follows
# Selective_copy.py walks through file tree and copies files with
# certain extension to give folder
import shutil
import os
import re
# Deciding the folders and extensions to be targeted
# TODO: user input instead of static values
extension = "zip"
source_folder = "/Users/viliheikkila/documents/kooditreeni/"
destination_folder = "/Users/viliheikkila/documents/test"
def Selective_copy(source_folder):
# create regex to identify file extensions
mo = re.compile(r"(\w+).(\w+)") # Group(2) represents the file extension
for dirpath, dirnames, filenames in os.walk(source_folder):
for i in filenames:
if mo.search(i).group(2) == extension:
file_path = os.path.abspath(i)
print("Copying from " + file_path + " to " + destination_folder)
shutil.copy(file_path, destination_folder)
Selective_copy(source_folder)
dirpath is one of the things provided by walk for a reason: it gives the path to the directory that the items in files is located in. You can use that to determine the subfolder you should be using.
file_path = os.path.abspath(i)
This line is blatantly wrong.
Keep in mind that filenames keeps list of base file names. At this point it's just a list of strings and (technically) they are not associated at all with files in filesystem.
os.path.abspath does string-only operations and attempts to merge file name with current working dir. As a result, merged filename points to file that does not exist.
What should be done is merge between root and base file name (both values yield from os.walk):
file_path = os.path.abspath(dirpath, i)

In python, how to get the path to all the files in a directory, including files in subdirectories, but excluding path to subdirectories

I have a directory containing folders and subfolders. At the end of each path there are files. I want to make a txt file containing the path to all the files, but excluding the path to folders.
I tried this suggestion from Getting a list of all subdirectories in the current directory, and my code looks like this:
import os
myDir = '/path/somewhere'
print [x[0] for x in os.walk(myDir)]
And it gives the path of all elements (files AND folders), but I want only the paths to the files. Any ideas for it?
os.walk(path) returns three tuples parent folder, sub directories and files.
so you can do like this:
for dir, subdir, files in os.walk(path):
for file in files:
print os.path.join(dir, file)
The os.walk method gives you dirs, subdirs and files in each iteration, so when you are looping through os.walk, you will have to then iterate over the files and combine each file with "dir".
In order to perform this combination, what you want to do is do an os.path.join between the directory and the files.
Here is a simple example to help illustrate how traversing with os.walk works
from os import walk
from os.path import join
# specify in your loop in order dir, subdirectory, file for each level
for dir, subdir, files in walk('path'):
# iterate over each file
for file in files:
# join will put together the directory and the file
print(join(dir, file))
If you just want the paths, then add a filter to your list comprehension as follows:
import os
myDir = '/path/somewhere'
print [dirpath for dirpath, dirnames, filenames in os.walk(myDir) if filenames]
This would then only add the path for folders which contain files.
def get_paths(path, depth=None):
for name in os.listdir(path):
full_path = os.path.join(path, name)
if os.path.isfile(full_path):
yield full_path
else:
d = depth - 1 if depth is not None else None
if d is None or d >= 0:
for sub_path in get_paths(full_path):
yield sub_path

Copy all files with certain extension, while maintaining directory tree

My problem:
Traverse a directory, and find all the header files, *.h.
Copy all of these files to another location, but maintain the directory tree
What I've tried:
I've been able to gather all the headers using the os library
for root, dirs, files in os.walk(r'D:\folder\build'):
for f in files:
if f.endswith('.h'):
print os.path.join(root, f)
This correctly prints:
D:\folder\build\a.h
D:\folder\build\b.h
D:\folder\build\subfolder\c.h
D:\folder\build\subfolder\d.h
Where I'm stuck:
With a list of full file paths, how can I copy these files to another location, while maintaining the sub directories? In the above example, I'd want to maintain the directory structure below \build\
For example, I'd want the copy to create the following:
D:\other\subfolder\build\a.h
D:\other\subfolder\build\b.h
D:\other\subfolder\build\subfolder\c.h
D:\other\subfolder\build\subfolder\d.h
You can use shutil.copytree with a ignore callable to filter the files to copy.
"If ignore is given, it must be a callable that will receive as its arguments the directory being visited by copytree(), and a list of its contents (...). The callable must return a sequence of directory and file names relative to the current directory (i.e. a subset of the items in its second argument); these names will then be ignored in the copy process"
So for your specific case, you could write:
from os.path import join, isfile
from shutil import copytree
# ignore any files but files with '.h' extension
ignore_func = lambda d, files: [f for f in files if isfile(join(d, f)) and f[-2:] != '.h']
copytree(src_dir, dest_dir, ignore=ignore_func)
Edit: as #pchiquet shows, it can be done in a single command. Yet I will show how this problem could be approached manually.
You're gonna need three things.
You know what directory you were traversing, so to construct the destination path, you need to replace the name of the source root directory with the name of de destination root directory:
walked_directory = 'D:\folder\build'
found_file = 'D:\other\subfolder\build\a.h'
destination_directory = 'D:\other\subfolder'
destination_file = found_file.replace('walked_directory', 'destination_directory')
Now that you have a source and a destination, first you need to make sure the destination exists:
os.makedirs(os.path.dirname(destination_file))
Once it exists, you can copy the file:
shutil.copyfile(found_file, destination_file)
This will recursively copy all the files with '.h' extension from current dir to dest_dir without creating subdirectories inside dest_dir:
import glob
import shutil
from pathlib import Path
# ignore any files but files with '.h' extension
for file in glob.glob('**/*.h', recursive=True):
shutil.copyfile(
Path(file).absolute(),
(Past(dest_dir)/Path(file).name).absolute()
)

Categories

Resources