Copy all files with certain extension, while maintaining directory tree

Copy all files with certain extension, while maintaining directory tree - python

My problem:
Traverse a directory, and find all the header files, *.h.
Copy all of these files to another location, but maintain the directory tree
What I've tried:
I've been able to gather all the headers using the os library
for root, dirs, files in os.walk(r'D:\folder\build'):
for f in files:
if f.endswith('.h'):
print os.path.join(root, f)
This correctly prints:
D:\folder\build\a.h
D:\folder\build\b.h
D:\folder\build\subfolder\c.h
D:\folder\build\subfolder\d.h
Where I'm stuck:
With a list of full file paths, how can I copy these files to another location, while maintaining the sub directories? In the above example, I'd want to maintain the directory structure below \build\
For example, I'd want the copy to create the following:
D:\other\subfolder\build\a.h
D:\other\subfolder\build\b.h
D:\other\subfolder\build\subfolder\c.h
D:\other\subfolder\build\subfolder\d.h

You can use shutil.copytree with a ignore callable to filter the files to copy.
"If ignore is given, it must be a callable that will receive as its arguments the directory being visited by copytree(), and a list of its contents (...). The callable must return a sequence of directory and file names relative to the current directory (i.e. a subset of the items in its second argument); these names will then be ignored in the copy process"
So for your specific case, you could write:
from os.path import join, isfile
from shutil import copytree
# ignore any files but files with '.h' extension
ignore_func = lambda d, files: [f for f in files if isfile(join(d, f)) and f[-2:] != '.h']
copytree(src_dir, dest_dir, ignore=ignore_func)

Edit: as #pchiquet shows, it can be done in a single command. Yet I will show how this problem could be approached manually.
You're gonna need three things.
You know what directory you were traversing, so to construct the destination path, you need to replace the name of the source root directory with the name of de destination root directory:
walked_directory = 'D:\folder\build'
found_file = 'D:\other\subfolder\build\a.h'
destination_directory = 'D:\other\subfolder'
destination_file = found_file.replace('walked_directory', 'destination_directory')
Now that you have a source and a destination, first you need to make sure the destination exists:
os.makedirs(os.path.dirname(destination_file))
Once it exists, you can copy the file:
shutil.copyfile(found_file, destination_file)

This will recursively copy all the files with '.h' extension from current dir to dest_dir without creating subdirectories inside dest_dir:
import glob
import shutil
from pathlib import Path
# ignore any files but files with '.h' extension
for file in glob.glob('**/*.h', recursive=True):
shutil.copyfile(
Path(file).absolute(),
(Past(dest_dir)/Path(file).name).absolute()
)

Related

Python: Finding files in directory but ignoring folders and their contents

So my program search_file.py is trying to look for .log files in the directory it is currently placed in. I used the following code to do so:
import os
# This is to get the directory that the program is currently running in
dir_path = os.path.dirname(os.path.realpath(__file__))
# for loop is meant to scan through the current directory the program is in
for root, dirs, files in os.walk(dir_path):
for file in files:
# Check if file ends with .log, if so print file name
if file.endswith('.log')
print(file)
My current directory is as follows:
search_file.py
sample_1.log
sample_2.log
extra_file (this is a folder)
And within the extra_file folder we have:
extra_sample_1.log
extra_sample_2.log
Now, when the program runs and prints the files out it also takes into account the .log files in the extra_file folder. But I do not want this. I only want it to print out sample_1.log and sample_2.log. How would I approach this?

Try this:
import os
files = os.listdir()
for file in files:
if file.endswith('.log'):
print(file)
The problem in your code is os.walk traverses the whole directory tree and not just your current directory. os.listdir returns a list of all filenames in a directory with the default being your current directory which is what you are looking for.
os.walk documentation
os.listdir documentation

By default, os.walk does a root-first traversal of the tree, so you know the first emitted data is the good stuff. So, just ask for the first one. And since you don't really care about root or dirs, use _ as the "don't care" variable name
# get root files list.
_, _, files = next(os.walk(dir_path))
for file in files:
# Check if file ends with .log, if so print file name
if file.endswith('.log')
print(file)
Its also common to use glob:
from glob import glob
dir_path = os.path.dirname(os.path.realpath(__file__))
for file in glob(os.path.join(dir_path, "*.log")):
print(file)
This runs the risk that there is a directory that ends in ".log", so you could also add a testing using os.path.isfile(file).

Move files one by one to newly created directories for each file with Python 3

What I have is an initial directory with a file inside D:\BBS\file.x and multiple .txt files in the work directory D:\
What I am trying to do is to copy the folder BBS with its content and incrementing it's name by number, then copy/move each existing .txt file to the newly created directory to make it \BBS1, \BBS2, ..., BBSn (depends on number of the txt).
Visual example of the Before and After:
Initial view of the \WorkFolder
Desired view of the \WorkFolder
Right now I have reached only creating of a new directory and moving txt in it but all at once, not as I would like to. Here's my code:
from pathlib import Path
from shutil import copy
import shutil
import os
wkDir = Path.cwd()
src = wkDir.joinpath('BBS')
count = 0
for content in src.iterdir():
addname = src.name.split('_')[0]
out_folder = wkDir.joinpath(f'!{addname}')
out_folder.mkdir(exist_ok=True)
out_path = out_folder.joinpath(content.name)
copy(content, out_path)
files = os.listdir(wkDir)
for f in files:
if f.endswith(".txt"):
shutil.move(f, out_folder)
I kindly request for assistance with incrementing and copying files one by one to the newly created directory for each as mentioned.
Not much skills with python in general. Python3 OS Windows
Thanks in advance

Now, I understand what you want to accomplish. I think you can do it quite easily by only iterating over the text files and for each one you copy the BBS folder. After that you move the file you are currently at. In order to get the folder_num, you may be able to just access the file name's characters at the particular indexes (e.g. f[4:6]) if the name is always of the pattern TextXX.txt. If the prefix "Text" may vary, it is more stable to use regular expressions like in the following sample.
Also, the function shutil.copytree copies a directory with its children.
import re
import shutil
from pathlib import Path
wkDir = Path.cwd()
src = wkDir.joinpath('BBS')
for f in os.listdir(wkDir):
if f.endswith(".txt"):
folder_num = re.findall(r"\d+", f)[0]
target = wkDir.joinpath(f"{src.name}{folder_num}")
# copy BBS
shutil.copytree(src, target)
# move .txt file
shutil.move(f, target)

Python loop through directories

I am trying to use python library os to loop through all my subdirectories in the root directory, and target specific file name and rename them.
Just to make it clear this is my tree structure
My python file is located at the root level.
What I am trying to do, is to target the directory 942ba loop through all the sub directories and locate the file 000000 and rename it to 000000.csv
the current code I have is as follow:
import os
root = '<path-to-dir>/942ba956-8967-4bec-9540-fbd97441d17f/'
for dirs, subdirs, files in os.walk(root):
for f in files:
print(dirs)
if f == '000000':
dirs = dirs.strip(root)
f_new = f + '.csv'
os.rename(os.path.join(r'{}'.format(dirs), f), os.path.join(r'{}'.format(dirs), f_new))
But this is not working, because when I run my code, for some reasons the code strips the date from the subduers
can anyone help me to understand how to solve this issue?

A more efficient way to iterate through the folders and only select the files you are looking for is below:
source_folder = '<path-to-dir>/942ba956-8967-4bec-9540-fbd97441d17f/'
files = [os.path.normpath(os.path.join(root,f)) for root,dirs,files in os.walk(source_folder) for f in files if '000000' in f and not f.endswith('.gz')]
for file in files:
os.rename(f, f"{f}.csv")
The list comprehension stores the full path to the files you are looking for. You can change the condition inside the comprehension to anything you need. I use this code snippet a lot to find just images of certain type, or remove unwanted files from the selected files.
In the for loop, files are renamed adding the .csv extension.

I would use glob to find the files.
import os, glob
zdir = '942ba956-8967-4bec-9540-fbd97441d17f'
files = glob.glob('*{}/000000'.format(zdir))
for fly in files:
os.rename(fly, '{}.csv'.format(fly))

Python: search multiple directories and grab newest file, deleting others

New to python and would appreciate a little help.
I would like to go through 10 directories and copy the newest file from each directory back into a single folder. There may be multiple files in each directory.
I can pull a complete listing from each directory, not sure how to narrow this down. Any direction would be appreciated.
inside the STATES directory will be directories for each state (i.e. CA, NY, FL, MI, GA)
**Edited if it is helpful, the directory structure looks like this:
'/dat/users/states/CA/'
'/dat/users/states/NY/'
'/dat/users/states/MI/'
import glob
import os
data_dir = '/dat/users/states/*/'
file_dir_extension = os.path.join(data_dir, '*.csv')
for file_name in glob.glob(file_dir_extension):
if file_name.endswith('.csv'):
print (file_name)

You can use os.walk() instead of glob.glob() to traverse all of your folders. For each folder you get a list of the filename in it. This can be sorted by date using os.path.getmtime(). This will result in the newest file being at the start of the list.
Pop the first element off the list and copy this to your target folder. The remaining elements in the list could then be deleted using os.remove() as follows:
import os
import shutil
root = r'/src/folder/'
copy_to = r'/copy to/folder'
for dirpath, dirnames, filenames in os.walk(root):
# Filter only csv files
files = [file for file in filenames if os.path.splitext(file)[1].lower() == '.csv']
# Sort list by file date
files = sorted(files, key=lambda x: os.path.getmtime(os.path.join(dirpath, x)), reverse=True)
if files:
# Copy the newest file
copy_me = files.pop(0)
print("Copying '{}'".format(copy_me))
shutil.copyfile(os.path.join(dirpath, copy_me), os.path.join(copy_to, copy_me))
# Remove the remaining files
for file in files:
src = os.path.join(dirpath, file)
print("Removing '{}'".format(src))
#os.remove(src)
os.path.join() is used to safely join a path and filename together.
Note: If it is supported on your system, you might need to use something like:
os.stat(os.path.join(dirpath, x)).st_birthtime
to sort based on the creation date/time.

Finding correct path to files in subfolders with os.walk with python?

I am trying to create a program that copies files with certain file extension to the given folder. When files are located in subfolders instead of the root folder the program fails to get correct path. In its current state the program works perfectly for the files in the root folder, but it crashes when it finds matching items in subfolders. The program tries to use rootfolder as directory instead of the correct subfolder.
My code is as follows
# Selective_copy.py walks through file tree and copies files with
# certain extension to give folder
import shutil
import os
import re
# Deciding the folders and extensions to be targeted
# TODO: user input instead of static values
extension = "zip"
source_folder = "/Users/viliheikkila/documents/kooditreeni/"
destination_folder = "/Users/viliheikkila/documents/test"
def Selective_copy(source_folder):
# create regex to identify file extensions
mo = re.compile(r"(\w+).(\w+)") # Group(2) represents the file extension
for dirpath, dirnames, filenames in os.walk(source_folder):
for i in filenames:
if mo.search(i).group(2) == extension:
file_path = os.path.abspath(i)
print("Copying from " + file_path + " to " + destination_folder)
shutil.copy(file_path, destination_folder)
Selective_copy(source_folder)

dirpath is one of the things provided by walk for a reason: it gives the path to the directory that the items in files is located in. You can use that to determine the subfolder you should be using.

file_path = os.path.abspath(i)
This line is blatantly wrong.
Keep in mind that filenames keeps list of base file names. At this point it's just a list of strings and (technically) they are not associated at all with files in filesystem.
os.path.abspath does string-only operations and attempts to merge file name with current working dir. As a result, merged filename points to file that does not exist.
What should be done is merge between root and base file name (both values yield from os.walk):
file_path = os.path.abspath(dirpath, i)

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Copy all files with certain extension, while maintaining directory tree - python

Related

Python: Finding files in directory but ignoring folders and their contents

Move files one by one to newly created directories for each file with Python 3

Python loop through directories

Python: search multiple directories and grab newest file, deleting others

Finding correct path to files in subfolders with os.walk with python?

Categories

Resources