Python: How to copy specific files from one location to another and keep directory structure - python

Hi I'm struggling with some python code for to copy specific files in a folder to another folder whilst keeping the directory structure.
I'm learning so this code is put together using various code snippets I've found, I couldn't find anything that exactly matched my circumstance and I don't understand python enough yet to understand where I've gone wrong
def filtered_copy(src_dir, dest_dir, filter):
print 'Copying files named ' + filter + ' in ' + src_dir + ' to ' + dest_dir
ignore_func = lambda d, files: [f for f in files if isfile(join(d, f)) and f != filter]
if os.path.exists(dest_dir):
print 'deleting existing data'
shutil.rmtree(dest_dir)
copytree(src_dir, dest_dir, ignore=ignore_func)
Executing this code like this
filtered_copy(c:\foldertosearch, c:\foldertocopyto, 'settings.xml')
does copy across the file I want but does not copy across the parent folder i.e. the src_dir, so the result I'm trying to achieve is:
c:\foldertocopyto\foldertosearch\settings.xml
*** Edit - to clarify this is a script that will be used on multiple operating systems
So if the folder structure was more complex i.e.
Parent folder
-subfolder
--subsubfolder
----subsubsubfolder
------settings.xml
and I ran
filtered_copy(subsubsubfolder, foldertocopyto, 'settings.xml')
I would want the new folder structure to be
foldertocopyto (there could be more parent folders above this or not)
--subsubsubfolder
----settings.xml
In other words the folder I search for a specific file in should also be copied across and if that folder already exists it should be deleted before the folder and file is copied across
I assumed copytree() would do this part - but obviously not!
*** end of Edit
*** Latest code changes
This works but I'm sure it's a long-winded way, also it copies blank folders, presumably because of the copytree() execution, I'd prefer just the folder that's being searched and the filtered file...
def filtered_copy(src_dir, dest_dir, filter):
foldername = os.path.basename(os.path.normpath(src_dir))
print 'Copying files named ' + filter + ' in ' + src_dir + ' to ' + dest_dir + '/' + foldername
ignore_func = lambda d, files: [f for f in files if isfile(join(d, f)) and f != filter]
if os.path.exists(dest_dir + '/' + foldername):
print 'deleting existing data'
shutil.rmtree(dest_dir)
copytree(src_dir, dest_dir + '/' + foldername, ignore=ignore_func)
*** end of latest code changes

You can use distutils.dir_util.copy_tree. It works just fine and you don't have to pass every argument, only src and dst are mandatory.
However in your case you can't use a similar tool like shutil.copytree because it behaves differently: as the destination directory must not exist this function can't be used for overwriting its contents
Give a try to a sample code below:
def recursive_overwrite(src, dest, ignore=None):
if os.path.isdir(src):
if not os.path.isdir(dest):
os.makedirs(dest)
files = os.listdir(src)
if ignore is not None:
ignored = ignore(src, files)
else:
ignored = set()
for f in files:
if f not in ignored:
recursive_overwrite(os.path.join(src, f),
os.path.join(dest, f),
ignore)
else:
shutil.copyfile(src, dest)

Related

Python code to merge multiple .wav files from multiple folders gets hung up

I have a bunch of wave files from an outdoor bird recorder that are broken up into 1 hour segments. Each days worth of audio is in a single folder and I have 30 days worth of folders. I am trying to iterate through the folders an merge each days audio into one file and export it with the folder name but each time i try to run it the print statements indicate that each for loop runs to completion before the merge function can be called, or it runs properly and the merge funtion throws a write error.
import wave
import os
#creates an empty object for the first folder name
rootfiles= ""
#sets the path for the starting location
path = "I:\SwiftOne_000"
#lists all folders in the directory "path"
dir_list = os.listdir(path)
print("Files and directories in '", path, "' :")
#iterates through folders in path
for i in dir_list:
#adds file name to original path
rootfiles = ( path + "\\" + i)
prefix = i
# define outfiles for waves
out_name = prefix
print("first loop completed")
for x in rootfiles:
myfiles= []
paths = rootfiles
ext = (".wav")
#print(paths)
dir_lists = os.listdir(paths)
#print(dir_lists)
#print("Files and directories in '", paths, "' :")
print("second loop completed")
for x in dir_lists:
myfiles.append( paths + "\\" + x)
#print (myfiles)
outfile= "D:\SwiftD\prefix" + prefix + ".wav"
wav_files = myfiles
print("third loop completed")
from contextlib import closing
with closing(wave.open(outfile, 'wb')) as output:
# find sample rate from first file
with closing(wave.open(wav_files[0])) as w:
output.setparams(w.getparams())
# write each file to output
for infile in wav_files:
with closing(wave.open(infile)) as w:
output.writeframes(w.readframes(w.getnframes()))
I think you want something like this, assuming your folder structure is:
- Swift (directory)
- Day1 (directory)
- File1
- File2
- File3
import os, wave
src = r'I:\SwiftOne_000'
output_folder = r'I:\OutputFolder'
input_data = {}
for d_name, d_path in [(d, path) for d in os.listdir(src) if os.path.isdir(path := os.path.join(src, d))]:
input_data[d_name] = [path for f in os.listdir(d_path) if f.lower().endswith('.wav') and os.path.isfile(path := os.path.join(d_path, f))]
print(input_data)
for d_name, paths in input_data.items():
with wave.open(os.path.join(output_folder, f'{d_name}.wav'), 'wb') as output:
params_written = False
for path in paths:
with wave.open(path, 'rb') as data:
if not params_written:
output.setparams(data.getparams())
params_written = True
output.writeframes(data.readframes(data.getnframes()))
There are a few issues with your code. It better to use os.path.join to concatentate paths rather than constructing the string yourself as it makes it platform independent (although you probably don't care). os.listdir will return files and folders so you should check the type with os.path.isfile or os.path.isdir to be sure. The case for the file extension isn't always in lower case so your extension check might not work; using .lower() means you can always check for .wav.
I'm pretty sure you don't need contentlib closing as the with block will already take care of this for you.
You are using the outfile variable to write to the file, however, you overwrite this each time you loop around the third loop, so you will only ever get one file corresponding to the last directory.
Without seeing the stack trace, I'm not sure what the write error is likely to be.

How to Iterate over several directory levels and move files based on condition

I would like some help to loop through some directories and subdirectories and extracting data. I have a directory with three levels, with the third level containing several .csv.gz files. The structure is like this
I need to access level 2 (where subfolders are) of each folder and check the existence of a specific folder (in my example, this will be subfolder 3; I left the other folders empty for this example, but in real cases they will have data). If checking returns True, then I want to change the name of files within the target subfolder3 and transfer all files to another folder.
Bellow is my code. It is quite cumbersome and there is probably better ways of doing it. I tried using os.walk() and this is the closest I got to a solution but it won't move the files.
import os
import shutil
def organizer(parent_dir, target_dir, destination_dir):
for root, dirs, files in os.walk(parent_dir):
if root.endswith(target_dir):
target = root
for files in os.listdir(target):
if not files.startswith("."):
# this is to change the name of the file
fullname = files.split(".")
just_name = fullname[0]
csv_extension = fullname[1]
gz_extension = fullname[2]
subject_id = target
#make a new name
origin = subject_id + "/"+ just_name + "." + csv_extension + "." + gz_extension
#make a path based on this new name
new_name = os.path.join(destination_dir, origin)
#move file from origin folder to destination folder and rename the file
shutil.move(origin, new_name)
Any suggestions on how to make this work and / or more eficient?
simply enough, you can use the built-in os module, with os.walk(path) returns you root directories and files found
import os
for root, _, files in os.walk(path):
#your code here
for your problem, do this
import os
for root, dirs, files in os.walk(parent_directory);
for file in files:
#exctract the data from the "file"
check this for more information os.walk()
and if you want to get the name of the file, you can use os.path.basename(path)
you can even check for just the gzipped csv files you're looking for using built-in fnmatch module
import fnmathch, os
def find_csv_files(path):
result = []
for root, _, files in os.walk(path):
for name in files:
if fnmatch.fnmatch(name, "*.csv.gz"): # find csv.gz using regex paterns
result.append(os.path.join(root, name))
return list(set(results)) #to get the unique paths if for some reason duplicated
Ok, guys, I was finally able to find a solution. Here it is. Not the cleanest one, but it works in my case. Thanks for the help.
def organizer(parent_dir, target_dir, destination_dir):
for root, dirs, files in os.walk(parent_dir):
if root.endswith(target_dir):
target = root
for files in os.listdir(target):
#this one because I have several .DS store files in the folder which I don't want to extract
if not files.startswith("."):
fullname = files.split(".")
just_name = fullname[0]
csv_extension = fullname[1]
gz_extension = fullname[2]
origin = target + "/" + files
full_folder_name = origin.split("/")
#make a new name
new_name = full_folder_name[5] + "_"+ just_name + "." + csv_extension + "." + gz_extension
#make a path based on this new name
new_path = os.path.join(destination_dir, new_name)
#move file from origin folder to destination folder and rename the file
shutil.move(origin, new_path)
The guess the problem was that was passing a variable that was a renamed file (in my example, I wrongly called this variable origin) as the origin path to shutil.move(). Since this path does not exist, then the files weren't moved.

How to Copy Sub Folders and files into New folder using Python

Very new to iterating over folder structures in python (and python!)
All the answers I've found on this site seem very confusing to implement to my situation. Hoping someone can assist.
I have a folder called Downloads. ( 1st level )
This folder is stored at "C:\Users\myusername\Desktop\downloads"
Within this folder I have the following subfolders. (2nd level)
Folder path example: C:\Users\myusername\Desktop\downloads\2020-03-13
2020-03-13
2020-03-13
2020-03-15... etc
Within each of these 2nd level folders I have another level of folders with pdf files.
So if I take 2020-03-13 it has a number of folders below: - 3rd level
22105853
22108288
22182889
Example path for third level:
C:\Users\myusername\Desktop\downloads\2020-03-13\22105853
All I am trying to do is create a new folder at the Downloads (1st)level and copy all the folders at the third level into it. Eliminating the second level structure basically.
Desired result.
C:\Users\myusername\Desktop\r3\downloads\NEWFOLDER\22105853
C:\Users\myusername\Desktop\r3\downloads\NEWFOLDER\22108288
C:\Users\myusername\Desktop\r3\downloads\NEWFOLDER\22182889
I started the code below and managed to recreate the file structure within a new file called Downloads: But stuck now and hoping someone can help me.
save_dir='C:\\Users\\myusername\\Desktop\\downloads\\'
localpath = os.path.join(save_dir, 'Repository')
if not os.path.exists(localpath):
try:
os.mkdir(localpath, mode=777)
print('MAKE_DIR: ' + localpath)
except OSError:
print("directory error occurred")
for root, dirs, files in os.walk(save_dir):
for dir in dirs:
path = os.path.join(localpath, dir)
if '-' not in path and not os.path.exists(path):
#(Checking for '-' to not create folders at sceond level)
os.mkdir(path, mode=777)
print(path)
This code snippet should work:
import os
from distutils.dir_util import copy_tree
root_dir = 'path/to/your/rootdir'
try:
os.mkdir('path/to/your/rootdir/dirname')
except:
pass
for folder_name in os.listdir(root_dir):
path = root_dir + folder_name
for folder_name in os.listdir(path):
copy_tree(path + folder_name, 'path/to/your/rootdir/dirname')
just replace the directory names with the names you need
Using copy_tree is probably the best way to do it, however I prefer check if there are strange files or folders in wrong place and then create folders or copy the files.
Here is another way to do that.
However be careful if you will create the repository folder inside the root folder and than you will iterate over the root folder, in listdir you will have also the Repository folder.
import os
import shutil
def main_copy():
save_dir='C:\\Users\\myusername\\Desktop\\downloads'
localpath = os.path.join(save_dir, 'Repository')
if not os.path.exists(localpath):
try:
os.mkdir(localpath, mode=777)
print('MAKE_DIR: ' + localpath)
except OSError:
print("directory error occurred")
return
for first_level in os.listdir(save_dir):
subffirstlevel = os.path.join(save_dir, first_level)
# skip repository folder
if subffirstlevel == localpath: continue
# skip eventually files
if os.path.isfile(subffirstlevel): continue
for folder_name in os.listdir(subffirstlevel):
subf = os.path.join(subffirstlevel, folder_name)
# skip eventually files
if os.path.isfile(subf): continue
newsubf = os.path.join(localpath, folder_name)
if not os.path.exists(newsubf):
try:
os.mkdir(newsubf, mode=777)
print('MAKE_DIR: ' + newsubf)
except OSError:
print("directory error occurred")
continue
for file_name in os.listdir(subf):
filename = os.path.join(subf, file_name)
if os.path.isfile(filename):
shutil.copy(filename, os.path.join(newsubf, file_name))
print("copy ", file_name)

why python is dropping errors after execution is being completed

I write the program that penetrate inside the folder and check if folder have sub folders that either contained files or not if there is folders which contained files inside program penetrate into it again and delete all founded files in order to make folders empty and get ready to remove it but after the program being executed i get the following error when there is many folders and files inside the directory
PermissionError: [WinError 5] Access is denied: 'FileLocation\\Folder'
and if the directory name inside that directory has two or more words with space separator between them the program runs well but drop the following error
FileNotFoundError: [WinError 2] The system cannot find the file specified: 'FileLocation\\firstName secondName'
and the codes which I wrote with the only os module are the one that follows
import os
# parent folder
New = "C:\\Users\\HP\\Desktop\\New"
# check if the path exists
if os.path.exists(New):
# looping over all files and directories inside the parent folder
for files in os.listdir(New):
# check if there is directories
if os.path.isdir(New + '\\' + files):
# check if the directories are empty and get ready to remove it
if len(os.listdir(New + '\\' + files)) <= 0:
os.rmdir(New + '\\' + files)
# when directories are not empty
else:
# search for file inside the nested directories
for sub_files in os.listdir(New + '\\' + files):
# remove all the files inside the nested directories
os.remove(New + '\\' + files + "\\" + sub_files)
# remove directories after removing files inside it
os.removedirs(New + '\\' + files)
# check if there is files
if os.path.isfile(New + '\\' + files):
# removing the files inside the parent folder
os.remove(New + '\\' + files)
# removing the entire folder after deleting all files and folders inside it
os.rmdir(New)
else:
print('Folder doesn\'t exist')
while when i wrote the program like the second following codes work well without any logic or runtime error and its codes are the following with the shutil and os modules
import shutil
import os
# parent folder
New = "C:\\Users\\HP\\Desktop\\New"
if os.path.exists(New):
shutil.rmtree(New)
else:
print('Folder doesn\'t exist')
so i would like to know if there is any further configuration about the python errors or any bugs in my codes to fix or any way drops no error at runtime which will be better for this(removing directories which is not empty) thanks
I believe it's this part:
if len(os.listdir(New + '\\' + files)) <= 0:
os.rmdir(New + '\\' + files)
# when directories are not empty
else:
# search for file inside the nested directories
for sub_files in os.listdir(New + '\\' + files):
# remove all the files inside the nested directories
os.remove(New + '\\' + files + "\\" + sub_files)
You have if len(os.listdir(New + '\\' + files)) <= 0: which means if the folder is empty. You also have else:, which is for non-empty folders. But, if in that folder, there is another folder that is also not empty, you cannot call os.remove(New + '\\' + files + "\\" + sub_files) on that, so it throws a PermissionError: [WinError 5] Access is denied error.
Click right button on Pychrarm or CMD whatever you use to run the script and select Run as a System administrator

Python code returns "====== RESTART: <path> ======" when previously working properly, batch renaming folders and files with find and replace

Last week I wrote a simple code that would find "XXXX" in a folder's subfolders and files and replace it with a four digit project number (my office has template folders for new projects with many subfolders/files that all are prenamed with XXXX to be replaced with the project number, and I was tired of seeing many of these remaining unchanged throughout a project...)
Anyway, I used it on several project folders last week after writing it and tried running it again today for a new project folder, and the Shell returns
====== RESTART: ======
Python Shell return
I hadn't modified the code nor its file location so am quite confused and related threads did not lead me to resolving this.
Here is the code:
import os
basedir = 'K:\Projects\1702 HANH Stanley Justice'
find = "XXXX"
replace = "1702"
for root, dirs, filenames in os.walk(basedir, topdown=False):
dirs[:] = [d for d in dirs]
for filename in filenames:
filename_split = os.path.splitext(filename)
filename_zero = filename_split[0]
extension = filename_split[1]
if find in filename_zero:
path1 = os.path.join(root, filename)
path2 = os.path.join(root, filename_zero.replace(find, replace) + extension)
os.rename(path1, path2)
print ("file: " + path1 + " renamed to: " + path2)
for root, dirs, filenames in os.walk(basedir, topdown=False):
dirs[:] = [d for d in dirs]
for thedir in dirs:
if find in thedir:
path1 = os.path.join(root, thedir)
path2 = os.path.join(root, thedir.replace(find, replace))
os.rename(path1, path2)
print ("dir: " + path1 + " renamed to: " + path2)
Thank you in advance to whoever can advise me on how to fix this!
Haven't had enough coffee yet today and had copied and pasted the new project folder path without replacing the back slashes with forward slashes. Worked properly when replacing those:
basedir = 'K:/Projects/1702 HANH Stanley Justice'
I'll leave this up for anyone else who experiences this caffeine deprived lapse in memory or for someone searching for a code that will find and replace a string in subfolder and file names (mine was created from combining similarly related codes I found on other threads here, was unable to find an exact one. I'm fairly new to Python so stackoverflow is very helpful!)

Categories

Resources