removing substrings from subdirectory names using values held in list - python

I have a parent directory that contains a lot of subdirectories. I want to create a script that loops through all of the subdirectories and removes any key words that I have specified in the list variable.
I am not entirely sure how to acheive this.
Currently I have this:
import os
directory = next(os.walk('.'))[1]
stringstoremove = ['string1','string2','string3','string4','string5']
for folders in directory:
os.rename
And maybe this type of logic to check to see if the string exists within the subdirectory name:
if any(words in inputstring for words in stringstoremove):
print ("TRUE")
else:
print ("FALSE")
Trying my best to to deconstruct the task, but I'm going round in circles now
Thanks guys

Startng from your existing code:
import os
directory = next(os.walk('.'))[1]
stringstoremove = ['string1','string2','string3','string4','string5']
for folder in directory :
new_folder = folder
for r in stringstoremove :
new_folder = new_folder.replace( r, '')
if folder != new_folder : # don't rename if it's the same
os.rename( folder, new_folder )

If you want to rename those sub directories which match in your stringstoremove list then following script will be helpful.
import os
import re
path = "./" # parent directory path
sub_dirs = os.listdir(path)
stringstoremove = ['string1','string2','string3','string4','string5']
for directory_name in sub_dirs:
if os.path.isdir(path + directory):
for string in stringstoremove:
if string in directory_name:
try:
new_name = re.sub(string, "", directory_name)
os.rename(path + directory, path + new_name) # rename this directory
except Exception as e:
print (e)

Related

How to Iterate over several directory levels and move files based on condition

I would like some help to loop through some directories and subdirectories and extracting data. I have a directory with three levels, with the third level containing several .csv.gz files. The structure is like this
I need to access level 2 (where subfolders are) of each folder and check the existence of a specific folder (in my example, this will be subfolder 3; I left the other folders empty for this example, but in real cases they will have data). If checking returns True, then I want to change the name of files within the target subfolder3 and transfer all files to another folder.
Bellow is my code. It is quite cumbersome and there is probably better ways of doing it. I tried using os.walk() and this is the closest I got to a solution but it won't move the files.
import os
import shutil
def organizer(parent_dir, target_dir, destination_dir):
for root, dirs, files in os.walk(parent_dir):
if root.endswith(target_dir):
target = root
for files in os.listdir(target):
if not files.startswith("."):
# this is to change the name of the file
fullname = files.split(".")
just_name = fullname[0]
csv_extension = fullname[1]
gz_extension = fullname[2]
subject_id = target
#make a new name
origin = subject_id + "/"+ just_name + "." + csv_extension + "." + gz_extension
#make a path based on this new name
new_name = os.path.join(destination_dir, origin)
#move file from origin folder to destination folder and rename the file
shutil.move(origin, new_name)
Any suggestions on how to make this work and / or more eficient?
simply enough, you can use the built-in os module, with os.walk(path) returns you root directories and files found
import os
for root, _, files in os.walk(path):
#your code here
for your problem, do this
import os
for root, dirs, files in os.walk(parent_directory);
for file in files:
#exctract the data from the "file"
check this for more information os.walk()
and if you want to get the name of the file, you can use os.path.basename(path)
you can even check for just the gzipped csv files you're looking for using built-in fnmatch module
import fnmathch, os
def find_csv_files(path):
result = []
for root, _, files in os.walk(path):
for name in files:
if fnmatch.fnmatch(name, "*.csv.gz"): # find csv.gz using regex paterns
result.append(os.path.join(root, name))
return list(set(results)) #to get the unique paths if for some reason duplicated
Ok, guys, I was finally able to find a solution. Here it is. Not the cleanest one, but it works in my case. Thanks for the help.
def organizer(parent_dir, target_dir, destination_dir):
for root, dirs, files in os.walk(parent_dir):
if root.endswith(target_dir):
target = root
for files in os.listdir(target):
#this one because I have several .DS store files in the folder which I don't want to extract
if not files.startswith("."):
fullname = files.split(".")
just_name = fullname[0]
csv_extension = fullname[1]
gz_extension = fullname[2]
origin = target + "/" + files
full_folder_name = origin.split("/")
#make a new name
new_name = full_folder_name[5] + "_"+ just_name + "." + csv_extension + "." + gz_extension
#make a path based on this new name
new_path = os.path.join(destination_dir, new_name)
#move file from origin folder to destination folder and rename the file
shutil.move(origin, new_path)
The guess the problem was that was passing a variable that was a renamed file (in my example, I wrongly called this variable origin) as the origin path to shutil.move(). Since this path does not exist, then the files weren't moved.

How to Copy Sub Folders and files into New folder using Python

Very new to iterating over folder structures in python (and python!)
All the answers I've found on this site seem very confusing to implement to my situation. Hoping someone can assist.
I have a folder called Downloads. ( 1st level )
This folder is stored at "C:\Users\myusername\Desktop\downloads"
Within this folder I have the following subfolders. (2nd level)
Folder path example: C:\Users\myusername\Desktop\downloads\2020-03-13
2020-03-13
2020-03-13
2020-03-15... etc
Within each of these 2nd level folders I have another level of folders with pdf files.
So if I take 2020-03-13 it has a number of folders below: - 3rd level
22105853
22108288
22182889
Example path for third level:
C:\Users\myusername\Desktop\downloads\2020-03-13\22105853
All I am trying to do is create a new folder at the Downloads (1st)level and copy all the folders at the third level into it. Eliminating the second level structure basically.
Desired result.
C:\Users\myusername\Desktop\r3\downloads\NEWFOLDER\22105853
C:\Users\myusername\Desktop\r3\downloads\NEWFOLDER\22108288
C:\Users\myusername\Desktop\r3\downloads\NEWFOLDER\22182889
I started the code below and managed to recreate the file structure within a new file called Downloads: But stuck now and hoping someone can help me.
save_dir='C:\\Users\\myusername\\Desktop\\downloads\\'
localpath = os.path.join(save_dir, 'Repository')
if not os.path.exists(localpath):
try:
os.mkdir(localpath, mode=777)
print('MAKE_DIR: ' + localpath)
except OSError:
print("directory error occurred")
for root, dirs, files in os.walk(save_dir):
for dir in dirs:
path = os.path.join(localpath, dir)
if '-' not in path and not os.path.exists(path):
#(Checking for '-' to not create folders at sceond level)
os.mkdir(path, mode=777)
print(path)
This code snippet should work:
import os
from distutils.dir_util import copy_tree
root_dir = 'path/to/your/rootdir'
try:
os.mkdir('path/to/your/rootdir/dirname')
except:
pass
for folder_name in os.listdir(root_dir):
path = root_dir + folder_name
for folder_name in os.listdir(path):
copy_tree(path + folder_name, 'path/to/your/rootdir/dirname')
just replace the directory names with the names you need
Using copy_tree is probably the best way to do it, however I prefer check if there are strange files or folders in wrong place and then create folders or copy the files.
Here is another way to do that.
However be careful if you will create the repository folder inside the root folder and than you will iterate over the root folder, in listdir you will have also the Repository folder.
import os
import shutil
def main_copy():
save_dir='C:\\Users\\myusername\\Desktop\\downloads'
localpath = os.path.join(save_dir, 'Repository')
if not os.path.exists(localpath):
try:
os.mkdir(localpath, mode=777)
print('MAKE_DIR: ' + localpath)
except OSError:
print("directory error occurred")
return
for first_level in os.listdir(save_dir):
subffirstlevel = os.path.join(save_dir, first_level)
# skip repository folder
if subffirstlevel == localpath: continue
# skip eventually files
if os.path.isfile(subffirstlevel): continue
for folder_name in os.listdir(subffirstlevel):
subf = os.path.join(subffirstlevel, folder_name)
# skip eventually files
if os.path.isfile(subf): continue
newsubf = os.path.join(localpath, folder_name)
if not os.path.exists(newsubf):
try:
os.mkdir(newsubf, mode=777)
print('MAKE_DIR: ' + newsubf)
except OSError:
print("directory error occurred")
continue
for file_name in os.listdir(subf):
filename = os.path.join(subf, file_name)
if os.path.isfile(filename):
shutil.copy(filename, os.path.join(newsubf, file_name))
print("copy ", file_name)

python how to collect a specific file from a list of folders and save

I have many folders in a master folder as given below. Each folder contains a .JPG file. I would like to extract all these files and store them in this master folder.
Inside each folder
My present code:
import os
import glob
os.chdir('Master folder')
extension = 'JPG'
jpg_files= [i for i in glob.glob('*.{}'.format(extension))]
This did not work.
To find the images in your tree, I would use os.walk. Below you can find a complete example to 'find and move' function that move all the files to your given path, and create a new filename for duplicate filenames.
The simple 'find and replace' function will also check with function add_index_to_filepath whether or not the file already exists, add an index (n) to the path. For example: if image.jpg would exists, it turns the next one into turn into image (1).jpg and the following one into image (2).jpg and so on.
import os
import re
import shutil
def add_index_to_filepath(path):
'''
Check if a file exists, and append '(n)' if true.
'''
# If the past exists, go adjust it
if os.path.exists(path):
# pull apart your path and filenames
folder, file = os.path.split(path)
filename, extension = os.path.splitext(file)
# discover the current index, and correct filename
try:
regex = re.compile(r'\(([0-9]*)\)$')
findex = regex.findall(filename)[0]
filename = regex.sub('({})'.format(int(findex) + 1), filename)
except IndexError:
filename = filename + ' (1)'
# Glue your path back together.
new_path = os.path.join(folder, '{}{}'.format(filename, extension))
# Recursivly call your function, go keep verifying if it exists.
return add_index_to_filepath(new_path)
return path
def find_and_move_files(path, extension_list):
'''
Walk through a given path and move the files from the sub-dir to the path.
Upper-and lower-case are ignored. Duplicates get a new filename.
'''
files_moved = []
# First walk through the path, to list all files.
for root, dirs, files in os.walk(path, topdown=False):
for file in files:
# Is your extension wanted?
extension = os.path.splitext(file)[-1].lower()
if extension in extension_list:
# Perpare your old an new path, and move
old_path = os.path.join(root, file)
new_path = add_index_to_filepath(os.path.join(path, file))
if new_path in files_moved:
shutil.move(old_path, new_path)
# Lets keep track of what we moved to return it in the end
files_moved.append(new_path)
return files_moved
path = '.' # your filepath for the master-folder
extensions = ['.jpg', '.jpeg'] # There are some variations of a jpeg-file extension.
found_files = find_and_move_files(path, extensions)

Python - how to change directory

I am doing a school assignment where I have to take input from a user and save it to a text file.
My file structure will be something like:
- Customer register
- Customer ID
- .txt files 1-5
It can be saved in the python folder and I can make the folders like this:
os.makedirs("Customer register/Customer ID")
My question is, how do I set the path the text files are to be stored in, in the directory when I don't know the directory? So that no matter where the program is run it is saved in the "Customer ID" folder I create (but on the computer the program is run on)?
Also, how do I make this work on both windows and mac?
I also want to program to be able to be executed several times, and check if the folder is there and save to the "Customer ID" folder if it already exists. Is there a way to do that?
EDIT:
This is the code I am trying to use:
try:
dirs = os.makedirs("Folder")
path = os.getcwd()
os.chdir(path + "/Folder")
print (os.getcwd())
except:
if os.path.exists:
path = os.getcwd()
unique_filename = str(uuid.uuid4())
customerpath = os.getcwd()
os.chdir(customerpath + "/Folder/" + unique_filename)
I am able to create a folder and change the directory (everything in "try" works as I want).
When this folder is created I want to create a second folder with a random generated folder name (used for saving customer files). I can't get this to work in the same way.
Error:
FileNotFoundError: [WinError 2] The system cannot find the file specified: 'C:\Users\48736\PycharmProjects\tina/Folder/979b9026-b2f6-4526-a17a-3b53384f60c4'
EDIT 2:
try:
os.makedirs("Folder")
path = os.getcwd()
os.chdir(path + "/Folder")
print (os.getcwd())
except:
if os.path.exists:
path = os.getcwd()
os.chdir(os.path.join(path, 'Folder'))
print(os.getcwd())
def userId(folderid):
try:
if not os.path.exists(folderid):
os.makedirs(folderid)
except:
if os.path.exists(folderid):
os.chdir(path + "/Folder/" + folderid)
userId(str(uuid.uuid4()))
print(os.getcwd())
So I can now create a folder, change directory to the folder I have created and create a new folder with a unique filename within that folder.
But I can't change the directory again to the folder with the unique filename.
Any suggestions?
I have tried:
os.chdir(path + "/Folder/" + folderid)
os.chdir(path, 'Folder', folderid)
os.chdir(os.path.join(path, 'Folder', folderid))
But is still just stays in: C:\Users\47896\PycharmProjects\tina\Folder
You can use relative paths in your create directory command, i.e.
os.makedirs("./Customer register/Customer ID")
to create folder in project root (=where the primary caller is located) or
os.makedirs("../Customer register/Customer ID") in parent directory.
You can, of course, traverse the files tree as you need.
For specific options mentioned in your question, please, see makedirs documentation at Python 3 docs
here is solution
import os
import shutil
import uuid
path_on_system = os.getcwd() # directory where you want to save data
path = r'Folder' # your working directory
dir_path = os.path.join(path_on_system, path)
if not os.path.exists(dir_path):
os.makedirs(dir_path)
file_name = str(uuid.uuid4()) # file which you have created
if os.path.exists(file_name) and os.path.exists(dir_path):
shutil.move(file_name,os.path.join(dir_path,file_name))
else:
print(" {} does not exist".format(file_name))

how to get a folder name and file name in python

I have a python program named myscript.py which would give me the list of files and folders in the path provided.
import os
import sys
def get_files_in_directory(path):
for root, dirs, files in os.walk(path):
print(root)
print(dirs)
print(files)
path=sys.argv[1]
get_files_in_directory(path)
the path i provided is D:\Python\TEST and there are some folders and sub folder in it as you can see in the output provided below :
C:\Python34>python myscript.py "D:\Python\Test"
D:\Python\Test
['D1', 'D2']
[]
D:\Python\Test\D1
['SD1', 'SD2', 'SD3']
[]
D:\Python\Test\D1\SD1
[]
['f1.bat', 'f2.bat', 'f3.bat']
D:\Python\Test\D1\SD2
[]
['f1.bat']
D:\Python\Test\D1\SD3
[]
['f1.bat', 'f2.bat']
D:\Python\Test\D2
['SD1', 'SD2']
[]
D:\Python\Test\D2\SD1
[]
['f1.bat', 'f2.bat']
D:\Python\Test\D2\SD2
[]
['f1.bat']
I need to get the output this way :
D1-SD1-f1.bat
D1-SD1-f2.bat
D1-SD1-f3.bat
D1-SD2-f1.bat
D1-SD3-f1.bat
D1-SD3-f2.bat
D2-SD1-f1.bat
D2-SD1-f2.bat
D2-SD2-f1.bat
how do i get the output this way.(Keep in mind the directory structure here is just an example. The program should be flexible for any path). How do i do this.
Is there any os command for this. Can you Please help me solve this? (Additional Information : I am using Python3.4)
You could try using the glob module instead:
import glob
glob.glob('D:\Python\Test\D1\*\*\*.bat')
Or, to just get the filenames
import os
import glob
[os.path.basename(x) for x in glob.glob('D:\Python\Test\D1\*\*\*.bat')]
To get what you want, you could do the following:
def get_files_in_directory(path):
# Get the root dir (in your case: test)
rootDir = path.split('\\')[-1]
# Walk through all subfolder/files
for root, subfolder, fileList in os.walk(path):
for file in fileList:
# Skip empty dirs
if file != '':
# Get the full path of the file
fullPath = os.path.join(root,file)
# Split the path and the file (May do this one and the step above in one go
path, file = os.path.split(fullPath)
# For each subfolder in the path (in REVERSE order)
subfolders = []
for subfolder in path.split('\\')[::-1]:
# As long as it isn't the root dir, append it to the subfolders list
if subfolder == rootDir:
break
subfolders.append(subfolder)
# Print the list of subfolders (joined by '-')
# + '-' + file
print('{}-{}'.format( '-'.join(subfolders), file) )
path=sys.argv[1]
get_files_in_directory(path)
My test folder:
SD1-D1-f1.bat
SD1-D1-f2.bat
SD2-D1-f1.bat
SD3-D1-f1.bat
SD3-D1-f2.bat
It may not be the best way to do it, but it will get you what you want.

Categories

Resources