I have a folder structure that looks something like this:
/Forecasting/as_of_date=20220201/type=full/export_country=Spain/import_country=France/000.parquet'
and there are approx 2500 such structures.
I am trying to rename only certain subfolders, mainly export_country as exp_cty and import_country as imp_cty.
So far I tried this but it doesn't seem to work. I never had to deal with such complex folder structures and I'm a bit unsure how to go about it. My script is below:
import os
from pathlib import Path
path = r"/datasets/local/Trade_Data" # Path to directory that will be searched
old = "import_country*"
new = "imp_cty"
for root, dirs, files in os.walk(path, topdown=False): # os.walk will return all files and folders in thedirectory
for name in dirs: # only concerned with dirs since I only want to change subfolders
directoryPath = os.path.join(root, name) # create a path to subfolder
if old in directoryPath: # if the 'export_country' is found in my path then
parentDirectory = Path(directoryPath).parent # save the parent directory path
os.chdir(parentDirectory) # set parent to working directory
os.rename(old, new)
The code you have proposed has 2 issues:
The first one: if old in directoryPath: checks if the string import_country* is inside the path.
From your question I have understood that you would like to rename all directories that start with "import_country"
so you can use the startswith for that.
The second problem is os.rename(old, new) you are trying to rename directory with name import_country* which doesn't exist, instead you should use the name variable.
Here is your code with slightly changes that is working, please note that you must use topdown=False as you are renaming directories while walking through them:
import os
from pathlib import Path
path = "/datasets/local/Trade_Data"
old_prefix = "import_country"
new_name = "imp_cty"
for root, dirs, files in os.walk(path, topdown=False):
for name in dirs:
if name.startswith(old_prefix):
directoryPath = os.path.join(root, name)
parentDirectory = Path(directoryPath).parent
os.chdir(parentDirectory)
os.rename(name, new_name)
Related
I want to add the names of all the files in a specific folder to a list how can i do that? the pathway is from dropbox -> a folder called 'UMM' -> a folder called '2018' could someone help me with the code on this. I have tried using os.walk() but it doesn't seem to work
You can use os.walk and append only names which are in files.
from os import walk
file_names = list()
path = 'path/of/folder'
for root, dirc, files in walk(path):
for FileName in files:
file_names.append(FileName)
print(file_names)
This will append all the files name from all the directories and sub-directories of the specified path.
this will create a list of the files in a folder
from os import listdir
# the path
path = ''
fileList = listdir(path)
I have a path like this:
/my/path/to/important_folder
on the same level, I have other files and folders that I want to list when I reach the same level as important_folder.
My folder could be deeper, so I need to traverse through the folders until I reach important_folder keep searching. Once found, list all files/folders in that same level.
How can I achieve this?
With os.walk, you can do this:
import os
for path, dirs, files in os.walk('/my/path'):
if path == 'important_folder':
for name in dirs + files:
print(os.path.join(path, name))
Or you can use glob.iglob with recursive=True:
import glob
for name in glob.iglob('/my/path/**/important_folder/*', recursive=True):
print(name)
I have some questions related to copying a folder structure. In fact, I need to do a conversion of pdf files to text files. Hence I have such a folder structure for the place where I import the pdf:
D:/f/subfolder1/subfolder2/a.pdf
And I would like to create the exact folder structure under "D:/g/subfolder1/subfolder2/" but without the pdf file since I need to put at this place the converted text file. So after the conversion function it gives me
D:/g/subfolder1/subfolder2/a.txt
And also I would like to add if function to make sure that under "D:/g/" the same folder structure does not exist before creating.
Here is my current code. So how can I create the same folder structure without the file?
Thank you!
import converter as c
import os
inputpath = 'D:/f/'
outputpath = 'D:/g/'
for root, dirs, files in os.walk(yourpath, topdown=False):
for name in files:
with open("D:/g/"+ ,mode="w") as newfile:
newfile.write(c.convert_pdf_to_txt(os.path.join(root, name)))
For me the following works fine:
Iterate over existing folders
Build the structure for the new folders based on existing ones
Check, if the new folder structure does not exist
If so, create new folder without files
Code:
import os
inputpath = 'D:/f/'
outputpath = 'D:/g/'
for dirpath, dirnames, filenames in os.walk(inputpath):
structure = os.path.join(outputpath, dirpath[len(inputpath):])
if not os.path.isdir(structure):
os.mkdir(structure)
else:
print("Folder does already exits!")
Documentation:
os.walk
os.mkdir
os.path.isdir
How about using shutil.copytree()?
import shutil
def ig_f(dir, files):
return [f for f in files if os.path.isfile(os.path.join(dir, f))]
shutil.copytree(inputpath, outputpath, ignore=ig_f)
The directory you want to create should not exist before calling this function. You can add a check for that.
Taken from shutil.copytree without files
A minor tweak to your code for skipping pdf files:
for root, dirs, files in os.walk('.', topdown=False):
for name in files:
if name.find(".pdf") >=0: continue
with open("D:/g/"+ ,mode="w") as newfile:
newfile.write(c.convert_pdf_to_txt(os.path.join(root, name)))
I am trying to create a program that copies files with certain file extension to the given folder. When files are located in subfolders instead of the root folder the program fails to get correct path. In its current state the program works perfectly for the files in the root folder, but it crashes when it finds matching items in subfolders. The program tries to use rootfolder as directory instead of the correct subfolder.
My code is as follows
# Selective_copy.py walks through file tree and copies files with
# certain extension to give folder
import shutil
import os
import re
# Deciding the folders and extensions to be targeted
# TODO: user input instead of static values
extension = "zip"
source_folder = "/Users/viliheikkila/documents/kooditreeni/"
destination_folder = "/Users/viliheikkila/documents/test"
def Selective_copy(source_folder):
# create regex to identify file extensions
mo = re.compile(r"(\w+).(\w+)") # Group(2) represents the file extension
for dirpath, dirnames, filenames in os.walk(source_folder):
for i in filenames:
if mo.search(i).group(2) == extension:
file_path = os.path.abspath(i)
print("Copying from " + file_path + " to " + destination_folder)
shutil.copy(file_path, destination_folder)
Selective_copy(source_folder)
dirpath is one of the things provided by walk for a reason: it gives the path to the directory that the items in files is located in. You can use that to determine the subfolder you should be using.
file_path = os.path.abspath(i)
This line is blatantly wrong.
Keep in mind that filenames keeps list of base file names. At this point it's just a list of strings and (technically) they are not associated at all with files in filesystem.
os.path.abspath does string-only operations and attempts to merge file name with current working dir. As a result, merged filename points to file that does not exist.
What should be done is merge between root and base file name (both values yield from os.walk):
file_path = os.path.abspath(dirpath, i)
I have a directory containing folders and subfolders. At the end of each path there are files. I want to make a txt file containing the path to all the files, but excluding the path to folders.
I tried this suggestion from Getting a list of all subdirectories in the current directory, and my code looks like this:
import os
myDir = '/path/somewhere'
print [x[0] for x in os.walk(myDir)]
And it gives the path of all elements (files AND folders), but I want only the paths to the files. Any ideas for it?
os.walk(path) returns three tuples parent folder, sub directories and files.
so you can do like this:
for dir, subdir, files in os.walk(path):
for file in files:
print os.path.join(dir, file)
The os.walk method gives you dirs, subdirs and files in each iteration, so when you are looping through os.walk, you will have to then iterate over the files and combine each file with "dir".
In order to perform this combination, what you want to do is do an os.path.join between the directory and the files.
Here is a simple example to help illustrate how traversing with os.walk works
from os import walk
from os.path import join
# specify in your loop in order dir, subdirectory, file for each level
for dir, subdir, files in walk('path'):
# iterate over each file
for file in files:
# join will put together the directory and the file
print(join(dir, file))
If you just want the paths, then add a filter to your list comprehension as follows:
import os
myDir = '/path/somewhere'
print [dirpath for dirpath, dirnames, filenames in os.walk(myDir) if filenames]
This would then only add the path for folders which contain files.
def get_paths(path, depth=None):
for name in os.listdir(path):
full_path = os.path.join(path, name)
if os.path.isfile(full_path):
yield full_path
else:
d = depth - 1 if depth is not None else None
if d is None or d >= 0:
for sub_path in get_paths(full_path):
yield sub_path