How to copy folder structure under another directory? - python

I have some questions related to copying a folder structure. In fact, I need to do a conversion of pdf files to text files. Hence I have such a folder structure for the place where I import the pdf:
D:/f/subfolder1/subfolder2/a.pdf
And I would like to create the exact folder structure under "D:/g/subfolder1/subfolder2/" but without the pdf file since I need to put at this place the converted text file. So after the conversion function it gives me
D:/g/subfolder1/subfolder2/a.txt
And also I would like to add if function to make sure that under "D:/g/" the same folder structure does not exist before creating.
Here is my current code. So how can I create the same folder structure without the file?
Thank you!
import converter as c
import os
inputpath = 'D:/f/'
outputpath = 'D:/g/'
for root, dirs, files in os.walk(yourpath, topdown=False):
for name in files:
with open("D:/g/"+ ,mode="w") as newfile:
newfile.write(c.convert_pdf_to_txt(os.path.join(root, name)))

For me the following works fine:
Iterate over existing folders
Build the structure for the new folders based on existing ones
Check, if the new folder structure does not exist
If so, create new folder without files
Code:
import os
inputpath = 'D:/f/'
outputpath = 'D:/g/'
for dirpath, dirnames, filenames in os.walk(inputpath):
structure = os.path.join(outputpath, dirpath[len(inputpath):])
if not os.path.isdir(structure):
os.mkdir(structure)
else:
print("Folder does already exits!")
Documentation:
os.walk
os.mkdir
os.path.isdir

How about using shutil.copytree()?
import shutil
def ig_f(dir, files):
return [f for f in files if os.path.isfile(os.path.join(dir, f))]
shutil.copytree(inputpath, outputpath, ignore=ig_f)
The directory you want to create should not exist before calling this function. You can add a check for that.
Taken from shutil.copytree without files

A minor tweak to your code for skipping pdf files:
for root, dirs, files in os.walk('.', topdown=False):
for name in files:
if name.find(".pdf") >=0: continue
with open("D:/g/"+ ,mode="w") as newfile:
newfile.write(c.convert_pdf_to_txt(os.path.join(root, name)))

Related

Rename certain subfolders in a directory using python

I have a folder structure that looks something like this:
/Forecasting/as_of_date=20220201/type=full/export_country=Spain/import_country=France/000.parquet'
and there are approx 2500 such structures.
I am trying to rename only certain subfolders, mainly export_country as exp_cty and import_country as imp_cty.
So far I tried this but it doesn't seem to work. I never had to deal with such complex folder structures and I'm a bit unsure how to go about it. My script is below:
import os
from pathlib import Path
path = r"/datasets/local/Trade_Data" # Path to directory that will be searched
old = "import_country*"
new = "imp_cty"
for root, dirs, files in os.walk(path, topdown=False): # os.walk will return all files and folders in thedirectory
for name in dirs: # only concerned with dirs since I only want to change subfolders
directoryPath = os.path.join(root, name) # create a path to subfolder
if old in directoryPath: # if the 'export_country' is found in my path then
parentDirectory = Path(directoryPath).parent # save the parent directory path
os.chdir(parentDirectory) # set parent to working directory
os.rename(old, new)
The code you have proposed has 2 issues:
The first one: if old in directoryPath: checks if the string import_country* is inside the path.
From your question I have understood that you would like to rename all directories that start with "import_country"
so you can use the startswith for that.
The second problem is os.rename(old, new) you are trying to rename directory with name import_country* which doesn't exist, instead you should use the name variable.
Here is your code with slightly changes that is working, please note that you must use topdown=False as you are renaming directories while walking through them:
import os
from pathlib import Path
path = "/datasets/local/Trade_Data"
old_prefix = "import_country"
new_name = "imp_cty"
for root, dirs, files in os.walk(path, topdown=False):
for name in dirs:
if name.startswith(old_prefix):
directoryPath = os.path.join(root, name)
parentDirectory = Path(directoryPath).parent
os.chdir(parentDirectory)
os.rename(name, new_name)

read all files in sub folder with pandas

My notebook is in the home folder where I also have another folder "test". In the test folder, I have 5 sub folders. Each of the folder contains a .shp file. I want to iterate in all sub folders within test and open all .shp files. It doesn't matter if they get overwritten.
data = gpd.read_file("./test/folder1/file1.shp")
data.head()
How can I do so? I tried this
path = os.getcwd()
files = glob.glob(os.path.join(path + "/test/", "*.shp"))
print(files)
but this would only go in 1 layer deep.
you can use the os.walk method in the os library.
import os
import pandas as pd
for root, dirs, files in os.walk("./test"):
for name in files:
fpath = os.path.join(root, name)
data = pd.read_file(fpath)
Just do os.chdir(path), and then use glob.glob(os.path.join('*.shp')). It should work.
You have already given the string to join 'os.path'.

Python loop through directories

I am trying to use python library os to loop through all my subdirectories in the root directory, and target specific file name and rename them.
Just to make it clear this is my tree structure
My python file is located at the root level.
What I am trying to do, is to target the directory 942ba loop through all the sub directories and locate the file 000000 and rename it to 000000.csv
the current code I have is as follow:
import os
root = '<path-to-dir>/942ba956-8967-4bec-9540-fbd97441d17f/'
for dirs, subdirs, files in os.walk(root):
for f in files:
print(dirs)
if f == '000000':
dirs = dirs.strip(root)
f_new = f + '.csv'
os.rename(os.path.join(r'{}'.format(dirs), f), os.path.join(r'{}'.format(dirs), f_new))
But this is not working, because when I run my code, for some reasons the code strips the date from the subduers
can anyone help me to understand how to solve this issue?
A more efficient way to iterate through the folders and only select the files you are looking for is below:
source_folder = '<path-to-dir>/942ba956-8967-4bec-9540-fbd97441d17f/'
files = [os.path.normpath(os.path.join(root,f)) for root,dirs,files in os.walk(source_folder) for f in files if '000000' in f and not f.endswith('.gz')]
for file in files:
os.rename(f, f"{f}.csv")
The list comprehension stores the full path to the files you are looking for. You can change the condition inside the comprehension to anything you need. I use this code snippet a lot to find just images of certain type, or remove unwanted files from the selected files.
In the for loop, files are renamed adding the .csv extension.
I would use glob to find the files.
import os, glob
zdir = '942ba956-8967-4bec-9540-fbd97441d17f'
files = glob.glob('*{}/000000'.format(zdir))
for fly in files:
os.rename(fly, '{}.csv'.format(fly))

I want to add all the names of the files in a specific folder to a list

I want to add the names of all the files in a specific folder to a list how can i do that? the pathway is from dropbox -> a folder called 'UMM' -> a folder called '2018' could someone help me with the code on this. I have tried using os.walk() but it doesn't seem to work
You can use os.walk and append only names which are in files.
from os import walk
file_names = list()
path = 'path/of/folder'
for root, dirc, files in walk(path):
for FileName in files:
file_names.append(FileName)
print(file_names)
This will append all the files name from all the directories and sub-directories of the specified path.
this will create a list of the files in a folder
from os import listdir
# the path
path = ''
fileList = listdir(path)

Iterate over files located in different folders

I’d like to write a function to iterate over excel files that are in different folders. Parts of the path of each file are the same, for instance:
C:\Main\Division\Reports\Year\Data.xls
The only part of each path that changes is ‘Year’. The files all have the same name.
Is there a way to do this with a placeholder for Year? If not, what approach should I take?
You can use os.listdir function
directory = "C:\Main\Division\Reports"
root_dir = os.path.dirname(directory)
for data in os.listdir(directory):
file_name = os.path.join(root_dir, data, 'Data.xls')
# do something
You could try os.walk
import os
parent = "C:\Main\Division\Reports"
for root, directory, files in os.walk(parent):
print root
print directory
print files

Categories

Resources