Looking for thoughts on how to read all csv files inside a folder in a project.
As an example, the following code is a part of my present working code, where my 'ProjectFolder' is on Desktop, and I am hardcoding the path. Inside the project folder, I have 'csvfolder' where I have all my csv files
However if I move the "ProjectFolder" to a different Hard drive, or other location, my path fails and I have to provide a new path. Is there an smart way to not worry about location of the project folder?
path = r'C:\Users\XXX\Desktop\ProjectFolder\csvFolder' # use your path
all_files = glob.glob(path + "/*.csv")
df_mm = pd.concat((pd.read_csv(f, usecols=["[mm]"]) for f in all_files),
axis = 1, ignore_index = True)
We have dynamic and absolute path concepts, just search on google "absolute vs relative path"; in your case if your python file is in the ProjectFolder you can simply try this:
from os import listdir
from os.path import dirname, realpath, join
def main():
# This is your project directory
current_directory_path = dirname(realpath(__file__))
# This is your csv directory
csv_files_directory_path = join(current_directory_path, "csvFolder")
for each_file_name in listdir(csv_files_directory_path):
if each_file_name.endswith(".csv"):
each_csv_file_full_path = join(csv_files_directory_path, each_file_name)
# Do whatever you want with each_csv_file_full_path
if __name__ == '__main__':
main()
Related
I am generating a python code that automatically processes and combines JSON datasets.
Meanwhile, when I access each folder, there are two JSON datasets in a folder, which are, for example
download/2019/201901/dragon.csv
download/2019/201901/kingdom.csv
and the file names are the same across all folders. In other words, each folder has two datasets with the name above.
in the 'download' folder, there are 4 folders, 2019, 2020, 2021, 2022, and
in the folder of each year, there are folders for each month, e.g., 2019/201901, 2019/201902, ~~
In this situation, I want to process only 'dragon.csv's. I wonder how I can do it. my current code is
import os
import pandas as pd
import numpy as np
path = 'download/2019'
save_path = 'download'
class Preprocess:
def __init__(self, path, save_path):
self.path = path
self.save_path = save_path
after finishing processing,
def save_dataset(path, save_path):
for dir in os.listdir(path):
for file in os.listdir(os.path.join(path, dir)):
if file[-3:] == 'csv':
df = pd.read_csv(os.path.join(path, dir, file))
print(f'Reading data from {os.path.join(path, dir, file)}')
print('Start Preprocessing...')
df = preprocessing(df)
print('Finished!')
if not os.path.exists(os.path.join(save_path, dir)):
os.makedirs(os.path.join(save_path, dir))
df.to_csv(os.path.join(save_path, dir, file), index=False)
save_dataset(path, save_path)
If I understand your question, you only want to process files that include the substring "dragon". You could do this by adding a conditional to your if-clause. So instead of writing if file[-3:] == 'csv' simply write if file[-3:] == 'csv' and 'dragon' in file
You can use pathlib's glob method:
from pathlib import Path
p = Path() # nothing if you're in the folder containing `download` else point to that folder
dragons_paths = p.glob("download/**/dragons.csv")
dragons_paths contains a generator that will point to all the dragons.csv files under download folder.
PS. You should avoid shadowing dir, maybe call your variable dir_ or d.
I have my data in the file 'Climate\Data\Raw_Flooding\CSV\input\addresses.csv'
My code which needs to access the file, however, is in the folder 'Climate\Code\Flooding\python_code\code.py'
How do I access the data subdirectory when I am in the code subdirectory?
import os
cwd = os.getcwd()
print(cwd)
path = cwd.split('\\')
new_path = ''
for i in range(path.index('Climate')):
new_path += path[i]+'\\'
new_path+='Data\Raw_Flooding\CSV\input\\address.csv'
print(new_path)
This will give you the path to the csv file.
I am doing a school assignment where I have to take input from a user and save it to a text file.
My file structure will be something like:
- Customer register
- Customer ID
- .txt files 1-5
It can be saved in the python folder and I can make the folders like this:
os.makedirs("Customer register/Customer ID")
My question is, how do I set the path the text files are to be stored in, in the directory when I don't know the directory? So that no matter where the program is run it is saved in the "Customer ID" folder I create (but on the computer the program is run on)?
Also, how do I make this work on both windows and mac?
I also want to program to be able to be executed several times, and check if the folder is there and save to the "Customer ID" folder if it already exists. Is there a way to do that?
EDIT:
This is the code I am trying to use:
try:
dirs = os.makedirs("Folder")
path = os.getcwd()
os.chdir(path + "/Folder")
print (os.getcwd())
except:
if os.path.exists:
path = os.getcwd()
unique_filename = str(uuid.uuid4())
customerpath = os.getcwd()
os.chdir(customerpath + "/Folder/" + unique_filename)
I am able to create a folder and change the directory (everything in "try" works as I want).
When this folder is created I want to create a second folder with a random generated folder name (used for saving customer files). I can't get this to work in the same way.
Error:
FileNotFoundError: [WinError 2] The system cannot find the file specified: 'C:\Users\48736\PycharmProjects\tina/Folder/979b9026-b2f6-4526-a17a-3b53384f60c4'
EDIT 2:
try:
os.makedirs("Folder")
path = os.getcwd()
os.chdir(path + "/Folder")
print (os.getcwd())
except:
if os.path.exists:
path = os.getcwd()
os.chdir(os.path.join(path, 'Folder'))
print(os.getcwd())
def userId(folderid):
try:
if not os.path.exists(folderid):
os.makedirs(folderid)
except:
if os.path.exists(folderid):
os.chdir(path + "/Folder/" + folderid)
userId(str(uuid.uuid4()))
print(os.getcwd())
So I can now create a folder, change directory to the folder I have created and create a new folder with a unique filename within that folder.
But I can't change the directory again to the folder with the unique filename.
Any suggestions?
I have tried:
os.chdir(path + "/Folder/" + folderid)
os.chdir(path, 'Folder', folderid)
os.chdir(os.path.join(path, 'Folder', folderid))
But is still just stays in: C:\Users\47896\PycharmProjects\tina\Folder
You can use relative paths in your create directory command, i.e.
os.makedirs("./Customer register/Customer ID")
to create folder in project root (=where the primary caller is located) or
os.makedirs("../Customer register/Customer ID") in parent directory.
You can, of course, traverse the files tree as you need.
For specific options mentioned in your question, please, see makedirs documentation at Python 3 docs
here is solution
import os
import shutil
import uuid
path_on_system = os.getcwd() # directory where you want to save data
path = r'Folder' # your working directory
dir_path = os.path.join(path_on_system, path)
if not os.path.exists(dir_path):
os.makedirs(dir_path)
file_name = str(uuid.uuid4()) # file which you have created
if os.path.exists(file_name) and os.path.exists(dir_path):
shutil.move(file_name,os.path.join(dir_path,file_name))
else:
print(" {} does not exist".format(file_name))
I have a python script that creates a PDF and saves it in a subfoler of the folder where the script is saved. I have the following that saves the file to the subfolder:
outfilename = "Test" + ".pdf" #in real code there is a var that holds the name of the file
outfiledir = 'C:/Users/JohnDoe/Desktop/dev/PARENTFOLDER/SUBFOLDER/' #parent folder is where the script is - subfolder is where the PDFs get saved to
outfilepath = os.path.join(outfiledir, outfilename)
Is there a way I can save the PDFs to the subfolder without having to specify the full path? Lets say I wanted yto make this script an exe that multiple computers could use, how would I display the path so that the PDFs are just saved in the subfoler?
Thanks!
Try it:
import os
dir_name = os.path.dirname(os.path.abspath(__file__)) + '/subdir'
path = os.path.join(dir_name, 'filename')
When I open a file, I have to specify the directory that it is in. Is there a way to specify using the current directory instead of writing out the path name? I'm using:
source = os.listdir("../mydirectory")
But the program will only work if it is placed in a directory called "mydirectory". I want the program to work in the directory it is in, no matter what the name is.
def copyfiles(servername):
source = os.listdir("../mydirectory") # directory where original configs are located
destination = '//' + servername + r'/c$/remotedir/' # destination server directory
for files in source:
if files.endswith("myfile.config"):
try:
os.makedirs(destination, exist_ok=True)
shutil.copy(files,destination)
except:
this is a pathlib version:
from pathlib import Path
HERE = Path(__file__).parent
source = list((HERE / "../mydirectory").iterdir())
if you prefer os.path:
import os.path
HERE = os.path.dirname(__file__)
source = os.listdir(os.path.join(HERE, "../mydirectory"))
note: this will often be different from the current working directory
os.getcwd() # or '.'
__file__ is the filename of your current python file. HERE is now the path of the directory where your python file lives.
'.' stands for the current directory.
try:
os.listdir('./')
or:
os.listdir(os.getcwd())