I stetted up a Spyder project where I have codes located in a specific folder and different datas in different folders. Basically, I would like to read those files using the relative path or a simple approach. Let's take the project tree below as example:
Project Tree
I am trying to read "dummy_csv.csv" using "dummy_code.py".
What I am currently doing is this:
import pandas as pd
filepath= "../../../../../dummy_folder02/untitled folder/untitled folder/untitled
folder/dummy_data/dummy_csv.csv"
pd.read_csv(filepath)
I wonder if there is a more elegant/cleaner way of doing this...
You can include the root dir of your data in the system path variable, and then use just the relavtive path:
import sys
sys.path.append(<absolute path to root data dir>)
filepath = "<relative path to csv file, in relation to the absolute path added to sys.path>"
for example:
sys.path.append("C:/my_datasets/dummy_folder02")
filepath = "untitled folder/untitled folder/untitled folder/dummy_data/dummy_csv.csv"
Related
I have a script in python, I want to import a csv from another folder. how can I do this? (for example, my .py is in a folder and I want to reach the data from the desktop)
First of all, you need to understand how relative and absolute paths work.
I write an example using relative paths. I have two folders in desktop called scripts which includes python files and csvs which includes csv files. So, the code would be:
df = pd.read_csv('../csvs/file.csv)
The path means:
.. (previous folder, in this case, desktop folder).
/csvs (csvs folder).
/file.csv (the csv file).
If you are on Windows:
Right-click on the file on your desktop, and go to its properties.
You should see a Location: tag that has a structure similar to this: C:\Users\<user_name>\Desktop
Then you can define the file path as a variable in Python as:
file_path = r'C:\Users\<your_user_name>\Desktop\<your_file_name>.csv'
To read it:
df = pd.read_csv(file_path)
Obviously, always try to use relative paths instead of absolute paths like this in your code. Investing some time into learning the Pathlib module would greatly help you.
What I have is an initial directory with a file inside D:\BBS\file.x and multiple .txt files in the work directory D:\
What I am trying to do is to copy the folder BBS with its content and incrementing it's name by number, then copy/move each existing .txt file to the newly created directory to make it \BBS1, \BBS2, ..., BBSn (depends on number of the txt).
Visual example of the Before and After:
Initial view of the \WorkFolder
Desired view of the \WorkFolder
Right now I have reached only creating of a new directory and moving txt in it but all at once, not as I would like to. Here's my code:
from pathlib import Path
from shutil import copy
import shutil
import os
wkDir = Path.cwd()
src = wkDir.joinpath('BBS')
count = 0
for content in src.iterdir():
addname = src.name.split('_')[0]
out_folder = wkDir.joinpath(f'!{addname}')
out_folder.mkdir(exist_ok=True)
out_path = out_folder.joinpath(content.name)
copy(content, out_path)
files = os.listdir(wkDir)
for f in files:
if f.endswith(".txt"):
shutil.move(f, out_folder)
I kindly request for assistance with incrementing and copying files one by one to the newly created directory for each as mentioned.
Not much skills with python in general. Python3 OS Windows
Thanks in advance
Now, I understand what you want to accomplish. I think you can do it quite easily by only iterating over the text files and for each one you copy the BBS folder. After that you move the file you are currently at. In order to get the folder_num, you may be able to just access the file name's characters at the particular indexes (e.g. f[4:6]) if the name is always of the pattern TextXX.txt. If the prefix "Text" may vary, it is more stable to use regular expressions like in the following sample.
Also, the function shutil.copytree copies a directory with its children.
import re
import shutil
from pathlib import Path
wkDir = Path.cwd()
src = wkDir.joinpath('BBS')
for f in os.listdir(wkDir):
if f.endswith(".txt"):
folder_num = re.findall(r"\d+", f)[0]
target = wkDir.joinpath(f"{src.name}{folder_num}")
# copy BBS
shutil.copytree(src, target)
# move .txt file
shutil.move(f, target)
The goal is to run through a half stable and half variable path.
I am trying to run through a path (go to lowest folder which is called Archive) and fill a list with files that have a certain ending. This works quite well for a stable path such as this.
fileInPath='\\server123456789\provider\COUNTRY\CATEGORY\Archive
My code runs through the path (recursive) and lists all files that have a certain ending. This works well. For simplicity I will just print the file name in the following code.
import csv
import os
fileInPath='\\\\server123456789\\provider\\COUNTRY\\CATEGORY\\Archive
fileOutPath=some path
csvSeparator=';'
fileList = []
for subdir, dirs, files in os.walk(fileInPath):
for file in files:
if file[-3:].upper()=='PAR':
print (file)
The problem is that I can manage to have country and category to be variable e.g. by using *
The standard library module pathlib provides a simple way to do this.
Your file list can be obtained with
from pathlib import Path
list(Path("//server123456789/provider/".glob("*/*/Archive/*.PAR"))
Note I'm using / instead of \\ pathlib handles the conversion for you on windows.
So I've started down the path again of trying to automate something. My end game is to combine the data within Excel files containing the Clean Up in the file name and combine the data from a tab within these files named LOV. So basically it had to go into a folder with folders which have folders again that have 2 files, one file has the words Clean Up in the naming and is a .xlsx file. Which I need to only read those files and and pull the data from the tab called LOV into one large file. --- So that's my end goal. Which I just started and I am no where near, but now you know the end game.
Currently I'm stuck just getting a list of Folder names in the Master folder so I at least know it's getting there lol.
import os
import glob
import pandas as pd
# assigns directory location to PCC Folder
os.chdir('V:/PCC Clean Up Project 2017/_DCS Data SWAT Project/PCC Files
Complete Ready to Submit/Brake System Parts')
FolderList = glob.glob('')
print(FolderList)
Any help is appreciated, thanks guys!
EDITED
Firstly Its hard to understand your question. But from what I understand you need to iterate over folders and subfolders, you can do that with
for root, dirs, files in os.walk(source): #Give your path in source
for file in filenames:
if file.endswith((".xlxs")): # You can check for any file extension
filename = os.path.join(subdir,file)
dirname = subdir.split(os.path.sep)[-1] # gets the directory name
print(dirname)
If you only want the list of folders in your current directory, you can use os.path. Here is how it works:
import os
directory = "V:/PCC Clean Up Project 2017/_DCS Data SWAT Project/PCC Files
Complete Ready to Submit/Brake System Parts"
childDirectories = next(os.walk(directory))[1]
This will give you a list of all folders in your current directory.
Read more about os.walk here.
You can then go into one of the child directories by using os.chdir:
os.chdir(childDirectories[i])
This is a problem that has been previously solved (cannot write file with full path in Python) however I followed the advice in the previous answer and it didn't work and that's why I'm posting this.
I'm trying to access a csv file to load into the pandas dataframe.
import os
output_path = os.path.join('Desktop/My_project_folder', 'train.csv')
This is returning:
IOError: File Desktop/My_project_folder/train.csv does not exist
edit: I don't understand because the train.csv file exists in my project folder.
The os.path.join() function is platform agnostic meaning it can run across multiple OS (PC, Mac, Linux) without having the need to specify directories or subdirectories with forward or back slashes. Hence, simply separate paths and file names by commas:
myDir = '/path/to/Desktop/My_project_folder'
output_path = os.path.join(myDir, 'train.csv')
However, if Python script resides in same directory as data, have script detect its own path and then import data frame into pandas and avoiding hard-coding whole path names:
import os
import pandas as pd
# SET CURRENT DIRECTORY
cd = os.path.dirname(os.path.abspath(__file__))
traindf = read_csv(os.path.join(cd, 'train.csv'))