Best way to join two paths? - python

I'm trying to understand the best way to join two paths in Python. I'm able to get my expected result by using string concatenation, but I understand that is not the preferred way of working with paths. I'm trying to preserve the folder structure of a file, but move it to a new defined output directory.
For example -
import os
orig_file = r"F:\Media\Music\test_doc.txt"
output_dir = r"D:\output_dir"
## preferred method, but unexpected result
new_file = os.path.join(output_dir, os.path.splitdrive(orig_file)[1])
print(new_file)
## new file = D:\Media\Music\test_doc.txt
## What I want
new_file = output_dir + os.path.splitdrive(orig_file)[1]
print(new_file)
## new file = D:\output_dir\Media\Music\test_doc.txt
As you can see, when I use os.path.join() it seems to discard the "output_dir" folder on the D: drive.

In Python 3.4+, the best way to do it is via the more modern and object-oriented pathlib module.
from pathlib import Path
orig_file = Path(r"F:\Media\Music\test_doc.txt")
output_dir = Path(r"D:\output_dir")
new_file = output_dir.joinpath(*orig_file.parts[1:])
print(f'{new_file=}') # -> new_file=WindowsPath('D:/output_dir/Media/Music/test_doc.txt')

Related

How to load all .txt files in a folder with separate names - Python

I have to create a bunch of numpy arrays by importing all the .txt files from a folder. This is the way I'm doing it right now:
wil_davide_noIA_IST_nz300 = np.genfromtxt("%s/davide/wil_davide_noIA_IST_nz300.txt" %path)
wig_davide_multiBinBias_IST_nz300 = np.genfromtxt("%s/davide/wig_davide_multiBinBias_IST_nz300.txt" %path)
wig_davide_multiBinBias_PySSC_nz300 = np.genfromtxt("%s/davide/wig_davide_multiBinBias_PySSC_nz300.txt" %path)
wig_davide_noBias_PySSC_nz300 = np.genfromtxt("%s/davide/wig_davide_noBias_PySSC_nz300.txt" %path)
wig_davide_IST_nz300 = np.genfromtxt("%s/davide/wig_davide_IST_nz300.txt" %path)
wig_davide_noBias_IST_nz300 = np.genfromtxt("%s/davide/wig_davide_noBias_IST_nz300.txt" %path)
wig_davide_PySSC_nz300 = np.genfromtxt("%s/davide/wig_davide_PySSC_nz300.txt" %path)
...
from the folder
Can I automate the process somehow? Note that I'd like the array to have the same name as the imported file (without the .txt, of course).
Thank you very much
I haven't tested this, but I think this pseudo-code should work - the only thing "pseudo" about it is the hardcoded "dir/to/files" string, which you'll have to change to the path to the directory containing the text files. In modern Python you would use the pathlib or glob standard library modules to iterate over all text files in a given directory. Creating a variable number of variables, with variable names determined at runtime is never a good idea. Instead, keep your numpy arrays in some kind of collection. In this case, I would suggest a dictionary, so that you can access the individual numpy arrays by key, where the key is the name of the corresponding file as a string - without the extension of course:
def get_kv_pairs():
from pathlib import Path
for path in Path("dir/to/files").glob("*.txt"):
yield path.stem, np.genfromtxt(str(path))
arrays = dict(get_kv_pairs())
print(arrays["wil_davide_noIA_IST_nz300"]) # Here's how you would access the individual numpy arrays.
This is how you can get all the filenames.
path is whatever is your file path ("./" if in the same directory)
import os
path = "./"
filenames = []
for filename in os.listdir(path):
filenames.append[file]
...
Also, you can filter some files with the if-structure provided. Here I'm selecting only txt files.
filter is some filter you can apply (use "." not in file to get directories)
import os
path = "./"
filter = ".txt"
txt_filenames = []
for filename in os.listdir(path):
if filter in file:
txt_filenames.append[filename]
...
A heads up: You should not do this, if there are any security concerns with the files or their names.
Otherwise you can use the exec function to get the variables you want.
For a list approach see the other solution.
import os
path_to_folder = "."
command = ""
for f in os.listdir(path_to_folder):
if f.endswith(".txt"):
command += f.replace(".txt", "") + f" = np.genfromtxt('{path_to_folder}/{f}')\n"
exec(command)
Depended on your folder/file structure you would need to change up the code a bit, but that should work like asked.

Moving unsupported file extensions by python

I'm trying to move files from one directory to another using Python - spyder.
My file extension is *.OD which python does not support or read.
I have tried using the wildcard and leaving out the file extension (which does not work). Another file extension cannot be used for this particular file.
Moving python supported extensions such as .txt and .csv works fine.
import shutil
source = '//Original_Filepath/Extract*.od'
target = '//NewFilePath/Extract_*.od'
shutil.copy(source, target)
There are no errors, it just doesn't move/copy the file.
Thanks,
There are a couple of basic mistakes with how you're trying to copy the files. With shutil.copy you should not specify a glob, but instead the exact source and destination.
If instead you want to copy a set of files from one directory to another and (presuming the added underscore isn't a mistake) change the target, then you should try using pathlib in combination with shutil (and re if needed).
pathlib - Object-oriented filesystem paths
Try adapting this:
import pathlib
import shutil
import re
source = pathlib.Path('//Original_Filepath') # pathlib takes care of end slash
source_glob = 'Extract*.od'
target = pathlib.Path('//NewFilePath')
for filename in source.glob(source_glob):
# filename here is a Path object as well
glob_match = re.match(r'Extract(.*)\.od', filename.stem).group(1)
new_filename = "Extract_{}.od".format(glob_match)
shutil.copy(str(filename), str(target / new_filename)) # `/` will create new Path
If you're not interested in editing the target nor using any other advanced feature that pathlib provides then see Xukrao's comment.
Thank you all for your help. Much appreciated! :)
I was also able to copy the file with the below as well (a bit simpler).
I left out the * and used a date string instead.
import shutil
from datetime import datetime
now = datetime.now()
Org_from=os.path.abspath('//Original FilePath')
New_to=os.path.abspath('//New Path')
shutil.copy(os.path.join(org_from, 'File_' + now.strftime("%Y%m%d") + '.od'), os.path.join(New_to, 'File_' + now.strftime("%Y%m%d") + '.od'))
Cheers,
Jen

How to normalize a relative path using pathlib

I'm trying to use relative paths in Python, and I want to put my csv files in a separate folder from my python code.
My python program is in the following folder:
G:\projects\code
I want to read this file which is one level up:
G:\projects\data\sales.csv
How do I specify a path using pathlib that is one level up from my current working folder? I don't want to change the current working folder.
I tried this:
from pathlib import Path
file = Path.cwd() /'..'/'data'/'sales.csv'
But now the 'file' variable equals this:
'G:/projects/code/../data/sales.csv'
I read through the docs and either it isn't explained there or I'm just missing it.
Although it's not a problem that your path includes '..' (you can still use this path to open files, etc. in Python), you can normalize the path using resolve():
from pathlib import Path
path = Path.cwd() / '..' / 'data' / 'sales.csv'
print(path) # WindowsPath('G:/projects/code/../data/sales.csv')
print(path.resolve()) # WindowsPath('G:/projects/data/sales.csv')
NB: I personally would name a variable that contains a path path, not file. So you could later on do file = open(path).
print(
Path(__file__).parent, # the folder
Path(__file__).parent.parent, # the folder's parent
sep='\n'
)
print(
Path(
Path(__file__).parent.parent, 'hello.py'
)
)
results in
C:\Users\isik\Desktop\Python\MessAround\project\module
C:\Users\isik\Desktop\Python\MessAround\project
C:\Users\isik\Desktop\Python\MessAround\project\hello.py
with this file structure
-project
-module
-__init__.py
-hello.py
-__init__.py
while the code is located inside project.module.__init__.py
Do you mean "read my csv files"?
The import keyword has a different meaning in Python (you import only other Python modules).
In any case, in order to read a file located one folder above your Python file, you can use this:
import os
filePath = os.path.dirname(__file__)+'/../'+fileName
fileDesc = open(filePath)
fileData = fileDesc.read()
fileDesc.close()
...
here is an example I used:
import json
from pathlib import Path
def read_files(folder_name, file_name):
base_path = Path.cwd().joinpath('configs','resources')
path = base_path.joinpath(folder_name,file_name)
open_file = open(path,'r')
return json.load(open_file.read())
This is pretty old, but I happened on it looking for something else.
Oddly you never got a direct, obvious, answer -- you want the parent property:
from pathlib import Path
file = Path.cwd().parent / 'data' / 'sales.csv'
Note that some of the answers that say you want __file__ rather than the current working directory may be correct (depending on your use case), in which case it's:
from pathlib import Path
file = Path(__file__).parent.parent / 'data' / 'sales.csv'
(parent of the python file is the code dir, parent of that is the projects dir.
However, It's not great practice to refer to your data by its relative path to your code -- I think using the cwd is a better option -- though what you should do is pass the path to the data in to the script via sys.argv.

Is there a more efficient way to store Pandas DataFrames to CSV?

I've written a small Python script that performs operations on a CSV file and stores the newly modified one. I'm wondering if there are any functions or modules that I could take advantage of to make it more efficient. Here's the script:
import pandas as pd
import os
print("Current directory is:\n" + os.getcwd() + '\n')
csv = input("Please enter csv file name: ")
csv_list = csv.split('/')
df = pd.read_csv(csv)
df.drop(df[df['is_reply_to'] == 1].index, inplace=True)
df.to_csv('./' + csv_list[-2] + '/' + 'new_' + csv_list[-1])
Example input: ./upper_directory/testing.csv
Example ouput: new_testing.csv
The method that I'm using is very specific in the sense that I'm assuming that the target CSV file is located in a directory inside the current directory. I was wondering if there was any way to make it more general in the sense that I don't have to do things like csv_list[-2] + '/' + ....
Thank you!
You can create better looking paths like this:
import os
# Directory path of input, then actual file name of path.
out_path = os.path.join(os.path.dirname(csv), 'new_{}'.format(os.path.basename(csv)))
df.to_csv(out_path)

os.path moving through directories

i have this structure in my code.
app
src
script.py
data
data.ptk
i need to open the file "data.ptk" from "scrip.py", using os.path i'm able to extract the script path.
my_path = os.path.abspath(os.path.dirname(__file__))
But flowing the structure i need to go back 2 directories and then enter the "data" directory to open the file.
The easy way would be to decompose the string my_path with a split("/"), remove the last 2 words and add "data" ...
But I do not think it's the right way
The script.py need to be independent of the OS, that is the reason i'm not able to "hard code" the directory where de pkl file is placed
Any suggestions? thanks.
To elaborate on my comment more, you can see the documentation for pathlib here: https://docs.python.org/3/library/pathlib.html?highlight=pathlib#module-pathlib. It is a stock part of python3 (not sure about python2). I think the following would work:
from pathlib import Path
scriptPath = Path(__file__).absolute() # the absolute() is not strictly necessary
srcPath = scriptPath.parent
appPath = srcPath.parent
commonDirectory = appPath.parent # this could have been shortened with scriptPath.parent.parent.parent
dataPath = commonDirectory / 'data'
dtkFile = dataPath / 'data.ptk'

Categories

Resources