python: transforming os.Path code into Pathlib code - python

I have the following function in python to add a dict as row to a pandas DF that also takes care of creating a first empty DF if there is not yet there.
I use the library os but I would like to change to Pathlib since consulting with a Software Developer of my company I was said I should use pathlib and not os.Path for these issues. (note aside, I don't have a CS background)
def myfunc(dictt,filename, folder='', extension='csv'):
if folder == '':
folder = os.getcwd(). #---> folder = Path.cwd()
filename = filename + '.' + 'csv'
total_file = os.path.join(folder,filename) #<--- this is what I don't get translated
# check if file exists, otherwise create it
if not os.path.isfile(total_file):#<----- if total file is a Path object: totalfile.exists()
df_empty = pd.DataFrame()
if extension=='csv':
df_empty.to_csv(total_file)
elif extension=='pal':
df_empty.to_pkl(total_file)
else:
#raise error
pass
# code to append the dict as row
# ...
First I don't understand why path lib is supposed to be better, and secondly I don't understand how to translate the line above mentioned, i.e. how to really do os.path.join(folder_path, filename) with pathlib notation.
In path lib it seems to be different approaches for windows and other machines, and also I don't see an explanation as to what is a posix path (docs here).
Can anyone help me with those two lines?
Insights as to why use Pathlib instead of os.path are Welcome.
thanks

First I don't understand why path lib is supposed to be better.
pathlib provides an object-oriented interface to the same functionality os.path gives. There is nothing inherently wrong about using os.path. We (the python community) had been using os.path happily before pathlib came on the scene.
However, pathlib does make life simpler. Firstly, as mentioned in the comment by Henry Ecker, you're dealing with path objects, not strings, so you have less error checking to do after a path has been constructed, and secondly, the path objects' instance methods are right there to be used.
Can anyone help me with those two lines?
Using your example:
def mypathlibfunc(dictt, filename, folder='', extension='csv'):
if folder == '':
folder = pl.Path.cwd()
else:
folder = pl.Path(folder)
total_file = folder / f'{filename}.{extension}'
if not total_file.exists():
# do your thing
df_empty = pd.DataFrame()
if extension == 'csv':
df_empty.to_csv(total_file)
elif extension == 'pal':
df_empty.to_pickle(total_file)
notes:
if your function is called with folder != '', then a Path object is being built from it, this is to ensure that folder has a consistent type in the rest of the function.
child Path objects can be constructed using the division operator /, which is what I did for total_file & I didn't actually need to wrap f'{filename}.{extension}' in a Path object. pretty neat! reference
pandas.DataFrame.to_[filetype] methods all accept a Path object in addition to a path string, so you don't have to worry about modifying that part of your code.
In path lib it seems to be different approaches for windows and other machines, and also I don't see an explanation as to what is a posix path
If you use the Path object, it will be cross-platform, and you needn't worry about windows & posix paths.

Related

How to modify a path which's in a variable (python)?

After finding the path of the Python file which I am actually working on with os.getcwd() and __file__ I want to modify it, so if I put it in a variable named r and then delete one part of the path that will be very good. For example, the path is 'C:\\Users\\Shadow\\Desktop\\213.py' if I want to delete \\213.py from the path (r) how can I do that?
you can manipulate your string:
r = 'C:\\Users\\Shadow\\Desktop\\213.py'
r.rsplit('\\', 1)[0]
output:
'C:\\Users\\Shadow\\Desktop'
you may also want to have a look over pathlib.Path
You extract the directory name in your example. This is easily achieved with os.path.dirname.
import os
os.path.dirname(__file__)
This solution is cross-platform (mostly), and avoids most pitfalls that arise from treating paths as strings.
If you need the value stored in a variable:
import os
r = on.path.dirname(__file__)

Os.path gives unexpected output

lately I started working with the Os module in python . And I finally arrived to this Os.path method . So here is my question . I ran this method in one of my kivy project just for testing and it actually didn't returned the correct output.The method consisted of finding if any directory exist and return a list of folders in the directory . otherwise print Invalid Path and return -1 . I passed in an existing directory and it returned -1 but the weird path is that when I run similar program out of my kivy project using the same path present in thesame folder as my python file it return the desired output .here is the image with the python file and the directory name image I have tested which returns invalid path.
and here is my code snippet
def get_imgs(self, img_path):
if not os.path.exists(img_path):
print("Invalid Path...")
return -1
else:
all_files = os.listdir(img_path)
imgs = []
for f in all_files:
if (
f.endswith(".png")
or f.endswith(".PNG")
or f.endswith(".jpg")
or f.endswith(".JPG")
or f.endswith(".jpeg")
or f.endswith(".JPEG")
):
imgs.append("/".join([img_path, f]))
return imgs
It's tough to tell without seeing the code with your function call. Whatever argument you're passing must not be a valid path. I use the os module regularly and have slowly learned a lot of useful methods. I always print out paths that I'm reading or where I'm writing before doing it in case anything unexpected happens, I can see that img_path variable, for example. Copy and paste the path in file explorer up to the directory and make sure that's all good.
Some other useful os.path methods you will find useful, based on your code:
os.join(<directory>, <file_name.ext>) is much more intuitive than imgs.append("/".join([img_path, f]))
os.getcwd() gets your working directory (which I print at the start of scripts in dev to quickly address issues before debugging). I typically use full paths to play it safe because Python pathing can cause differences/issues when running from cmd vs. PyCharm
os.path.basename(f) gives you the file, while os.path.dirname(f) gives you the directory.
It seems like a better approach to this is to use pathlib and glob. You can iterate over directories and use wild cards.
Look at these:
iterating over directories: How can I iterate over files in a given directory?
different file types: Python glob multiple filetypes
Then you don't even need to check whether os.path.exists(img_path) because this will read the files directly from your file system. There's also more wild cards in the glob library such as * for anything/any length, ? for any character, [0-9] for any number, found here: https://docs.python.org/3/library/glob.html

How to deal with multiple dots in a file name with Python pathlib?

I am having an issue with pathlib when I try to construct a file path that has a "." in its name, the pathlib module ignores it.
Here are example lines (I tried multiple versions, all resulted the same issue)
The issue is that the original file name will be coming from another application, so it is not like I can edit the name myself. I also do not want to do string replacement work arounds, if possible.
path=r"c:\temp"
1
p=Path(path).joinpath("myfile.001").with_suffix(".bat")
2
p=Path(path, "myfile.001").with_suffix(".bat")
3
p=Path(path).with_name("myfile.001").with_suffix(".bat")
All these lines will yield to
WindowsPath('C:/temp/myfile.bat')
So how do I make pathlib.Path to construct this full path properly. The final path has to be
WindowsPath('C:/temp/myfile.001.bat')
Not
WindowsPath('C:/temp/myfile.bat')
Naturally I am looking for a way to do it through pathlib itself, otherwise I can just use os.
thanks
You are telling pathlib to replace the suffix .001 with the suffix .bat. pathlib complies.
Tell pathlib to add .bat to the existing suffix.
p = Path(path, 'myfile.001')
p = p.with_suffix(p.suffix+'.001')

How to format the beginning a loop correctly?

I have a program which I designed both for myself and my colleague to use, with all the data being stored in a directories. However, I want to set up the loop so that it work both for me and him. I tried all of these:
file_location = glob.glob('/../*.nc')
file_location = glob('/../*.nc')
But none of them are picking up any files. How can I fix this?
You can get a directory relative to a user's home (called ~ in the function call) using os.path.expanduser(). In your case, the line would be
file_location = glob.glob(os.path.expanduser('~/Dropbox/Argo/Data/*.nc'))
Usually is a good practice not hardcoding paths if you're going to use your paths for other tasks which need well-formed paths (ie: subprocess, writing paths to shell scripts), I'd recommend to manage paths using the os.path module instead, for example:
import os, glob
home_path = os.path.expanduser("~")
dropbox_path = os.path.join(home_path, "Dropbox")
good_paths = glob.glob(os.path.join(dropbox_path,"Argo","Data","*.nc"))
bad_paths = glob.glob(dropbox_path+"/Argo\\Data/*.nc")
print len(good_paths)==len(bad_paths)
print all([os.path.exists(p) for p in good_paths])
print all([os.path.exists(p) for p in bad_paths])
The example shows a comparison between bad and well formed paths. Both of them will work, but good_paths will be more flexible and portable in the long term.

normalize non-existing path using pathlib only

python has recently added the pathlib module (which i like a lot!).
there is just one thing i'm struggling with: is it possible to normalize a path to a file or directory that does not exist? i can do that perfectly well with os.path.normpath. but wouldn't it be absurd to have to use something other than the library that should take care of path related stuff?
the functionality i would like to have is this:
from os.path import normpath
from pathlib import Path
pth = Path('/tmp/some_directory/../i_do_not_exist.txt')
pth = Path(normpath(str(pth)))
# -> /tmp/i_do_not_exist.txt
but without having to resort to os.path and without having to type-cast to str and back to Path. also pth.resolve() does not work for non-existing files.
is there a simple way to do that with just pathlib?
is it possible to normalize a path to a file or directory that does not exist?
Starting from 3.6, it's the default behavior. See https://docs.python.org/3.6/library/pathlib.html#pathlib.Path.resolve
Path.resolve(strict=False)
...
If strict is False, the path is resolved as far as possible and any remainder is appended without checking whether it exists
As of Python 3.5: No, there's not.
PEP 0428 states:
Path resolution
The resolve() method makes a path absolute, resolving
any symlink on the way (like the POSIX realpath() call). It is the
only operation which will remove " .. " path components. On Windows,
this method will also take care to return the canonical path (with the
right casing).
Since resolve() is the only operation to remove the ".." components, and it fails when the file doesn't exist, there won't be a simple means using just pathlib.
Also, the pathlib documentation gives a hint as to why:
Spurious slashes and single dots are collapsed, but double dots ('..')
are not, since this would change the meaning of a path in the face of
symbolic links:
PurePath('foo//bar') produces PurePosixPath('foo/bar')
PurePath('foo/./bar') produces PurePosixPath('foo/bar')
PurePath('foo/../bar') produces PurePosixPath('foo/../bar')
(a naïve approach would make PurePosixPath('foo/../bar') equivalent to PurePosixPath('bar'), which is wrong if foo is a symbolic link to another directory)
All that said, you could create a 0 byte file at the location of your path, and then it'd be possible to resolve the path (thus eliminating the ..). I'm not sure that's any simpler than your normpath approach, though.
If this fits you usecase(e.g. ifle's directory already exists) you might try to resolve path's parent and then re-append file name, e.g.:
from pathlib import Path
p = Path()/'hello.there'
print(p.parent.resolve()/p.name)
Old question, but here is another solution in particular if you want POSIX paths across the board (like nix paths on Windows too).
I found pathlib resolve() to be broken as of Python 3.10, and this method is not currently exposed by PurePosixPath.
What I found worked was to use posixpath.normpath(). Also found PurePosixPath.joinpath() to be broken. I.E. It will not join ".." with "myfile.txt" as expected. It will return just "myfile.txt". But posixpath.join() works perfectly; will return "../myfile.txt".
Note this is in path strings, but easily back to pathlib.Path(my_posix_path) et al for an OOP container.
And easily transpose to Windows platform paths too by just constructing this way, as the module takes care of the platform independence for you.
Might be the solution for others with Python file path woes..

Categories

Resources