Easiest way to append non-directory component to Python Path? - python

Several SO questions ask how to append a directory to a pathlib.Path object. That's not this question.
I would like to use a Path object a prefix for a series of files in a single directory, like this:
2022-01-candidates.csv
2022-01-resumes.zip
2022-02-candidates.csv
2022-02-resumes.zip
Ideally, I would construct Path objects for the 2022-01 and 2022-02 components, and then append -candidates.csv and -resumes.zip to each.
Unfortunately, Path appears to only understand appending subdirectores, not extensions to existing path names.
The only workaround that I see is something like p.parent / (p.name + "-candidates.csv"). Although that's not so bad, it's clumsy and this pattern is common for me. I wonder whether I'm missing a more streamlined method. (For example, why isn't there a + concatenation operator?)
Path.with_suffix() requires that the suffix start with a dot, so that doesn't work.

As you mentioned, using the division operator always creates a sub-directory, and with_suffix is only for extensions. You could use with_path to edit the filename:
import pathlib
path = pathlib.Path("2022-01")
path.with_name(f"{path.name}-candidates.csv")

Related

Uppercase the names of multiple files in a directory in Python

I'm working on a small project that requires that I use Python to uppercase all the names of files in a certain directory "ex: input: Brandy.jpg , output: BRANDY.jpg".
The thing is I've never done on multiple files before, what I've done was the following:
universe = os.listdir('parallel_universe/')
universe = [os.path.splitext(x)[0].upper() for x in universe]
But what I've done capitalized the names in the list only but not the files in the directory itself, the output was like the following:
['ADAM SANDLER','ANGELINA JULIE','ARIANA GRANDE','BEN AFFLECK','BEN STILLER','BILL GATES', 'BRAD PITT','BRITNEY SPEARS','BRUCE LEE','CAMERON DIAZ','DWAYNE JOHNSON','ELON MUSK','ELTON JOHN','JACK BLACK','JACKIE CHAN','JAMIE FOXX','JASON SEGEL', 'JASON STATHAM']
What am I missing here? And since I don't have much experience in Python, I'd love if your answers include explanations for each step, and thanks in advance.
Right now, you are converting the strings to uppercase, but that's it. There is no actual renaming being done. In order to rename, you need to use os.rename
If you were to wrap your code with os.rename, it should solve your problem, like so:
[os.rename("parallel_universe/" + x, "parallel_universe/" + os.path.splitext(x)[0].upper() + os.path.splitext(x)[1]) for x in universe]
I have removed the assignment universe= because this line no longer returns a list and you will instead get a bunch on None objects.
Docs for os.rename: https://docs.python.org/3/library/os.html#os.rename

Quickly check for subdirectories in list

I have two sets of paths, with maybe 5000 files in the first set and 10000 files in the second. The first set is contained in the second set. I need to check if any of the entries in the second set is a child of any entry in the first set (i.e. if it's a subdirectory or file in another directory from the first set). There are some additional requirements:
No operations on the file system, it should be done only on the path strings (except for dealing with symlinks if needed).
Platform independent (e.g. upper/lower case, different separators)
It should be robust with respect to different ways of expressing the same path.
It should deal with both symlinks and their targets.
Some paths will be absolute and some relative.
This should be as fast as possible!
I'm thinking along the lines of getting both os.path.abspath() and os.path.realpath() for each entry and then comparing them with os.path.commonpath([parent]) == os.path.commonpath([parent, child]). I can't come up with a good way of running this fast though. Or is it safe to just compare the strings directly? That would make it much much easier. Thanks!
EDIT: I was a bit unclear about the platform independence. It should work for all platforms, but there won't be for example Windows and Unix style paths mixed.
You can first calculate the real path of all paths using os.path.realpath and then use os.path.commonprefix to check if one path in a child of the first set of paths.
Example:
import os
first = ['a', 'b/x', '/r/c']
second = ['e', 'b/x/t', 'f']
first = set(os.path.realpath(p) for p in first)
second = set(os.path.realpath(p) for p in second)
for s in second:
if any(os.path.commonprefix([s, f]) == f
for f in first):
print(s)
You get:
/full/path/to/b/x/t

Getting just the current directory without the full path in python

I apologize if this is a question that has already been resolved. I want to get the current directory when running a Python script or within Python. The following will return the full path including the current directory:
os.getcwd()
I can also get the path all the way up to the current directory:
os.path.dirname(os.getcwd())
Using os.path.split will return the same thing as the above, plus the current folder, but then I end up with an object I want:
(thing_I_dont_want, thing_I_want) = os.path.split(os.getcwd())
Is there a way I can get just the thing I want, the current folder, without creating any objects I don't want around? Alternately, is there something I can put in place of the variable thing_I_dont_wantthat will prevent it from being created (e.g. (*, thing_I_want))?
Thanks!
Like this:
os.path.split(os.getcwd())[1]
Although os.path.split returns a tuple, you don't need to unpack it. You can simply select the item that you need and ignore the one that you don't need.
Use os.path.split:
>>> os.path.split(os.getcwd())
('/home/user', 'py')
>>> os.path.split(os.getcwd())[-1]
'py'
help on os.path.split:
>>> print os.path.split.__doc__
Split a pathname. Returns tuple "(head, tail)" where "tail" is
everything after the final slash. Either part may be empty.
You could try this, though it's not safe (as all the given solutions) if the pathname ends with a / for some reason:
os.path.basename(os.getcwd())
The standard pythonic way of denoting that "this is a thing I don't want" is to call it _ - as in:
_, thing_I_want = os.path.split(os.getcwd())
Note that this doesn't do anything special. The object is being created inside os.split(), and it's still being returned and given the name _ - but this does make it clear to people reading your code that you don't care about that particular element.
As well as being a signal to other people, most IDEs and code validators will understand that the variable called _ is to be ignored, and they won't do things like warn you about it never being used.

Checking a file exists (and ignoring case) in Python

I have a Python script and I want to check if a file exists, but I want to ignore case
eg.
path = '/Path/To/File.log'
if os.path.isfile(path):
return true
The directory may look like this "/path/TO/fILe.log". But the above should still return true.
Generate one-time a set S of all absolute paths in the filesystem using os.walk, lowering them all as you collect them using str.lower.
Iterate through your large list of paths to check for existing, checking with if my_path.lower() in S.
(Optional) Go and interrogate whoever provided you the list with inconsistent cases. It sounds like an XY problem, there may be some strange reason for this and an easier way out.

Is there a one-liner to list a directory two levels deep where the second level is an only-child, but not known?

Suppose you have a directory structure like this:
A/
B/
a.1
b.2
c.3
I'm wondering if there's a way, knowing that B has no siblings AND NOT KNOWING B's NAME, to do an os.listdir operation in one swoop (that is, without calling os.listdir twice), instead of in three commands like so:
root = "A"
secondLevel = os.listdir(root)[0]
listing = os.listdir(os.path.join(root,secondLevel))
import glob
os.listdir(glob.glob('A/*')[0])
or maybe even
glob.glob('A/*/*')
You are looking for os.walk.

Categories

Resources