I have a Python script and I want to check if a file exists, but I want to ignore case
eg.
path = '/Path/To/File.log'
if os.path.isfile(path):
return true
The directory may look like this "/path/TO/fILe.log". But the above should still return true.
Generate one-time a set S of all absolute paths in the filesystem using os.walk, lowering them all as you collect them using str.lower.
Iterate through your large list of paths to check for existing, checking with if my_path.lower() in S.
(Optional) Go and interrogate whoever provided you the list with inconsistent cases. It sounds like an XY problem, there may be some strange reason for this and an easier way out.
Related
I'm a newbie at python, I'm trying to write a python code to rename multiple files with different names depending upon different match cases, here's my code
for i, file in enumerate(os.listdir(inputpath)):
if(match(".*"716262_2.*$"),file):
dstgco="Test1"+"DateHere"+".xls"
gnupgCommandOp=gnupgCommand(os.rename(os.path.join(inputpath,file),os.path.join(inputpath,dstgco)))
returnCode = call(gnupgCommandop)
if(match(".*"270811_2.*$"),file):
dstgmo="Test2"+"DateHere"+".xls"
gnupgCommandOp=gnupgCommand(os.rename(os.path.join(inputpath,file),os.path.join(inputpath,dstgmo)))
returnCode = call(gnupgCommandop)
currently what is happening is only one file is getting renamed which is Test2 DateHere with str object is not a callable error, my requirement is to rename files present at the location depending on the different match cases, I'm writing incorrect for loop or if statements?
things I have tried :
used incremental count
used glob
used only os.listdir and not enumerate
seems like it is matching the first statement and breaking on the next retrieval, may be I wrote If statements wrong
I can't debug this since I'm calling this code from an internal tool using a bat file.
can someone please help me out with this, I know only a single gnupgCommandOp should be used, is my syntax is wrong? is what would be a better way to achieve this?
I use Path.iterdir() to go through a list of directories and do so some work.
from pathlib import Path
for folder in repository.iterdir():
# do some work
After completing that task, if it's successful the directories should be empty. I'm aware that you shouldn't modify a list you're iterating over, so doing this I assume would be bad practice, correct?
for folder in repository.iterdir():
folder.rmdir()
Instead what I did was collect a list of directories that are empty and then use that list to remove the directories, like this:
# Get a list of all the directories that are empty.
directories_to_remove = [folder for folder in repository.iterdir() if not os.listdir(folder)]
# Remove all the directories that are empty
for folder in directories_to_remove:
folder.rmdir()
The folder paths still exists in directories_to_remove so I'm not modifying a list I'm iterating over, is this the correct way to delete the directories from the drive?
That is a correct way to remove the actual directory, but you are correct in that you are not removing it from directories_to_remove
If you'd like to, you could simply change:
for folder in directories_to_remove:
folder.rmdir()
to
for folder in directories_to_remove:
folder.rmdir()
directories_to_remove.remove(folder)
But, if your intentions are to not remove it from the list while iterating, you are succeeding, good job :)
iterdir doesn't remove a list, so you're good. Moreover, if it did return a list, your code doesn't delete any items from it, so it would've been fine anyway, as long as the directories are actually deleted, of course.
What happens when iterating over data in Python is determined by the iterator and problems are likely to occure when altering the iterable this iterator refers to. If you do iterate over a list that was created once, you can maipulate the data it was created from without any harm. But since iterdir "yields" values, it appears to be a generator thus it fetches the data on the fly, and whether or not bad things happen depends on its implementation (and/ or the implementation of the underlying system calls).
Unless you have severe performance or memory troubles I would not take the chance and fetch all data before processing it:
for folder in [repository.iterdir()]:
folder.rmdir()
Note thought that this is not 100% safe either since there might be a case where the directories are filled, altered or removed before you hit the folder.rmdir() but this is highly unlikely unless you use multiprossessing.
I have two sets of paths, with maybe 5000 files in the first set and 10000 files in the second. The first set is contained in the second set. I need to check if any of the entries in the second set is a child of any entry in the first set (i.e. if it's a subdirectory or file in another directory from the first set). There are some additional requirements:
No operations on the file system, it should be done only on the path strings (except for dealing with symlinks if needed).
Platform independent (e.g. upper/lower case, different separators)
It should be robust with respect to different ways of expressing the same path.
It should deal with both symlinks and their targets.
Some paths will be absolute and some relative.
This should be as fast as possible!
I'm thinking along the lines of getting both os.path.abspath() and os.path.realpath() for each entry and then comparing them with os.path.commonpath([parent]) == os.path.commonpath([parent, child]). I can't come up with a good way of running this fast though. Or is it safe to just compare the strings directly? That would make it much much easier. Thanks!
EDIT: I was a bit unclear about the platform independence. It should work for all platforms, but there won't be for example Windows and Unix style paths mixed.
You can first calculate the real path of all paths using os.path.realpath and then use os.path.commonprefix to check if one path in a child of the first set of paths.
Example:
import os
first = ['a', 'b/x', '/r/c']
second = ['e', 'b/x/t', 'f']
first = set(os.path.realpath(p) for p in first)
second = set(os.path.realpath(p) for p in second)
for s in second:
if any(os.path.commonprefix([s, f]) == f
for f in first):
print(s)
You get:
/full/path/to/b/x/t
While trying to get my build to use single compilation units I have found it necessary to test whether, given a certain filename, the respective file has changed since the last build, and then based on whether it has or not, to treat it differently (add it to a particular scu or not).
I have tried constructing a file object, and calling Node.changed() on it, but this always returns False, even when the file has changed.
How can I test to file to see if SCons thinks it has changed?
Did you try AddPostAction command ?
http://www.scons.org/doc/2.1.0/HTML/scons-user/a10706.html
It basically execute an action after a target have been built.
I apologize if this is a question that has already been resolved. I want to get the current directory when running a Python script or within Python. The following will return the full path including the current directory:
os.getcwd()
I can also get the path all the way up to the current directory:
os.path.dirname(os.getcwd())
Using os.path.split will return the same thing as the above, plus the current folder, but then I end up with an object I want:
(thing_I_dont_want, thing_I_want) = os.path.split(os.getcwd())
Is there a way I can get just the thing I want, the current folder, without creating any objects I don't want around? Alternately, is there something I can put in place of the variable thing_I_dont_wantthat will prevent it from being created (e.g. (*, thing_I_want))?
Thanks!
Like this:
os.path.split(os.getcwd())[1]
Although os.path.split returns a tuple, you don't need to unpack it. You can simply select the item that you need and ignore the one that you don't need.
Use os.path.split:
>>> os.path.split(os.getcwd())
('/home/user', 'py')
>>> os.path.split(os.getcwd())[-1]
'py'
help on os.path.split:
>>> print os.path.split.__doc__
Split a pathname. Returns tuple "(head, tail)" where "tail" is
everything after the final slash. Either part may be empty.
You could try this, though it's not safe (as all the given solutions) if the pathname ends with a / for some reason:
os.path.basename(os.getcwd())
The standard pythonic way of denoting that "this is a thing I don't want" is to call it _ - as in:
_, thing_I_want = os.path.split(os.getcwd())
Note that this doesn't do anything special. The object is being created inside os.split(), and it's still being returned and given the name _ - but this does make it clear to people reading your code that you don't care about that particular element.
As well as being a signal to other people, most IDEs and code validators will understand that the variable called _ is to be ignored, and they won't do things like warn you about it never being used.