Can joblib.Memory be used to write in a thread-safe manner to a common cache across multiple processes. In what situations, if any will this fail or cause an error?
The library first writes to a temporary file and then moves the temporary file to the destination. Source code:
def _concurrency_safe_write(self, to_write, filename, write_func):
"""Writes an object into a file in a concurrency-safe way."""
temporary_filename = concurrency_safe_write(to_write,
filename, write_func)
self._move_item(temporary_filename, filename)
Writing to the temporary file seems safe among processes in the same operating system because it includes the pid in the file name. Additionally, it seems safe among threads in the same process because it includes the thread id. Source:
def concurrency_safe_write(object_to_write, filename, write_func):
"""Writes an object into a unique file in a concurrency-safe way."""
thread_id = id(threading.current_thread())
temporary_filename = '{}.thread-{}-pid-{}'.format(
filename, thread_id, os.getpid())
write_func(object_to_write, temporary_filename)
return temporary_filename
Moving the temporary file to the destination has shown problems on Windows. Source:
if os.name == 'nt':
# https://github.com/joblib/joblib/issues/540
access_denied_errors = (5, 13)
from os import replace
def concurrency_safe_rename(src, dst):
"""Renames ``src`` into ``dst`` overwriting ``dst`` if it exists.
On Windows os.replace can yield permission errors if executed by two
different processes.
"""
max_sleep_time = 1
total_sleep_time = 0
sleep_time = 0.001
while total_sleep_time < max_sleep_time:
try:
replace(src, dst)
break
except Exception as exc:
if getattr(exc, 'winerror', None) in access_denied_errors:
time.sleep(sleep_time)
total_sleep_time += sleep_time
sleep_time *= 2
else:
raise
else:
raise
else:
from os import replace as concurrency_safe_rename # noqa
From this source code you can see that on Windows it could fail after having failed to move the temporary file to the destination because of access denied errors during a total time of 1 s and having retried with exponential backoff.
The same source code has a link to the issue #540 that describes the Windows errors and was closed with the comment:
Fixed by #541 (hopefully).
The "(hopefully)" in the comment seems to indicate that the author could not guarantee that the fix was definitive, but the issue has not been reopened, so it probably has not happened again.
For other operating systems there is no special logic or retries and just the standard os.replace() is used. The description mentions cases where it "may fail" and also that it "will be an atomic operation":
Rename the file or directory src to dst. If dst is a directory, OSError will be raised. If dst exists and is a file, it will be replaced silently if the user has permission. The operation may fail if src and dst are on different filesystems. If successful, the renaming will be an atomic operation (this is a POSIX requirement).
If no one is changing permissions in the destination directories, you should be less worried about the probability of failure of this operation. The scenario of "if src and dst are on different filesystems" seems not feasible because the source path (temporary file) is built just by adding a suffix to the destination path, so they should be in the same directory.
Other questions that deal with the atomicity of rename:
Is rename() atomic?
Is mv atomic on my fs?
Is rename required by standard to be atomic?
Related
I am using Python on Ubuntu (Linux). I would also like this code to work on modern-ish Windows PCs. I can take or leave Macs, since I have no plan to get one, but I am hoping to make this code as portable as possible.
I have written some code that is supposed to traverse a directory and run a test on all of its subdirectories (and later, some code that will do something with each file, so I need to know how to detect links there too).
I have added a check for symlinks, but I do not know how to protect against hardlinks that could cause infinite recursion. Ideally, I'd like to also protect against duplicate detections in general (root: [A,E], A: [B,C,D], E: [D,F,G], where D is the same file or directory in both A and E).
My current thought is to check if the path from the root directory to the current folder is the same as the path being tested, and if it isn't, skip it as an instance of a cycle. However, I think that would take a lot of extra I/O or it might just retrace the (actually cyclic) path that was just created.
How do I properly detect cycles in my filesystem?
def find(self) -> bool:
if self._remainingFoldersToSearch:
current_folder = self._remainingFoldersToSearch.pop()
if not current_folder.is_symlink():
contents = current_folder.iterdir()
try:
for item in contents:
if item.is_dir():
if item.name == self._indicator:
potentialArchive = [x.name for x in item.iterdir()]
if self._conf in potentialArchive:
self._archives.append(item)
if self._onArchiveReadCallback:
self._onArchiveReadCallback(item)
else:
self._remainingFoldersToSearch.append(item)
self._searched.append(item)
if self._onFolderReadCallback:
self._onFolderReadCallback(item)
except PermissionError:
logging.info("Invalid permissions accessing folder:", exc_info=True)
return True
else:
return False
What is the most elegant way to solve this:
open a file for reading, but only if it is not already opened for writing
open a file for writing, but only if it is not already opened for reading or writing
The built-in functions work like this
>>> path = r"c:\scr.txt"
>>> file1 = open(path, "w")
>>> print file1
<open file 'c:\scr.txt', mode 'w' at 0x019F88D8>
>>> file2 = open(path, "w")
>>> print file2
<open file 'c:\scr.txt', mode 'w' at 0x02332188>
>>> file1.write("111")
>>> file2.write("222")
>>> file1.close()
scr.txt now contains '111'.
>>> file2.close()
scr.txt was overwritten and now contains '222' (on Windows, Python 2.4).
The solution should work inside the same process (like in the example above) as well as when another process has opened the file.
It is preferred, if a crashing program will not keep the lock open.
I don't think there is a fully crossplatform way. On unix, the fcntl module will do this for you. However on windows (which I assume you are by the paths), you'll need to use the win32file module.
Fortunately, there is a portable implementation (portalocker) using the platform appropriate method at the python cookbook.
To use it, open the file, and then call:
portalocker.lock(file, flags)
where flags are portalocker.LOCK_EX for exclusive write access, or LOCK_SH for shared, read access.
The solution should work inside the same process (like in the example above) as well as when another process has opened the file.
If by 'another process' you mean 'whatever process' (i.e. not your program), in Linux there's no way to accomplish this relying only on system calls (fcntl & friends). What you want is mandatory locking, and the Linux way to obtain it is a bit more involved:
Remount the partition that contains your file with the mand option:
# mount -o remount,mand /dev/hdXY
Set the sgid flag for your file:
# chmod g-x,g+s yourfile
In your Python code, obtain an exclusive lock on that file:
fcntl.flock(fd, fcntl.LOCK_EX)
Now even cat will not be able to read the file until you release the lock.
EDIT: I solved it myself! By using directory existence & age as a locking mechanism! Locking by file is safe only on Windows (because Linux silently overwrites), but locking by directory works perfectly both on Linux and Windows. See my GIT where I created an easy to use class 'lockbydir.DLock' for that:
https://github.com/drandreaskrueger/lockbydir
At the bottom of the readme, you find 3 GITplayers where you can see the code examples execute live in your browser! Quite cool, isn't it? :-)
Thanks for your attention
This was my original question:
I would like to answer to parity3 (https://meta.stackoverflow.com/users/1454536/parity3) but I can neither comment directly ('You must have 50 reputation to comment'), nor do I see any way to contact him/her directly. What do you suggest to me, to get through to him?
My question:
I have implemented something similiar to what parity3 suggested here as an answer: https://stackoverflow.com/a/21444311/3693375 ("Assuming your Python interpreter, and the ...")
And it works brilliantly - on Windows. (I am using it to implement a locking mechanism that works across independently started processes. https://github.com/drandreaskrueger/lockbyfile )
But other than parity3 says, it does NOT work the same on Linux:
os.rename(src, dst)
Rename the file or directory src to dst. ... On Unix, if dst exists
and is a file,
it will be replaced silently if the user has permission.
The operation may fail on some Unix flavors if src and dst
are on different filesystems. If successful, the renaming will
be an atomic operation (this is a POSIX requirement).
On Windows, if dst already exists, OSError will be raised
(https://docs.python.org/2/library/os.html#os.rename)
The silent replacing is the problem. On Linux.
The "if dst already exists, OSError will be raised" is great for my purposes. But only on Windows, sadly.
I guess parity3's example still works most of the time, because of his if condition
if not os.path.exists(lock_filename):
try:
os.rename(tmp_filename,lock_filename)
But then the whole thing is not atomic anymore.
Because the if condition might be true in two parallel processes, and then both will rename, but only one will win the renaming race. And no exception raised (in Linux).
Any suggestions? Thanks!
P.S.: I know this is not the proper way, but I am lacking an alternative. PLEASE don't punish me with lowering my reputation. I looked around a lot, to solve this myself. How to PM users in here? And meh why can't I?
Here's a start on the win32 half of a portable implementation, that does not need a seperate locking mechanism.
Requires the Python for Windows Extensions to get down to the win32 api, but that's pretty much mandatory for python on windows already, and can alternatively be done with ctypes. The code could be adapted to expose more functionality if it's needed (such as allowing FILE_SHARE_READ rather than no sharing at all). See also the MSDN documentation for the CreateFile and WriteFile system calls, and the article on Creating and Opening Files.
As has been mentioned, you can use the standard fcntl module to implement the unix half of this, if required.
import winerror, pywintypes, win32file
class LockError(StandardError):
pass
class WriteLockedFile(object):
"""
Using win32 api to achieve something similar to file(path, 'wb')
Could be adapted to handle other modes as well.
"""
def __init__(self, path):
try:
self._handle = win32file.CreateFile(
path,
win32file.GENERIC_WRITE,
0,
None,
win32file.OPEN_ALWAYS,
win32file.FILE_ATTRIBUTE_NORMAL,
None)
except pywintypes.error, e:
if e[0] == winerror.ERROR_SHARING_VIOLATION:
raise LockError(e[2])
raise
def close(self):
self._handle.close()
def write(self, str):
win32file.WriteFile(self._handle, str)
Here's how your example from above behaves:
>>> path = "C:\\scr.txt"
>>> file1 = WriteLockedFile(path)
>>> file2 = WriteLockedFile(path) #doctest: +IGNORE_EXCEPTION_DETAIL
Traceback (most recent call last):
...
LockError: ...
>>> file1.write("111")
>>> file1.close()
>>> print file(path).read()
111
I prefer to use filelock, a cross-platform Python library which barely requires any additional code. Here's an example of how to use it:
from filelock import FileLock
lockfile = r"c:\scr.txt"
lock = FileLock(lockfile + ".lock")
with lock:
file = open(path, "w")
file.write("111")
file.close()
Any code within the with lock: block is thread-safe, meaning that it will be finished before another process has access to the file.
Assuming your Python interpreter, and the underlying os and filesystem treat os.rename as an atomic operation and it will error when the destination exists, the following method is free of race conditions. I'm using this in production on a linux machine. Requires no third party libs and is not os dependent, and aside from an extra file create, the performance hit is acceptable for many use cases. You can easily apply python's function decorator pattern or a 'with_statement' contextmanager here to abstract out the mess.
You'll need to make sure that lock_filename does not exist before a new process/task begins.
import os,time
def get_tmp_file():
filename='tmp_%s_%s'%(os.getpid(),time.time())
open(filename).close()
return filename
def do_exclusive_work():
print 'exclusive work being done...'
num_tries=10
wait_time=10
lock_filename='filename.lock'
acquired=False
for try_num in xrange(num_tries):
tmp_filename=get_tmp_file()
if not os.path.exists(lock_filename):
try:
os.rename(tmp_filename,lock_filename)
acquired=True
except (OSError,ValueError,IOError), e:
pass
if acquired:
try:
do_exclusive_work()
finally:
os.remove(lock_filename)
break
os.remove(tmp_filename)
time.sleep(wait_time)
assert acquired, 'maximum tries reached, failed to acquire lock file'
EDIT
It has come to light that os.rename silently overwrites the destination on a non-windows OS. Thanks for pointing this out # akrueger!
Here is a workaround, gathered from here:
Instead of using os.rename you can use:
try:
if os.name != 'nt': # non-windows needs a create-exclusive operation
fd = os.open(lock_filename, os.O_WRONLY | os.O_CREAT | os.O_EXCL)
os.close(fd)
# non-windows os.rename will overwrite lock_filename silently.
# We leave this call in here just so the tmp file is deleted but it could be refactored so the tmp file is never even generated for a non-windows OS
os.rename(tmp_filename,lock_filename)
acquired=True
except (OSError,ValueError,IOError), e:
if os.name != 'nt' and not 'File exists' in str(e): raise
# akrueger You're probably just fine with your directory based solution, just giving you an alternate method.
To make you safe when opening files within one application, you could try something like this:
import time
class ExclusiveFile(file):
openFiles = {}
fileLocks = []
class FileNotExclusiveException(Exception):
pass
def __init__(self, *args):
sMode = 'r'
sFileName = args[0]
try:
sMode = args[1]
except:
pass
while sFileName in ExclusiveFile.fileLocks:
time.sleep(1)
ExclusiveFile.fileLocks.append(sFileName)
if not sFileName in ExclusiveFile.openFiles.keys() or (ExclusiveFile.openFiles[sFileName] == 'r' and sMode == 'r'):
ExclusiveFile.openFiles[sFileName] = sMode
try:
file.__init__(self, sFileName, sMode)
finally:
ExclusiveFile.fileLocks.remove(sFileName)
else:
ExclusiveFile.fileLocks.remove(sFileName)
raise self.FileNotExclusiveException(sFileName)
def close(self):
del ExclusiveFile.openFiles[self.name]
file.close(self)
That way you subclass the file class. Now just do:
>>> f = ExclusiveFile('/tmp/a.txt', 'r')
>>> f
<open file '/tmp/a.txt', mode 'r' at 0xb7d7cc8c>
>>> f1 = ExclusiveFile('/tmp/a.txt', 'r')
>>> f1
<open file '/tmp/a.txt', mode 'r' at 0xb7d7c814>
>>> f2 = ExclusiveFile('/tmp/a.txt', 'w') # can't open it for writing now
exclfile.FileNotExclusiveException: /tmp/a.txt
If you open it first with 'w' mode, it won't allow anymore opens, even in read mode, just as you wanted...
What is the most elegant way to solve this:
open a file for reading, but only if it is not already opened for writing
open a file for writing, but only if it is not already opened for reading or writing
The built-in functions work like this
>>> path = r"c:\scr.txt"
>>> file1 = open(path, "w")
>>> print file1
<open file 'c:\scr.txt', mode 'w' at 0x019F88D8>
>>> file2 = open(path, "w")
>>> print file2
<open file 'c:\scr.txt', mode 'w' at 0x02332188>
>>> file1.write("111")
>>> file2.write("222")
>>> file1.close()
scr.txt now contains '111'.
>>> file2.close()
scr.txt was overwritten and now contains '222' (on Windows, Python 2.4).
The solution should work inside the same process (like in the example above) as well as when another process has opened the file.
It is preferred, if a crashing program will not keep the lock open.
I don't think there is a fully crossplatform way. On unix, the fcntl module will do this for you. However on windows (which I assume you are by the paths), you'll need to use the win32file module.
Fortunately, there is a portable implementation (portalocker) using the platform appropriate method at the python cookbook.
To use it, open the file, and then call:
portalocker.lock(file, flags)
where flags are portalocker.LOCK_EX for exclusive write access, or LOCK_SH for shared, read access.
The solution should work inside the same process (like in the example above) as well as when another process has opened the file.
If by 'another process' you mean 'whatever process' (i.e. not your program), in Linux there's no way to accomplish this relying only on system calls (fcntl & friends). What you want is mandatory locking, and the Linux way to obtain it is a bit more involved:
Remount the partition that contains your file with the mand option:
# mount -o remount,mand /dev/hdXY
Set the sgid flag for your file:
# chmod g-x,g+s yourfile
In your Python code, obtain an exclusive lock on that file:
fcntl.flock(fd, fcntl.LOCK_EX)
Now even cat will not be able to read the file until you release the lock.
EDIT: I solved it myself! By using directory existence & age as a locking mechanism! Locking by file is safe only on Windows (because Linux silently overwrites), but locking by directory works perfectly both on Linux and Windows. See my GIT where I created an easy to use class 'lockbydir.DLock' for that:
https://github.com/drandreaskrueger/lockbydir
At the bottom of the readme, you find 3 GITplayers where you can see the code examples execute live in your browser! Quite cool, isn't it? :-)
Thanks for your attention
This was my original question:
I would like to answer to parity3 (https://meta.stackoverflow.com/users/1454536/parity3) but I can neither comment directly ('You must have 50 reputation to comment'), nor do I see any way to contact him/her directly. What do you suggest to me, to get through to him?
My question:
I have implemented something similiar to what parity3 suggested here as an answer: https://stackoverflow.com/a/21444311/3693375 ("Assuming your Python interpreter, and the ...")
And it works brilliantly - on Windows. (I am using it to implement a locking mechanism that works across independently started processes. https://github.com/drandreaskrueger/lockbyfile )
But other than parity3 says, it does NOT work the same on Linux:
os.rename(src, dst)
Rename the file or directory src to dst. ... On Unix, if dst exists
and is a file,
it will be replaced silently if the user has permission.
The operation may fail on some Unix flavors if src and dst
are on different filesystems. If successful, the renaming will
be an atomic operation (this is a POSIX requirement).
On Windows, if dst already exists, OSError will be raised
(https://docs.python.org/2/library/os.html#os.rename)
The silent replacing is the problem. On Linux.
The "if dst already exists, OSError will be raised" is great for my purposes. But only on Windows, sadly.
I guess parity3's example still works most of the time, because of his if condition
if not os.path.exists(lock_filename):
try:
os.rename(tmp_filename,lock_filename)
But then the whole thing is not atomic anymore.
Because the if condition might be true in two parallel processes, and then both will rename, but only one will win the renaming race. And no exception raised (in Linux).
Any suggestions? Thanks!
P.S.: I know this is not the proper way, but I am lacking an alternative. PLEASE don't punish me with lowering my reputation. I looked around a lot, to solve this myself. How to PM users in here? And meh why can't I?
Here's a start on the win32 half of a portable implementation, that does not need a seperate locking mechanism.
Requires the Python for Windows Extensions to get down to the win32 api, but that's pretty much mandatory for python on windows already, and can alternatively be done with ctypes. The code could be adapted to expose more functionality if it's needed (such as allowing FILE_SHARE_READ rather than no sharing at all). See also the MSDN documentation for the CreateFile and WriteFile system calls, and the article on Creating and Opening Files.
As has been mentioned, you can use the standard fcntl module to implement the unix half of this, if required.
import winerror, pywintypes, win32file
class LockError(StandardError):
pass
class WriteLockedFile(object):
"""
Using win32 api to achieve something similar to file(path, 'wb')
Could be adapted to handle other modes as well.
"""
def __init__(self, path):
try:
self._handle = win32file.CreateFile(
path,
win32file.GENERIC_WRITE,
0,
None,
win32file.OPEN_ALWAYS,
win32file.FILE_ATTRIBUTE_NORMAL,
None)
except pywintypes.error, e:
if e[0] == winerror.ERROR_SHARING_VIOLATION:
raise LockError(e[2])
raise
def close(self):
self._handle.close()
def write(self, str):
win32file.WriteFile(self._handle, str)
Here's how your example from above behaves:
>>> path = "C:\\scr.txt"
>>> file1 = WriteLockedFile(path)
>>> file2 = WriteLockedFile(path) #doctest: +IGNORE_EXCEPTION_DETAIL
Traceback (most recent call last):
...
LockError: ...
>>> file1.write("111")
>>> file1.close()
>>> print file(path).read()
111
I prefer to use filelock, a cross-platform Python library which barely requires any additional code. Here's an example of how to use it:
from filelock import FileLock
lockfile = r"c:\scr.txt"
lock = FileLock(lockfile + ".lock")
with lock:
file = open(path, "w")
file.write("111")
file.close()
Any code within the with lock: block is thread-safe, meaning that it will be finished before another process has access to the file.
Assuming your Python interpreter, and the underlying os and filesystem treat os.rename as an atomic operation and it will error when the destination exists, the following method is free of race conditions. I'm using this in production on a linux machine. Requires no third party libs and is not os dependent, and aside from an extra file create, the performance hit is acceptable for many use cases. You can easily apply python's function decorator pattern or a 'with_statement' contextmanager here to abstract out the mess.
You'll need to make sure that lock_filename does not exist before a new process/task begins.
import os,time
def get_tmp_file():
filename='tmp_%s_%s'%(os.getpid(),time.time())
open(filename).close()
return filename
def do_exclusive_work():
print 'exclusive work being done...'
num_tries=10
wait_time=10
lock_filename='filename.lock'
acquired=False
for try_num in xrange(num_tries):
tmp_filename=get_tmp_file()
if not os.path.exists(lock_filename):
try:
os.rename(tmp_filename,lock_filename)
acquired=True
except (OSError,ValueError,IOError), e:
pass
if acquired:
try:
do_exclusive_work()
finally:
os.remove(lock_filename)
break
os.remove(tmp_filename)
time.sleep(wait_time)
assert acquired, 'maximum tries reached, failed to acquire lock file'
EDIT
It has come to light that os.rename silently overwrites the destination on a non-windows OS. Thanks for pointing this out # akrueger!
Here is a workaround, gathered from here:
Instead of using os.rename you can use:
try:
if os.name != 'nt': # non-windows needs a create-exclusive operation
fd = os.open(lock_filename, os.O_WRONLY | os.O_CREAT | os.O_EXCL)
os.close(fd)
# non-windows os.rename will overwrite lock_filename silently.
# We leave this call in here just so the tmp file is deleted but it could be refactored so the tmp file is never even generated for a non-windows OS
os.rename(tmp_filename,lock_filename)
acquired=True
except (OSError,ValueError,IOError), e:
if os.name != 'nt' and not 'File exists' in str(e): raise
# akrueger You're probably just fine with your directory based solution, just giving you an alternate method.
To make you safe when opening files within one application, you could try something like this:
import time
class ExclusiveFile(file):
openFiles = {}
fileLocks = []
class FileNotExclusiveException(Exception):
pass
def __init__(self, *args):
sMode = 'r'
sFileName = args[0]
try:
sMode = args[1]
except:
pass
while sFileName in ExclusiveFile.fileLocks:
time.sleep(1)
ExclusiveFile.fileLocks.append(sFileName)
if not sFileName in ExclusiveFile.openFiles.keys() or (ExclusiveFile.openFiles[sFileName] == 'r' and sMode == 'r'):
ExclusiveFile.openFiles[sFileName] = sMode
try:
file.__init__(self, sFileName, sMode)
finally:
ExclusiveFile.fileLocks.remove(sFileName)
else:
ExclusiveFile.fileLocks.remove(sFileName)
raise self.FileNotExclusiveException(sFileName)
def close(self):
del ExclusiveFile.openFiles[self.name]
file.close(self)
That way you subclass the file class. Now just do:
>>> f = ExclusiveFile('/tmp/a.txt', 'r')
>>> f
<open file '/tmp/a.txt', mode 'r' at 0xb7d7cc8c>
>>> f1 = ExclusiveFile('/tmp/a.txt', 'r')
>>> f1
<open file '/tmp/a.txt', mode 'r' at 0xb7d7c814>
>>> f2 = ExclusiveFile('/tmp/a.txt', 'w') # can't open it for writing now
exclfile.FileNotExclusiveException: /tmp/a.txt
If you open it first with 'w' mode, it won't allow anymore opens, even in read mode, just as you wanted...
Quite simply, I am cycling through all sub folders in a specific location, and collecting a few numbers from three different files.
def GrepData():
import glob as glob
import os as os
os.chdir('RUNS')
RUNSDir = os.getcwd()
Directories = glob.glob('*.*')
ObjVal = []
ParVal = []
AADVal = []
for dir in Directories:
os.chdir(dir)
(X,Y) = dir.split(sep='+')
AADPath = glob.glob('Aad.out')
ObjPath = glob.glob('fobj.out')
ParPath = glob.glob('Par.out')
try:
with open(os.path.join(os.getcwd(),ObjPath[0])) as ObjFile:
for line in ObjFile:
ObjVal.append(list([X,Y,line.split()[0]]))
ObjFile.close()
except(IndexError):
ObjFile.close()
try:
with open(os.path.join(os.getcwd(),ParPath[0])) as ParFile:
for line in ParFile:
ParVal.append(list([X,Y,line.split()[0]]))
ParFile.close()
except(IndexError):
ParFile.close()
try:
with open(os.path.join(os.getcwd(),AADPath[0])) as AADFile:
for line in AADFile:
AADVal.append(list([X,Y,line.split()[0]]))
AADFile.close()
except(IndexError):
AADFile.close()
os.chdir(RUNSDir)
Each file open command is placed in a try - except block, as in a few cases the file that is opened will be empty, and thus appending the line.split() will lead to an index error since the list is empty.
However when running this script i get the following error: "OSError: [Errno 24] Too Many open files"
I was under the impression that the idea of the "with open..." statement was that it took care of closing the file after use? Clearly that is not happening.
So what I am asking for is two things:
The answer to: "Is my understanding of with open correct?"
How can I correct whatever error is inducing this problem?
(And yes i know the code is not exactly elegant. The whole try - except ought to be a single object that is reused - but I will fix that after figuring out this error)
Try moving your try-except inside the with like so:
with open(os.path.join(os.getcwd(),ObjPath[0])) as ObjFile:
for line in ObjFile:
try:
ObjVal.append(list([X,Y,line.split()[0]]))
except(IndexError):
pass
Notes: there is no need to close your file manually, this is what with is for. Also, there is no need to use as os in your imports if you are using the same name.
"Too many open files" has nothing to do with writing semantically incorrect python code, and you are using with correctly. The key is the part of your error that says "OSError," which refers to the underlying operating system.
When you call open(), the python interpreter will execute a system call. The details of the system call vary a bit by which OS you are using, but on linux this call is open(2). The operating system kernel will handle the system call. While the file is open, it has an entry in the system file table and takes up OS resources -- this means effectively it is "taking up space" whilst it is open. As such the OS has a limit to the number of files that can be opened at any one time.
Your problem is that while you call open(), you don't call close() quickly enough. In the event that your directory structure requires you to have many thousands files open at once that might approach this cap, it can be temporarily changed (at least on linux, I'm less familiar with other OSes so I don't want to go into too many details about how to do this across platforms).
I am writing a Python script in which I write output to a temporary file and then move that file to its final destination once it is finished and closed. When the script finishes, I want the output file to have the same permissions as if it had been created normally through open(filename,"w"). As it is, the file will have the restrictive set of permissions used by the tempfile module for temp files.
Is there a way for me to figure out what the "default" file permissions for the output file would be if I created it in place, so that I can apply them to the temp file before moving it?
For the record, I had a similar issue, here is the code I have used:
import os
from tempfile import NamedTemporaryFile
def UmaskNamedTemporaryFile(*args, **kargs):
fdesc = NamedTemporaryFile(*args, **kargs)
# we need to set umask to get its current value. As noted
# by Florian Brucker (comment), this is a potential security
# issue, as it affects all the threads. Considering that it is
# less a problem to create a file with permissions 000 than 666,
# we use 666 as the umask temporary value.
umask = os.umask(0o666)
os.umask(umask)
os.chmod(fdesc.name, 0o666 & ~umask)
return fdesc
There is a function umask in the os module. You cannot get the current umask per se, you have to set it and the function returns the previous setting.
The umask is inherited from the parent process. It describes, which bits are not to be set when creating a file or directory.
This way is slow but safe and will work on any system that has a 'umask' shell command:
def current_umask() -> int:
"""Makes a best attempt to determine the current umask value of the calling process in a safe way.
Unfortunately, os.umask() is not threadsafe and poses a security risk, since there is no way to read
the current umask without temporarily setting it to a new value, then restoring it, which will affect
permissions on files created by other threads in this process during the time the umask is changed.
On recent linux kernels (>= 4.1), the current umask can be read from /proc/self/status.
On older systems, the safest way is to spawn a shell and execute the 'umask' command. The shell will
inherit the current process's umask, and will use the unsafe call, but it does so in a separate,
single-threaded process, which makes it safe.
Returns:
int: The current process's umask value
"""
mask: Optional[int] = None
try:
with open('/proc/self/status') as fd:
for line in fd:
if line.startswith('Umask:'):
mask = int(line[6:].strip(), 8)
break
except FileNotFoundError:
pass
except ValueError:
pass
if mask is None:
import subprocess
mask = int(subprocess.check_output('umask', shell=True).decode('utf-8').strip(), 8)
return mask
import os
def current_umask() -> int:
tmp = os.umask(0o022)
os.umask(tmp)
return tmp
This function is implemented in some python packages, e.g. pip and setuptools.