with open inside try - except block, too many files open? - python

Quite simply, I am cycling through all sub folders in a specific location, and collecting a few numbers from three different files.
def GrepData():
import glob as glob
import os as os
os.chdir('RUNS')
RUNSDir = os.getcwd()
Directories = glob.glob('*.*')
ObjVal = []
ParVal = []
AADVal = []
for dir in Directories:
os.chdir(dir)
(X,Y) = dir.split(sep='+')
AADPath = glob.glob('Aad.out')
ObjPath = glob.glob('fobj.out')
ParPath = glob.glob('Par.out')
try:
with open(os.path.join(os.getcwd(),ObjPath[0])) as ObjFile:
for line in ObjFile:
ObjVal.append(list([X,Y,line.split()[0]]))
ObjFile.close()
except(IndexError):
ObjFile.close()
try:
with open(os.path.join(os.getcwd(),ParPath[0])) as ParFile:
for line in ParFile:
ParVal.append(list([X,Y,line.split()[0]]))
ParFile.close()
except(IndexError):
ParFile.close()
try:
with open(os.path.join(os.getcwd(),AADPath[0])) as AADFile:
for line in AADFile:
AADVal.append(list([X,Y,line.split()[0]]))
AADFile.close()
except(IndexError):
AADFile.close()
os.chdir(RUNSDir)
Each file open command is placed in a try - except block, as in a few cases the file that is opened will be empty, and thus appending the line.split() will lead to an index error since the list is empty.
However when running this script i get the following error: "OSError: [Errno 24] Too Many open files"
I was under the impression that the idea of the "with open..." statement was that it took care of closing the file after use? Clearly that is not happening.
So what I am asking for is two things:
The answer to: "Is my understanding of with open correct?"
How can I correct whatever error is inducing this problem?
(And yes i know the code is not exactly elegant. The whole try - except ought to be a single object that is reused - but I will fix that after figuring out this error)

Try moving your try-except inside the with like so:
with open(os.path.join(os.getcwd(),ObjPath[0])) as ObjFile:
for line in ObjFile:
try:
ObjVal.append(list([X,Y,line.split()[0]]))
except(IndexError):
pass
Notes: there is no need to close your file manually, this is what with is for. Also, there is no need to use as os in your imports if you are using the same name.

"Too many open files" has nothing to do with writing semantically incorrect python code, and you are using with correctly. The key is the part of your error that says "OSError," which refers to the underlying operating system.
When you call open(), the python interpreter will execute a system call. The details of the system call vary a bit by which OS you are using, but on linux this call is open(2). The operating system kernel will handle the system call. While the file is open, it has an entry in the system file table and takes up OS resources -- this means effectively it is "taking up space" whilst it is open. As such the OS has a limit to the number of files that can be opened at any one time.
Your problem is that while you call open(), you don't call close() quickly enough. In the event that your directory structure requires you to have many thousands files open at once that might approach this cap, it can be temporarily changed (at least on linux, I'm less familiar with other OSes so I don't want to go into too many details about how to do this across platforms).

Related

Is there a way to check what part of my code leaves file handles open

Is there a way to track the python process to check where a file is being opened. I have too many files open when I use lsof on my running process but I'm not sure where they are being opened.
ls /proc/$pid/fd/ | wc -l
I suspect one of the libraries I'm using might have not handled the files properly. Is there a way to isolate exactly which line in my python code the files are being opened?
In my code I work with 3rd party libraries to process thousands of media files and since they are being left open I receive the error
OSError: [Errno 24] Too many open files
after running for a few minutes. Now I know raising the limit of open files is an option but this will just push the error to a later point of time.
The easiest way to trace the open calls is to use an audit hook in Python. Note that this method would only trace Python open calls and not the system calls.
Let fdmod.py be a module file with a single function foo:
def foo():
return open("/dev/zero", mode="r")
Now the main code in file fd_trace.py, which is tracing all open calls and importing fdmod, is defined follows:
import sys
import inspect
import fdmod
def open_audit_hook(name, *args):
if name == "open":
print(name, *args, "was called:")
caller = inspect.currentframe()
while caller := caller.f_back:
print(f"\tFunction {caller.f_code.co_name} "
f"in {caller.f_code.co_filename}:"
f"{caller.f_lineno}"
)
sys.addaudithook(open_audit_hook)
# main code
fdmod.foo()
with open("/dev/null", "w") as dev_null:
dev_null.write("hi")
fdmod.foo()
When we run fd_trace.py, we will print the call stack whenever some component is calling open:
% python3 fd_trace.py
open ('/dev/zero', 'r', 524288) was called:
Function foo in /home/tkrennwa/fdmod.py:2
Function <module> in fd_trace.py:17
open ('/dev/null', 'w', 524865) was called:
Function <module> in fd_trace.py:18
open ('/dev/zero', 'r', 524288) was called:
Function foo in /home/tkrennwa/fdmod.py:2
Function <module> in fd_trace.py:20
See sys.audithook and inspect.currentframe for details.
You might get useful information using strace. This will show all system calls made by a process, including calls to open(). It will not directly show you where in the Python code those calls are occurring, but you may be able to deduce some information from the context.
Seeing open file handles is easy on Linux:
open_file_handles = os.listdir('/proc/self/fd')
print('open file handles: ' + ', '.join(map(str, open_file_handles)))
You can also use the following on any OS (e.g. Windows, Mac):
import errno, os, resource
open_file_handles = []
for fd in range(resource.getrlimit(resource.RLIMIT_NOFILE)[0]):
try: os.fstat(fd)
except OSError as e:
if e.errno == errno.EBADF: continue
open_file_handles.append(fd)
print('open file handles: ' + ', '.join(map(str, open_file_handles)))
Note: This should always work assuming you're actually (occasionally) running out of file handles. There are usually a max of 256 file handles. But it might take a long time if the max (set by the OS/user policy) is something huge like a billion.
Note also: There will almost always be at least three file handles open for STDIN, STDOUT, and STDERR respectively.

Running Fortran executable within Python script

I am writing a Python script with the following objectives:
Starting from current working directory, change directory to child directory 'A'
Make slight adjustments to a fort.4 file
Run a Fortran binary file (the syntax of which is ../../../../ continuing until I hit the folder containing the binary); return to 2. until my particular objective is complete, then
Back out of child directory to parent, then enter another child directory and return to 2. until I have iterated through all the folders in question.
The code is coming along well. I am having to rely heavily upon Python's OS module for the directory work. However, I have never had any experience a) making minor adjustments of a file using python and b) running an executable. Could you guys give me some ideas on Python modules, direct me to a similar stack source etc, or perhaps give ways that this can be accomplished? I understand this is a vague question, so please ask if you do not understand what I am asking and I will elaborate. Also, the changes I have to make to this fort.4 file are repetitive in nature; they all happen at the same position in the file.
Cheers
EDIT::
entire fort.4 file:
file_name
movie1.dat !name of a general file the binary reads
nbr_box ! line3-8 is general info
2
this_box
1
lrdf_bead
.true.
beadid1
C1 !this is the line I must change
beadid2
F4 !This is a second line I must change
lrdf_com
.false.
bin_width
0.04
rcut
7
So really, I need to change "C1" to "C2" for example. The changes are very insignificant to make, but I must emphasize the fact that the main fortran executable reads this fort.4, as well as this movie1.dat file that I have already created. Hope this helps
Ok so there is a few important things here, first we need to be able to manage our cwd, for that we will use the os module
import os
whenever a method operates on a folder it is important to change directories into the folder and back to the parent folder. This can also be achieved with the os module.
def operateOnFolder(folder):
os.chdir(folder)
...
os.chdir("..")
Now we need to do some method for each directory, that comes with this,
for k in os.listdir(".") if os.path.isdir(k):
operateOnFolder(k)
Finally in order to operate on some preexisting FORTRAN file we can use the builtin file operators.
fileSource = open("someFile.f","r")
fileText = fileSource.read()
fileSource.close()
fileLines = fileText.split("\n")
# change a line in the file with -> fileLines[42] = "the 42nd line"
fileText = "\n".join(fileLines)
fileOutput = open("someFile.f","w")
fileOutput.write(fileText)
You can create and run your executable output.fx from source.f90::
subprocess.call(["gfortran","-o","output.fx","source.f90"])#create
subprocess.call(["output.fx"]) #execute

Can't access temporary files created with tempfile

I am using tempfile.NamedTemporaryFile() to store some text until the program ends. On Unix is working without any issues but on Windows the file returned isn't accessible for reading or writing: python gives Errno 13. The only way is to set delete=False and manually delete the file with os.remove(). Why?
This causes the IOError because the file can be opened only once after it is created.
The reason is because NamedTemporaryFile creates the file with FILE_SHARE_DELETE flag on Windows. On Windows when a file has been created/opened with specific share flag all subsequent open operations have to pass this share flag. It's not the case with Python's open function which does not pass FILE_SHARE_DELETE flag. See my answer on How to create a temporary file that can be read by a subprocess? question for more details and a workaround.
Take a look: http://docs.python.org/2/library/tempfile.html
tempfile.NamedTemporaryFile([mode='w+b'[, bufsize=-1[, suffix=''[, prefix='tmp'[, dir=None[, delete=True]]]]]])
This function operates exactly as TemporaryFile() does, except that the file is guaranteed to have a visible name in the file system (on Unix, the directory entry is not unlinked). That name can be retrieved from the name attribute of the file object. Whether the name can be used to open the file a second time, while the named temporary file is still open, varies across platforms (it can be so used on Unix; it cannot on Windows NT or later). If delete is true (the default), the file is deleted as soon as it is closed.
Thanks to #Rnhmjoj here is a working solution:
file = NamedTemporaryFile(delete=False)
file.close()
You have to keep the file with the delete-flag and then close it after creation. This way, Windows will unlock the file and you can do stuff with it!

Detecting broken stream in python when file is deleted

My problem is that logging stops for a python program when the log is rotated.
I have tracked it down to the stream itself. I don't see any way to tell if the stream is broken from python. After the file is deleted it still accepts writes without any issue.
import os
FILE = 'testing.txt'
fs = open(FILE, 'a')
fs.write('word')
os.remove(FILE)
fs.write('Nothing....') # Nothing breaks
print(fs.errors) # No errors
So, how can I find out if the file stream is still valid?
And checking to see if the file exists will not help since the file will always exist regardless of whether or not the stream is still valid.
Upon much more inspection, I found the solution. It is an OS specific problem. When the file is deleted in Linux (or Macintosh) it just unlinks it. (I was not aware of this)
So if you run lsof on the machine, it still shows the file as open.
[user#machine]$ lsof | grep --color -i "testing.txt"
python26 26495 user 8w REG 8,33 23474 671920 /home/user/temp/testing.txt (deleted)
The solution is to stat the stream in python.
stat = os.fstat(fs.fileno())
Which will give you the number of links it has.
if stat.st_nlink < 1:
#has been deleted
And there you go. Now you know if you should reload it or not. Hopefully this helps someone else.
Try Exception handling:
import os
FILE = 'testing.txt'
try:
fs = open(FILE, 'a')
fs.write('word')
os.remove(FILE)
fs.write('Nothing....') # Nothing breaks
except Exception, e:
print "Error:", e
print(fs.errors) # No errors
There are python bindings for ionotify if you need more intelligence than just an try: except: clause. But I think its only pertinent to Linux (im not sure of your platform)
Another solution I found is to add the "copytruncate" flag into the logrotate config.
See "man logrotate" for more info.

Dropping a file onto a script to run as argument causes exception in Vista

edit:OK, I could swear that the way I'd tested it showed that the getcwd was also causing the exception, but now it appears it's just the file creation. When I move the try-except blocks to their it actually does catch it like you'd think it would. So chalk that up to user error.
Original Question:
I have a script I'm working on that I want to be able to drop a file on it to have it run with that file as an argument. I checked in this question, and I already have the mentioned registry keys (apparently the Python 2.6 installer takes care of it.) However, it's throwing an exception that I can't catch. Running it from the console works correctly, but when I drop a file on it, it throws an exception then closes the console window. I tried to have it redirect standard error to a file, but it threw the exception before the redirection occurred in the script. With a little testing, and some quick eyesight I saw that it was throwing an IOError when I tried to create the file to write the error to.
import sys
import os
#os.chdir("C:/Python26/Projects/arguments")
try:
print sys.argv
raw_input()
os.getcwd()
except Exception,e:
print sys.argv + '\n'
print e
f = open("./myfile.txt", "w")
If I run this from the console with any or no arguments, it behaves as one would expect. If I run it by dropping a file on it, for instance test.txt, it runs, prints the arguments correctly, then when os.getcwd() is called, it throws the exception, and does not perform any of the stuff from the except: block, making it difficult for me to find any way to actually get the exception text to stay on screen. If I uncomment the os.chdir(), the script doesn't fail. If I move that line to within the except block, it's never executed.
I'm guessing running by dropping the file on it, which according to the other linked question, uses the WSH, is somehow messing up its permissions or the cwd, but I don't know how to work around it.
Seeing as this is probably not Python related, but a Windows problem (I for one could not reproduce the error given your code), I'd suggest attaching a debugger to the Python interpreter when it is started. Since you start the interpreter implicitly by a drag&drop action, you need to configure Windows to auto-attach a debugger each time Python starts. If I remember correctly, this article has the needed information to do that (you can substitute another debugger if you are not using Visual Studio).
Apart from that, I would take a snapshot with ProcMon while dragging a file onto your script, to get an idea of what is going on.
As pointed out in my edit above, the errors were caused by the working directory changing to C:\windows\system32, where the script isn't allowed to create files. I don't know how to get it to not change the working directory when started that way, but was able to work around it like this.
if len(sys.argv) == 1:
files = [filename for filename in os.listdir(os.getcwd())
if filename.endswith(".txt")]
else:
files = [filename for filename in sys.argv[1:]]
Fixing the working directory can be managed this way I guess.
exepath = sys.argv[0]
os.chdir(exepath[:exepath.rfind('\\')])

Categories

Resources