Why can you close() a file object more than once? - python

This question is purely out of curiosity. Brought up in a recent discussion for a question here, I've often wondered why the context manager (with) does not throw an error when people explicitly close the file anyway through misunderstanding... and then I found that you can call close() on a file multiple times with no error even without using with.
The only thing we can find relating to this is here and it just blandly says (emphasis mine):
close( )
Close the file. A closed file cannot be read or written any more. Any operation which requires that the file be open will raise a ValueError after the file has been closed. Calling close() more than once is allowed.
It would appear that this is intentional by design but, if you can't perform any operations on the closed file without an exception, we can't work out why closing the file multiple times is permitted. Is there a use case?

Thinking about with is just wrong. This behaviour has been in Python forever and thus it's worth keeping it for backward compatibility.
Because it would serve no purpose to raise an exception. If you have an actual bug in your code where you might close the file before finishing using it, you'll get an exception when using the read or write operations anyway, and thus you'll never reach the second call to close.
Allowing this will rarely make the code easier to write avoiding adding lots of if not the_file.isclosed(): the_file.close().
The BDFL designed the file objects in that way and we are stuck with that behaviour since there's no strong reason to change it.

Resource management is a big deal. It's part of the reason why we have context managers in the first place.
I'm guessing that the core development team thought that it is better to make it "safe" to call close more than once to encourage people to close their files. Otherwise, you could get yourself into a situation where you're asking "Was this closed before?". If .close() couldn't be called multiple times, the only option would be to put your file.close() call in a try/except clause. That makes the code uglier and (frankly), lots of people probably would just delete the call to file.close() rather than handle it properly. In this case, it's just convenient to be able to call file.close() without worrying about any consequences since it's almost certain to succeed and leave you with a file that you know is closed afterward.

Function call fulfilled its promise - after calling it file is closed. It's not like something failed, it's just you did not have to do anything at all to ensure condition you we're asking for.

Related

Big exception handler or lots of try...except clauses

I have a question which is about code design in python.
I'm working on a certain project and I can see that there is a certain amount of different types of errors that I have to handle often, which results in lots of places where there is a try...execept clause that repeats itself.
Now the question is, will it be more preferred to create one exception handler (a decorator) and decorate with it all those functions that have those repeating errors.
The trade off here is that if I create this exception handler decorator it will become quite a big of a class/function which will then make the person reading the code to try and understand another piece of complicated (maybe) logic to understand how the error is handled, where if I don't use the decorator, its pretty clear to the reader how is it handled.
Another option is to create multiple decorators for each of the types of the errors.
Or maybe just leave all those try...except clauses even though they are being repeated.
Any opinions on the matter and maybe other solutions? Thanks!
A lot of this is subjective, but I personally think it's better for exception handling code to be close to where the error is occurring, for the sake of readability and debugging ease. So:
The trade off here is that if I create this exception handler decorator it will become quite a big of a class/function
I would recommend against the Mr. Fixit class. When an error occurs and the debugger drops you into the Mr. Fixit, you then have to walk back quite a bit before you can figure out why the error happened, and what needs to be fixed to make it go away. Likewise, an unfamiliar developer reading your code loses the ability to understand just one small snippet pertaining to a particular error, and now has to work through a large class. As an added issue, a lot of what's in the Mr. Fixit is irrelevant to the one error they're looking at, and the place where the error handling occurs is in an entirely different place. With decorators especially, I feel like you are sacrificing readability (especially for someone less familiar with decorators than you) while gaining not much.
If written with some care, try/catch blocks are not very performance intensive and do not clutter up code too much. I would suggest erring on the side of more try/catches, with every try/catch close to what it's handling, so that you can tell at a glance how errors are handled for any given piece of code (without having to go to a different file).
If you are repeating code a lot, you can either refactor by making the code inside the catch a method that can be repeatedly called, or by making the code inside the try its own method that does its error handling inside its body. When in doubt, keep it simple.
Also, I hate being a close/off-topic Nazi so I won't flag, but I do think this question is more suited to Programmers#SE (being an abstract philosophy/conceptual question) and you might get better responses on that site.

Can I change the behaviour of "raise" or "Exception"? [duplicate]

This question already has answers here:
Calling a hook function every time an Exception is raised
(4 answers)
Closed 6 years ago.
My project's code is full of blocks like the following:
try:
execute_some_code()
except Exception:
print(datetime.datetime.now())
raise
simply because, if I get an error message, I'd like to know when it happened. I find it rather silly to repeat this code over and over, and I'd like to factor it away.
I don't want to decorate execute_some_code with something that does the error capturing (because sometimes it's just a block of code rather than a function call, and sometimes I don't need the exact same function to be decorated like that). I also don't want to divert stdout to some different stream that logs everything, because that would affect every other thing that gets sent to stdout as well.
Ideally, I'd like to over-ride the behaviour of either the raise statement (to also print datetime.datetime.now() on every execution) or the Exception class, to pre-pend all of its messages with the time. I can easily sub-class from Exception, but then I'd have to make sure my functions raise an instance of this subclass, and I'd have just as much code duplication as currently.
Is either of these options possible?
You might be able to modify python (I'd have to read code to be sure how complex that'd be), but:
You do not want to replace raise with different behaviour - trying and catching is a very pythonic approach to problem solving, so there's lots of code that works very well by e.g. calling a method and letting that method raise an exception, catching that under normal circumstances. So we can rule that approach out – you really only want to know about the exceptions you care about, not the ones that are normal during operation.
The same goes for triggering some action whenever an Exception instance is created – but:
You might be able to overwrite the global namespace; at least for things that get initialized after you declared your own Exception class. You could then add a message property that includes a timestamp. Don't do that, though – there might be people actually relying on the message to automatically react to Exceptions (bad style, but still not really seldom, sadly).

Most pythonic way to call dependant methods

I have a class with few methods - each one is setting some internal state, and usually requires some other method to be called first, to prepare stage.
Typical invocation goes like this:
c = MyMysteryClass()
c.connectToServer()
c.downloadData()
c.computeResults()
In some cases only connectToServer() and downloadData() will be called (or even just connectToServer() alone).
The question is: how should those methods behave when they are called in wrong order (or, in other words, when the internal state is not yet ready for their task)?
I see two solutions:
They should throw an exception
They should call correct previous method internally
Currently I'm using second approach, as it allows me to write less code (I can just write c.computeResults() and know that two other methods will be called if necessary). Plus, when I call them multiple times, I don't have to keep track of what was already called and so I avoid multiple reconnecting or downloading.
On the other hand, first approach seems more predictable from the caller perspective, and possibly less error prone.
And of course, there is a possibility for a hybrid solution: throw and exception, and add another layer of methods with internal state detection and proper calling of previous ones. But that seems to be a bit of an overkill.
Your suggestions?
They should throw an exception. As said in the Zen of Python: Explicit is better than implicit. And, for that matter, Errors should never pass silently. Unless explicitly silenced. If the methods are called out of order that's a programmer's mistake, and you shouldn't try to fix that by guessing what they mean. You might accidentally cover up an oversight in a way that looks like it works, but is not actually an accurate reflection of the programmer's intent. (That programmer may be future you.)
If these methods are usually called immediately one after another, you could consider collating them by adding a new method that simply calls them all in a row. That way you can use that method and not have to worry about getting it wrong.
Note that classes that handle internal state in this way are sometimes called for but are often not, in fact, necessary. Depending on your use case and the needs of the rest of your application, you may be better off doing this with functions and actually passing connection objects, etc. from one method to another, rather than using a class to store internal state. See for instance Stop Writing Classes. This is just something to consider and not an imperative; plenty of reasonable people disagree with the theory behind Stop Writing Classes.
You should write exceptions. It is good programming practice to write Exceptions to make your code easier to understand for the following reasons:
What you are describe fits the literal description of "exception" -- it is an exception to normal proceedings.
If you build in some kind of work around, you will likely have "spaghetti code" = BAD.
When you, or someone else goes back and reads this code later, it will be difficult to understand if you do not provide the hint that it is an exception to have these methods executed out of order.
Here's a good source:
http://jeffknupp.com/blog/2013/02/06/write-cleaner-python-use-exceptions/
As my CS professor always said "Good programmers can write code that computers can read, but great programmers write code that humans and computers can read".
I hope this helps.
If it's possible, you should make the dependencies explicit.
For your example:
c = MyMysteryClass()
connection = c.connectToServer()
data = c.downloadData(connection)
results = c.computeResults(data)
This way, even if you don't know how the library works, there's only one order the methods could be called in.

How do I dump an entire Python process for later debugging inspection?

I have a Python application in a strange state. I don't want to do live debugging of the process. Can I dump it to a file and examine its state later? I know I've restored corefiles of C programs in gdb later, but I don't know how to examine a Python application in a useful way from gdb.
(This is a variation on my question about debugging memleaks in a production system.)
There is no builtin way other than aborting (with os.abort(), causing the coredump if resource limits allow it) -- although you can certainly build your own 'dump' function that dumps relevant information about the data you care about. There are no ready-made tools for it.
As for handling the corefile of a Python process, the Python source has a gdbinit file that contains useful macros. It's still a lot more painful than somehow getting into the process itself (with pdb or the interactive interpreter) but it makes life a little easier.
If you only care about storing the traceback object (which is all you need to start a debugging session), you can use debuglater (a fork of pydump). It works with recent versions of Python and has a IPython/Jupyter integration.
If you want to store the entire session, look at dill. It has a dump_session, and load_session functions.
Here are two other relevant projects:
python-checkpointing2
pycrunch-trace
If you're looking for a language agnostic solution, you want to create a core dump file. Here's an example with Python.
Someone above said that there is no builtin way to perform this, but that's not entirely true. For an example, you could take a look at the pylons debugging tools. Whene there is an exception, the exception handler saves the stack trace and prints a URL on the console that can be used to retrieve the debugging session over HTTP.
While they're probably keeping these sessions in memory, they're just python objects, so there's nothing to stop you from pickling a stack dump and restoring it later for inspection. It would mean some changes to the app, but it should be possible...
After some research, it turns out the relevant code is actually coming from Paste's EvalException module. You should be able to look there to figure out what you need.
It's also possible to write something that would dump all the data from the process, e.g.
Pickler that ignores the objects it can't pickle (replacing them with something else) (e.g. Python: Pickling a dict with some unpicklable items)
Method that recursively converts everything into serializable stuff (e.g. this, except it needs a check for infinitely recursing objects and do something with those; also it could try dir() and getattr() to process some of the unknown objects, e.g. extension classes).
But leaving a running process with manhole or pylons or something like that certainly seems more convenient when possible.
(also, I wonder if something more convenient was written since this question was first asked)
This answer suggests making your program core dump and then continuing execution on another sufficiently similar box.

Overhead of passing around a file object instead of file name?

I have a method that detects what file it should open based on input, opens the file, then returns the file object.
def find_and_open(search_term):
# ... logic to find file
return open(filename, 'r')
I like this way because it hides the most amount of implementation from the caller. You give it your criteria, it spits out the file object. Why bother with String paths if you're just going to open it anyway?
However, in other Python projects I tend to see such a method return a string of the filepath, instead of the fileobject itself. The file is then opened at the very last minute, read/edited, and closed.
My questions are:
from a performance standpoint, does passing around file objects carry more overhead? I suppose a reference is a reference no matter what it points to, but perhaps there's something going on in the interpreter that makes a String reference faster to pass around than a file reference?
From a purely "Pythonic" standpoint, does it make more sense to return the file object, or the String path (and then open the file as late as possible)?
Performance is unlikely to be an issue. Reading and writing to disk are orders of magnitude slower than reading from RAM, so passing a pointer around is very unlikely to be the performance bottleneck. 1
From the python docs:
It is good practice to use the with keyword when dealing with file objects. This has the advantage that the file is properly closed after its suite finishes, even if an exception is raised on the way. It is also much shorter than writing equivalent try-finally blocks ...
Note that you can both use with to open a file and pass the file object around to other functions, either by nesting functions or using yield. Though I would consider this less pythonic than passing a file string around in most cases.
Simple is better than complex.
Flat is better than nested.
You might also be interested in pathlib.

Categories

Resources