Why are generators not context managers?

Why are generators not context managers? - python

It is possible for a generator to manage a resource, e.g. by yield'ing from inside a context manager.
The resource is freed as soon as the close() method of the generator is called (or an exception is raised).
As it's easy to forget to call close() in the end, I think it's obvious to use a context manager also for that (and also to handle potential exceptions).
I know that I can use contextlib.closing for that, but wouldn't it be much nicer to directly use the generator in the with statement?
Is there a reason why a generator should not be a context manager?

In general, the reason you don't see more generators as context managers and visa versa is that they're aimed at solving different problems. Context managers came about because it provided a clean and concise way of scoping executable code to a resource.
There is one very good reason you might want to separate a class that implements __iter__() from also being a context manager, the Single Responsibility Principle. Single Responsibility boils down to the concept
Make a class do one thing and do it well
Lists are iterable but that's because they're a collection. They manage no state other than what they hold and iteration is just another way of accessing that state. Unless you need iteration as a means of accessing the state of a contained object then I can't see a reason to mix and match the two together. Even then, I would go to great lengths to separate it out in true OO style.

Like Wheaties said, you want to have classes do only "one thing and do it well". In particular with context managers, they are managing a context. So ask yourself, what is the context here? Most of the time, it will be having a resource open. A while ago I asked about using a queue with a context manager, and the response was basically that a queue did not make sense as a context. However, "in a task" was the real context that I was in and it made sense to make a context manager for that.
Additionally, there is no iterated with statement. For example, I cannot open a file and iterate through it in one statement like this:
for line in file with open(filename) as file:
...
It has to be done in two lines:
with open(filename) as file:
for line in file:
...
This is good because the context being managed is not "we are iterating through the file", it is "we have a file open". So again, what is the context? What are you really doing? Most likely, your managed context is not actually the iteration through the resource. However, if you look at your specific problem you might discover that you do indeed have a situation in which the generator is managing a context. Hopefully understanding what the context really is should give you some ideas on how to appropriately manage it.

Related

Most pythonic way to call dependant methods

I have a class with few methods - each one is setting some internal state, and usually requires some other method to be called first, to prepare stage.
Typical invocation goes like this:
c = MyMysteryClass()
c.connectToServer()
c.downloadData()
c.computeResults()
In some cases only connectToServer() and downloadData() will be called (or even just connectToServer() alone).
The question is: how should those methods behave when they are called in wrong order (or, in other words, when the internal state is not yet ready for their task)?
I see two solutions:
They should throw an exception
They should call correct previous method internally
Currently I'm using second approach, as it allows me to write less code (I can just write c.computeResults() and know that two other methods will be called if necessary). Plus, when I call them multiple times, I don't have to keep track of what was already called and so I avoid multiple reconnecting or downloading.
On the other hand, first approach seems more predictable from the caller perspective, and possibly less error prone.
And of course, there is a possibility for a hybrid solution: throw and exception, and add another layer of methods with internal state detection and proper calling of previous ones. But that seems to be a bit of an overkill.
Your suggestions?

They should throw an exception. As said in the Zen of Python: Explicit is better than implicit. And, for that matter, Errors should never pass silently. Unless explicitly silenced. If the methods are called out of order that's a programmer's mistake, and you shouldn't try to fix that by guessing what they mean. You might accidentally cover up an oversight in a way that looks like it works, but is not actually an accurate reflection of the programmer's intent. (That programmer may be future you.)
If these methods are usually called immediately one after another, you could consider collating them by adding a new method that simply calls them all in a row. That way you can use that method and not have to worry about getting it wrong.
Note that classes that handle internal state in this way are sometimes called for but are often not, in fact, necessary. Depending on your use case and the needs of the rest of your application, you may be better off doing this with functions and actually passing connection objects, etc. from one method to another, rather than using a class to store internal state. See for instance Stop Writing Classes. This is just something to consider and not an imperative; plenty of reasonable people disagree with the theory behind Stop Writing Classes.

You should write exceptions. It is good programming practice to write Exceptions to make your code easier to understand for the following reasons:
What you are describe fits the literal description of "exception" -- it is an exception to normal proceedings.
If you build in some kind of work around, you will likely have "spaghetti code" = BAD.
When you, or someone else goes back and reads this code later, it will be difficult to understand if you do not provide the hint that it is an exception to have these methods executed out of order.
Here's a good source:
http://jeffknupp.com/blog/2013/02/06/write-cleaner-python-use-exceptions/
As my CS professor always said "Good programmers can write code that computers can read, but great programmers write code that humans and computers can read".
I hope this helps.

If it's possible, you should make the dependencies explicit.
For your example:
c = MyMysteryClass()
connection = c.connectToServer()
data = c.downloadData(connection)
results = c.computeResults(data)
This way, even if you don't know how the library works, there's only one order the methods could be called in.

How to use del in a reliable way?

I have learned that python does not guarantee that __del__ is called whenever an object is deleted.
In other words, del x does not necessarily invoke its destructor x.__del__().
If I want to ensure proper object cleanup, I should use a context manager (in a with statement).
I know it's stupid, but for a couple of reasons (please don't ask why) I am tied to a system with Python 2.4; therefore context managers are out of question (they were introduced in Python 2.5)
So I need a an alternative solution, and hence my question: are there best practices that would help me to use __del__ reliably? I am thinking in the direction of "if python provides such functionality, there must be a way it can be efficiently used (I'm just to stupid to figure out how)",,,
Or I am just being naive, should forget about __del__ and move on to a completely different approach?

In short: No, there is no way to ensure it gets called.
The answer is to implement context managers yourself. A with statement roughly translates to:
x.__enter__()
try:
...
finally:
x.__exit__()
So just do it manually. It is a little more complex than that, so I recommend reading PEP 343 to fully understand how context managers work.
One option is to call your cleaning up function close(), and then in future versions of python, people can easily use contextlib.closing to turn it into a real context manager.

Instead of __del__, give your class a method called something like close, then call that explicitly:
foo = Foo()
try:
foo.do_interesting_stuff()
finally:
foo.close()
For extra safety and forward-compatibility, have __exit__ and __del__ call close as well.

Is there an established memoize on-disk decorator for python?

I have been searching a bit for a python module that offers a memoize decorator with the following capabilities:
Stores cache on disk to be reused among subsequent program runs.
Works for any pickle-able arguments, most importantly numpy arrays.
(Bonus) checks whether arguments are mutated in function calls.
I found a few small code snippets for this task and could probably implement one myself, but I would prefer having an established package for this task. I also found incpy, but that does not seem to work with the standard python interpreter.
Ideally, I would like to have something like functools.lru_cache plus cache storage on disk. Can someone point me to a suitable package for this?

I don't know of any memoize decorator that takes care of all that, but you might want to have a look at ZODB. It's a persistence system built on top of pickle that provides some additional features including being able move objects from memory to disk when they aren't being used and the ability to save only objects that have been modified.
Edit: As a follow-up for the comment. A memoization decorator isn't supported out of the box by ZODB. However, I think you can:
Implement your own persistent class
Use a memoization decorator in the methods you need (any standard implementation should work, but it probably needs to be modified to make sure that the dirty bit is set)
After that, if you create an object of that class and add it to a ZODB database, when you execute one of the memoized methods, the object will be marked as dirty and changes will be saved to the database in the next transaction commit operation.

I realize this is a 2-year-old question, and that this wouldn't count as an "established" decorator, but…
This is simple enough that you really don't need to worry about only using established code. The module's docs link to the source because, in addition to being useful in its own right, it works as sample code.
So, what do you need to add? Add a filename parameter. At run time, pickle.load the filename into the cache, using {} if it fails. Add a cache_save function that just pickle.saves the cache to the file under the lock. Attach that function to wrapper the same as the existing ones (cache_info, etc.).
If you want to save the cache automatically, instead of leaving it up to the caller, that's easy; it's just a matter of when to do so. Any option you come up with—atexit.register, adding a save_every argument so it saves every save_every misses, …—is trivial to implement. In this answer I showed how little work it takes. Or you can get a complete working version (to customize, or to use as-is) on GitHub.
There are other ways you could extend it—put some save-related statistics (last save time, hits and misses since last save, …) in the cache_info, copy the cache and save it in a background thread instead of saving it inline, etc. But I can't think of anything that would be worth doing that wouldn't be easy.

Guaranteeing a file close

I have a class where I create a file object in the constructor. This class also implements a finish() method as part of its interface and in this method I close the file object. The problem is that if I get an exception before this point, the file will not be closed. The class in question has a number of other methods that use the file object. Do I need to wrap all of these in a try finally clause or is there a better approach?
Thanks,
Barry

You could make your class a context-manager, and then wrap object creation and use of that class in a with-statement. See PEP 343 for details.
To make your class a context-manager, it has to implement the methods __enter__() and __exit__(). __enter__() is called when you enter the with-statement, and __exit__() is guaranteed to be called when you leave it, no matter how.
You could then use your class like this:
with MyClass() as foo:
# use foo here
If you acquire your resources in the constructor, you can make __enter__() simply return self without doing anything. __exit__() should just call your finish()-method.

For short lived file objects, a try/finally pair or the more succinct with-statement is recommended as a clean way to make sure the files are flushed and the related resources are released.
For long lived file objects, you can register with atexit() for an explicit close or just rely on the interpreter cleaning up before it exits.
At the interactive prompt, most people don't bother for simple experiments where there isn't much of a downside to leaving files unclosed or relying on refcounting or GC to close for you.
Closing your files is considered good technique. In reality though, not explicitly closing files rarely has any noticeable effects.

You can either have a try...finally pair, or make your class a context manager suitable for use in the with statement.

Design question in Python: should this be one generic function or two specific ones?

I'm creating a basic database utility class in Python. I'm refactoring an old module into a class. I'm now working on an executeQuery() function, and I'm unsure of whether to keep the old design or change it. Here are the 2 options:
(The old design:) Have one generic executeQuery method that takes the query to execute and a boolean commit parameter that indicates whether to commit (insert, update, delete) or not (select), and determines with an if statement whether to commit or to select and return.
(This is the way I'm used to, but that might be because you can't have a function that sometimes returns something and sometimes doesn't in the languages I've worked with:) Have 2 functions, executeQuery and executeUpdateQuery (or something equivalent). executeQuery will execute a simple query and return a result set, while executeUpdateQuery will make changes to the DB (insert, update, delete) and return nothing.
Is it accepted to use the first way? It seems unclear to me, but maybe it's more Pythonistic...? Python is very flexible, maybe I should take advantage of this feature that can't really be accomplished in this way in more strict languages...
And a second part of this question, unrelated to the main idea - what is the best way to return query results in Python? Using which function to query the database, in what format...?

It's propably just me and my FP fetish, but I think a function executed solely for side effects is very different from a non-destructive function that fetches some data, and therefore have different names. Especially if the generic function would do something different depending on exactly that (the part on the commit parameter seems to imply that).
As for how to return results... I'm a huge fan of generators, but if the library you use for database connections returns a list anyway, you might as well pass this list on - a generator wouldn't buy you anything in this case. But if it allows you to iterate over the results (one at a time), seize the opportunity to save a lot of memory on larger queries.

I don't know how to answer the first part of your question, it seems like a matter of style more than anything else. Maybe you could invoke the Single Responsibility Principle to argue that it should be two separate functions.
When you're going to return a sequence of indeterminate length, it's best to use a Generator.

I'd have two methods, one which updates the database and one which doesn't. Both could delegate to a common private method, if they share a lot of code.
By separating the two methods, it becomes clear to callers what the different semantics are between the two, makes documenting the different methods easier, and clarifies what return types to expect. Since you can pull out shared code into private methods on the object, there's no worry about duplicating code.
As for returning query results, it'll depend on whether you're loading all the results from the database before returning, or returning a cursor object. I'd be tempted to do something like the following:
with db.executeQuery('SELECT * FROM my_table') as results:
for row in results:
print row['col1'], row['col2']
... so the executeQuery method returns a ContextManager object (which cleans up any open connections, if needed), which also acts as a Generator. And the results from the generator act as read-only dicts.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Why are generators not context managers? - python

Related

Most pythonic way to call dependant methods

How to use del in a reliable way?

Is there an established memoize on-disk decorator for python?

Guaranteeing a file close

Design question in Python: should this be one generic function or two specific ones?

Categories

Resources

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Why are generators not context managers? - python

Related

Most pythonic way to call dependant methods

How to use __del__ in a reliable way?

Is there an established memoize on-disk decorator for python?

Guaranteeing a file close

Design question in Python: should this be one generic function or two specific ones?

Categories

Resources

How to use del in a reliable way?