I've been trying to find RAII in Python.
Resource Allocation Is Initialization is a pattern in C++ whereby
an object is initialized as it is created. If it fails, then it throws
an exception. In this way, the programmer knows that
the object will never be left in a half-constructed state. Python
can do this much.
But RAII also works with the scoping rules of C++
to ensure the prompt destruction of the object. As soon as the variable
pops off the stack it is destroyed. This may happen in Python, but only
if there are no external or circular references.
More importantly, a name for an object still exists until the function it is
in exits (and sometimes longer). Variables at the module level will
stick around for the life of the module.
I'd like to get an error if I do something like this:
for x in some_list:
...
... 100 lines later ...
for i in x:
# Oops! Forgot to define x first, but... where's my error?
...
I could manually delete the names after I've used it,
but that would be quite ugly, and require effort on my part.
And I'd like it to Do-What-I-Mean in this case:
for x in some_list:
surface = x.getSurface()
new_points = []
for x,y,z in surface.points:
... # Do something with the points
new_points.append( (x,y,z) )
surface.points = new_points
x.setSurface(surface)
Python does some scoping, but not at the indentation level, just at
the functional level. It seems silly to require that I make a new function
just to scope the variables so I can reuse a name.
Python 2.5 has the "with" statement
but that requires that I explicitly put in __enter__ and __exit__ functions
and generally seems more oriented towards cleaning up resources like files
and mutex locks regardless of the exit vector. It doesn't help with scoping.
Or am I missing something?
I've searched for "Python RAII" and "Python scope" and I wasn't able to find anything that
addressed the issue directly and authoritatively.
I've looked over all the PEPs. The concept doesn't seem to be addressed
within Python.
Am I a bad person because I want to have scoping variables in Python?
Is that just too un-Pythonic?
Am I not grokking it?
Perhaps I'm trying to take away the benefits of the dynamic aspects of the language.
Is it selfish to sometimes want scope enforced?
Am I lazy for wanting the compiler/interpreter
to catch my negligent variable reuse mistakes? Well, yes, of course I'm lazy,
but am I lazy in a bad way?
tl;dr RAII is not possible, you mix it up with scoping in general and when you miss those extra scopes you're probably writing bad code.
Perhaps I don't get your question(s), or you don't get some very essential things about Python... First off, deterministic object destruction tied to scope is impossible in a garbage collected language. Variables in Python are merely references. You wouldn't want a malloc'd chunk of memory to be free'd as soon as a pointer pointing to it goes out of scope, would you? Practical exception in some circumstances if you happen to use ref counting - but no language is insane enough to set the exact implementation in stone.
And even if you have reference counting, as in CPython, it's an implementation detail. Generally, including in Python which has various implementations not using ref counting, you should code as if every object hangs around until memory runs out.
As for names existing for the rest of a function invocation: You can remove a name from the current or global scope via the del statement. However, this has nothing to do with manual memory management. It just removes the reference. That may or may not happen to trigger the referenced object to be GC'd and is not the point of the exercise.
If your code is long enough for this to cause name clashes, you should write smaller functions. And use more descriptive, less likely-to-clash names. Same for nested loops overwriting the out loop's iteration variable: I'm yet to run into this issue, so perhaps your names are not descriptive enough or you should factor these loops apart?
You are correct, with has nothing to do with scoping, just with deterministic cleanup (so it overlaps with RAII in the ends, but not in the means).
Perhaps I'm trying to take away the benefits of the dynamic aspects of the language. Is it selfish to sometimes want scope enforced?
No. Decent lexical scoping is a merit independent of dynamic-/staticness. Admittedly, Python (2 - 3 pretty much fixed this) has weaknesses in this regard, although they're more in the realm of closures.
But to explain "why": Python must be conservative with where it starts a new scope because without declaration saying otherwise, assignment to a name makes it a local to the innermost/current scope. So e.g. if a for loop had it's own scope, you couldn't easily modify variables outside of the loop.
Am I lazy for wanting the compiler/interpreter to catch my negligent variable reuse mistakes? Well, yes, of course I'm lazy, but am I lazy in a bad way?
Again, I imagine that accidential resuse of a name (in a way that introduces errors or pitfalls) is rare and a small anyway.
Edit: To state this again as clearly as possible:
There can't be stack-based cleanup in a language using GC. It's just not possibly, by definition: a variable is one of potentially many references to objects on the heap that neither know nor care about when variables go out of scope, and all memory management lies in the hands of the GC, which runs when it likes to, not when a stack frame is popped. Resource cleanup is solved differently, see below.
Deterministic cleanup happens through the with statement. Yes, it doesn't introduce a new scope (see below), because that's not what it's for. It doesn't matter the name the managed object is bound to isn't removed - the cleanup happened nonetheless, what remains is a "don't touch me I'm unusable" object (e.g. a closed file stream).
Python has a scope per function, class, and module. Period. That's how the language works, whether you like it or not. If you want/"need" more fine-grained scoping, break the code into more fine-grained functions. You might wish for more fine-grained scoping, but there isn't - and for reasons pointed out earlier in this answer (three paragraphs above the "Edit:"), there are reasons for this. Like it or not, but this is how the language works.
You are right about with -- it is completely unrelated to variable scoping.
Avoid global variables if you think they are a problem. This includes module level variables.
The main tool to hide state in Python are classes.
Generator expressions (and in Python 3 also list comprehensions) have their own scope.
If your functions are long enough for you to lose track of the local variables, you should probably refactor your code.
But RAII also works with the scoping
rules of C++ to ensure the prompt
destruction of the object.
This is considered unimportant in GC languages, which are based on the idea that memory is fungible. There is no pressing need to reclaim an object's memory as long as there's enough memory elsewhere to allocate new objects. Non-fungible resources like file handles, sockets, and mutexes are considered a special case to be dealt with specially (e.g., with). This contrasts with C++'s model that treats all resources the same.
As soon as the variable pops off the
stack it is destroyed.
Python doesn't have stack variables. In C++ terms, everything is a shared_ptr.
Python does some scoping, but not at
the indentation level, just at the
functional level. It seems silly to
require that I make a new function
just to scope the variables so I can
reuse a name.
It also does scoping at the generator comprehension level (and in 3.x, in all comprehensions).
If you don't want to clobber your for loop variables, don't use so many for loops. In particular, it's un-Pythonic to use append in a loop. Instead of:
new_points = []
for x,y,z in surface.points:
... # Do something with the points
new_points.append( (x,y,z) )
write:
new_points = [do_something_with(x, y, z) for (x, y, z) in surface.points]
or
# Can be used in Python 2.4-2.7 to reduce scope of variables.
new_points = list(do_something_with(x, y, z) for (x, y, z) in surface.points)
Basically you are probably using the wrong language. If you want sane scoping rules and reliable destruction then stick with C++ or try Perl. The GC debate about when memory is released seems to miss the point. It's about releasing other resources like mutexes and file handles. I believe C# makes the distinction between a destructor that is called when the reference count goes to zero and when it decides to recycle the memory. People aren't that concerned about the memory recycling but do want to know as soon as it is no longer referenced. It's a pity as Python had real potential as a language. But it's unconventional scoping and unreliable destructors (or at least implementation dependent ones) means that one is denied the power you get with C++ and Perl.
Interesting the comment made about just using new memory if it's available rather than recycling old in GC. Isn't that just a fancy way of saying it leaks memory :-)
When switching to Python after years of C++, I have found it tempting to rely on __del__ to mimic RAII-type behavior, e.g. to close files or connections. However, there are situations (e.g. observer pattern as implemented by Rx) where the thing being observed maintains a reference to your object, keeping it alive! So, if you want to close the connection before it is terminated by the source, you won't get anywhere by trying to do that in __del__.
The following situation arises in UI programming:
class MyComponent(UiComponent):
def add_view(self, model):
view = TheView(model) # observes model
self.children.append(view)
def remove_view(self, index):
del self.children[index] # model keeps the child alive
So, here is way to get RAII-type behavior: create a container with add and remove hooks:
import collections
class ScopedList(collections.abc.MutableSequence):
def __init__(self, iterable=list(), add_hook=lambda i: None, del_hook=lambda i: None):
self._items = list()
self._add_hook = add_hook
self._del_hook = del_hook
self += iterable
def __del__(self):
del self[:]
def __getitem__(self, index):
return self._items[index]
def __setitem__(self, index, item):
self._del_hook(self._items[index])
self._add_hook(item)
self._items[index] = item
def __delitem__(self, index):
if isinstance(index, slice):
for item in self._items[index]:
self._del_hook(item)
else:
self._del_hook(self._items[index])
del self._items[index]
def __len__(self):
return len(self._items)
def __repr__(self):
return "ScopedList({})".format(self._items)
def insert(self, index, item):
self._add_hook(item)
self._items.insert(index, item)
If UiComponent.children is a ScopedList, which calls acquire and dispose methods on the children, you get the same guarantee of deterministic resource acquisition and disposal as you are used to in C++.
Related
This question already has answers here:
What is the most preferred way to pass object attributes to a function in Python?
(5 answers)
Closed 2 months ago.
A class
class Test:
self.model = model
self.type = type
self.version = version
...
test = Test()
Functions
def get_type_1(test):
if test.model == "something" and test.type == "something" and type.version == "something"
return "value"
def get_type_2(model, type, version):
if model == "something" and type == "something" and version == "something"
return "value"
From the perspective of "clean code" which type of function should I use? I couch myself using type_1 when there are more arguments and type_2 where there is 1-2 of them. Which is making a logical mess in my program. Do I need to worry in Python about speed and memory passing class all the time?
Prefer the 1st form, for three reasons.
You're not shadowing the type builtin. (Trivial, could use alternate spelling type_)
More convenient for the caller, and for folks reading the calling code.
Those three things go together. Better to show that, with the representation.
When we speak of (model, type, version),
they could be nearly anything.
There's no clear relationship among them,
and no name to hang documentation upon.
OTOH the object may have well-understood constraints,
perhaps "model is never Edsel when version > 3".
We can consult the documentation,
and the implementation,
to understand the class invariants.
Sometimes mutability is a concern.
That is, a caller might have passed in an
object with foo(test), and then we're
worried that library routine foo might possibly have
changed model "Colt" to "Bronco".
Often the docs, implicit or explicit,
will make clear that such mutations
are out of bounds, they will not happen.
To make things very obvious with
minimal documentation burden, consider
using a named tuple
for those three fields in the example.
need to worry in Python about speed and memory passing class all the time?
No.
Python is about clarity of communicating a technical
idea to other humans. It is not about speed.
Recall Knuth's advice. If speed was a principal
concern, you would have already used
cProfile
to identify the hot spots that should be
implemented in e.g. Rust, cython, or C++.
Usually that only becomes important when you
notice you're often looping more than a thousand
or a million times.
Use dis.dis()
to disassemble your two functions.
Notice that caller1 pushed a single reference
to test, while caller2 spent more time and
more stack memory pushing three references.
Down in the target code, we still need to
chase three references, so that's mostly a wash.
If you pass an object with a dozen attributes,
of which just three will be used, that's no
burden on the bytecode interpreter, the other
nine are simply never touched.
It can be an intellectual burden on an engineer
maintaining the code, who might need to reason
about those nine and dismiss them as not a concern.
Another concern that a paranoid caller might have
about called library code relates to references.
Typically we expect the called routine will not
permanently hold a reference (or weakref) on
the passed test object, nor on attributes
such as test.version or test.version.history_dict.
If the library routine will store a reference for a
long time, or pass a reference to someone that will
store it, well, that's worth documenting.
Caller will want to understand memory consumption,
leaks, and object lifetime.
While waiting for a long running function to finish executing, I began thinking about whether the garbage collector will clean up references to variables which will no longer be used.
Say for example, I have a function like:
def long_running_function():
x = MemoryIntensiveObject()
print id(x)
# lots of hard work done here which does not reference x
return
I'm intrigued whether the interpreter is smart enough to realize that x is no longer used and can be dereferenced. It's somewhat hard to test, as I can write code to check its reference count, but that implicitly then references it which obviates the reason for doing it.
My thought is, perhaps when the function is parsed and the bytecode is generated, it may be generated in such a way that will allow it to clean up the object when it can no longer be referenced.
Or, is the answer just simpler that, as long as we're still within a scope where it "could" be used, it won't be cleaned up?
No, CPython will not garbage collect an object as long as a name that references that object is still defined in the current scope.
This is because, even if there are no references to the name x as literals in the code, calls to vars() or locals() could still grab a copy of the locals namespace dictionary (either before or after the last reference to x) and therefore the entire locals namespace effectively "roots" the values it references until execution leaves its scope.
I don't know for certain how other implementations do this. In particular, in a JIT-compiled implementation like PyPy, Jython, or IronPython, it is possible at least in theory for this optimization to be performed. The JVM and CLR JITs actually do perform this optimization in practice on other languages. Whether Python on those platforms would be able to take advantage or not depends entirely on the bytecode that the Python code gets compiled into.
I was writing some Python code and, as usual, I try to make my functions small and give them a clear name (although sometimes a little too long). I get to the point where there are no global variables and everything a function needs is passed to it.
But I thought, in this case, every function has access to any other function. Why not limit their access to other functions just like we limit the access to other variables.
I was thinking to use nested functions but that implies closures and that's even worse for my purpose.
I was also thinking about using objects and I think this is the point of OOP, although it'll be a little too much boilerplate in my case.
Has anyone got this problem on her/his mind and what's the solution.
It is not a good idea to have global mutable data, e.g. variables. The mutability is the key here. You can have constants and functions to your hearts content.
But as soon as you write functions that rely on globally mutable state it limits the reusability of your functions - they're always bound to that one shared state.
For the sake of everyone reading your code, grouping the functions into classes will help to mentally categorize them. Using the class self parameter helps to organize the variables, too, by grouping them in a class.
You can limit their access with a single leading underscore at the beginning of the function name.
Global variables are discouraged because they make it hard to keep track of the state of the program. If I'm debugging a 1,000-line file, and somewhere in the middle of a function I see some_well_named_flag = False, I'm going to have a lot of hunting to do to see how else it affects what else in the program.
Functions don't have state. The places where they can modify the program are more or less limited to the parameters and return value.
If you're still concerned about controlling access to functions, there are other languages like Java or C++ that can help you do that. One convention with Python is to prefix functions that shouldn't be used outside of the class with an underscore, and then trust people not to call them from outside the class.
I'm trying to find out why the use of global is considered to be bad practice in python (and in programming in general). Can somebody explain? Links with more info would also be appreciated.
This has nothing to do with Python; global variables are bad in any programming language.
However, global constants are not conceptually the same as global variables; global constants are perfectly harmless. In Python the distinction between the two is purely by convention: CONSTANTS_ARE_CAPITALIZED and globals_are_not.
The reason global variables are bad is that they enable functions to have hidden (non-obvious, surprising, hard to detect, hard to diagnose) side effects, leading to an increase in complexity, potentially leading to Spaghetti code.
However, sane use of global state is acceptable (as is local state and mutability) even in functional programming, either for algorithm optimization, reduced complexity, caching and memoization, or the practicality of porting structures originating in a predominantly imperative codebase.
All in all, your question can be answered in many ways, so your best bet is to just google "why are global variables bad". Some examples:
Global Variables Are Bad - Wiki Wiki Web
Why is Global State so Evil? - Software Engineering Stack Exchange
Are global variables bad?
If you want to go deeper and find out why side effects are all about, and many other enlightening things, you should learn Functional Programming:
Side effect (computer science) - Wikipedia
Why are side-effects considered evil in functional programming? - Software Engineering Stack Exchange
Functional programming - Wikipedia
Yes, in theory, globals (and "state" in general) are evil. In practice, if you look into your python's packages directory you'll find that most modules there start with a bunch of global declarations. Obviously, people have no problem with them.
Specifically to python, globals' visibility is limited to a module, therefore there are no "true" globals that affect the whole program - that makes them a way less harmful. Another point: there are no const, so when you need a constant you have to use a global.
In my practice, if I happen to modify a global in a function, I always declare it with global, even if there technically no need for that, as in:
cache = {}
def foo(args):
global cache
cache[args] = ...
This makes globals' manipulations easier to track down.
A personal opinion on the topic is that having global variables being used in a function logic means that some other code can alter the logic and the expected output of that function which will make debugging very hard (especially in big projects) and will make testing harder as well.
Furthermore, if you consider other people reading your code (open-source community, colleagues etc) they will have a hard time trying to understand where the global variable is being set, where has been changed and what to expect from this global variable as opposed to an isolated function that its functionality can be determined by reading the function definition itself.
(Probably) Violating Pure Function definition
I believe that a clean and (nearly) bug-free code should have functions that are as pure as possible (see pure functions). A pure function is the one that has the following conditions:
The function always evaluates the same result value given the same argument value(s). The function result value cannot depend on any hidden information or state that may change while program execution proceeds or between different executions of the program, nor can it depend on any external input from I/O devices (usually—see below).
Evaluation of the result does not cause any semantically observable side effect or output, such as mutation of mutable objects or output to I/O devices.
Having global variables is violating at least one of the above if not both as an external code can probably cause unexpected results.
Another clear definition of pure functions: "Pure function is a function that takes all of its inputs as explicit arguments and produces all of its outputs as explicit results." [1]. Having global variables violates the idea of pure functions since an input and maybe one of the outputs (the global variable) is not explicitly being given or returned.
(Probably) Violating Unit testing F.I.R.S.T principle
Further on that, if you consider unit-testing and the F.I.R.S.T principle (Fast tests, Independent tests, Repeatable, Self-Validating and Timely) will probably violate the Independent tests principle (which means that tests don't depend on each other).
Having a global variable (not always) but in most of the cases (at least of what I have seen so far) is to prepare and pass results to other functions. This violates this principle as well. If the global variable has been used in that way (i.e the global variable used in function X has to be set in a function Y first) it means that to unit test function X you have to run test/run function Y first.
Globals as constants
On the other hand and as other people have already mentioned, if the global variable is used as a "constant" variable can be slightly better since the language does not support constants. However, I always prefer working with classes and having the "constants" as a class member and not use a global variable at all. If you have a code that two different classes require to share a global variable then you probably need to refactor your solution and make your classes independent.
I don't believe that globals shouldn't be used. But if they are used the authors should consider some principles (the ones mentioned above perhaps and other software engineering principles and good practices) for a cleaner and nearly bug-free code.
They are essential, the screen being a good example. However, in a multithreaded environment or with many developers involved, in practice often the question arises: who did (erraneously) set or clear it? Depending on the architecture, analysis can be costly and be required often. While reading the global var can be ok, writing to it must be controlled, for example by a single thread or threadsafe class. Hence, global vars arise the fear of high development costs possible by the consequences for which themselves are considered evil. Therefore in general, it's good practice to keep the number of global vars low.
I'm trying to find out why the use of global is considered to be bad practice in python (and in programming in general). Can somebody explain? Links with more info would also be appreciated.
This has nothing to do with Python; global variables are bad in any programming language.
However, global constants are not conceptually the same as global variables; global constants are perfectly harmless. In Python the distinction between the two is purely by convention: CONSTANTS_ARE_CAPITALIZED and globals_are_not.
The reason global variables are bad is that they enable functions to have hidden (non-obvious, surprising, hard to detect, hard to diagnose) side effects, leading to an increase in complexity, potentially leading to Spaghetti code.
However, sane use of global state is acceptable (as is local state and mutability) even in functional programming, either for algorithm optimization, reduced complexity, caching and memoization, or the practicality of porting structures originating in a predominantly imperative codebase.
All in all, your question can be answered in many ways, so your best bet is to just google "why are global variables bad". Some examples:
Global Variables Are Bad - Wiki Wiki Web
Why is Global State so Evil? - Software Engineering Stack Exchange
Are global variables bad?
If you want to go deeper and find out why side effects are all about, and many other enlightening things, you should learn Functional Programming:
Side effect (computer science) - Wikipedia
Why are side-effects considered evil in functional programming? - Software Engineering Stack Exchange
Functional programming - Wikipedia
Yes, in theory, globals (and "state" in general) are evil. In practice, if you look into your python's packages directory you'll find that most modules there start with a bunch of global declarations. Obviously, people have no problem with them.
Specifically to python, globals' visibility is limited to a module, therefore there are no "true" globals that affect the whole program - that makes them a way less harmful. Another point: there are no const, so when you need a constant you have to use a global.
In my practice, if I happen to modify a global in a function, I always declare it with global, even if there technically no need for that, as in:
cache = {}
def foo(args):
global cache
cache[args] = ...
This makes globals' manipulations easier to track down.
A personal opinion on the topic is that having global variables being used in a function logic means that some other code can alter the logic and the expected output of that function which will make debugging very hard (especially in big projects) and will make testing harder as well.
Furthermore, if you consider other people reading your code (open-source community, colleagues etc) they will have a hard time trying to understand where the global variable is being set, where has been changed and what to expect from this global variable as opposed to an isolated function that its functionality can be determined by reading the function definition itself.
(Probably) Violating Pure Function definition
I believe that a clean and (nearly) bug-free code should have functions that are as pure as possible (see pure functions). A pure function is the one that has the following conditions:
The function always evaluates the same result value given the same argument value(s). The function result value cannot depend on any hidden information or state that may change while program execution proceeds or between different executions of the program, nor can it depend on any external input from I/O devices (usually—see below).
Evaluation of the result does not cause any semantically observable side effect or output, such as mutation of mutable objects or output to I/O devices.
Having global variables is violating at least one of the above if not both as an external code can probably cause unexpected results.
Another clear definition of pure functions: "Pure function is a function that takes all of its inputs as explicit arguments and produces all of its outputs as explicit results." [1]. Having global variables violates the idea of pure functions since an input and maybe one of the outputs (the global variable) is not explicitly being given or returned.
(Probably) Violating Unit testing F.I.R.S.T principle
Further on that, if you consider unit-testing and the F.I.R.S.T principle (Fast tests, Independent tests, Repeatable, Self-Validating and Timely) will probably violate the Independent tests principle (which means that tests don't depend on each other).
Having a global variable (not always) but in most of the cases (at least of what I have seen so far) is to prepare and pass results to other functions. This violates this principle as well. If the global variable has been used in that way (i.e the global variable used in function X has to be set in a function Y first) it means that to unit test function X you have to run test/run function Y first.
Globals as constants
On the other hand and as other people have already mentioned, if the global variable is used as a "constant" variable can be slightly better since the language does not support constants. However, I always prefer working with classes and having the "constants" as a class member and not use a global variable at all. If you have a code that two different classes require to share a global variable then you probably need to refactor your solution and make your classes independent.
I don't believe that globals shouldn't be used. But if they are used the authors should consider some principles (the ones mentioned above perhaps and other software engineering principles and good practices) for a cleaner and nearly bug-free code.
They are essential, the screen being a good example. However, in a multithreaded environment or with many developers involved, in practice often the question arises: who did (erraneously) set or clear it? Depending on the architecture, analysis can be costly and be required often. While reading the global var can be ok, writing to it must be controlled, for example by a single thread or threadsafe class. Hence, global vars arise the fear of high development costs possible by the consequences for which themselves are considered evil. Therefore in general, it's good practice to keep the number of global vars low.