I'm trying to find out why the use of global is considered to be bad practice in python (and in programming in general). Can somebody explain? Links with more info would also be appreciated.
This has nothing to do with Python; global variables are bad in any programming language.
However, global constants are not conceptually the same as global variables; global constants are perfectly harmless. In Python the distinction between the two is purely by convention: CONSTANTS_ARE_CAPITALIZED and globals_are_not.
The reason global variables are bad is that they enable functions to have hidden (non-obvious, surprising, hard to detect, hard to diagnose) side effects, leading to an increase in complexity, potentially leading to Spaghetti code.
However, sane use of global state is acceptable (as is local state and mutability) even in functional programming, either for algorithm optimization, reduced complexity, caching and memoization, or the practicality of porting structures originating in a predominantly imperative codebase.
All in all, your question can be answered in many ways, so your best bet is to just google "why are global variables bad". Some examples:
Global Variables Are Bad - Wiki Wiki Web
Why is Global State so Evil? - Software Engineering Stack Exchange
Are global variables bad?
If you want to go deeper and find out why side effects are all about, and many other enlightening things, you should learn Functional Programming:
Side effect (computer science) - Wikipedia
Why are side-effects considered evil in functional programming? - Software Engineering Stack Exchange
Functional programming - Wikipedia
Yes, in theory, globals (and "state" in general) are evil. In practice, if you look into your python's packages directory you'll find that most modules there start with a bunch of global declarations. Obviously, people have no problem with them.
Specifically to python, globals' visibility is limited to a module, therefore there are no "true" globals that affect the whole program - that makes them a way less harmful. Another point: there are no const, so when you need a constant you have to use a global.
In my practice, if I happen to modify a global in a function, I always declare it with global, even if there technically no need for that, as in:
cache = {}
def foo(args):
global cache
cache[args] = ...
This makes globals' manipulations easier to track down.
A personal opinion on the topic is that having global variables being used in a function logic means that some other code can alter the logic and the expected output of that function which will make debugging very hard (especially in big projects) and will make testing harder as well.
Furthermore, if you consider other people reading your code (open-source community, colleagues etc) they will have a hard time trying to understand where the global variable is being set, where has been changed and what to expect from this global variable as opposed to an isolated function that its functionality can be determined by reading the function definition itself.
(Probably) Violating Pure Function definition
I believe that a clean and (nearly) bug-free code should have functions that are as pure as possible (see pure functions). A pure function is the one that has the following conditions:
The function always evaluates the same result value given the same argument value(s). The function result value cannot depend on any hidden information or state that may change while program execution proceeds or between different executions of the program, nor can it depend on any external input from I/O devices (usually—see below).
Evaluation of the result does not cause any semantically observable side effect or output, such as mutation of mutable objects or output to I/O devices.
Having global variables is violating at least one of the above if not both as an external code can probably cause unexpected results.
Another clear definition of pure functions: "Pure function is a function that takes all of its inputs as explicit arguments and produces all of its outputs as explicit results." [1]. Having global variables violates the idea of pure functions since an input and maybe one of the outputs (the global variable) is not explicitly being given or returned.
(Probably) Violating Unit testing F.I.R.S.T principle
Further on that, if you consider unit-testing and the F.I.R.S.T principle (Fast tests, Independent tests, Repeatable, Self-Validating and Timely) will probably violate the Independent tests principle (which means that tests don't depend on each other).
Having a global variable (not always) but in most of the cases (at least of what I have seen so far) is to prepare and pass results to other functions. This violates this principle as well. If the global variable has been used in that way (i.e the global variable used in function X has to be set in a function Y first) it means that to unit test function X you have to run test/run function Y first.
Globals as constants
On the other hand and as other people have already mentioned, if the global variable is used as a "constant" variable can be slightly better since the language does not support constants. However, I always prefer working with classes and having the "constants" as a class member and not use a global variable at all. If you have a code that two different classes require to share a global variable then you probably need to refactor your solution and make your classes independent.
I don't believe that globals shouldn't be used. But if they are used the authors should consider some principles (the ones mentioned above perhaps and other software engineering principles and good practices) for a cleaner and nearly bug-free code.
They are essential, the screen being a good example. However, in a multithreaded environment or with many developers involved, in practice often the question arises: who did (erraneously) set or clear it? Depending on the architecture, analysis can be costly and be required often. While reading the global var can be ok, writing to it must be controlled, for example by a single thread or threadsafe class. Hence, global vars arise the fear of high development costs possible by the consequences for which themselves are considered evil. Therefore in general, it's good practice to keep the number of global vars low.
Related
I was writing some Python code and, as usual, I try to make my functions small and give them a clear name (although sometimes a little too long). I get to the point where there are no global variables and everything a function needs is passed to it.
But I thought, in this case, every function has access to any other function. Why not limit their access to other functions just like we limit the access to other variables.
I was thinking to use nested functions but that implies closures and that's even worse for my purpose.
I was also thinking about using objects and I think this is the point of OOP, although it'll be a little too much boilerplate in my case.
Has anyone got this problem on her/his mind and what's the solution.
It is not a good idea to have global mutable data, e.g. variables. The mutability is the key here. You can have constants and functions to your hearts content.
But as soon as you write functions that rely on globally mutable state it limits the reusability of your functions - they're always bound to that one shared state.
For the sake of everyone reading your code, grouping the functions into classes will help to mentally categorize them. Using the class self parameter helps to organize the variables, too, by grouping them in a class.
You can limit their access with a single leading underscore at the beginning of the function name.
Global variables are discouraged because they make it hard to keep track of the state of the program. If I'm debugging a 1,000-line file, and somewhere in the middle of a function I see some_well_named_flag = False, I'm going to have a lot of hunting to do to see how else it affects what else in the program.
Functions don't have state. The places where they can modify the program are more or less limited to the parameters and return value.
If you're still concerned about controlling access to functions, there are other languages like Java or C++ that can help you do that. One convention with Python is to prefix functions that shouldn't be used outside of the class with an underscore, and then trust people not to call them from outside the class.
I'm trying to find out why the use of global is considered to be bad practice in python (and in programming in general). Can somebody explain? Links with more info would also be appreciated.
This has nothing to do with Python; global variables are bad in any programming language.
However, global constants are not conceptually the same as global variables; global constants are perfectly harmless. In Python the distinction between the two is purely by convention: CONSTANTS_ARE_CAPITALIZED and globals_are_not.
The reason global variables are bad is that they enable functions to have hidden (non-obvious, surprising, hard to detect, hard to diagnose) side effects, leading to an increase in complexity, potentially leading to Spaghetti code.
However, sane use of global state is acceptable (as is local state and mutability) even in functional programming, either for algorithm optimization, reduced complexity, caching and memoization, or the practicality of porting structures originating in a predominantly imperative codebase.
All in all, your question can be answered in many ways, so your best bet is to just google "why are global variables bad". Some examples:
Global Variables Are Bad - Wiki Wiki Web
Why is Global State so Evil? - Software Engineering Stack Exchange
Are global variables bad?
If you want to go deeper and find out why side effects are all about, and many other enlightening things, you should learn Functional Programming:
Side effect (computer science) - Wikipedia
Why are side-effects considered evil in functional programming? - Software Engineering Stack Exchange
Functional programming - Wikipedia
Yes, in theory, globals (and "state" in general) are evil. In practice, if you look into your python's packages directory you'll find that most modules there start with a bunch of global declarations. Obviously, people have no problem with them.
Specifically to python, globals' visibility is limited to a module, therefore there are no "true" globals that affect the whole program - that makes them a way less harmful. Another point: there are no const, so when you need a constant you have to use a global.
In my practice, if I happen to modify a global in a function, I always declare it with global, even if there technically no need for that, as in:
cache = {}
def foo(args):
global cache
cache[args] = ...
This makes globals' manipulations easier to track down.
A personal opinion on the topic is that having global variables being used in a function logic means that some other code can alter the logic and the expected output of that function which will make debugging very hard (especially in big projects) and will make testing harder as well.
Furthermore, if you consider other people reading your code (open-source community, colleagues etc) they will have a hard time trying to understand where the global variable is being set, where has been changed and what to expect from this global variable as opposed to an isolated function that its functionality can be determined by reading the function definition itself.
(Probably) Violating Pure Function definition
I believe that a clean and (nearly) bug-free code should have functions that are as pure as possible (see pure functions). A pure function is the one that has the following conditions:
The function always evaluates the same result value given the same argument value(s). The function result value cannot depend on any hidden information or state that may change while program execution proceeds or between different executions of the program, nor can it depend on any external input from I/O devices (usually—see below).
Evaluation of the result does not cause any semantically observable side effect or output, such as mutation of mutable objects or output to I/O devices.
Having global variables is violating at least one of the above if not both as an external code can probably cause unexpected results.
Another clear definition of pure functions: "Pure function is a function that takes all of its inputs as explicit arguments and produces all of its outputs as explicit results." [1]. Having global variables violates the idea of pure functions since an input and maybe one of the outputs (the global variable) is not explicitly being given or returned.
(Probably) Violating Unit testing F.I.R.S.T principle
Further on that, if you consider unit-testing and the F.I.R.S.T principle (Fast tests, Independent tests, Repeatable, Self-Validating and Timely) will probably violate the Independent tests principle (which means that tests don't depend on each other).
Having a global variable (not always) but in most of the cases (at least of what I have seen so far) is to prepare and pass results to other functions. This violates this principle as well. If the global variable has been used in that way (i.e the global variable used in function X has to be set in a function Y first) it means that to unit test function X you have to run test/run function Y first.
Globals as constants
On the other hand and as other people have already mentioned, if the global variable is used as a "constant" variable can be slightly better since the language does not support constants. However, I always prefer working with classes and having the "constants" as a class member and not use a global variable at all. If you have a code that two different classes require to share a global variable then you probably need to refactor your solution and make your classes independent.
I don't believe that globals shouldn't be used. But if they are used the authors should consider some principles (the ones mentioned above perhaps and other software engineering principles and good practices) for a cleaner and nearly bug-free code.
They are essential, the screen being a good example. However, in a multithreaded environment or with many developers involved, in practice often the question arises: who did (erraneously) set or clear it? Depending on the architecture, analysis can be costly and be required often. While reading the global var can be ok, writing to it must be controlled, for example by a single thread or threadsafe class. Hence, global vars arise the fear of high development costs possible by the consequences for which themselves are considered evil. Therefore in general, it's good practice to keep the number of global vars low.
I came upon a question which says that Python code runs faster in functions. So I thought that breaking code down into as many parts as I can will be the faster approach. But when timing some functions I found that it wasn't really correct.
I won't post the code here as it is currently placed for review at codereview. I am still figuring out the best way to do timing as that code is also placed for review at codereview and I am not getting many answer despite a bounty.
I figured out that the performance benefit cannot be infinite and there has to be a limit where breaking functions down would stop giving performance benefit.
So what is the the limit where performance benefit stops by breaking Python code into various functions? From the point of performance, when is breaking code into functions no longer useful?
It is true that code runs faster in a function than in the global scope because of access times. A function has its own local scope which is implemented as an array, whereas the global scope is really just a dictionary. Arrays are accessed faster than dicts and this is happening under the hood, on the C level.
This doesn't imply that breaking the code up into many functions will make it quicker, it only implies that moving code from global to inside a function will improve access times, which might result in an overall boost in speed.
Even if you were to shove all your code inside one function just for the sake of using local scope instead of global, it doesn't guarantee a performance improvement, which can only be determined by actually profiling the code. This is because function calls have a relatively high overhead in Python which might dwarf any performance gain from faster local access times.
There's a lot of info on the page at the first link you provided that confirms this.
As I understand it, that performance boost only applies when global variables are moved into a local scope of a function, as it improves their access time. Further breaking your code down into more functions won't provide any similar boost.
I've been trying to find RAII in Python.
Resource Allocation Is Initialization is a pattern in C++ whereby
an object is initialized as it is created. If it fails, then it throws
an exception. In this way, the programmer knows that
the object will never be left in a half-constructed state. Python
can do this much.
But RAII also works with the scoping rules of C++
to ensure the prompt destruction of the object. As soon as the variable
pops off the stack it is destroyed. This may happen in Python, but only
if there are no external or circular references.
More importantly, a name for an object still exists until the function it is
in exits (and sometimes longer). Variables at the module level will
stick around for the life of the module.
I'd like to get an error if I do something like this:
for x in some_list:
...
... 100 lines later ...
for i in x:
# Oops! Forgot to define x first, but... where's my error?
...
I could manually delete the names after I've used it,
but that would be quite ugly, and require effort on my part.
And I'd like it to Do-What-I-Mean in this case:
for x in some_list:
surface = x.getSurface()
new_points = []
for x,y,z in surface.points:
... # Do something with the points
new_points.append( (x,y,z) )
surface.points = new_points
x.setSurface(surface)
Python does some scoping, but not at the indentation level, just at
the functional level. It seems silly to require that I make a new function
just to scope the variables so I can reuse a name.
Python 2.5 has the "with" statement
but that requires that I explicitly put in __enter__ and __exit__ functions
and generally seems more oriented towards cleaning up resources like files
and mutex locks regardless of the exit vector. It doesn't help with scoping.
Or am I missing something?
I've searched for "Python RAII" and "Python scope" and I wasn't able to find anything that
addressed the issue directly and authoritatively.
I've looked over all the PEPs. The concept doesn't seem to be addressed
within Python.
Am I a bad person because I want to have scoping variables in Python?
Is that just too un-Pythonic?
Am I not grokking it?
Perhaps I'm trying to take away the benefits of the dynamic aspects of the language.
Is it selfish to sometimes want scope enforced?
Am I lazy for wanting the compiler/interpreter
to catch my negligent variable reuse mistakes? Well, yes, of course I'm lazy,
but am I lazy in a bad way?
tl;dr RAII is not possible, you mix it up with scoping in general and when you miss those extra scopes you're probably writing bad code.
Perhaps I don't get your question(s), or you don't get some very essential things about Python... First off, deterministic object destruction tied to scope is impossible in a garbage collected language. Variables in Python are merely references. You wouldn't want a malloc'd chunk of memory to be free'd as soon as a pointer pointing to it goes out of scope, would you? Practical exception in some circumstances if you happen to use ref counting - but no language is insane enough to set the exact implementation in stone.
And even if you have reference counting, as in CPython, it's an implementation detail. Generally, including in Python which has various implementations not using ref counting, you should code as if every object hangs around until memory runs out.
As for names existing for the rest of a function invocation: You can remove a name from the current or global scope via the del statement. However, this has nothing to do with manual memory management. It just removes the reference. That may or may not happen to trigger the referenced object to be GC'd and is not the point of the exercise.
If your code is long enough for this to cause name clashes, you should write smaller functions. And use more descriptive, less likely-to-clash names. Same for nested loops overwriting the out loop's iteration variable: I'm yet to run into this issue, so perhaps your names are not descriptive enough or you should factor these loops apart?
You are correct, with has nothing to do with scoping, just with deterministic cleanup (so it overlaps with RAII in the ends, but not in the means).
Perhaps I'm trying to take away the benefits of the dynamic aspects of the language. Is it selfish to sometimes want scope enforced?
No. Decent lexical scoping is a merit independent of dynamic-/staticness. Admittedly, Python (2 - 3 pretty much fixed this) has weaknesses in this regard, although they're more in the realm of closures.
But to explain "why": Python must be conservative with where it starts a new scope because without declaration saying otherwise, assignment to a name makes it a local to the innermost/current scope. So e.g. if a for loop had it's own scope, you couldn't easily modify variables outside of the loop.
Am I lazy for wanting the compiler/interpreter to catch my negligent variable reuse mistakes? Well, yes, of course I'm lazy, but am I lazy in a bad way?
Again, I imagine that accidential resuse of a name (in a way that introduces errors or pitfalls) is rare and a small anyway.
Edit: To state this again as clearly as possible:
There can't be stack-based cleanup in a language using GC. It's just not possibly, by definition: a variable is one of potentially many references to objects on the heap that neither know nor care about when variables go out of scope, and all memory management lies in the hands of the GC, which runs when it likes to, not when a stack frame is popped. Resource cleanup is solved differently, see below.
Deterministic cleanup happens through the with statement. Yes, it doesn't introduce a new scope (see below), because that's not what it's for. It doesn't matter the name the managed object is bound to isn't removed - the cleanup happened nonetheless, what remains is a "don't touch me I'm unusable" object (e.g. a closed file stream).
Python has a scope per function, class, and module. Period. That's how the language works, whether you like it or not. If you want/"need" more fine-grained scoping, break the code into more fine-grained functions. You might wish for more fine-grained scoping, but there isn't - and for reasons pointed out earlier in this answer (three paragraphs above the "Edit:"), there are reasons for this. Like it or not, but this is how the language works.
You are right about with -- it is completely unrelated to variable scoping.
Avoid global variables if you think they are a problem. This includes module level variables.
The main tool to hide state in Python are classes.
Generator expressions (and in Python 3 also list comprehensions) have their own scope.
If your functions are long enough for you to lose track of the local variables, you should probably refactor your code.
But RAII also works with the scoping
rules of C++ to ensure the prompt
destruction of the object.
This is considered unimportant in GC languages, which are based on the idea that memory is fungible. There is no pressing need to reclaim an object's memory as long as there's enough memory elsewhere to allocate new objects. Non-fungible resources like file handles, sockets, and mutexes are considered a special case to be dealt with specially (e.g., with). This contrasts with C++'s model that treats all resources the same.
As soon as the variable pops off the
stack it is destroyed.
Python doesn't have stack variables. In C++ terms, everything is a shared_ptr.
Python does some scoping, but not at
the indentation level, just at the
functional level. It seems silly to
require that I make a new function
just to scope the variables so I can
reuse a name.
It also does scoping at the generator comprehension level (and in 3.x, in all comprehensions).
If you don't want to clobber your for loop variables, don't use so many for loops. In particular, it's un-Pythonic to use append in a loop. Instead of:
new_points = []
for x,y,z in surface.points:
... # Do something with the points
new_points.append( (x,y,z) )
write:
new_points = [do_something_with(x, y, z) for (x, y, z) in surface.points]
or
# Can be used in Python 2.4-2.7 to reduce scope of variables.
new_points = list(do_something_with(x, y, z) for (x, y, z) in surface.points)
Basically you are probably using the wrong language. If you want sane scoping rules and reliable destruction then stick with C++ or try Perl. The GC debate about when memory is released seems to miss the point. It's about releasing other resources like mutexes and file handles. I believe C# makes the distinction between a destructor that is called when the reference count goes to zero and when it decides to recycle the memory. People aren't that concerned about the memory recycling but do want to know as soon as it is no longer referenced. It's a pity as Python had real potential as a language. But it's unconventional scoping and unreliable destructors (or at least implementation dependent ones) means that one is denied the power you get with C++ and Perl.
Interesting the comment made about just using new memory if it's available rather than recycling old in GC. Isn't that just a fancy way of saying it leaks memory :-)
When switching to Python after years of C++, I have found it tempting to rely on __del__ to mimic RAII-type behavior, e.g. to close files or connections. However, there are situations (e.g. observer pattern as implemented by Rx) where the thing being observed maintains a reference to your object, keeping it alive! So, if you want to close the connection before it is terminated by the source, you won't get anywhere by trying to do that in __del__.
The following situation arises in UI programming:
class MyComponent(UiComponent):
def add_view(self, model):
view = TheView(model) # observes model
self.children.append(view)
def remove_view(self, index):
del self.children[index] # model keeps the child alive
So, here is way to get RAII-type behavior: create a container with add and remove hooks:
import collections
class ScopedList(collections.abc.MutableSequence):
def __init__(self, iterable=list(), add_hook=lambda i: None, del_hook=lambda i: None):
self._items = list()
self._add_hook = add_hook
self._del_hook = del_hook
self += iterable
def __del__(self):
del self[:]
def __getitem__(self, index):
return self._items[index]
def __setitem__(self, index, item):
self._del_hook(self._items[index])
self._add_hook(item)
self._items[index] = item
def __delitem__(self, index):
if isinstance(index, slice):
for item in self._items[index]:
self._del_hook(item)
else:
self._del_hook(self._items[index])
del self._items[index]
def __len__(self):
return len(self._items)
def __repr__(self):
return "ScopedList({})".format(self._items)
def insert(self, index, item):
self._add_hook(item)
self._items.insert(index, item)
If UiComponent.children is a ScopedList, which calls acquire and dispose methods on the children, you get the same guarantee of deterministic resource acquisition and disposal as you are used to in C++.
I try to avoid "global" statements in python and Do you use the "global" statement in Python? suggests this is a common view. Values go into a function through its arguments and come out through its return statement (or reading/writing files or exceptions or probably something else I'm forgetting).
Within a class, self.variable statements are in effect global to each instance of the class. You can access the variable in any method in the class.
Do the same reasons we should avoid globals apply within classes, so that we should only use values in methods that come in through its arguments? I'm especially thinking about long classes that can be just about an entire program. Does the encapsulation inherent in a class eliminate the concern? In any case, we should make inputs, outputs and side effects clear in comments?
Do the same reasons we should avoid
globals apply within classes, so that
we should only use values in methods
that come in through its arguments?
I'm especially thinking about long
classes that can be just about an
entire program.
Classes exist to couple behaviour with state. If you take away the state part (which is what you're suggesting) then you have no need for classes. Nothing wrong with that, of course - much good software has been written without object-orientation.
Generally, if you're following the Single Responsibility Principle when making your classes, then these variables will be typically used together by a class that needs access to most or all of them in each method. You don't pass them in explicitly because the class exclusively works with behaviour that could reasonably access the entire state.
To put it another way, if you find yourself with a class that doesn't use half of its variables in a lot of its methods, that's probably a sign that you should split it into two classes.
self.variable is not global to the class, it's global to the instance. There's a big difference:
class MyClass:
def __init__(self, a):
self.a = a
mc1 = MyClass(1)
mc2 = MyClass(2)
assert mc1.a == 1
assert mc2.a == 2
You should definitely use self to encapsulate data in your classes.
That said, it is definitely possible to create huge overgrown classes that abuse instance variables in all the ways regular variables can be abused. This is where skill and craftsmanship come into play: properly dividing up your code into manageable chunks.
Ideally, no instance-wide variables would be used and everything would be passed as a parameter and well-documented in comments. That being said, it can get very tedious to comment every little thing and method parameter lists can start to look ridiculous (unless you have a hierarchy of partially-applied methods). Pragmatically, a balance should be sought between using non-local variables and making everything excruciatingly explicit.
There is at least one case where you have to have instance- or class-level variables and that's when an implementation-specific value has to be retained between method calls.
Scalability and concurrency depend on minimization if not complete elimination of state and side effects except for the most local and exclusive of runtime scopes. OOP without objects or display classes (i.e., closures) would be procedural, yes. Languages are increasingly becoming multiparadigm, but a lot of them have a primary paradigm. C# is object oriented with functional features. F# is functional with objects.
If the data is immutable, then instance variables are always okay in my books.