I'm currently extending a project that does not implement any classes. I need to call a function from one module that has side effects on global variables in that module. If I simply import the module and call the function, this has side effects on the rest of the program, which I don't want.
Solutions that I thought about so far:
Modify the module so that it's a class: This would break compatibility with the existing code which I want to avoid.
Save the state of the class at the beginning of the method and restore it at the end: This could have side effects because there is multi threading involved.
Copy the whole module: Probably the best option, but I want to avoid code duplication.
Is there a better option to achieve what I want to do?
This situation sounds like classical XY problem (https://en.wikipedia.org/wiki/XY_problem)
Assume that there exists a module, and that module has a function, and that function is hard coded to change state that is maintained with module level variables.
In this situation there probably does not exist any sensible way to programmatically avoid the function changing the module level variables, unless the function explicitly supports assigning custom context.
You can create ad-hoc solution that fixes the problem for a known variable but general purpose solution sounds impractical when comparing to restructuring the code.
Without knowing more details it is hard to suggest anything specifically, except for restructuring the module.
Some options:
Use classes
Make the function support custom context by providing locals() from elsewhere
Make a clone of function without side effects
Save state of known module level variables, call the function, restore state
Interestingly enough, you can automate the last option if you can be sure that the side effects exists within a known scope. Using dir(your_module) provides programmatical access.
Ultimately I think the problem in and of itself is proof that you should consider restructuring. This is coming from someone who is hell bent on the ability to abuse python by doing stuff like walking up the callstack to change variables in parent-of-parent-of-parent-of-parent to force program state. I even made debugger capable of doing that (https://github.com/hirsimaki-markus/SEAPIE)
Yet I still think breaking compability and restructing is the more sensible long term solution. Of the ad-hoc solutions, I would suggest creating side-effectless versions of the necessary functions.
Related
I have an application that dynamically generates a lot of Python modules with class factories to eliminate a lot of redundant boilerplate that makes the code hard to debug across similar implementations and it works well except that the dynamic generation of the classes across the modules (hundreds of them) takes more time to load than simply importing from a file. So I would like to find a way to save the modules to a file after generation (unless reset) then load from those files to cut down on bootstrap time for the platform.
Does anyone know how I can save/export auto-generated Python modules to a file for re-import later. I already know that pickling and exporting as a JSON object won't work because they make use of thread locks and other dynamic state variables and the classes must be defined before they can be pickled. I need to save the actual class definitions, not instances. The classes are defined with the type() function.
If you have ideas of knowledge on how to do this I would really appreciate your input.
You’re basically asking how to write a compiler whose input is a module object and whose output is a .pyc file. (One plausible strategy is of course to generate a .py and then byte-compile that in the usual fashion; the following could even be adapted to do so.) It’s fairly easy to do this for simple cases: the .pyc format is very simple (but note the comments there), and the marshal module does all of the heavy lifting for it. One point of warning that might be obvious: if you’ve already evaluated, say, os.getcwd() when you generate the code, that’s not at all the same as evaluating it when loading it in a new process.
The “only” other task is constructing the code objects for the module and each class: this requires concatenating a large number of boring values from the dis module, and will fail if any object encountered is non-trivial. These might be global/static variables/constants or default argument values: if you can alter your generator to produce modules directly, you can probably wrap all of these (along with anything else you want to defer) in function calls by compiling something like
my_global=(lambda: open(os.devnull,'w'))()
so that you actually emit the function and then a call to it. If you can’t so alter it, you’ll have to have rules to recognize values that need to be constructed in this fashion so that you can replace them with such calls.
Another detail that may be important is closures: if your generator uses local functions/classes, you’ll need to create the cell objects, perhaps via “fake” closures of your own:
def cell(x): return (lambda: x).__closure__[0]
I was writing some Python code and, as usual, I try to make my functions small and give them a clear name (although sometimes a little too long). I get to the point where there are no global variables and everything a function needs is passed to it.
But I thought, in this case, every function has access to any other function. Why not limit their access to other functions just like we limit the access to other variables.
I was thinking to use nested functions but that implies closures and that's even worse for my purpose.
I was also thinking about using objects and I think this is the point of OOP, although it'll be a little too much boilerplate in my case.
Has anyone got this problem on her/his mind and what's the solution.
It is not a good idea to have global mutable data, e.g. variables. The mutability is the key here. You can have constants and functions to your hearts content.
But as soon as you write functions that rely on globally mutable state it limits the reusability of your functions - they're always bound to that one shared state.
For the sake of everyone reading your code, grouping the functions into classes will help to mentally categorize them. Using the class self parameter helps to organize the variables, too, by grouping them in a class.
You can limit their access with a single leading underscore at the beginning of the function name.
Global variables are discouraged because they make it hard to keep track of the state of the program. If I'm debugging a 1,000-line file, and somewhere in the middle of a function I see some_well_named_flag = False, I'm going to have a lot of hunting to do to see how else it affects what else in the program.
Functions don't have state. The places where they can modify the program are more or less limited to the parameters and return value.
If you're still concerned about controlling access to functions, there are other languages like Java or C++ that can help you do that. One convention with Python is to prefix functions that shouldn't be used outside of the class with an underscore, and then trust people not to call them from outside the class.
I use IPython Notebooks extensively in my research. I find them to be a wonderful tool.
However, on more than one occasion, I have been bitten by subtle bugs stemming from variable scope. For example, I will be doing some exploratory analysis:
foo = 1
bar = 2
foo + bar
And I decide that foo + bar is a useful algorithm for my purposes, so I encapsulate it in a function to make it easier to apply to a wider range of inputs:
def the_function(foo, bar):
return foo + bar
Inevitably, somewhere down the line, after building a workflow from the ground up, I will have a typo somewhere (e.g. def the_function(fooo, bar):) that causes a global variable to be used (and/or modified) in a function call. This causes unseen side effects and leads to spurious results. But because it typically returns a result, it can be difficult to find where the problem actually occurs.
Now, I recognize that this behavior is a feature, which I deliberately use often (for convenience, or for necessity i.e. function closures or decorators). But as I keep running into bugs, I'm thinking I need a better strategy for avoiding such problems (current strategy = "be careful").
For example, one strategy might be to always prepend '_' to local variable names. But I'm curious if there are not other strategies - even "pythonic" strategies, or community encouraged strategies.
I know that python 2.x differs in some regards to python 3.x in scoping - I use python 3.x.
Also, strategies should consider the interactive nature of scientific computing, as would be used in an IPython Notebook venue.
Thoughts?
EDIT: To be more specific, I am looking for IPython Notebook strategies.
I was tempted to flag this question as too broad, but perhaps the following will help you.
When you decide to wrap some useful code in a function, write some tests. If you think the code is useful, you must have used it with some examples. Write the test first lest you 'forget'.
My personal policy for a library module is to run the test in an if __name__
== '__main__': statement, whether the test code is in the same file or a different file. I also execute the file to run the tests multiple times during a programming session, after every small unit of change (trivial in Idle or similar IDE).
Use a code checker program, which will catch some typo-based errors. "'fooo' set but never used".
Keep track of the particular kinds of errors you make, analyze them and think about personal countermeasures, or at least learn to recognize the symptoms.
Looking at your example, when you do write a function, don't use the same names for both global objects and parameters. In your example, delete or change the global 'foo' and 'bar' or use something else for parameter names.
I would suggest that you separate your concerns. For your exploratory analysis, write your code in the iPython notebook, but when you've decided that there are some functions that are useful, instead, open up an editor and put your functions into a python file which you can then import.
You can use iPython magics to auto reload things you've imported. So once you've tested them in iPython, you can simply copy them to your module. This way, the scope of your functions is isolated from your notebook. An additional advantage is that when you're ready to run things in a headless environment, you already have your entire codebase in one place.
In the end, I made my own solution to the problem. It builds on both answers given so far.
You can find my solution, which is a cell magic extension, on github: https://github.com/brazilbean/modulemagic
In brief, this extension gives you the ability to create %%module cells in the notebook. These cells are saved as a file and imported back into your session. It effectively accomplishes what #shadanan had suggested, but allows you to keep all your work in the same place (convenient, and in line with the Notebook philosophy of providing code and results in the same place).
Because the import process sandboxes the code, it solves all of the scope shadowing errors that motivated my original question. It also involves little to no overhead to use - no renaming of variables, having other editors open, etc.
Answer Credit to Leenert Regebro:
Apparently my instincts were right, and this is impossible. And obvious as well since there was only one answer in two hours.
Thanks for all the coments as well.
My Google-foo is failing me.
I am writing a library of custom exceptions as a module, for use in multiple projects under a single publisher. I may have no say in the other projects, or I may have a say. So it could be in use both by me and others. The "and others" is the problem here. Within my exceptions module there will be specific functions for outputting tracebacks etc. to log files using the logging module. This is fine for me, because I use the logging module.
But if someone else, not using logging, uses the exceptions library, I need to skip the logging part. A try...except resolves this problem, but what if they ARE using logging? In this case I need to be able to determine their logging scheme (console/file/stream, file names etc.) This is so that I can create a sub-logger, which will write to their file (or console or what have you):
<snip>
their_logger = THE_FUNCTION_I_CANNOT_FIGURE_OUT_HOW_TO_WRITE()
temp_var = their_logger.name + ".ExceptionLogger"
myLogger = logging.getLogger(temp_var)
</snip>
Obviously I could create a separate class or function it instantiate my module and have it receive a parameter of type logging.logger, but I would prefer to idiot proof this, if it is even possible.
I cant even check a global or the globals() dict for a value that I know of, because the other programmer might not use one.
Is there any way to do this? (Assuming my library has been imported, and possibly not by the top level application...) I personally have never tried to get data from upstream in the namespaces to be available in a lower namespace without explicit passing, and I doubt it is even possible, but there are a lot of programmers out there, any one ever achieved this?
It's a bad idea to include optional configuration by default. Instead of adding the logging specifics by default and then make some sort of wild guess hidden by a try/except to exclude it, put that part of code into a function, and call it from your code explicitly.
You can not idiot-proof things. In fact, the more magic and hidden logic you have, the less idiot-proof it will be, and soon it will instead be intelligence-proof where it becomes really difficult to understand the magic.
So go with your idea of making a function and passing in the logger instead.
I'm programming a game in Python, where all IO activities are done by an IO object (in the hope that it will be easy to swap that object out for another which implements a different user interface). Nearly all the other objects in the game need to access the IO system at some point (e.g. printing a message, updating the position of the player, showing a special effect caused by an in-game action), so my question is this:
Does it make sense for a reference to the IO object to be available globally?
The alternative is passing a reference to the IO object into the __init__() of every object that needs to use it. I understand that this is good from a testing point of view, but is this worth the resulting "function signature pollution"?
Thanks.
Yes, this is a legitimate use of a global variable. If you'd rather not, passing around a context object that is equivalent to this global is another option, as you mentioned.
Since I assume you're using multiple files (modules), why not do something like:
import io
io.print('hello, world')
io.clear()
This is a common way programs that have more complex I/O needs than simple printing do things like logging.
Yes, I think so.
Another possibility would be to create a module loggerModule that has functions like print() and write(), but this would only marginally be better.
Nope.
Variables are too specific to be passed around in the global namespace. Hide them inside static functions/classes instead that can do magic things to them at run time (or call other ones entirely).
Consider what happens if the IO can periodically change state or if it needs to block for a while (like many sockets do).
Consider what happens if the same block of code is included multiple times. Does the variable instance get duplicated as well?
Consider what happens if you want to have a version 2 of the same variable. What if you want to change its interface? Do you have to modify all the code that references it?
Does it really make sense to infect all the code that uses the variable with knowledge of all the ways it can go bad?