PyCharm procedural __all__ generation and syntax highlighting - python

I'm using this decorator to manage __all__ in a DRY manner:
def export(obj):
mod = sys.modules[obj.__module__]
if hasattr(mod, '__all__'):
mod.__all__.append(obj.__name__)
else:
mod.__all__ = [obj.__name__]
return obj
For names imported with import * PyCharm issues an unresolved reference error, which is understandable, since it doesn't run the code before analysis. But it is an obvious inconvenience.
How would you solve it (or maybe already solved)?
My assumptions:
Adding some automatic linter plugin or altering existing PyCharm's inspection code would be fine.
Something that's actually editing a .py source is viable, but not fine.
This method is probably not the best one, therefore suggesting another convenient technique of dealing with exports is fine too.

You may be interested in an alternative approach to managing __all__:
https://pypi.org/project/auto-all/
This provides a start_all() and end_all() function to place in your module around the items you want to make accessible. This approach works with PyCharms code inspection.
from auto_all import start_all, end_all
# Imports outside the start and end function calls are not included in __all__.
from pathlib import Path
def a_private_function():
print("This is a private function.")
# Start defining externally accessible objects.
start_all(globals())
def a_public_function():
print("This is a public function.")
# Stop defining externally accessible objects.
end_all(globals())
I feel like this is a reasonable approach to managing __all__, and one that I used on more complex packages. The source code for the package is small, so could easily be included direct in your code to avoid external dependencies if you need.
The reason I use this is I have some modules where lots of items need to be "exported" and a I want to keep imported items out of the export list. I have multiple developers working on the code and it's easy to add new items and forget to include them in __all__, so automating this helps.

Related

How to add, delete and edit some lines inside the functions without completely rewriting in Python?

I want to make minor changes (~10 lines, including editing and deleting the existing lines, and adding new lines) in the middle of a long function (~200 lines) offered by an installed pypi package.
For example:
def func(*args, **kwargs)
... # other lines remain unchanged
some lines which I want to edit
... # other lines remain unchanged
AFAIK, the decorators only add code in the beginning/end of a function, and inheritance is a basic solution but it may need a lot of unnecessary copying since I only want to change a relativaly small amount of code. I cannot directly editing the package either since the package is read-only for me and IMHO it is an inelegant solution.
So is there any simple and elegant solutions (i.e., implemented with small amount of code and good readability) to achieve this goal?
In short, no. There is no "pretty" solution for this. Options are:
Copy-paste the whole function into an own file. Monkey-patch it onto the source module when your "correction" is imported.
Import the function and patch the bytecode somehow (however this is likely to work in one python release only)
If it is an open-source module, fork it and make a custom build (this is arguably the cleanest solution).
(Edit): ... or send a pull-request to the original project ;-)
In any case, document it well.
Indeed changing the contents of installed packages is a no go. You cannot change a portion of a function's code dynamically as I understand from you question. Since this is a function, you could redefine it in your own project and assign it to the package's function after importing it. e.g.
# in your mylib.py
def your_slightly_modified_func(*args, **kwargs):
# your awesome implementation
pass
# in your usage location
import mylib
import lib_from_pypi
lib_from_pypi.func = mylib.your_slightly_modified_func
Now when you call the function from the installed package, it is going to use your own version of the function.

Choose Python classes to instantiate at runtime based on either user input or on command line parameters

I am starting a new Python project that is supposed to run both sequentially and in parallel. However, because the behavior is entirely different, running in parallel would require a completely different set of classes than those used when running sequentially. But there is so much overlap between the two codes that it makes sense to have a unified code and defer the parallel/sequential behavior to a certain group of classes.
Coming from a C++ world, I would let the user set a Parallel or Serial class in the main file and use that as a template parameter to instantiate other classes at runtime. In Python there is no compilation time so I'm looking for the most Pythonic way to accomplish this. Ideally, it would be great that the code determines whether the user is running sequentially or in parallel to select the classes automatically. So if the user runs mpirun -np 4 python __main__.py the code should behave entirely different than when the user calls just python __main__.py. Somehow it makes no sense to me to have if statements to determine the type of an object at runtime, there has to be a much more elegant way to do this. In short, I would like to avoid:
if isintance(a, Parallel):
m = ParallelObject()
elif ifinstance(a, Serial):
m = SerialObject()
I've been reading about this, and it seems I can use factories (which somewhat have this conditional statement buried in the implementation). Yet, using factories for this problem is not an option because I would have to create too many factories.
In fact, it would be great if I can just "mimic" C++'s behavior here and somehow use Parallel/Serial classes to choose classes properly. Is this even possible in Python? If so, what's the most Pythonic way to do this?
Another idea would be to detect whether the user is running in parallel or sequentially and then load the appropriate module (either from a parallel or sequential folder) with the appropriate classes. For instance, I could have the user type in the main script:
from myPackage.parallel import *
or
from myPackage.serial import *
and then have the parallel or serial folders import all shared modules. This would allow me to keep all classes that differentiate parallel/serial behavior with the same names. This seems to be the best option so far, but I'm concerned about what would happen when I'm running py.test because some test files will load parallel modules and some other test files would load the serial modules. Would testing work with this setup?
You may want to check how a similar issue is solved in the stdlib: https://github.com/python/cpython/blob/master/Lib/os.py - it's not a 100% match to your own problem, nor the only possible solution FWIW, but you can safely assume this to be a rather "pythonic" solution.
wrt/ the "automagic" thing depending on execution context, if you decide to go for it, by all means make sure that 1/ both implementations can still be explicitely imported (like os.ntpath and os.posixpath) so they are truly unit-testable, and 2/ the user can still manually force the choice.
EDIT:
So if I understand it correctly, this file you points out imports modules depending on (...)
What it "depends on" is actually mostly irrelevant (in this case it's a builtin name because the target OS is known when the runtime is compiled, but this could be an environment variable, a command line argument, a value in a config file etc). The point was about both conditional import of modules with same API but different implementations while still providing direct explicit access to those modules.
So in a similar way, I could let the user type from myPackage.parallel import * and then in myPackage/init.py I could import all the required modules for the parallel calculation. Is this what you suggest?
Not exactly. I posted this as an example of conditional imports mostly, and eventually as a way to build a "bridge" module that can automagically select the appropriate implementation at runtime (on which basis it does so is up to you).
The point is that the end user should be able to either explicitely select an implementation (by explicitely importing the right submodule - serial or parallel and using it directly) OR - still explicitely - ask the system to select one or the other depending on the context.
So you'd have myPackage.serial and myPackage.parallel (just as they are now), and an additional myPackage.automagic that dynamically selects either serial or parallel. The "recommended" choice would then be to use the "automagic" module so the same code can be run either serial or parallel without the user having to care about it, but with still the ability to force using one or the other where it makes sense.
My fear is that py.test will have modules from parallel and serial while testing different files and create a mess
Why and how would this happen ? Remember that Python has no "process-global" namespace - "globals" are really "module-level" only - and that python's import is absolutely nothing like C/C++ includes.
import loads a module object (can be built directly from python source code, or from compiled C code, or even dynamically created - remember, at runtime a module is an object, instance of the module type) and binds this object (or attributes of this object) into the enclosing scope. Also, modules are garanteed (with a couple caveats, but those are to be considered as error cases) to be imported only once for a given process (and then cached) so importing the same module twice in a same process will yield the same object (IOW a module is a singleton).
All this means that given something like
# module A
def foo():
return bar(42)
def bar(x):
return x * 2
and
# module B
def foo():
return bar(33)
def bar(x):
return x / 2
It's garanteed that however you import from A and B, A.foo will ALWAYS call A.bar and NEVER call B.bar and B.foo will only ever call B.bar (unless you explicitely monkeyptach them of course but that's not the point).
Also, this means that within a module you cannot have access to the importing namespace (the module or function that's importing your module), so you cannot have a module depending on "global" names set by the importer.
To make a long story short, you really need to forget about C++ and learn how Python works, as those are wildly different languages with wildly different object models, execution models and idioms. A couple interesting reads are http://effbot.org/zone/import-confusion.htm and https://nedbatchelder.com/text/names.html
EDIT 2:
(about the 'automagic' module)
I would do that based on whether the user runs mpirun or just python. However, it seems it's not possible (see for instance this or this) in a portable way without a hack. Any ideas in that direction?
I've never ever had anything to do with mpi so I can't help with this - but if the general consensus is that there's no reliable portable way to detect this then obviously there's your answer.
This being said, simple stupid solutions are sometimes overlooked. In your case, explicitly setting an environment variable or passing a command-line switch to your main script would JustWork(tm), ie the user should for example use
SOMEFLAG=serial python main.py
vs
SOMEFLAG=parallel mpirun -np4 python main.py
or
python main.py serial
vs
mpirun -np4 python main.py parallel
(whichever works best for you needs - is the most easily portable).
This of course requires a bit more documentation and some more effort from the end-user but well...
I'm not really what you're asking here. Python classes are just (callable/instantiable) objects themselves, so you can of course select and use them conditionally. If multiple classes within multiple modules are involved, you can also make the imports conditional.
if user_says_parallel:
from myPackage.parallel import ParallelObject
ObjectClass = ParallelObject
else:
from myPackage.serial import SerialObject
ObjectClass = SerialObject
my_abstract_object = ObjectClass()
If that's very useful depends on your classes and the effort it takes to make sure they have the same API so they're compatible when replacing each other. Maybe even inheritance à la ParallelObject => SerialObject is possible, or at least a common (virtual) base class to put all the shared code. But that's just the same as in C++.

Circular imports hell

Python is extremely elegant language. Well, except... except imports. I still can't get it work the way it seems natural to me.
I have a class MyObjectA which is in file mypackage/myobjecta.py. This object uses some utility functions which are in mypackage/utils.py. So in my first lines in myobjecta.py I write:
from mypackage.utils import util_func1, util_func2
But some of the utility functions create and return new instances of MyObjectA. So I need to write in utils.py:
from mypackage.myobjecta import MyObjectA
Well, no I can't. This is a circular import and Python will refuse to do that.
There are many question here regarding this issue, but none seems to give satisfactory answer. From what I can read in all the answers:
Reorganize your modules, you are doing it wrong! But I do not know
how better to organize my modules even in such a simple case as I
presented.
Try just import ... rather than from ... import ...
(personally I hate to write and potentially refactor all the full
name qualifiers; I love to see what exactly I am importing into
module from the outside world). Would that help? I am not sure,
still there are circular imports.
Do hacks like import something in the inner scope of a function body just one line before you use something from other module.
I am still hoping there is solution number 4) which would be Pythonic in the sense of being functional and elegant and simple and working. Or is there not?
Note: I am primarily a C++ programmer, the example above is so much easily solved by including corresponding headers that I can't believe it is not possible in Python.
There is nothing hackish about importing something in a function body, it's an absolutely valid pattern:
def some_function():
import logging
do_some_logging()
Usually ImportErrors are only raised because of the way import() evaluates top level statements of the entire file when called.
In case you do not have a logic circular dependency...
, nothing is impossible in python...
There is a way around it if you positively want your imports on top:
From David Beazleys excellent talk Modules and Packages: Live and Let Die! - PyCon 2015, 1:54:00, here is a way to deal with circular imports in python:
try:
from images.serializers import SimplifiedImageSerializer
except ImportError:
import sys
SimplifiedImageSerializer = sys.modules[__package__ + '.SimplifiedImageSerializer']
This tries to import SimplifiedImageSerializer and if ImportError is raised (due to a circular import error or the it not existing) it will pull it from the importcache.
PS: You have to read this entire post in David Beazley's voice.
Don't import mypackage.utils to your main module, it already exists in mypackage.myobjecta. Once you import mypackage.myobjecta the code from that module is being executed and you don't need to import anything to your current module, because mypackage.myobjecta is already complete.
What you want isn't possible. There's no way for Python to know in which order it needs to execute the top-level code in order to do what you ask.
Assume you import utils first. Python will begin by evaluating the first statement, from mypackage.myobjecta import MyObjectA, which requires executing the top level of the myobjecta module. Python must then execute from mypackage.utils import util_func1, util_func2, but it can't do that until it resolves the myobjecta import.
Instead of recursing infinitely, Python resolves this situation by allowing the innermost import to complete without finishing. Thus, the utils import completes without executing the rest of the file, and your import statement fails because util_func1 doesn't exist yet.
The reason import myobjecta works is that it allows the symbols to be resolved later, after the body of every module has executed. Personally, I've run into a lot of confusion even with this kind of circular import, and so I don't recommend using them at all.
If you really want to use a circular import anyway, and you want them to be "from" imports, I think the only way it can reliably work is this: Define all symbols used by another module before importing from that module. In this case, your definitions for util_func1 and util_func2 must be before your from mypackage.myobjecta import MyObjectA statement in utils, and the definition of MyObjectA must be before from mypackage.utils import util_func1, util_func2 in myobjecta.
Compiled languages like C# can handle situations like this because the top level is a collection of definitions, not instructions. They don't have to create every class and every function in the order given. They can work things out in whatever order is required to avoid any cycles. (C++ does it by duplicating information in prototypes, which I personally feel is a rather hacky solution, but that's also not how Python works.)
The advantage of a system like Python is that it's highly dynamic. Yes you can define a class or a function differently based on something you only know at runtime. Or modify a class after it's been created. Or try to import dependencies and go without them if they're not available. If you don't feel these things are worth the inconvenience of adhering to a strict dependency tree, that's totally reasonable, and maybe you'd be better served by a compiled language.
Pythonistas frown upon importing from a function. Pythonistas usually frown upon global variables. Yet, I saw both and don't think the projects that used them were any worse than others done by some strict Pythhonistas. The feature does exist, not going into a long argument over its utility.
There's an alternative to the problem of importing from a function: when you import from the top of a file (or the bottom, really), this import will take some time (some small time, but some time), but Python will cache the entire file and if another file needs the same import, Python can retrieve the module quickly without importing. Whereas, if you import from a function, things get complicated: Python will have to process the import line each time you call the function, which might, in a tiny way, slow your program down.
A solution to this is to cache the module independently. Okay, this uses imports inside function bodies AND global variables. Wow!
_MODULEA = None
def util1():
if _MODULEA is None:
from mymodule import modulea as _MODULEA
obj = _MODULEA.ClassYouWant
return obj
I saw this strategy adopted with a project using a flat API. Whether you like it or not (and I'm not sure about that myself), it works and is fast, because the import line is executed only once (when the function first executes). Still, I would recommend restructuring: problems with circular imports show a problem in structure, usually, and this is always worth fixing. I do agree, though, it would be nice if Python provided more useful errors when this kind of situation happens.

python workaround for circular import

Ok so it is like this.
I'd rather not give away my code but if you really need it I will. I have two modules that need a bit from each other. the modules are called webhandler and datahandler.
In webhandler I have a line:
import datahandler
and in datahandler I have another line:
import webhandler
Now I know this is terrible code and a circular import like this causes the code to run twice (which is what im trying to avoid).
However the datahandler module needs to access several functions from the webhandler module, and the webhandler module needs access to several variables that are generated in the datahandler module. I dont see any workaround other than moving functions to different modules but that would ruin the organisation of my program and make no logical sense with the module naming.
Any help?
Circular dependencies are a form of code smell. If you have two modules that depend on each other, then that’s a very bad sign, and you should restructure your code.
There are a few different ways to do this; which one is best depends on what you are doing, and what parts of each module are actually used by another.
A very simple solution would be to just merge both modules, so you only have a single module that only depends on itself, or rather on its own contents. This is simple, but since you had separated modules before, it’s likely that you are introducing new problems that way because you no longer have a separation of concerns.
Another solution would be to make sure that the dependencies are actually required. If there are only a few parts of a module that depend on the other, maybe you could move those bits around in a way that the circular dependency is no longer required, or utilize the way imports work to make the circular dependencies no longer a problem.
The better solution would probably be to move the dependencies into a separate new module. If naming is really the hardest problem about that, then you’re probably doing it right. It might “ruin the organisation of [your] program” but since you have circular dependencies, there is something inherently wrong with your setup anyway.
What others have said about not doing circular imports is the best solution, but if you end up absolutely needing them (possibly for backwards compatibility or clarity of code), it's usually within just one method or function of one of the modules. Thus you can safely do this:
# modA.py
import modB
# modB.py
def functionDependingOnA():
import modA
...
There's a slight overhead to doing the import each time the function is called, but it is rather low unless it's called all the time. (about 400ns in my testing).
You could also do like this to avoid even that lookup:
# modA -- same as above.
# modB.py
_imports = {}
def _importA():
import modA
_imports['modA'] = modA
return modA
def functionDependingOnA():
modA = _imports.get('modA') or _importA()
This version only added 40ns of time on the second and subsequent calls, or about the same amount of time as an empty local function call.
SqlAlchemy uses the Dependency Injection pattern, where the required module is passed to a function by a decorator:
#util.dependencies("sqlalchemy.orm.util")
def identity_key(cls, orm_util, *args, **kwargs):
return orm_util.identity_key(*args, **kwargs)
This approach is basically the same as doing an import inside a function, but has slightly better performance.
the webhandler module needs access to several variables that are generated in the datahandler module
It might make sense to push any "generated" data to a third location. So datahandler functions call config.setvar( name, value ) when appropriate and webhandler functions call config.getvar( name ) when they need to. config would be a third sub-module, containing simple setvar and getvar functions that you write (wrappers around setting/getting elements of a global dictionary would be the simplest approach).
Then the datahandler code would import webhandler, config but webhandler would only need to import config.
I agree with poke however, that the need for such a question betrays the fact that you probably haven't yet got the design finalized as neatly and logically as you thought. If it were me, I would re-think the way modules are divided up.

How to properly handle a circular module dependency in Python?

Trying to find a good and proper pattern to handle a circular module dependency in Python. Usually, the solution is to remove it (through refactoring); however, in this particular case we would really like to have the functionality that requires the circular import.
EDIT: According to answers below, the usual angle of attack for this kind of issue would be a refactor. However, for the sake of this question, assume that is not an option (for whatever reason).
The problem:
The logging module requires the configuration module for some of its configuration data. However, for some of the configuration functions I would really like to use the custom logging functions that are defined in the logging module. Obviously, importing the logging module in configuration raises an error.
The possible solutions we can think of:
Don't do it. As I said before, this is not a good option, unless all other possibilities are ugly and bad.
Monkey-patch the module. This doesn't sound too bad: load the logging module dynamically into configuration after the initial import, and before any of its functions are actually used. This implies defining global, per-module variables, though.
Dependency injection. I've read and run into dependency injection alternatives (particularly in the Java Enterprise space) and they remove some of this headache; however, they may be too complicated to use and manage, which is something we'd like to avoid. I'm not aware of how the panorama is about this in Python, though.
What is a good way to enable this functionality?
Thanks very much!
As already said, there's probably some refactoring needed. According to the names, it might be ok if a logging modules uses configuration, when thinking about what things should be in configuration one think about configuration parameters, then a question arises, why is that configuration logging at all?
Chances are that the parts of the code under configuration that uses logging does not belong to the configuration module: seems like it is doing some kind of processing and logging either results or errors.
Without inner knowledge, and using only common sense, a "configuration" module should be something simple without much processing and it should be a leaf in the import tree.
Hope it helps!
Will this work for you?
# MODULE a (file a.py)
import b
HELLO = "Hello"
# MODULE b (file b.py)
try:
import a
# All the code for b goes here, for example:
print("b done",a.HELLO))
except:
if hasattr(a,'HELLO'):
raise
else:
pass
Now I can do an import b. When the circular import (caused by the import b statement in a) throws an exception, it gets caught and discarded. Of course your entire module b will have to indented one extra block spacing, and you have to have inside knowledge of where the variable HELLO is declared in a.
If you don't want to modify b.py by inserting the try:except: logic, you can move the whole b source to a new file, call it c.py, and make a simple file b.py like this:
# new Module b.py
try:
from c import *
print("b done",a.HELLO)
except:
if hasattr(a,"HELLO"):
raise
else:
pass
# The c.py file is now a copy of b.py:
import a
# All the code from the original b, for example:
print("b done",a.HELLO))
This will import the entire namespace from c to b, and paper over the circular import as well.
I realize this is gross, so don't tell anyone about it.
A cyclic module dependency is usually a code smell.
It indicates that part of the code should be re-factored so that it is external to both modules.
So if I'm reading your use case right, logging accesses configuration to get configuration data. However, configuration has some functions that, when called, require that stuff from logging be imported in configuration.
If that is the case (that is, configuration doesn't really need logging until you start calling functions), the answer is simple: in configuration, place all the imports from logging at the bottom of the file, after all the class, function and constant definitions.
Python reads things from top to bottom: when it comes across an import statement in configuration, it runs it, but at this point, configuration already exists as a module that can be imported, even if it's not fully initialized yet: it only has the attributes that were declared before the import statement was run.
I do agree with the others though, that circular imports are usually a code smell.

Categories

Resources