I use a package which has a structure somewhat similar to the following when communicating with a piece of hardware:
channel
__init__.py
transport
__init__.py
flow.py
multiplex.py
network
__init__.py
header.py
addressing.py
I now wish to be able to configure my package so that I can use it to communicate with two very similar hardwares. For example, when communicating with hw1 I want the equivalent of the following in adressing.py:
from collections import namedtuple
PacketSize = namedtuple('PacketSize', ('header', 'body'))
packet_size = PacketSize(16,256)
while when testing hw2, I want the equivalent of:
from collections import namedtuple
PacketSize = namedtuple('PacketSize', ('header', 'body'))
packet_size = PacketSize(8,256)
Almost all of the modules in the packages are the same for both hw1 and hw2. I might however even have slightly different flavours for certain functions and classes within the package.
I was thinking I could solve this by having this structure:
channel
__init__.py
transport
__init__.py
flow.py
multiplex.py
network
__init__.py
header.py
addressing.py
hw1
__init__.py
addressing.py
hw2
__init__.py
addressing.py
So each subpackage will contain a hw1 and hw2 subpackage where hardware specific code is placed. I have programmed channel/network/addressing.py as follows:
from collections import namedtuple
PacketSize = namedtuple('PacketSize', ('header', 'body'))
if hardware == "hw1":
from hw1.targetprops import *
elif hardware == "hw2":
from hw2.targetprops import *
And channel/network/hw1/addressing.py like this:
from ..addressing import PacketSize
packet_size = PacketSize(16,256)
Does this make sense? I think channel/network/addressing.py is ugly to be honest since I'm doing an import, then I define a namedtuple, then I continue with the conditional imports. Could I do this better?
Is the general approach above the best way to flavor a package?
Is there a standard way to configure the package so that it knows if it is concerned with hw1 or hw2? At the moment I just have a global called hardware as seen above when I do if harware == "hw1".
You should try to abstract away the hardware/flavor dependent features behind some kind of common interface. There are many different ways to do this, such as class inheritance, composition, passing around an object, or --- as you may be looking to do --- even as a global python module or object.
I personally tend to often favor composition, because class inheritance often isn't a natural fit, and may blow up into multiple inheritance or MixInMadness.
A global Python module (or a sigleton object) is attractive, but I would steer away unless there really, really has to be only one of them in a single process. Good examples where this is a good design is when it is tied to the underlying platform, for instance the Python os module that has much the same interface on Windows and Linux but works very differently underneath. Compare this to your hw1 and hw2. Another good example is the Twisted reactor, of which there really can only be one running at a time. Even then, a large part of the Twisted code passes around a reactor object for e.g., compositing. This is partly to make unit testing possible.
For your example, if hw1 or hw2 refers to the hardware your program is running on, then a global python module does make sense. If it instead refers to hardware your program is communicating with, e.g., over a serial port or the network, then a global module is the wrong approach. You might have have two serial ports and want to speak to hw1 and hw2 in the same process.
For an example on how to use a global module, or actually a global object, I recommend looking at how Twisted does it. Then your modules would do something like
from mypackage import hardware # hw1/hw2 automatically detected
print hardware.packet_size # different for different hardware
or
# main.py
from mypackage import hw1
hw1.init()
# other.py
from mypackage import hardware # initialized to hw1 in main.py
Compositing or passing around an object on the other hand would look something like:
hw = mypackage.hw1.hw1factory()
send_data(hw, 'foo') # uses hw.packet_size
hw = mypackage.hw2.hw2factory()
frob = Frobnicator(hw)
frob.frobnicate('foo') # uses hw.packet_size internally
Your approach is wrong. You should not have multiple modules to distinguish cases. In the case you describe, the module color.py might contain a function to which you pass a list of items to be tested and colors to test the items with. How you organize this depends on the data source and destination and the nature of the items you are testing.
You should consider using py.test and fixtures (which are NOT like Django fixtures). These do exactly what you want and py.test can handle UnitTest and Nose style tests as well.
Related
I'm maintaining several open source projects and I want to write code at work that nudges people to do the right thing.
I have a situation where I see people importing stuff in module_a from module_b, but that should not happen. There are two reasons for it:
Production code importing stuff from test code: I hope I don't need to explain why that is a bad idea.
Import Cycles: Some modules are so basic, that they should not import any other modules from the package (e.g. constants.py / errors.py / utils.py).
For this question, you can assume that all imports happen on module level (hence not inside a function).
Is it possible to enforce via CI (e.g. mypy / pytest / flake8) that module_a does not import anything from module_b?
I am starting a new Python project that is supposed to run both sequentially and in parallel. However, because the behavior is entirely different, running in parallel would require a completely different set of classes than those used when running sequentially. But there is so much overlap between the two codes that it makes sense to have a unified code and defer the parallel/sequential behavior to a certain group of classes.
Coming from a C++ world, I would let the user set a Parallel or Serial class in the main file and use that as a template parameter to instantiate other classes at runtime. In Python there is no compilation time so I'm looking for the most Pythonic way to accomplish this. Ideally, it would be great that the code determines whether the user is running sequentially or in parallel to select the classes automatically. So if the user runs mpirun -np 4 python __main__.py the code should behave entirely different than when the user calls just python __main__.py. Somehow it makes no sense to me to have if statements to determine the type of an object at runtime, there has to be a much more elegant way to do this. In short, I would like to avoid:
if isintance(a, Parallel):
m = ParallelObject()
elif ifinstance(a, Serial):
m = SerialObject()
I've been reading about this, and it seems I can use factories (which somewhat have this conditional statement buried in the implementation). Yet, using factories for this problem is not an option because I would have to create too many factories.
In fact, it would be great if I can just "mimic" C++'s behavior here and somehow use Parallel/Serial classes to choose classes properly. Is this even possible in Python? If so, what's the most Pythonic way to do this?
Another idea would be to detect whether the user is running in parallel or sequentially and then load the appropriate module (either from a parallel or sequential folder) with the appropriate classes. For instance, I could have the user type in the main script:
from myPackage.parallel import *
or
from myPackage.serial import *
and then have the parallel or serial folders import all shared modules. This would allow me to keep all classes that differentiate parallel/serial behavior with the same names. This seems to be the best option so far, but I'm concerned about what would happen when I'm running py.test because some test files will load parallel modules and some other test files would load the serial modules. Would testing work with this setup?
You may want to check how a similar issue is solved in the stdlib: https://github.com/python/cpython/blob/master/Lib/os.py - it's not a 100% match to your own problem, nor the only possible solution FWIW, but you can safely assume this to be a rather "pythonic" solution.
wrt/ the "automagic" thing depending on execution context, if you decide to go for it, by all means make sure that 1/ both implementations can still be explicitely imported (like os.ntpath and os.posixpath) so they are truly unit-testable, and 2/ the user can still manually force the choice.
EDIT:
So if I understand it correctly, this file you points out imports modules depending on (...)
What it "depends on" is actually mostly irrelevant (in this case it's a builtin name because the target OS is known when the runtime is compiled, but this could be an environment variable, a command line argument, a value in a config file etc). The point was about both conditional import of modules with same API but different implementations while still providing direct explicit access to those modules.
So in a similar way, I could let the user type from myPackage.parallel import * and then in myPackage/init.py I could import all the required modules for the parallel calculation. Is this what you suggest?
Not exactly. I posted this as an example of conditional imports mostly, and eventually as a way to build a "bridge" module that can automagically select the appropriate implementation at runtime (on which basis it does so is up to you).
The point is that the end user should be able to either explicitely select an implementation (by explicitely importing the right submodule - serial or parallel and using it directly) OR - still explicitely - ask the system to select one or the other depending on the context.
So you'd have myPackage.serial and myPackage.parallel (just as they are now), and an additional myPackage.automagic that dynamically selects either serial or parallel. The "recommended" choice would then be to use the "automagic" module so the same code can be run either serial or parallel without the user having to care about it, but with still the ability to force using one or the other where it makes sense.
My fear is that py.test will have modules from parallel and serial while testing different files and create a mess
Why and how would this happen ? Remember that Python has no "process-global" namespace - "globals" are really "module-level" only - and that python's import is absolutely nothing like C/C++ includes.
import loads a module object (can be built directly from python source code, or from compiled C code, or even dynamically created - remember, at runtime a module is an object, instance of the module type) and binds this object (or attributes of this object) into the enclosing scope. Also, modules are garanteed (with a couple caveats, but those are to be considered as error cases) to be imported only once for a given process (and then cached) so importing the same module twice in a same process will yield the same object (IOW a module is a singleton).
All this means that given something like
# module A
def foo():
return bar(42)
def bar(x):
return x * 2
and
# module B
def foo():
return bar(33)
def bar(x):
return x / 2
It's garanteed that however you import from A and B, A.foo will ALWAYS call A.bar and NEVER call B.bar and B.foo will only ever call B.bar (unless you explicitely monkeyptach them of course but that's not the point).
Also, this means that within a module you cannot have access to the importing namespace (the module or function that's importing your module), so you cannot have a module depending on "global" names set by the importer.
To make a long story short, you really need to forget about C++ and learn how Python works, as those are wildly different languages with wildly different object models, execution models and idioms. A couple interesting reads are http://effbot.org/zone/import-confusion.htm and https://nedbatchelder.com/text/names.html
EDIT 2:
(about the 'automagic' module)
I would do that based on whether the user runs mpirun or just python. However, it seems it's not possible (see for instance this or this) in a portable way without a hack. Any ideas in that direction?
I've never ever had anything to do with mpi so I can't help with this - but if the general consensus is that there's no reliable portable way to detect this then obviously there's your answer.
This being said, simple stupid solutions are sometimes overlooked. In your case, explicitly setting an environment variable or passing a command-line switch to your main script would JustWork(tm), ie the user should for example use
SOMEFLAG=serial python main.py
vs
SOMEFLAG=parallel mpirun -np4 python main.py
or
python main.py serial
vs
mpirun -np4 python main.py parallel
(whichever works best for you needs - is the most easily portable).
This of course requires a bit more documentation and some more effort from the end-user but well...
I'm not really what you're asking here. Python classes are just (callable/instantiable) objects themselves, so you can of course select and use them conditionally. If multiple classes within multiple modules are involved, you can also make the imports conditional.
if user_says_parallel:
from myPackage.parallel import ParallelObject
ObjectClass = ParallelObject
else:
from myPackage.serial import SerialObject
ObjectClass = SerialObject
my_abstract_object = ObjectClass()
If that's very useful depends on your classes and the effort it takes to make sure they have the same API so they're compatible when replacing each other. Maybe even inheritance à la ParallelObject => SerialObject is possible, or at least a common (virtual) base class to put all the shared code. But that's just the same as in C++.
I'm writing a small package for internal use and come to a design problem. I define a few classes and constants (i.e., server IP address) in some file, let's call it mathfunc.py. Now, some of these classes and constants will be used in other files in the same package. My current setup is like this:
/mypackage
__init__.py
mathfunc.py
datefunc.py
So, at the moment I think I have to import mathfunc.py in datefunc.py to use the classes defined there (or alternatively import both of them all the time). This sounds wrong to me because then I'll be in a lot of pain importing lots of files everywhere. Is it a proper design at all or there is some other way? Maybe I can put all definitions in some file which will not be a subpackage on its own, but will be used by all other files?
Nope, that's pretty much how Python works. If you want to use objects declared in another file, you have to import from it.
Tips:
You can keep your namespace clean by only importing the things you need, rather than using from foo import *.
If you really need to do a "circular import" (where A needs things in B, and B needs things in A) you can solve that by only importing inside the functions where you need the object, not at the top of a file.
My module is all in one big file that is getting hard to maintain. What is the standard way of breaking things up?
I have one module in a file my_module.py, which I import like this:
import my_module
"my_module" will soon be a thousand lines, which is pushing the limits of my ability to keep everything straight. I was thinking of adding files my_module_base.py, my_module_blah.py, etc. And then, replacing my_module.py with
from my_module_base import *
from my_module_blah import *
# etc.
Then, the user code does not need to change:
import my_module # still works...
Is this the standard pattern?
It depends on what your module is doing actually. Usually it is always a good idea to make your module a directory with an '__init__.py' file inside. So you would first transform your your_module.py to something like your_module/__init__.py.
After that you continue according to your business logic. Here some examples:
do you have utility functions which are not directly used by the modules API put them in some file called utils.py
do you have some classes dealing with the database or representing your database models put them in models.py
do you have some internal configuration it might make sense to put it into some extra file called settings.py or config.py
These are just examples (a little bit stolen from the Django approach of reusable apps ^^). As said, it depends a lot what your module does. If it is still too big afterwards it also makes sense to create submodules (as subdirectories with their own __init__.py).
i'm sure there are lots of opinions on this, but I'd say you break it into more well-defined functional units (modules), contained in a package. Then you use:
from mypackage import modulex
Then use the package name to reference the object:
modulex.MyClass()
etc.
You should (almost) never use
from mypackage import *
Since that can introduce bugs (duplicate names from different modules will end up clobbering one).
No, that is not the standard pattern. from something import * is usually not a good practice as it will import lot of things you did not intend to. Instead follow the same approach as you did, but include the modules specifically from one to another for e.g.
In base.py if you are having def myfunc then in main.py use from base import myfunc So that for your users, main.myfunc would work too. Of course, you need to take care that you don't end up doing a circular import.
Also, if you see that from something import * is required, then control the import values using the __all__ construct.
Is there is a super global (like PHP) in Python? I have certain variables I want to use throughout my whole project in separate files, classes, and functions, and I don't want to have to keep declaring it throughout each file.
In theory yes, you can start spewing crud into __builtin__:
>>> import __builtin__
>>> __builtin__.rubbish= 3
>>> rubbish
3
But, don't do this; it's horrible evilness that will give your applications programming-cancer.
classes and functions and i don't want to have to keep declaring
Put them in modules and ‘import’ them when you need to use them.
I have certain variables i want to use throughout my whole project
If you must have unqualified values, just put them in a file called something like “mypackage/constants.py” then:
from mypackage.constants import *
If they really are ‘variables’ in that you change them during app execution, you need to start encapsulating them in objects.
Create empty superglobal.py module.
In your files do:
import superglobal
superglobal.whatever = loacalWhatever
other = superglobal.other
Even if there are, you should not use such a construct EVER. Consider using a borg pattern to hold this kind of stuff.
class Config:
"""
Borg singlton config object
"""
__we_are_one = {}
__myvalue = ""
def __init__(self):
#implement the borg patter (we are one)
self.__dict__ = self.__we_are_one
def myvalue(self, value=None):
if value:
self.__myvalue = value
return self.__myvalue
conf = Config()
conf.myvalue("Hello")
conf2 = Config()
print conf2.myvalue()
Here we use the borg pattern to create a singlton object. No matter where you use this in the code, the 'myvalue' will be the same, no matter what module or class you instantiate Config in.
in years of practice, i've grown quite disappointed with python's import system: it is complicated and difficult to handle correctly. also, i have to maintain scores of imports in each and every module i write, which is a pita.
namespaces are a very good idea, and they're indispensable---php doesn't have proper namespaces, and it's a mess.
conceptually, part of writing an application consists in defining a suitable vocabulary, the words that you'll use to do the things you want to. yet in the classical way, it's exactly these words that won't come easy, as you have to first import this, import that to obtain access.
when namespaces came into focus in the javascript community, john resig of jquery fame decided that providing a single $ variable in the global namespace was the way to go: it would only affect the global namespace minimally, and provide easy access to everything with jquery.
likewise, i experimented with a global variable g, and it worked to an extent. basically, you have two options here: either have a startup module that must be run prior to any other module in your app, which defines what things should be available in g, so it is ready to go when needed. the other approach which i tried was to make g lazy and to react with custom imports when a new name was required; so whenever you need to g.foo.frob(42) for the first time, the mechanism will try something like import foo; g.foo = foo behind the scenes. that was considerable more difficult to do right.
these days, i've ditched the import system almost completely except for standard library and site packages. most of the time i write wrappers for hose libraries, as 90% of those have inanely convoluted interfaces anyhow. those wrappers i then publish in the global namespace, using spelling conventions to keep the risk of collisions to a minimum.
i only tell this to alleviate the impression that modifying the global namespace is something that is inherently evil, which the other answers seem to state. not so. what is evil is to do it thoughtlessly, or be compelled by language or package design to do so.
let me add one remark, as i almost certainly will get some fire here: 99% of all imports done by people who religiously defend namespace purity are wrong. proof? you'll read in the beginning lines of any module foo.py that needs to do trigonometrics something like from math import sin. now when you correctly import foo and have a look at that namespace, what are you going to find? something named foo.sin. but that sin isn't part of the interface of foo, it is just a helper, it shouldn't clutter that namespace---hence, from math import sin as _sin or somesuch would've been correct. however, almost nobody does it that way.
i'm sure to arouse some heated comments with these views, so go ahead.
The reason it wasn't obvious to you is that Python intentionally doesn't try to support such a thing. Namespaces are a feature, and using them is to your advantage. If you want something you defined in another file, import it. This means from reading your source code you can figure out where everything came from, and also makes your code easier to test and modify.