How can python program know if it is being tested? For example:
def foo():
if foo_being_tested:
pseudorandom()
else:
random()
When in test, program should use pseudorandom sequence to be able to compare with C code version of the program and in regular execution random from numpy should be used.
You can't, not without inspecting the call stack.
Generally speaking, you should not do this at all; by altering your code when tested you are not correctly testing your code.
Instead, you'd use mocking to replace any parts your code uses (anything used by the code under test but not part of it). For your specific example, you'd mock out random(); on Python 3.3 and up you can use unittest.mock, available as mock on PyPI for other Python versions, or you can just manually swap out module_under_test.random for the duration of the test.
You could also set an environment variable in your unittests to make it explicit you are running a test, but ideally that should be avoided.
Related
I am starting a new Python project that is supposed to run both sequentially and in parallel. However, because the behavior is entirely different, running in parallel would require a completely different set of classes than those used when running sequentially. But there is so much overlap between the two codes that it makes sense to have a unified code and defer the parallel/sequential behavior to a certain group of classes.
Coming from a C++ world, I would let the user set a Parallel or Serial class in the main file and use that as a template parameter to instantiate other classes at runtime. In Python there is no compilation time so I'm looking for the most Pythonic way to accomplish this. Ideally, it would be great that the code determines whether the user is running sequentially or in parallel to select the classes automatically. So if the user runs mpirun -np 4 python __main__.py the code should behave entirely different than when the user calls just python __main__.py. Somehow it makes no sense to me to have if statements to determine the type of an object at runtime, there has to be a much more elegant way to do this. In short, I would like to avoid:
if isintance(a, Parallel):
m = ParallelObject()
elif ifinstance(a, Serial):
m = SerialObject()
I've been reading about this, and it seems I can use factories (which somewhat have this conditional statement buried in the implementation). Yet, using factories for this problem is not an option because I would have to create too many factories.
In fact, it would be great if I can just "mimic" C++'s behavior here and somehow use Parallel/Serial classes to choose classes properly. Is this even possible in Python? If so, what's the most Pythonic way to do this?
Another idea would be to detect whether the user is running in parallel or sequentially and then load the appropriate module (either from a parallel or sequential folder) with the appropriate classes. For instance, I could have the user type in the main script:
from myPackage.parallel import *
or
from myPackage.serial import *
and then have the parallel or serial folders import all shared modules. This would allow me to keep all classes that differentiate parallel/serial behavior with the same names. This seems to be the best option so far, but I'm concerned about what would happen when I'm running py.test because some test files will load parallel modules and some other test files would load the serial modules. Would testing work with this setup?
You may want to check how a similar issue is solved in the stdlib: https://github.com/python/cpython/blob/master/Lib/os.py - it's not a 100% match to your own problem, nor the only possible solution FWIW, but you can safely assume this to be a rather "pythonic" solution.
wrt/ the "automagic" thing depending on execution context, if you decide to go for it, by all means make sure that 1/ both implementations can still be explicitely imported (like os.ntpath and os.posixpath) so they are truly unit-testable, and 2/ the user can still manually force the choice.
EDIT:
So if I understand it correctly, this file you points out imports modules depending on (...)
What it "depends on" is actually mostly irrelevant (in this case it's a builtin name because the target OS is known when the runtime is compiled, but this could be an environment variable, a command line argument, a value in a config file etc). The point was about both conditional import of modules with same API but different implementations while still providing direct explicit access to those modules.
So in a similar way, I could let the user type from myPackage.parallel import * and then in myPackage/init.py I could import all the required modules for the parallel calculation. Is this what you suggest?
Not exactly. I posted this as an example of conditional imports mostly, and eventually as a way to build a "bridge" module that can automagically select the appropriate implementation at runtime (on which basis it does so is up to you).
The point is that the end user should be able to either explicitely select an implementation (by explicitely importing the right submodule - serial or parallel and using it directly) OR - still explicitely - ask the system to select one or the other depending on the context.
So you'd have myPackage.serial and myPackage.parallel (just as they are now), and an additional myPackage.automagic that dynamically selects either serial or parallel. The "recommended" choice would then be to use the "automagic" module so the same code can be run either serial or parallel without the user having to care about it, but with still the ability to force using one or the other where it makes sense.
My fear is that py.test will have modules from parallel and serial while testing different files and create a mess
Why and how would this happen ? Remember that Python has no "process-global" namespace - "globals" are really "module-level" only - and that python's import is absolutely nothing like C/C++ includes.
import loads a module object (can be built directly from python source code, or from compiled C code, or even dynamically created - remember, at runtime a module is an object, instance of the module type) and binds this object (or attributes of this object) into the enclosing scope. Also, modules are garanteed (with a couple caveats, but those are to be considered as error cases) to be imported only once for a given process (and then cached) so importing the same module twice in a same process will yield the same object (IOW a module is a singleton).
All this means that given something like
# module A
def foo():
return bar(42)
def bar(x):
return x * 2
and
# module B
def foo():
return bar(33)
def bar(x):
return x / 2
It's garanteed that however you import from A and B, A.foo will ALWAYS call A.bar and NEVER call B.bar and B.foo will only ever call B.bar (unless you explicitely monkeyptach them of course but that's not the point).
Also, this means that within a module you cannot have access to the importing namespace (the module or function that's importing your module), so you cannot have a module depending on "global" names set by the importer.
To make a long story short, you really need to forget about C++ and learn how Python works, as those are wildly different languages with wildly different object models, execution models and idioms. A couple interesting reads are http://effbot.org/zone/import-confusion.htm and https://nedbatchelder.com/text/names.html
EDIT 2:
(about the 'automagic' module)
I would do that based on whether the user runs mpirun or just python. However, it seems it's not possible (see for instance this or this) in a portable way without a hack. Any ideas in that direction?
I've never ever had anything to do with mpi so I can't help with this - but if the general consensus is that there's no reliable portable way to detect this then obviously there's your answer.
This being said, simple stupid solutions are sometimes overlooked. In your case, explicitly setting an environment variable or passing a command-line switch to your main script would JustWork(tm), ie the user should for example use
SOMEFLAG=serial python main.py
vs
SOMEFLAG=parallel mpirun -np4 python main.py
or
python main.py serial
vs
mpirun -np4 python main.py parallel
(whichever works best for you needs - is the most easily portable).
This of course requires a bit more documentation and some more effort from the end-user but well...
I'm not really what you're asking here. Python classes are just (callable/instantiable) objects themselves, so you can of course select and use them conditionally. If multiple classes within multiple modules are involved, you can also make the imports conditional.
if user_says_parallel:
from myPackage.parallel import ParallelObject
ObjectClass = ParallelObject
else:
from myPackage.serial import SerialObject
ObjectClass = SerialObject
my_abstract_object = ObjectClass()
If that's very useful depends on your classes and the effort it takes to make sure they have the same API so they're compatible when replacing each other. Maybe even inheritance à la ParallelObject => SerialObject is possible, or at least a common (virtual) base class to put all the shared code. But that's just the same as in C++.
I can’t find any examples, I will try to make the question specific:
Given that micropython has some form of unit test library, how can I do monkey patch or similar to replace system objects output or input within test cases.
The desire is to write test cases that cannot be mocked without altering the actual implementation code just for test, I.e network or file system objects replaced with mocks using patch - or a similar manual way of overriding the system objects for test purposes.
You can try the technique I laid out in https://forum.micropython.org/viewtopic.php?t=4475#p25925
# load in the module for patching (make sure this gets run before other imports)
import modulename
# create a modified version of modulename, cherry-picking from the real thing
patchedmodule = ...
# replace the module with your modified one for all future imports
sys.modules['modulename']=patchedmodule
So I have some code which uses gphoto2 to capture some images and stuff, I figured the best way to test this would be to wrap the gphoto2 code in something like an if TESTING: then return fake data, else do the gphoto2 stuff.
Does anyone know how I would achieve this, I've tried googling some things but I've not had any luck with specifically detecting if unit tests are being run or not.
I'd assume it would be something like if unittest: but maybe there is a better way to do this altogether?
EDIT:
So based on the comments and answers so far, I tried out the unittest.mock package, it didn't work as I'd hoped, let me explain.
If I have method A which calls the capture image method (method B), then saves the image and a few other bits. I've managed to mock method B so that it returns either the image or None, which works fine when I call method B specifically, but when I try to call method A, it doesn't use the mock of method B, it uses the actual method B.
How do I make method A use the mock method B?
The mock package exists for this very reason.
It's a standalone, pip-installable package for Python 2; it has been incorporated into the standard library for Python versions >= 3.3 (as unittest.mock).
Just use a mocking library from within your test code. This way you'd mask the external APIs (hardware calls in your case) and return predictable values.
I would recommend flexmock https://pypi.python.org/pypi/flexmock it's super easy.
In the beginning of your test code, you'll write something like:
flexmock(SomeObject).should_receive('some_method').and_return('some', 'values')
I use IPython Notebooks extensively in my research. I find them to be a wonderful tool.
However, on more than one occasion, I have been bitten by subtle bugs stemming from variable scope. For example, I will be doing some exploratory analysis:
foo = 1
bar = 2
foo + bar
And I decide that foo + bar is a useful algorithm for my purposes, so I encapsulate it in a function to make it easier to apply to a wider range of inputs:
def the_function(foo, bar):
return foo + bar
Inevitably, somewhere down the line, after building a workflow from the ground up, I will have a typo somewhere (e.g. def the_function(fooo, bar):) that causes a global variable to be used (and/or modified) in a function call. This causes unseen side effects and leads to spurious results. But because it typically returns a result, it can be difficult to find where the problem actually occurs.
Now, I recognize that this behavior is a feature, which I deliberately use often (for convenience, or for necessity i.e. function closures or decorators). But as I keep running into bugs, I'm thinking I need a better strategy for avoiding such problems (current strategy = "be careful").
For example, one strategy might be to always prepend '_' to local variable names. But I'm curious if there are not other strategies - even "pythonic" strategies, or community encouraged strategies.
I know that python 2.x differs in some regards to python 3.x in scoping - I use python 3.x.
Also, strategies should consider the interactive nature of scientific computing, as would be used in an IPython Notebook venue.
Thoughts?
EDIT: To be more specific, I am looking for IPython Notebook strategies.
I was tempted to flag this question as too broad, but perhaps the following will help you.
When you decide to wrap some useful code in a function, write some tests. If you think the code is useful, you must have used it with some examples. Write the test first lest you 'forget'.
My personal policy for a library module is to run the test in an if __name__
== '__main__': statement, whether the test code is in the same file or a different file. I also execute the file to run the tests multiple times during a programming session, after every small unit of change (trivial in Idle or similar IDE).
Use a code checker program, which will catch some typo-based errors. "'fooo' set but never used".
Keep track of the particular kinds of errors you make, analyze them and think about personal countermeasures, or at least learn to recognize the symptoms.
Looking at your example, when you do write a function, don't use the same names for both global objects and parameters. In your example, delete or change the global 'foo' and 'bar' or use something else for parameter names.
I would suggest that you separate your concerns. For your exploratory analysis, write your code in the iPython notebook, but when you've decided that there are some functions that are useful, instead, open up an editor and put your functions into a python file which you can then import.
You can use iPython magics to auto reload things you've imported. So once you've tested them in iPython, you can simply copy them to your module. This way, the scope of your functions is isolated from your notebook. An additional advantage is that when you're ready to run things in a headless environment, you already have your entire codebase in one place.
In the end, I made my own solution to the problem. It builds on both answers given so far.
You can find my solution, which is a cell magic extension, on github: https://github.com/brazilbean/modulemagic
In brief, this extension gives you the ability to create %%module cells in the notebook. These cells are saved as a file and imported back into your session. It effectively accomplishes what #shadanan had suggested, but allows you to keep all your work in the same place (convenient, and in line with the Notebook philosophy of providing code and results in the same place).
Because the import process sandboxes the code, it solves all of the scope shadowing errors that motivated my original question. It also involves little to no overhead to use - no renaming of variables, having other editors open, etc.
After reading the Software Carpentry essay on Handling Configuration Files I'm interested in their Method #5: put parameters in a dynamically-loaded code module. Basically I want the power to do calculations within my input files to create my variables.
Based on this SO answer for how to import a string as a module I've written the following function to import a string or oen fileobject or STringIO as a module. I can then access varibales using the . operator:
import imp
def make_module_from_text(reader):
"""make a module from file,StringIO, text etc
Parameters
----------
reader : file_like object
object to get text from
Returns
-------
m: module
text as module
"""
#for making module out of strings/files see https://stackoverflow.com/a/7548190/2530083
mymodule = imp.new_module('mymodule') #may need to randomise the name; not sure
exec reader in mymodule.__dict__
return mymodule
then
import textwrap
reader = textwrap.dedent("""\
import numpy as np
a = np.array([0,4,6,7], dtype=float)
a_normalise = a/a[-1]
""")
mymod = make_module_from_text(reader)
print(mymod.a_normalise)
gives
[ 0. 0.57142857 0.85714286 1. ]
All well and good so far, but having looked around it seems using python eval and exec introduces security holes if I don't trust the input. A common response is "Never use eval orexec; they are evil", but I really like the power and flexibility of executing the code. Using {'__builtins__': None} I don't think will work for me as I will want to import other modules (e.g. import numpy as np in my above code). A number of people (e.g. here) suggest using the ast module but I am not at all clear on how to use it(can ast be used with exec?). Is there simple ways to whitelist/allow specific functionality (e.g. here)? Is there simple ways to blacklist/disallow specific functionality? Is there a magic way to say execute this but don't do anythinh nasty.
Basically what are the options for making sure exec doesn't run any nasty malicious code?
EDIT:
My example above of normalising an array within my input/configuration file is perhaps a bit simplistic as to what computations I would want to perform within my input/configuration file (I could easily write a method/function in my program to do that). But say my program calculates a property at various times. The user needs to specify the times in some way. Should I only accept a list of explicit time values so the user has to do some calculations before preparing the input file? (note: even using a list as configuration variable is not trivial see here). I think that is very limiting. Should I allow start-end-step values and then use numpy.linspace within my program? I think that is limiting too; whatif I want to use numpy.logspace instead? What if I have some function that can accept a list of important time values and then nicely fills in other times to get well spaced time values. Wouldn't it be good for the user to be able to import that function and use it? What if I want to input a list of user defined objects? The thing is, I don't want to code for all these specific cases when the functinality of python is already there for me and my user to use. Once I accept that I do indead want the power and functionality of executing code in my input/configuration file I wonder if there is actually any difference, security wise, in using exec vs using importlib vs imp.load_source and so on. To me there is the limited standard configparser or the all powerful, all dangerous exec. I just wish there was some middle ground with which I could say 'execute this... without stuffing up my computer'.
"Never use eval or exec; they are evil". This is the only answer that works here, I think. There is no fully safe way to use exec/eval on an untrusted string string or file.
The best you can do is to come up with your own language, and either interpret it yourself or turn it into safe Python code before handling it to exec. Be careful to proceed from the ground up --- if you allow the whole Python language minus specific things you thought of as dangerous, it would never be really safe.
For example, you can use the ast module if you want Python-like syntax; and then write a small custom ast interpreter that only recognizes a small subset of all possible nodes. That's the safest solution.
If you are willing to use PyPy, then its sandboxing feature is specifically designed for running untrusted code, so it may be useful in your case. Note that there are some issues with CPython interoperability mentioned that you may need to check.
Additionally, there is a link on this page to an abandoned project called pysandbox, explaining the problems with sandboxing directly within python.