I have a class like the following:
class A:
def __init__(self, arg1, arg2, arg3):
self.a=arg1
self.b=arg2
self.c=arg3
# ...
self.x=do_something(arg1, arg2, arg3)
self.y=do_something(arg1, arg2, arg3)
self.m = self.func1(self.x)
self.n = self.func2(self.y)
# ...
def func1(self, arg):
# do something here
def func2(self, arg):
# do something here
As you can see, initializing the class needs to feed in arg1, arg2, and arg3. However, testing func1 and func2 does not directly require such inputs, but rather, it's simply an input/output logic.
In my test, I can of course instantiate and initialize a test object in the regular way, and then test func1 and func2 individually. But the initialization requires input of arg1 arg2, arg3, which is really not relevant to test func1 and func2.
Therefore, I want to test func1 and func2 individually, without first calling __init__. So I have the following 2 questions:
What's the best way of designing such tests? (perferably, in py.test)
I want to test func1 and func2 without invoking __init__. I read from here that A.__new__() can skip invoking __init__ but still having the class instantiated. Is there a better way to achieve what I need without doing this?
NOTE:
There have been 2 questions regarding my ask here:
Is it necessary to test individual member functions?
(for testing purpose) Is it necessary to instantiating a class without initializing the object with __init__?
For question 1, I did a quick google search and find some relevant study or discussion on this:
Unit Testing Non Public Member Functions
(PDF) Incremental Testing of Object-Oriented Class Structures.
We initially test base classes having no parents by designing a test
suite that tests each member function individually and also tests the
interactions among member functions.
For question 2, I'm not sure. But I think it is necessary, as shown in the sample code, func1 and func2 are called in __init__. I feel more comfortable testing them on an class A object that hasn't been called with __init__ (and therefore no previous calls to func1 and func2).
Of course, one could just instantiate a class A object with regular means (testobj = A()) and then perform individual test on func1 and func2. But is it good:)? I'm just discussing here as what's the best way to test such scenario, what's the pros and cons.
On the other hand, one might also argue that from design perspective one should NOT put calls to func1 and func2 in __init__ in the first place. Is this a reasonable design option?
It is not usually useful or even possible to test methods of a class without instantiating the class (including running __init__). Typically your class methods will refer to attributes of the class (e.g., self.a). If you don't run __init__, those attributes won't exist, so your methods won't work. (If your methods don't rely on the attributes of their instance, then why are they methods and not just standalone functions?) In your example, it looks like func1 and func2 are part of the initialization process, so they should be tested as part of that.
In theory it is possible to "quasi-instantiate" the class by using __new__ and then adding just the members that you need, e.g.:
obj = A.__new__(args)
obj.a = "test value"
obj.func1()
However, this is probably not a very good way to do tests. For one thing, it results in you duplicating code that presumably already exists in the initialization code, which means your tests are more likely to get out of sync with the real code. For another, you may have to duplicate many initialization calls this way, since you'll have to manually re-do what would otherwise be done by any base-class __init__ methods called from your class.
As for how to design tests, you can take a look at the unittest module and/or the nose module. That gives you the basics of how to set up tests. What to actually put in the tests obviously depends on what your code is supposed to do.
Edit: The answer to your question 1 is "definitely yes, but not necessarily every single one". The answer to your question 2 is "probably not". Even at the first link you give, there is debate about whether methods that are not part of the class's public API should be tested at all. If your func1 and func2 are purely internal methods that are just part of the initialization, then there is probably no need to test them separately from the initialization.
This gets to your last question about whether it's appropriate to call func1 and func2 from within __init__. As I've stated repeatedly in my comments, it depends on what these functions do. If func1 and func2 perform part of the initialization (i.e., do some "setting-up" work for the instance), then it's perfectly reasonable to call them from __init__; but in that case they should be tested as part of the initialization process, and there is no need to test them independently. If func1 and func2 are not part of the initialization, then yes, you should test them independently; but in that case, why are they in __init__?
Methods that form an integral part of instantiating your class should be tested as part of testing the instantiation of your class. Methods that do not form an integral part of instantiating your class should not be called from within __init__.
If func1 and func2 are "simply an input/output logic" and do not require access to the instance, then they don't need to be methods of the class at all; they can just be standalone functions. If you want to keep them in the class you can mark them as staticmethods and then call them on the class directly without instantiating it. Here's an example:
>>> class Foo(object):
... def __init__(self, num):
... self.numSquared = self.square(num)
...
... #staticmethod
... def square(num):
... return num**2
>>> Foo.square(2) # you can test the square "method" this way without instantiating Foo
4
>>> Foo(8).numSquared
64
It is just imaginable that you might have some monster class which requires a hugely complex initialization process. In such a case, you might find it necessary to test parts of that process individually. However, such a giant init sequence would itself be a warning of an unwieldy designm.
If you have a choice, i'd go for declaring your initialization helper functions as staticmethods and just call them from tests.
If you have different input/output values to assert on, you could look into some parametrizing examples with py.test.
If your class instantiation is somewhat heavy you might want to look into dependency injection and cache the instance like this:
# content of test_module.py
def pytest_funcarg__a(request):
return request.cached_setup(lambda: A(...), scope="class")
class TestA:
def test_basic(self, a):
assert .... # check properties/non-init functions
This would re-use the same "a" instance across each test class. Other possible scopes are "session", "function" or "module". You can also define a command line option to set the scope so that for quick development you use more caching and for Continous-Integration you use more isolated resource setup, without the need to change the test source code.
Personally, in the last 12 years i went from fine-grained unit-testing to more functional/integration types of testing because it eases refactoring and seemed to make better use of my time overall. It's of course crucial to have good support and reports when failures occur, like dropping to PDB, concise tracebacks etc. And for some intricate algorithms i still write very fine-grained unit-tests but then i usually separate the algorithm out into a very independently testable thing.
HTH, holger
I agree with previous comments that it is generally better to avoid this problem by reducing the amount of work done at instantiation, e.g. by moving func1 etc calls into aconfigure(self) method which should be called after instantiation.
If you have strong reasons for keeping calls to self.func1 etc in __init__, there is an approach in pytest which might help.
(1) Put this in the module:
_called_from_test = False
(2) Put the following in conftest.py
import your_module
def pytest_configure(config):
your_module._called_from_test = True
with the appropriate name for your_module.
(3) Insert an if statement to end the execution of __init__ early when you are running tests,
if _called_from_test:
pass
else:
self.func1( ....)
You can then step through the individual function calls, testing them as you go.
The same could be achieved by making _called_from_test an optional argument of __init__.
More context is given in the Detect if running from within a pytest run section of pytest documentation.
Related
I am writing a package for interacting with dataset and have code that looks something like
from abc import ABC, ABCMeta, abstractmethod
from functools import cache
from pathlib import Path
from warnings import warn
class DatasetMetaClass(ABCMeta):
r"""Meta Class for Datasets"""
#property
#cache
def metaclass_property(cls):
r"""Compute an expensive property (for example: dataset statistics)."""
warn("Caching metaclass property...")
return "result"
# def __dir__(cls):
# return list(super().__dir__()) + ['metaclass_property']
class DatasetBaseClass(metaclass=DatasetMetaClass):
r"""Base Class for datasets that all datasets must subclass"""
#classmethod
#property
#cache
def baseclass_property(cls):
r"""Compute an expensive property (for example: dataset statistics)."""
warn("Caching baseclass property...")
return "result"
class DatasetExampleClass(DatasetBaseClass, metaclass=DatasetMetaClass):
r"""Some Dataset Example."""
Now, the problem is that during make html, sphinx actually executes the baseclass_property which is a really expensive operation. (Among other things: checks if dataset exists locally, if not, downloads it, preprocesses it, computes dataset statistics, mows the lawn and takes out the trash.)
I noticed that this does not happen if I make it a MetaClass property, because the meta-class property does not appear in the classes __dir__ call which may or may not be a bug. Manually adding it to __dir__ by uncommenting the two lines causes sphinx to also process the metaclass property.
Questions:
Is this a bug in Sphinx? Given that #properties are usually handled fine, it seems unintended that it breaks for #classmethod#property.
What is the best option - at the moment - to avoid this problem? Can I somehow tell Sphinx to not parse this function? I hope that there is a possibility to either disable sphinx for a function via comment similarly to # noqa, # type: ignore, # pylint disable= etc. or via some kind of #nodoc decorator.
Everything is working as it should, and there is no "bug" there either in Sphinx, nor in the ABC machinery, and even less in the language.
Sphinx uses th language introspection capabilities to retrieve a class's members and then introspect then for methods. What happens when you combine #classmethod and #property is that, besides it somewhat as a nice surprise actually work, when the class member thus created is accessed by Sphynx, as it must do in search for the doc strings, the code is triggered and runs.
It would actually be less surprising if property and classmethod could not be used in combination actually since both property and classmethod decorators use the descriptor protocol to create a new object with the appropriate methods for the feature they implement.
I think the less surprising thing to go there is to put some explicit guard inside your "classmethod property cache" functions to not run when the file is being processed by sphinx. Since sphinx do not have this feature itself, you can use an environment variable for that, say GENERATING_DOCS. (this does not exist, it can be any name), and then a guard inside your methods like:
...
def baseclass_property(self):
if os.environ.get("GENERATING_DOCS", False):
return
And then you either set this variable manually before running the script, or set it inside Sphinx' conf.py file itself.
If you have several such methods, and don't want to write the guard code in all of them, you could do a decorator, and while at that, just use the same decorator to apply the other 3 decorators at once:
from functools import cache, wraps
import os
def cachedclassproperty(func):
#wraps(func)
def wrapper(*args, **kwargs):
if os.environ.get("GENERATING_DOCS", False):
return
return func(*args, **kwargs)
return classmethod(property(cache(wrapper)))
Now, as for using the property on the metaclass: I advise against it. Metaclasses are for when you really need to customize your class creation process, and it is almost by chance that property on a metaclass works as a class property as well. All that happens in this case, as ou have investigated, is that the property will be hidden from a class' dir, and therefore won't be hit by Sphinx introspection - but even if you are using a metaclass for some other purpose, if you simply add a guard as I had suggested might even not prevent sphinx from properly documenting the class property, if it has a docstring. If you hide it from Sphinx, it will obviously go undocumented.
I must be tired, because surely there is an easy way to do this.
But I've read over the pytest docs and can't figure out this simple use case.
I have a little package I want to test:
class MyClass:
def __init__(self):
pass
def my_method(self, arg):
pass
def the_main_method():
m = MyClass()
m.my_method(123)
I would like to ensure that (1) an instance of MyClass is created, and that (2) my_method is called, with the proper arguments.
So here's my test:
from unittest.mock import patch
#patch('mypkg.MyClass', autospec=True)
def test_all(mocked_class):
# Call the real production code, with the class mocked.
import mypkg
mypkg.the_main_method()
# Ensure an instance of MyClass was created.
mocked_class.assert_called_once_with()
# But how do I ensure that "my_method" was called?
# I want something like mocked_class.get_returned_values() ...
I understand that each time the production code calls MyClass() the unittest framework whips up a new mocked instance.
But how do I get my hands on those instances?
I want to write something like:
the_instance.assert_called_once_with(123)
But where do I get the_instance from?
Well, to my surprise, there is only one mock instance created, no matter how many times you call the constructor (:
What I can write is:
mocked_class.return_value.my_method.assert_called_once_with(123)
The return_value does not represent one return value, though — it accumulates information for all created instances.
It's a rather abstruse approach, in my mind. I assume it was copied from some crazy Java mocking library (:
If you want to capture individual returned objects, you can use .side_effect to return whatever you want, and record it in your own list, etc.
I'm looking at the source code for a trie implementation
On lines 80-85:
def keys(self, prefix=[]):
return self.__keys__(prefix)
def __keys__(self, prefix=[], seen=[]):
result = []
etc.
What is def __keys__? Is that a magic object that is self-created? If so, is this poor code? Or does __keys__ exist as a standard Python magic method? I can't find it anywhere in the Python documentation, though.
Why is it legal for the function to call self.__keys__ before def __keys__ is even instantiated? Wouldn't def __keys__ have to go before def keys (since keys calls __keys__)?
For your second question, it is legal, the functions for a class are defined when the class gets defined , so you can be sure both functions would be defined before keys() is called, the logic also applies to normal functions, we can do -
>>> def a():
... b()
...
>>> def b():
... print("In B()")
...
>>> a()
In B()
This is legal because both a() and b() are defined before a() is called. It would only be illegal , if you try to call a() before b() gets defined. Please note defining a function does not automatically call it , and python does not validate at time of definition of function whether any functions used in a function is defined or not (untill runtime, when the function is called and in that case it throws a NameError)
For your first question, I do not know of any such magic methods called __keys__() , cannot find it in documentation either.
All of the real "magic methods" are in the data model documentation; __keys__ isn't one of them. The style guide says:
Never invent such names; only use them as documented.
so yes, making up a new one is bad form (the convention would have been to call it _keys).
The second part of your question doesn't make sense; even if this wasn't a class, there is no need to define methods and functions in the order they're called. As long as they exist by the time the call actually gets made, it's not a problem. I tend to define public methods before private ones, even though the former may call the latter, simply for the reader's convenience.
There is no magic method named __keys__(), so as you suspected this is just poor naming.
The code in the class definition can be in any order. All the matters that the definition has been made by the time the actual call is made downstream.
There is no magic method named __keys__, so its just a wrong naming convention. Looking at the code, the author just wanted to have a private method which is used internally, and also from the public method keys. As you can see __keys__ accepts an additional argument.
About the second question, there is no need that you define the functions in the same order as they called. It will be available by the time code is compiled.
The compilation of a class in Python is done way before the class is instantiated.
Whenever class type is created, the body of the class block is compiled and executed. Then, all the functions are transformed either into bound handles (normal functions) or into classmethod/staticmethod objects. Then, when a new instance is created, content of the type's __dict__ is copied over to the instance (and bound handles are transformed into methods).
Therefore, at the moment of calling instance.keys(), the instance already has both keys and __keys__ methods.
Also, there is no __keys__ method in any data mode, as far as I know.
I have a one to many class inheritance structure as follows:
class SuperClass:
def func1():
print 'hello'
def func2():
print 'ow'
class SubClass1(SuperClass):
def func1():
print 'hi'
class SubClass2(SuperClass):
def func1():
print 'howdy'
...
I want to add functionality to class A so that I can use it when I create classes B and C (etc), but I cannot edit the code for class A directly. My current solution is:
def func3():
print 'yes!'
SuperClass.func3 = func3
Is there a better and/or more pythonic way to achieve this?
This is called "monkeypatching", and is perfectly reasonable in some cases.
For example if you have to use someone else's code (that you can't modify) that depends on SuperClass, and you need to change that code's behavior, your only real choice is to replace methods on SuperClass.
However, in your case, there doesn't seem to be any good reason to do this. You're defining all of the subclasses of SuperClass, so why not just add another class in between?
class Intermediate(SuperClass):
def func3():
pass
class SubClass1(Intermediate):
def func1():
print 'hi'
This isn't good enough for "functionality that should have been in SuperClass but wasn't" if other code you can't control needs that functionality… but when it's only your code that needs that functionality, it's just as good, and a lot simpler.
If even the subclasses aren't under your control, often you can just derive a new class from each one that is. For example:
class Func3Mixin(object):
def func3():
pass
class F3SubClass1(SubClass1, Func3Mixin):
pass
class F3SubClass2(SubClass2, Func3Mixin):
pass
Now you just construct instances of F3SubClass1 instead of SubClass1. Code that was expecting a SubClass1 instance can use an F3SubClass1 just fine. And Python's duck typing makes this kind of "mixin-oriented programming" especially simple: inside the implementation of Func3Mixin.func3, you can use attributes and methods of SuperClass, despite the fact that Func3Mixin itself isn't statically related to SuperClass in any way, because you know that any runtime object that is a Func3Mixin will also be a SuperClass.
Meanwhile, even when monkeypatching is appropriate, it isn't necessarily the best answer. For example, if you're patching to work around a bug in some third-party code, that code has a nice license and a source repository that makes it easy to maintain your own patches, you can just fork it, create a fixed copy, and use that instead of the original.
Also, it's worth pointing out that none of your classes are actually usable as written—any attempt to call any of the methods will raise a TypeError because they're missing the self argument. But the way you've monkeypatched in func3, it will fail in exactly the same way as func1. (And the same is true for the alternatives I sketched above.)
Finally, all of your classes here are classic classes rather than new-style, because you forgot to make SuperClass inherit from object. If you can't change SuperClass, of course, that's not your fault—but you may want to fix it anyway by making your subclasses (or Intermediate) multiply inherit from object and SuperClass. (If you've been paying attention: yes, this means you can mix-in new-style-classness. Although under the covers you have to understand metaclasses to understand why.)
I am writing a framework to be used by people who know some Python. I have settled on some syntax, and it makes sense to me and them to use something like this, where Base is the Base class that implements the framework.
class A(Base):
#decorator1
#decorator2
#decorator3
def f(self):
pass
#decorator4
def f(self):
pass
#decorator5
def g(self)
pass
All my framework is implemented via Base's metaclass. This setup is appropriate for my use case, because all these user-defined classes have a rich inheritance graph. I expect the user to implement some of the methods, or just leave it with pass. Much of the information that the user is giving here is in the decorators. This allows me to avoid other solutions where the user would have to monkey-patch, give less structured dictionaries, and things like that.
My problem here is that f is defined twice by the user (with good reason), and this should be handled by my framework. Unfortunately, by the time this gets to the metaclass'__new__ method, the dictionary of attributes contains only one key f. My idea was to use yet another decorator, such as #duplicate for the user to signal this is happening, and the two f's to be wrapped differently so they don't overwrite each other. Can something like this work?
You should use a namespace to distinguish the different fs.
Heed the advice of the "Zen of Python":
Namespaces are one honking great idea -- let's do more of those!
Yes, you could, but only with an ugly hack, and only by storing the function with a new name.
You'd have to use the sys._getframe() function to retrieve the local namespace of the calling frame. That local namespace is the class-in-construction, and adding items to that namespace means they'll end up in the class dictionary passed to your metaclass.
The following retrieves that namespace:
callframe = sys._getframe(1)
namespace = callframe.f_locals
namespace is a dict-like object, just like locals(). You can now store something in that namespace (like __function_definitions__ or similar) to add extra references to the functions.
You might be thinking Java - methods overloading and arguments signature - but this is Python and you cannot do this. The second f() will override the first f() and you end up with only one f(). The namespace is a dictionary and you cannot have duplicated keys.