Turning a python function into a class without breaking the API - python

So I created a python library for computing error metrics between time series (here). When I was first creating the library, I was a beginner programmer with pretty much zero previous experience, so for every error metric, I just wrote it as a function. Today, I was thinking it might be nice if each error metric was represented as a class, so a user could do something like the following.
# Name of the package
import HydroErr as he
he.r_squared.description # Would return out a brief metric description
I would want to keep the old API syntax intact, or it would break all legacy code. It would have to look something like this when simulated and observed data was passed in.
import HydroErr as he
import numpy as np
he.r_squared(np.array([1, 2, 3]), np.array([1.1, 1.21, 1.3]))
# Out: 0.9966777408637874
I'm not really sure how to do this, and more importantly if I should do this. Any help would be appreciated.

To turn a function in a class you can use the __call__ method :
def function(param):
pass
# Becomes
class MyClass:
def __call__(self, param):
pass
def other_method(self):
pass
function = MyClass()
Both can be used like this : function(42)

You don't have to turn the functions into classes for this to work:
def r_squared(x, y):
""" Do things... """
return 56
r_squared.description = r_squared.__doc__
You can write a decorator if there's many functions like that:
def add_description(fn):
fn.description = fn.__doc__
#add_description
def r_squared(x, y):
""" Do things... """
return 56

Related

How to avoid parameter type in function's name?

I have a function foo that takes a parameter stuff
Stuff can be something in a database and I'd like to create a function that takes a stuff_id, get the stuff from the db, execute foo.
Here's my attempt to solve it:
1/ Create a second function with suffix from_stuff_id
def foo(stuff):
do something
def foo_from_stuff_id(stuff_id):
stuff = get_stuff(stuff_id)
foo(stuff)
2/ Modify the first function
def foo(stuff=None, stuff_id=None):
if stuff_id:
stuff = get_stuff(stuff_id)
do something
I don't like both ways.
What's the most pythonic way to do it ?
Assuming foo is the main component of your application, your first way. Each function should have a different purpose. The moment you combine multiple purposes into a single function, you can easily get lost in long streams of code.
If, however, some other function can also provide stuff, then go with the second.
The only thing I would add is make sure you add docstrings (PEP-257) to each function to explain in words the role of the function. If necessary, you can also add comments to your code.
I'm not a big fan of type overloading in Python, but this is one of the cases where I might go for it if there's really a need:
def foo(stuff):
if isinstance(stuff, int):
stuff = get_stuff(stuff)
...
With type annotations it would look like this:
def foo(stuff: Union[int, Stuff]):
if isinstance(stuff, int):
stuff = get_stuff(stuff)
...
It basically depends on how you've defined all these functions. If you're importing get_stuff from another module the second approach is more Pythonic, because from an OOP perspective you create functions for doing one particular purpose and in this case when you've already defined the get_stuff you don't need to call it within another function.
If get_stuff it's not defined in another module, then it depends on whether you are using classes or not. If you're using a class and you want to use all these modules together you can use a method for either accessing or connecting to the data base and use that method within other methods like foo.
Example:
from some module import get_stuff
MyClass:
def __init__(self, *args, **kwargs):
# ...
self.stuff_id = kwargs['stuff_id']
def foo(self):
stuff = get_stuff(self.stuff_id)
# do stuff
Or if the functionality of foo depends on the existence of stuff you can have a global stuff and simply check for its validation :
MyClass:
def __init__(self, *args, **kwargs):
# ...
_stuff_id = kwargs['stuff_id']
self.stuff = get_stuff(_stuff_id) # can return None
def foo(self):
if self.stuff:
# do stuff
else:
# do other stuff
Or another neat design pattern for such situations might be using a dispatcher function (or method in class) that delegates the execution to different functions based on the state of stuff.
def delegator(stff, stuff_id):
if stuff: # or other condition
foo(stuff)
else:
get_stuff(stuff_id)

python class: inner functions; acces via nested dots

Below is a simple class I made. I would like to access the inner function like
obj = TestClass().TestClass(data)
obj.preprocess.gradient()
It is clear that that such a call would not work because preprocess is a function. How can I achieve what I want (I hope it is clear to you)
EDIT: This is a simplified case. I hope that other users who are not into machine learning find it easier to apply the proper function in the correct order (first the preprocessing, then e.g. clustering, afterwards plotting). I just removed the outer functions (preprocessing etc.) it works fine. Still I wonder if such an approach might be reasonable.
import numpy as np
from sklearn.preprocessing import StandardScaler
class TestClass:
def __init__(self, data):
self.data = data
self._preprocessed = data
# should not be a function but rather a "chapter" which
# separates preprocessing from analysis method
def preprocessing(self):
def gradient(self):
self._preprocessed = np.gradient(self._preprocessed, 2)[1]
def normalize(self):
self._preprocessed = StandardScaler().fit_transform(self._preprocessed)
def cluster_analysis(self):
def pca(self):
pass
A first approach (probably better than the second one that follows) would be to return a class instance possessing those two methods, thus favoring composition.
Otherwise, what about returning a typed dict, as follows
#...
#property
def preprocessing(self):
def gradient(self):
self._preprocessed = np.gradient(self._preprocessed, 2)[1]
def normalize(self):
self._preprocessed = StandardScaler().fit_transform(self._preprocessed)
callables_dict = {
'gradient':gradient,
'normalize':normalize,
}
return type('submethods', (object,), callables_dict)
#...
Then you can call your submethod as (I guess) you want, doing
>>> TestClass(data).preprocessing.gradient
<unbound method submethods.gradient>
or
>>> TestClass(data).preprocessing.normalize
<unbound method submethods.normalize>
Depending on what you want to do, it may be a good idea to cache preprocessing so as to not redefine its inner functions each time it is called.
But as already said, this is probably not the best way to go.

Python: mocks in unittests

I have situation similar to:
class BaseClient(object):
def __init__(self, api_key):
self.api_key = api_key
# Doing some staff.
class ConcreteClient(BaseClient):
def get_some_basic_data(self):
# Doing something.
def calculate(self):
# some staff here
self.get_some_basic_data(param)
# some calculations
Then I want to test calculate function using mocking of get_some_basic_data function.
I'm doing something like this:
import unittest
from my_module import ConcreteClient
def my_fake_data(param):
return [{"key1": "val1"}, {"key2": "val2"}]
class ConcreteClientTest(unittest.TestCase):
def setUp(self):
self.client = Mock(ConcreteClient)
def test_calculate(self):
patch.object(ConcreteClient, 'get_some_basic_data',
return_value=my_fake_data).start()
result = self.client.calculate(42)
But it doesn't work as I expect.. As I thought, self.get_some_basic_data(param) returns my list from my_fake_data function, but it looks like it's still an Mock object, which is not expected for me.
What is wrong here?
There are two main problems that you are facing here. The primary issue that is raising the current problem you are experiencing is because of how you are actually mocking. Now, since you are actually patching the object for ConcreteClient, you want to make sure that you are still using the real ConcreteClient but mocking the attributes of the instance that you want to mock when testing. You can actually see this illustration in the documentation. Unfortunately there is no explicit anchor for the exact line, but if you follow this link:
https://docs.python.org/3/library/unittest.mock-examples.html
The section that states:
Where you use patch() to create a mock for you, you can get a
reference to the mock using the “as” form of the with statement:
The code in reference is:
class ProductionClass:
def method(self):
pass
with patch.object(ProductionClass, 'method') as mock_method:
mock_method.return_value = None
real = ProductionClass()
real.method(1, 2, 3)
mock_method.assert_called_with(1, 2, 3)
The critical item to notice here is how the everything is being called. Notice that the real instance of the class is created. In your example, when you are doing this:
self.client = Mock(ConcreteClient)
You are creating a Mock object that is specced on ConcreteClient. So, ultimately this is just a Mock object that holds the attributes for your ConcreteClient. You will not actually be holding the real instance of ConcreteClient.
To solve this problem. simply create a real instance after you patch your object. Also, to make your life easier so you don't have to manually start/stop your patch.object, use the context manager, it will save you a lot of hassle.
Finally, your second problem, is your return_value. Your return_value is actually returning the uncalled my_fake_data function. You actually want the data itself, so it needs to be the return of that function. You could just put the data itself as your return_value.
With these two corrections in mind, your test should now just look like this:
class ConcreteClientTest(unittest.TestCase):
def test_calculate(self):
with patch.object(ConcreteClient, 'get_some_basic_data',
return_value=[{"key1": "val1"}, {"key2": "val2"}]):
concrete_client = ConcreteClient(Mock())
result = concrete_client.calculate()
self.assertEqual(
result,
[{"key1": "val1"}, {"key2": "val2"}]
)
I took the liberty of actually returning the result of get_some_basic_data in calculate just to have something to compare to. I'm not sure what your real code looks like. But, ultimately, the structure of your test in how you should be doing this, is illustrated above.

Issue when I try to use serialized objects in python: when I get them back they lost their type

I'm working on a computing program where after each step of this program, I serialize data to be able to do a back up in case of crash.
In the last step I have to write reports through xlsxwriter.
So, when I un-serialize data (a list of beam) I can't use methods directly.
Here is the class used to serialize and un-serialize data:
import pickle
class Serializer(object):
#staticmethod
def serialize(obj):
return pickle.dumps(obj,protocol=pickle.HIGHEST_PROTOCOL)
#staticmethod
def unserialize(serializedDatas):
return pickle.loads (serializedDatas)
#staticmethod
def serializeToFile(obj,file):
return pickle.dump(obj,file,protocol=pickle.HIGHEST_PROTOCOL)
#staticmethod
def unserializeFile(file):
return pickle.load(file)
Here is my beam class:
from TypeControl import TypeControl
from Point import Point
from Section import Section
import copy
import numpy as np
class Beam(TypeControl):
def __init__(self,number,Name,Section,pt1,pt2):
super(self.__class__,self).__init__([],Name)
self.get().append(TypeControl(pt1,pt1.getName()))
self.get().append(TypeControl(pt2,pt2.getName()))
self.get().append(TypeControl(copy.deepcopy(Section),Section.getName()))
self.get().append(TypeControl(number,"Number"))
def getPointA (self):
return self.get()[0].get()
def getPointB (self):
return self.get()[1].get()
def getSection (self):
return self.get()[2].get()
def getNumber (self):
return self.get()[3].get()
def getLength(self):
vect=self.getPointA()-self.getPointB()
return np.sqrt(vect.getx()*vect.getx()+vect.gety()*vect.gety()+vect.getz()*vect.getz())
TypeControl is a class used to ensure the type of a data will not change during the execution of the program.
Now let's talk about my problem:
-the problem occur randomly
-when it works b.getName() return a name, when it don't works is executed as "pass" instruction, it don't raise any errors
-I've used the debugger to get more informations:
b is supposed to be my Beam object
(Pdb) b.getName()
*** The specified object '.getName()' is not a function or was not found along sys.path.
I expected a name ,getName() is a method of TypeControl Class, it should work.
(Pdb) Beam.getName(b)
'ExtColumnHP2_M2'
You might think that I try to use the wrong object due to a confusion. And it may work due to inheritance or something else.
Look at the following:
(Pdb) b
it returns nothing in the shell, I expected something like :
class 'Beam.Beam'
The question is what is the type of b?
(Pdb) type(b)
class 'Beam.Beam'
I would like it to return the good type:
(Pdb) b.dict
*** The specified object '.dict' is not a function or was not found along sys.path.
In c++, to solve my problem I would have tried: b=(Beam)b
Does someone know why I lost methods linked to this object?
Someone can give me a solution to use b as a common object?
Why before the serialization everything was working well?
The obvious solution is to do something like that: b=Beam(b),but that does not seem very smart.

Redefine a python function based on another class function based on type

I'm more of an engineer and less of a coder, but I know enough python and C++ to be dangerous.
I'm creating a python vector/matrix class as a helper class based upon numpy as well as cvxopt. The overall goal (which I've already obtained... the answer to this question will just make the class better) is to make dot products and other processes more unified and easier for numerical methods.
However, I'd like to make my helper class even more transparent. What I'd like to do is to redefine the cvxopt.matrix() init function based upon the current variable which was used. This is to say, if I have a custom matrix: "cstmat", I'd like the function "cvxopt.matrix(cstmat)" to be defined by my own methods instead of what is written in the cvxopt class.
In short, I'd like to "intercept" the other function call and use my own function.
The kicker, though, is that I don't want to take over cvxopt.matrix(any_other_type). I just want to redefine the function when it's called upon my own custom class. Is this possible?
Thanks,
Jon
You can do this, but it's not pretty.
You can do probably something along these lines:
cvxopt._orig_matrix = cvxopt.matrix
def my_matrix(*args, **kwargs):
if isinstance(arg[0], cstmat):
# do your stuff here
else:
cvxopt._orig_matrix(*args, **kwargs)
cvxopt.matrix = my_matrix
But you're probably better off finding a less weird way. And no guarantees that won't forget who "self" is.
Better would be to use inheritance! Kinda like this:
class Cstmat(cvsopt.matrix):
def __init__(self, ...):
pass
def matrix(self, arg):
if isinstance(arg, cstmat):
# do your stuff here
else:
cvsopt.matrix(arg)

Categories

Resources