Below is a simple class I made. I would like to access the inner function like
obj = TestClass().TestClass(data)
obj.preprocess.gradient()
It is clear that that such a call would not work because preprocess is a function. How can I achieve what I want (I hope it is clear to you)
EDIT: This is a simplified case. I hope that other users who are not into machine learning find it easier to apply the proper function in the correct order (first the preprocessing, then e.g. clustering, afterwards plotting). I just removed the outer functions (preprocessing etc.) it works fine. Still I wonder if such an approach might be reasonable.
import numpy as np
from sklearn.preprocessing import StandardScaler
class TestClass:
def __init__(self, data):
self.data = data
self._preprocessed = data
# should not be a function but rather a "chapter" which
# separates preprocessing from analysis method
def preprocessing(self):
def gradient(self):
self._preprocessed = np.gradient(self._preprocessed, 2)[1]
def normalize(self):
self._preprocessed = StandardScaler().fit_transform(self._preprocessed)
def cluster_analysis(self):
def pca(self):
pass
A first approach (probably better than the second one that follows) would be to return a class instance possessing those two methods, thus favoring composition.
Otherwise, what about returning a typed dict, as follows
#...
#property
def preprocessing(self):
def gradient(self):
self._preprocessed = np.gradient(self._preprocessed, 2)[1]
def normalize(self):
self._preprocessed = StandardScaler().fit_transform(self._preprocessed)
callables_dict = {
'gradient':gradient,
'normalize':normalize,
}
return type('submethods', (object,), callables_dict)
#...
Then you can call your submethod as (I guess) you want, doing
>>> TestClass(data).preprocessing.gradient
<unbound method submethods.gradient>
or
>>> TestClass(data).preprocessing.normalize
<unbound method submethods.normalize>
Depending on what you want to do, it may be a good idea to cache preprocessing so as to not redefine its inner functions each time it is called.
But as already said, this is probably not the best way to go.
Related
So I created a python library for computing error metrics between time series (here). When I was first creating the library, I was a beginner programmer with pretty much zero previous experience, so for every error metric, I just wrote it as a function. Today, I was thinking it might be nice if each error metric was represented as a class, so a user could do something like the following.
# Name of the package
import HydroErr as he
he.r_squared.description # Would return out a brief metric description
I would want to keep the old API syntax intact, or it would break all legacy code. It would have to look something like this when simulated and observed data was passed in.
import HydroErr as he
import numpy as np
he.r_squared(np.array([1, 2, 3]), np.array([1.1, 1.21, 1.3]))
# Out: 0.9966777408637874
I'm not really sure how to do this, and more importantly if I should do this. Any help would be appreciated.
To turn a function in a class you can use the __call__ method :
def function(param):
pass
# Becomes
class MyClass:
def __call__(self, param):
pass
def other_method(self):
pass
function = MyClass()
Both can be used like this : function(42)
You don't have to turn the functions into classes for this to work:
def r_squared(x, y):
""" Do things... """
return 56
r_squared.description = r_squared.__doc__
You can write a decorator if there's many functions like that:
def add_description(fn):
fn.description = fn.__doc__
#add_description
def r_squared(x, y):
""" Do things... """
return 56
I have a function foo that takes a parameter stuff
Stuff can be something in a database and I'd like to create a function that takes a stuff_id, get the stuff from the db, execute foo.
Here's my attempt to solve it:
1/ Create a second function with suffix from_stuff_id
def foo(stuff):
do something
def foo_from_stuff_id(stuff_id):
stuff = get_stuff(stuff_id)
foo(stuff)
2/ Modify the first function
def foo(stuff=None, stuff_id=None):
if stuff_id:
stuff = get_stuff(stuff_id)
do something
I don't like both ways.
What's the most pythonic way to do it ?
Assuming foo is the main component of your application, your first way. Each function should have a different purpose. The moment you combine multiple purposes into a single function, you can easily get lost in long streams of code.
If, however, some other function can also provide stuff, then go with the second.
The only thing I would add is make sure you add docstrings (PEP-257) to each function to explain in words the role of the function. If necessary, you can also add comments to your code.
I'm not a big fan of type overloading in Python, but this is one of the cases where I might go for it if there's really a need:
def foo(stuff):
if isinstance(stuff, int):
stuff = get_stuff(stuff)
...
With type annotations it would look like this:
def foo(stuff: Union[int, Stuff]):
if isinstance(stuff, int):
stuff = get_stuff(stuff)
...
It basically depends on how you've defined all these functions. If you're importing get_stuff from another module the second approach is more Pythonic, because from an OOP perspective you create functions for doing one particular purpose and in this case when you've already defined the get_stuff you don't need to call it within another function.
If get_stuff it's not defined in another module, then it depends on whether you are using classes or not. If you're using a class and you want to use all these modules together you can use a method for either accessing or connecting to the data base and use that method within other methods like foo.
Example:
from some module import get_stuff
MyClass:
def __init__(self, *args, **kwargs):
# ...
self.stuff_id = kwargs['stuff_id']
def foo(self):
stuff = get_stuff(self.stuff_id)
# do stuff
Or if the functionality of foo depends on the existence of stuff you can have a global stuff and simply check for its validation :
MyClass:
def __init__(self, *args, **kwargs):
# ...
_stuff_id = kwargs['stuff_id']
self.stuff = get_stuff(_stuff_id) # can return None
def foo(self):
if self.stuff:
# do stuff
else:
# do other stuff
Or another neat design pattern for such situations might be using a dispatcher function (or method in class) that delegates the execution to different functions based on the state of stuff.
def delegator(stff, stuff_id):
if stuff: # or other condition
foo(stuff)
else:
get_stuff(stuff_id)
I have a class that looks something like the following:
# Class violates the Single Responsibility Principle
class Baz:
data = [42]
def do_foo_to_data(self):
# call a dozen functions that do complicated stuff to data
def do_bar_to_data(self):
# call other functions that do different stuff to data
I want to break it into two separate classes because it violates the SRP. The functions called by do_foo_to_data() are completely distinct from those called by do_bar_to_data(). Yet they must operate on the same data.
I've come up with a bunch of solutions, but they're all ugly. Is there a way to do this cleanly, preferably in Python 3 (though 2.7 is OK too)?
The best of my "solutions" is below:
# I find this hard to read and understand
class Baz:
data = [42]
def create_foo(self):
return Baz.Foo()
def create_bar(self):
return Baz.Bar()
class Foo:
def do_foo_to_data(self):
# call foo functions
class Bar:
def do_bar_to_data(self):
# call bar functions
Note: It's not essential to me that the data member be a class member.
I only expect to create one instance of Baz; but I didn't want to ask two questions in one post and start a discussion about singletons.
This is not an elegant solution. You better pass a reference to the object you want them to operate on. So something like:
class Foo:
def __init__(self,data):
self.data = data
def do_foo_to_data(self):
#...
self.data[0] = 14
pass
class Bar:
def __init__(self,data):
self.data = data
def do_bar_to_data(self):
#...
self.data.append(15)
pass
(I added sample manipulations like self.data[0] = 14 and self.data.append(15))
And now you construct the data. For instance:
data = [42]
Next you construct a Foo and a Bar and pass a reference to data like:
foo = Foo(data)
bar = Bar(data)
__init__ is what most programming languages call the constructor and as you have seen in the first fragment, it requires an additional parameter data (in this case it is a reference to our constructed data).
and then you can for instance call:
foo.do_foo_to_data()
which will set data to [14] and
bar.do_bar_to_data()
which will result in data being equal to [14,15].
Mind that you cannot state self.data = ['a','new','list'] or something equivalent in do_foo_to_data or do_bar_to_data because this would change the reference to a new object. Instead you could for instance .clear() the list, and append new elements to it like:
def do_foo_to_data(self): #alternative version
#...
self.data.clear()
self.data.append('a')
self.data.append('new')
self.data.append('list')
Finally to answer your remark:
preferably in Python 3 (though 2.7 is OK too)?
The technique demonstrated is almost universal (meaning it is available in nearly every programming language). So this will work in both python-3.x and python-2.7.
Why do you even need a class for that? All you want is two separated functions which do some job on some data.
data = [42]
def foo(data):
data.append('sample operation foo')
def bar(data):
data.append('sample operation bar')
Problem solved.
You can pull out the distinct groups of functionality to separate mix-in classes:
class Foo:
"""Mixin class.
Requires self.data (must be provided by classes extending this class).
"""
def do_foo_to_data(self):
# call a dozen functions that do complicated stuff to data
class Bar:
"""Mixin class.
Requires self.data (must be provided by classes extending this class).
"""
def do_bar_to_data(self):
# call other functions that do different stuff to data
class Baz(Foo, Baz):
data = [42]
This relies on Python's duck-typing behavior. You should only apply the Foo and Bar mix-ins to classes that actually provide self.data, like the Baz class here does.
This might be suitable where certain classes are by convention required to provide certain attributes anyway, such as customized view classes in Django. However, when such conventions aren't already in place, you might not want to introduce new ones. It's too easy to miss the documentation and then have NameErrors at runtime. So let's make the dependency explicit, rather than only documenting it. How? With a mix-in for the mix-ins!
class Data:
"""Mixin class"""
data = [42]
class Foo(Data):
"""Mixin class"""
def do_foo_to_data(self):
# call a dozen functions that do complicated stuff to data
class Bar(Data):
"""Mixin class"""
def do_bar_to_data(self):
# call other functions that do different stuff to data
class Baz(Foo, Baz):
pass
Whether this is appropriate for your use-case is difficult to say at this level of abstraction. As RayLuo's answer shows, you might not need classes at all. Instead, you could put the different groups of functions into different modules or packages, to organize them.
That is a kind of best practices question.
I have a class structure with some methods defined. In some cases I want to override a particular part of a method. First thought on that is splitting my method to more atomic pieces and override related parts like below.
class myTest(object):
def __init__(self):
pass
def myfunc(self):
self._do_atomic_job()
...
...
def _do_atomic_job(self):
print "Hello"
That is a practical-looking way to solve the problem. But since I have too many parameters that is needed to be transferred to and revieced back from _do_atomic_job(), I do not want to pass and retrieve tons of parameters. Other option is setting these parameters as class variables with self.param_var etc but those parameters are used in a small part of the code and using self is not my preferred way of solving this.
Last option I thought is using inner functions. (I know I will have problems in variable scopes but as I said, this is a best practise and just ignore them and think scope and all things about the inner functions are working as expected)
class MyTest2(object):
mytext = ""
def myfunc(self):
def _do_atomic_job():
mytext = "Hello"
_do_atomic_job()
print mytext
Lets assume that works as expected. What I want to do is overriding the inner function _do_atomic_job()
class MyTest3(MyTest2):
def __init__(self):
super(MyTest3, self).__init__()
self.myfunc._do_atomic_job = self._alt_do_atomic_job # Of course this do not work!
def _alt_do_atomic_job(self):
mytext = "Hollla!"
Do what I want to achieve is overriding inherited class' method's inner function _do_atomic_job
Is it possible?
Either factoring _do_atomic_job() into a proper method, or maybe factoring it
into its own class seem like the best approach to take. Overriding an inner
function can't work, because you won't have access to the local variable of the
containing method.
You say that _do_atomic_job() takes a lot of parameters returns lots of values. Maybe you group some of these parameters into reasonable objects:
_do_atomic_job(start_x, start_y, end_x, end_y) # Separate coordinates
_do_atomic_job(start, end) # Better: start/end points
_do_atomic_job(rect) # Even better: rectangle
If you can't do that, and _do_atomic_job() is reasonably self-contained,
you could create helper classes AtomicJobParams and AtomicJobResult.
An example using namedtuples instead of classes:
AtomicJobParams = namedtuple('AtomicJobParams', ['a', 'b', 'c', 'd'])
jobparams = AtomicJobParams(a, b, c, d)
_do_atomic_job(jobparams) # Returns AtomicJobResult
Finally, if the atomic job is self-contained, you can even factor it into its
own class AtomicJob.
class AtomicJob:
def __init__(self, a, b, c, d):
self.a = a
self.b = b
self.c = c
self.d = d
self._do_atomic_job()
def _do_atomic_job(self):
...
self.result_1 = 42
self.result_2 = 23
self.result_3 = 443
Overall, this seems more like a code factorization problem. Aim for rather lean
classes that delegate work to helpers where appropriate. Follow the single responsibility principle. If values belong together, bundle them up in a value class.
As David Miller (a prominent Linux kernel developer) recently said:
If you write interfaces with more than 4 or 5 function arguments, it's
possible that you and I cannot be friends.
Inner variables are related to where they are defined and not where they are executed. This prints "hello".
class MyTest2(object):
def __init__(self):
localvariable = "hello"
def do_atomic_job():
print localvariable
self.do_atomic_job = do_atomic_job
def myfunc(self):
localvariable = "hollla!"
self.do_atomic_job()
MyTest2().myfunc()
So I can't see any way you could use the local variables without passing them, which is probably the best way to do it.
Note: Passing locals() will get you a dict of the variables, this is considered quite bad style though.
I had a class for my project and wants to some other methods be able accept it as parameter. What I'm trying to do is to pack the data (numpy.ndarray object) and some important info on the data into one object, so I don't have to store them separately and cause possible confusion in the future. However I also want this class to behave like a numpy.ndarray as much as possible. I overrode the operators such as +, -, etc. But for functions other than the operators (e.g. min(), matplotlib.pylab.plot(), etc) there doesn't seem to be a good way to do so.
My class is:
import numpy
class MyCls(object):
def __init__(self):
# Pack data and info into one place
self.data = numpy.ones((10, 10))
self.info = 'some info on the data'
I'd like to make it acceptable by some other functions. E.g.
mycls = MyCls()
x = min(mycls) # I was hoping this could return min(mycls.data)
or
from matplotlib import pylab
pylab.plot(x) # I was hoping this could plot x.data
I tried to override __getitem__() of MyCls:
def __getitem__(self, k):
return self.data[k]
It works for min(). I guess it's because in min(), it calls __getitem__() and each element in self.data is returned. But it won't work for another function which doesn't calls __getitem__(), like pylab.plot().
I was wondering is it possible that for a general use of x, it would use x.data, instead of x itself? Or is there another good way to do the same thing?