Have my own class behave like numpy.ndarray - python

I had a class for my project and wants to some other methods be able accept it as parameter. What I'm trying to do is to pack the data (numpy.ndarray object) and some important info on the data into one object, so I don't have to store them separately and cause possible confusion in the future. However I also want this class to behave like a numpy.ndarray as much as possible. I overrode the operators such as +, -, etc. But for functions other than the operators (e.g. min(), matplotlib.pylab.plot(), etc) there doesn't seem to be a good way to do so.
My class is:
import numpy
class MyCls(object):
def __init__(self):
# Pack data and info into one place
self.data = numpy.ones((10, 10))
self.info = 'some info on the data'
I'd like to make it acceptable by some other functions. E.g.
mycls = MyCls()
x = min(mycls) # I was hoping this could return min(mycls.data)
or
from matplotlib import pylab
pylab.plot(x) # I was hoping this could plot x.data
I tried to override __getitem__() of MyCls:
def __getitem__(self, k):
return self.data[k]
It works for min(). I guess it's because in min(), it calls __getitem__() and each element in self.data is returned. But it won't work for another function which doesn't calls __getitem__(), like pylab.plot().
I was wondering is it possible that for a general use of x, it would use x.data, instead of x itself? Or is there another good way to do the same thing?

Related

How to set default behaviors of magic methods in python?

Suppose I want to have a wrapper class Image for numpy array. My goal is to let it behave just like a 2D array but with some additional functionality (which is not important here). I am doing so because inheriting numpy array is way more troublesome.
import numpy as np
class Image(object):
def __init__(self, data: np.ndarray):
self._data = np.array(data)
def __getitem__(self, item):
return self._data.__getitem__(item)
def __setitem__(self, key, value):
self._data.__setitem__(key, value)
def __getattr__(self, item):
# delegates array's attributes and methods, except dunders.
try:
return getattr(self._data, item)
except AttributeError:
raise AttributeError()
# binary operations
def __add__(self, other):
return Image(self._data.__add__(other))
def __sub__(self, other):
return Image(self._data.__sub__(other))
# many more follow ... How to avoid this redundancy?
As you can see, I want to have all the magic methods for numeric operations, just like a normal numpy array, but with the return values as Image type. So the implementations of these magic methods, i.e. the __add__, __sub__, __truediv__ and so on, are almost the same and it's kind of silly. My question is if there is a way to avoid this redundancy?
Beyond what specifically I am doing here, is there a way to code up the magic methods in one place via some meta-programming technique, or it's just impossible? I searched some about python metaclass, but it's still not clear to me.
Notice __getattr__ won't handle delegates for magic methods. See this.
Edit
Just to clarify, I understand inheritance is a general solution for a problem like this, though my experience is very limited. But I feel inheriting numpy array really isn't a good idea. Because numpy array needs to handle view-casting and ufuncs (see this). And when you use your subclass in other py-libs, you also need to think how your array subclass gets along with other array subclasses. See my stupid gh-issue. That's why I am looking for alternatives.
The magic methods are always looked up in the class and bypass getattribute entirely so you must define them in the class. https://docs.python.org/3/reference/datamodel.html#special-lookup
However, you can save yourself some typing:
import operator
def make_bin_op(oper):
def op(self, other):
if isinstance(other, Image):
return Image(oper(self._data, other._data))
else:
return Image(oper(self._data, other))
return op
class Image:
...
__add__ = make_bin_op(operator.add)
__sub__ = make_bin_op(operator.sub)
If you want you could make a dict of operator dunder names and the corresponding operators and add them with a decorator. e.g.
OPER_DICT = {'__add__' : operator.add, '__sub__' : operator.sub, ...}
def add_operators(cls):
for k,v in OPER_DICT.items():
setattr(cls, k, make_bin_op(v))
#add_operators
class Image:
...
You could use a metaclass to do the same thing. However, you probably don't want use a metaclass unless you really understand what's going on.
What you're after is the concept called inheritance, a key part of object-oriented programming (see Wikipedia here.
When you define your class with class Image(object):, what that means is that Image is a subclass of object, which is a built-in type that does very little. Your functionality is added on to that more-or-less blank concept. But if instead you defined your class with class Image(np.array):, then Image would be a subclass of array, which means it would inherit all the default functionality of the array class. Essentially, any class method you want to leave as is you simply shouldn't redefine. If you don't write a __getitem__ function, it uses the one defined in array.
If you need to add additional functionality in any of those functions, you can still redefine them (called overriding) and then use super().__getitem__ (or whatever) to access the function defined in the inherited class. This often happens with __init__ for example.
For a more thorough explanation, take a look at the chapter on inheritance in Think Python.

How to avoid parameter type in function's name?

I have a function foo that takes a parameter stuff
Stuff can be something in a database and I'd like to create a function that takes a stuff_id, get the stuff from the db, execute foo.
Here's my attempt to solve it:
1/ Create a second function with suffix from_stuff_id
def foo(stuff):
do something
def foo_from_stuff_id(stuff_id):
stuff = get_stuff(stuff_id)
foo(stuff)
2/ Modify the first function
def foo(stuff=None, stuff_id=None):
if stuff_id:
stuff = get_stuff(stuff_id)
do something
I don't like both ways.
What's the most pythonic way to do it ?
Assuming foo is the main component of your application, your first way. Each function should have a different purpose. The moment you combine multiple purposes into a single function, you can easily get lost in long streams of code.
If, however, some other function can also provide stuff, then go with the second.
The only thing I would add is make sure you add docstrings (PEP-257) to each function to explain in words the role of the function. If necessary, you can also add comments to your code.
I'm not a big fan of type overloading in Python, but this is one of the cases where I might go for it if there's really a need:
def foo(stuff):
if isinstance(stuff, int):
stuff = get_stuff(stuff)
...
With type annotations it would look like this:
def foo(stuff: Union[int, Stuff]):
if isinstance(stuff, int):
stuff = get_stuff(stuff)
...
It basically depends on how you've defined all these functions. If you're importing get_stuff from another module the second approach is more Pythonic, because from an OOP perspective you create functions for doing one particular purpose and in this case when you've already defined the get_stuff you don't need to call it within another function.
If get_stuff it's not defined in another module, then it depends on whether you are using classes or not. If you're using a class and you want to use all these modules together you can use a method for either accessing or connecting to the data base and use that method within other methods like foo.
Example:
from some module import get_stuff
MyClass:
def __init__(self, *args, **kwargs):
# ...
self.stuff_id = kwargs['stuff_id']
def foo(self):
stuff = get_stuff(self.stuff_id)
# do stuff
Or if the functionality of foo depends on the existence of stuff you can have a global stuff and simply check for its validation :
MyClass:
def __init__(self, *args, **kwargs):
# ...
_stuff_id = kwargs['stuff_id']
self.stuff = get_stuff(_stuff_id) # can return None
def foo(self):
if self.stuff:
# do stuff
else:
# do other stuff
Or another neat design pattern for such situations might be using a dispatcher function (or method in class) that delegates the execution to different functions based on the state of stuff.
def delegator(stff, stuff_id):
if stuff: # or other condition
foo(stuff)
else:
get_stuff(stuff_id)

python class: inner functions; acces via nested dots

Below is a simple class I made. I would like to access the inner function like
obj = TestClass().TestClass(data)
obj.preprocess.gradient()
It is clear that that such a call would not work because preprocess is a function. How can I achieve what I want (I hope it is clear to you)
EDIT: This is a simplified case. I hope that other users who are not into machine learning find it easier to apply the proper function in the correct order (first the preprocessing, then e.g. clustering, afterwards plotting). I just removed the outer functions (preprocessing etc.) it works fine. Still I wonder if such an approach might be reasonable.
import numpy as np
from sklearn.preprocessing import StandardScaler
class TestClass:
def __init__(self, data):
self.data = data
self._preprocessed = data
# should not be a function but rather a "chapter" which
# separates preprocessing from analysis method
def preprocessing(self):
def gradient(self):
self._preprocessed = np.gradient(self._preprocessed, 2)[1]
def normalize(self):
self._preprocessed = StandardScaler().fit_transform(self._preprocessed)
def cluster_analysis(self):
def pca(self):
pass
A first approach (probably better than the second one that follows) would be to return a class instance possessing those two methods, thus favoring composition.
Otherwise, what about returning a typed dict, as follows
#...
#property
def preprocessing(self):
def gradient(self):
self._preprocessed = np.gradient(self._preprocessed, 2)[1]
def normalize(self):
self._preprocessed = StandardScaler().fit_transform(self._preprocessed)
callables_dict = {
'gradient':gradient,
'normalize':normalize,
}
return type('submethods', (object,), callables_dict)
#...
Then you can call your submethod as (I guess) you want, doing
>>> TestClass(data).preprocessing.gradient
<unbound method submethods.gradient>
or
>>> TestClass(data).preprocessing.normalize
<unbound method submethods.normalize>
Depending on what you want to do, it may be a good idea to cache preprocessing so as to not redefine its inner functions each time it is called.
But as already said, this is probably not the best way to go.

Assign attribute to function

This is probably very much a beginner question, but I have a question about attributes.
I have a module responsible for Google Docs API actions which contains functions for retrieving information. I would like to be able to refer to certain variables from these functions as attributes.
Here is an example:
Gdocs.py
def getRows():
rows = #action for getting rows
rowsText = #action for converting to text
General.py
import Gdocs
text = Gdocs.getRows.rowstext
I know the basic effect of passing variables can be achieved by just returning the values, but I would like to refer to them as attributes if possible. Simply put, my question is, how can you create an attribute of a function that I can reference in another .py document?
Thanks and sorry if it has been already answered, I did try to search but kept running nto very specific problems.
It sounds as if you want to return a result consisting of multiple parts. Don't use function attributes for this, return a new object that can be addressed via attributes instead. That'd make it thread-safe as well, as function attributes live in the same scope as the function object itself: as a global.
The standard library has a helpful class factory function for just such return values, collections.namedtuple():
from collections import namedtuple
Point = namedtuple('Point', 'x y')
def calculate_coordinates(foo, bar, baz):
return Point(42, 81)
The return value is a tuple subclass, so it can be addressed like a tuple (with indexing), it can be unpacked into separate values, or you can use attributes:
result = calculate_coordinates(spam, ham, eggs)
print result.x, result.y
or
res_x, res_y = calculate_coordinates(spam, ham, eggs)
all work.
While I understand what you said about not wanting a class for each function...
When you have a class, you can apply the #property decorator to functions.
This has the effect of allowing you to effectively create functions that exhibit behaviors but can be called just like attributes. In the following, if you wanted to produce a list of squares based on the input list, you could create a function with a verb-like name called create_list_of_squares(). But in this case, if you really want to keep the API simple and abstract away the mechanics behind the method and simply enable users to access the attributes 'squares', you can use a property decorator, like this...
class SquareList(list):
#property
def squares(self):
return [x ** 2 for x in self]
s = SquareList([1, 2, 3, 4])
print s.squares
which will yield:
[1, 4, 9, 16]
It's a little weird, but you can use staticmethod and classes to get what you want. To wit:
source: zattr2.py
class getRows(object):
#staticmethod
def rows(arg):
return [arg, arg]
#staticmethod
def rowsText(arg):
return repr(arg)
usage:
>>> import zattr2
>>> zattr2.getRows.rowsText('beer')
"'beer'"
See: https://docs.python.org/2/library/functions.html#staticmethod

python class keyword arguments

I'm writing a class for something and I keep stumbling across the same tiresome to type out construction. Is there some simple way I can set up class so that all the parameters in the constructor get initialized as their own name, i.e. fish = 0 -> self.fish = fish?
class Example(object):
def __init__(self, fish=0, birds=0, sheep=0):
self.fish = fish
self.birds = birds
self.sheep = sheep
Short answer: no. You are not required to initialize everything in the constructor (you could do it lazily), unless you need it immediately or expose it (meaning that you don't control access). But, since in Python you don't declare data fields, it will become difficult, much difficult, to track them all if they appear in different parts of the code.
More comprehensive answer: you could do some magic with **kwargs (which holds a dictionary of argument name/value pairs), but that is highly discouraged, because it makes documenting the changes almost impossible and difficult for users to check if a certain argument is accepted or not. Use it only for optional, internal flags. It could be useful when having 20 or more parameters to pass, but in that case I would suggest to rethink the design and cluster data.
In case you need a simple key/value storage, consider using a builtin, such as dict.
You could use the inspect module:
import inspect
class Example(object):
def __init__(self, fish=0, birds=0, sheep=0):
frame = inspect.currentframe()
args, _, _, values = inspect.getargvalues(frame)
for i in args:
setattr(self, i, values[i])
This works, but is more complicated that just setting them manually. It should be possible to hide this with a decorator:
#set_attributes
def __init__(self, fish=0, birds=0, sheep=0):
pass
but defining set_attributes gets tricky because the decorator inserts another stack frame into the mix, and I can't quite get the details right.
For Python 3.7+, you can try using data classes in combination with type annotations.
https://docs.python.org/3/library/dataclasses.html
Import the module and use the decorator. Type-annotate your variables and there's no need to define an init method, because it will automatically be created for you.
from dataclasses import dataclass
#dataclass
class Example:
fish: int = 0
birds: int = 0
sheep: int = 0

Categories

Resources