I am creating a data provider class that will hold data, perform transformations and make it available to other classes.
If the user creates and instance of this class and passes some data at instantiation, I would like to store it twice: once for all transformations and once as a copy of the original data. Let's assume the data itself has a copy method.
I am using the attrs package to create classes, but would also be interested in best approaches to this in general (perhaps there is a better way of getting what I am after?)
Here is what I have so far:
#attr.s
class DataContainer(object):
"""Interface for managing data. Reads and write data, acts as a provider to other classes.
"""
data = attr.ib(default=attr.Factory(list))
data_copy = data.copy()
def my_func(self, param1='all'):
"""Do something useful"""
return param1
This doesn't work: AttributeError: '_CountingAttr' object has no attribute 'copy'
I also cannot call data_copy = self.data.copy(), I get the error: NameError: name 'self' is not defined.
The working equivalent without the attrs package would be:
class DataContainer(object):
"""Interface for managing data. Reads and write data, acts as a provider to other classes.
"""
def __init__(self, data):
"Init method, saving passed data and a backup copy"
self.data = data
self.data_copy = data
EDIT:
As pointed out by #hynek, my simple init method above needs to be corrected to make an actual copy of the data: i.e. self.data_copy = data.copy(). Otherwise both self.data and self.data_copy would point to the same object.
You can do two things here.
The first one you've found yourself: you use __attr_post_init__.
The second one is to have a default:
>>> import attr
>>> #attr.s
... class C:
... x = attr.ib()
... _x_backup = attr.ib()
... #_x_backup.default
... def _copy_x(self):
... return self.x.copy()
>>> l = [1, 2, 3]
>>> i = C(l)
>>> i
C(x=[1, 2, 3], _x_backup=[1, 2, 3])
>>> i.x.append(4)
>>> i
C(x=[1, 2, 3, 4], _x_backup=[1, 2, 3])
JFTR, you example of
def __init__(self, data):
self.data = data
self.data_copy = data
is wrong because you’d assign the same object twice which means that modifying self.data also modifies self.data_copy and vice versa.
After looking through the documentation a little more deeply (scroll right to the bottom), I found that there is a kind of post-init hook for classes that are created by attrs.
You can just include a special __attrs_post_init__ method that can do the more complicated things one might want to do in an __init__ method, beyond simple assignment.
Here is my final working code:
In [1]: #attr.s
...: class DataContainer(object):
...: """Interface for managing data. Reads and write data,
...: acts as a provider to other classes.
...: """
...:
...: data = attr.ib()
...:
...: def __attrs_post_init__(self):
...: """Perform additional init work on instantiation.
...: Make a copy of the raw input data.
...: """
...: self.data_copy = self.data.copy()
In [2]: some_data = np.array([[1, 2, 3], [4, 5, 6]])
In [3]: foo = DataContainer(some_data)
In [4]: foo.data
Out[5]:
array([[1, 2, 3],
[4, 5, 6]])
In [6]: foo.data_copy
Out[7]:
array([[1, 2, 3],
[4, 5, 6]])
Just to be doubly sure, I checked to see that the two attributes are not referencing the same object. In this case they are not, which is likely thanks to the copy method on the NumPy array.
In [8]: foo.data[0,0] = 999
In [9]: foo.data
Out[10]:
array([[999, 2, 3],
[ 4, 5, 6]])
In [11]: foo.data_copy
Out[12]:
array([[1, 2, 3],
[4, 5, 6]])
Related
Here is what I want to achieve:
I have a class MyData that holds some sort of data.
Now in class A I store a sequence of data with attribute name data_list and type List[MyData].
Suppose MyData instances are data with different type index. Class A is a management class. I need A to hold all the data to implement sampling uniformly from all data.
But some other operations that are type-specific also need to be done. So a base class B and derived class B1,B2... is designed to account for each type of data. An instance of class A have a list of B instances as member, each storing data points with one type. Code that illustrates this: B.data_list = A.data_list[start_index:start_index+offset].
A have methods that returns some of the data, and B have methods that may modify some of the data.
Now here is the problem: I need to pass data by reference, so that any modification by member function of B is also visible from the side of A.
If I use python builtin List to store data, modifications by B won't be visible for A. I did some experiment using np.array(data_list, dtype=object), it seemed to work. But I'm not familiar with such kind of usage and not sure if it works for data of any type, and whether there will be performance concerns, etc.
Any suggestions or alternatives? Thanks!!
Illustrating code:
class A:
def __init__(self, data_list, n_segment):
self.data_list = data_list
data_count = len(data_list)
segment_length=data_count // n_segment
self.segments = [self.data_list[segment_length*i:segment_length*(i+1)] for i in range(n_segment)]
self.Bs = [B(segment) for segment in self.segments]
def __getitem__(self, item):
return self.data_list[item]
class B:
def __init__(self, data_list):
self.data_list = data_list
def modify(self, index, data):
self.data_list[index]=data
A_data_list = [1,2,3,4,5,6,7,8,9]
A_instance = A(A_data_list, n_segment=3)
print(A_instance[0]) # get 1
A_instance.Bs[0].modify(0,2) # modify A[0] to be 2
print(A_instance[0]) # still get 1
Note that in the above example changing A_data_list to numpy array will solve my problem, but in my case elements in list are objects which cannot be stacked into numpy arrays.
In class A, the segments are all copies of portions of data_list. Thus, so are Bs items. When you try to modify values, A.Bs are modified, but not the corresponding elements in A.data_list.
With numpy, it is probable that you have memory views instead. So when a value is modified, it affects both A.Bs and A.data_list. It is still bad form though.
Here is how to fix your classes so that the proper values are modified:
class A:
def __init__(self, data_list, n_segment):
self.data_list = data_list
data_count = len(data_list)
segment_length = data_count // n_segment
r = range(0, (n_segment + 1) * segment_length, segment_length)
slices = [slice(i, j) for i, j in zip(r, r[1:])]
self.Bs = [B(self.data_list, slice_) for slice_ in slices]
def __getitem__(self, item):
return self.data_list[item]
class B:
def __init__(self, data_list, slice_):
self.data_list = data_list
self.data_slice = slice_
def modify(self, index, data):
a_ix = list(range(*self.data_slice.indices(len(self.data_list))))[index]
self.data_list[a_ix] = data
Test:
A_data_list = [1,2,3,4,5,6,7,8,9]
a = A(A_data_list, n_segment=3)
>>> a[0]
1
a.Bs[0].modify(0, 2) # modify A[0] to be 2
>>> a[0]
2
a.Bs[1].modify(1, -5)
>>> vars(a)
{'data_list': [2, 2, 3, 4, -5, 6, 7, 8, 9],
... }
a.Bs[2].modify(-1, -1) # modify last element of segment #2
>>> vars(a)
{'data_list': [2, 2, 3, 4, -5, 6, 7, 8, -1],
... }
>>> A_instance.Bs[0].modify(3, 0)
IndexError: ... list index out of range
Note: This updated answer would also deal with arbitrary slices, including, hypothetically, ones with a step greater than 1.
I am creating a class that inherits from collections.UserList that has some functionality very similar to NumPy's ndarray (just for exercise purposes). I've run into a bit of a roadblock regarding recursive functions involving the modification of class attributes:
Let's take the flatten method, for example:
class Array(UserList):
def __init__(self, initlist):
self.data = initlist
def flatten(self):
# recursive function
...
Above, you can see that there is a singular parameter in the flatten method, being the required self parameter. Ideally, a recursive function should take a parameter which is passed recursively through the function. So, for example, it might take a lst parameter, making the signature:
Array.flatten(self, lst)
This solves the problem of having to set lst to self.data, which consequently will not work recursively, because self.data won't be changed. However, having that parameter in the function is going to be ugly in use and hinder the user experience of an end user who may be using the function.
So, this is the solution I've come up with:
def flatten(self):
self.data = self.__flatten(self.data)
def __flatten(self, lst):
...
return result
Another solution could be to nest __flatten in flatten, like so:
def flatten(self):
def __flatten(lst):
...
return result
self.data = __flatten(self.data)
However, I'm not sure if nesting would be the most readable as flatten is not the only recursive function in my class, so it could get messy pretty quickly.
Does anyone have any other suggestions? I'd love to know your thoughts, thank you!
A recursive method need not take any extra parameters that are logically unnecessary for the method to work from the caller's perspective; the self parameter is enough for recursion on a "child" element to work, because when you call the method on the child, the child is bound to self in the recursive call. Here is an example:
from itertools import chain
class MyArray:
def __init__(self, data):
self.data = [
MyArray(x) if isinstance(x, list) else x
for x in data]
def flatten(self):
return chain.from_iterable(
x.flatten() if isinstance(x, MyArray) else (x,)
for x in self.data)
Usage:
>>> a = MyArray([[[1, 2], [3, 4]], [[5, 6], [7, 8]]])
>>> list(a.flatten())
[1, 2, 3, 4, 5, 6, 7, 8]
Since UserList is an iterable, you can use a helper function to flatten nested iterables, which can deal likewise with lists and Array objects:
from collections import UserList
from collections.abc import Iterable
def flatten_iterable(iterable):
for item in iterable:
if isinstance(item, Iterable):
yield from flatten_iterable(item)
else:
yield item
class Array(UserList):
def __init__(self, initlist):
self.data = initlist
def flatten(self):
self.data = list(flatten_iterable(self.data))
a = Array([[1, 2], [3, 4]])
a.flatten(); print(a) # prints [1, 2, 3, 4]
b = Array([Array([1, 2]), Array([3, 4])])
b.flatten(); print(b) # prints [1, 2, 3, 4]
barrier of abstraction
Write array as a separate module. flatten can be generic like the example implementation here. This differs from a_guest's answer in that only lists are flattened, not all iterables. This is a choice you get to make as the module author -
# array.py
from collections import UserList
def flatten(t): # generic function
if isinstance(t, list):
for v in t:
yield from flatten(v)
else:
yield t
class array(UserList):
def flatten(self):
return list(flatten(self.data)) # specialization of generic function
why modules are important
Don't forget you are the module user too! You get to reap the benefits from both sides of the abstraction barrier created by the module -
As the author, you can easily expand, modify, and test your module without worrying about breaking other parts of your program
As the user, you can rely on the module's features without having to think about how the module is written or what the underlying data structures might be
# main.py
from array import array
t = array([1,[2,3],4,[5,[6,[7]]]]) # <- what is "array"?
print(t.flatten())
[1, 2, 3, 4, 5, 6, 7]
As the user, we don't have to answer "what is array?" anymore than you have to answer "what is dict?" or "what is iter?" We use these features without having to understand their implementation details. Their internals may change over time, but if the interface stays the same, our programs will continue to work without requiring change.
reusability
Good programs are reusable in many ways. See python's built-in functions for proof of this, or see the the guiding principles of the Unix philosophy -
Write programs that do one thing and do it well.
Write programs to work together.
If you wanted to use flatten in other areas of our program, we can reuse it easily -
# otherscript.py
from array import flatten
result = flatten(something)
Typically, all methods of a class have at least one argument which is called self in order to be able to reference the actual object this method is called on.
If you don't need self in your function, but you still want to include it in a class, you can use #staticmethod and just include a normal function like this:
class Array(UserList):
def __init__(self, initlist):
self.data = initlist
#staticmethod
def flatten():
# recursive function
...
Basically, #staticmethod allows you to make any function a method that can be called on a class or an instance of a class (object). So you can do this:
arr = Array()
arr.flatten()
as well as this:
Array.flatten()
Here is some further reference from Pyhon docs: https://docs.python.org/3/library/functions.html#staticmethod
I have a class that handles a Numpy matrix and some additional infos.
import numpy as np
class MyClass:
def __init__(self, v):
self.values = v
plop = MyClass(np.matrix([[1, 2], [3, 4]]))
The matrix being named values, to access it, I write:
plop.values[1, 1] # Returns 4
Is it possible to access it directly? I mean, doing:
plop[1, 1] # Should returns 4 too
I saw this post but it seams that this solution allows only one level of [].
Thanks!
Just add this method to you class
def __getitem__(self, indices):
return self.values[indices]
Also, given the opportunity, it would be useful to see how __getitem__ and slice objects work
you access it directly I think.
plop = np.matrix([[1, 2], [3, 4]])
plot[1, 1]
I have a property of a Python object that returns an array.
Now, I can set the setter of that property such that the whole array is settable.
However, I'm missing how to make the elements by themselves settable through the property.
I would expect from a user perspective (given an empty SomeClass class):
>>> x = SomeClass()
>>> x.array = [1, 2, 3]
>>> x.array[1] = 4
>>> print (x.array)
[1, 4, 3]
Now, suppose that SomeClass.array is a property defined as
class SomeClass(object):
def __init__(self, a):
self._a = a
#property
def array(self):
return self._a
#array.setter
def array(self, a):
self._a = a
Everything still works as above. Also if I force simple NumPy arrays on the setter.
However, if I replace the return self._a with a NumPy function (that goes in a vectorised way through the elements) and I replace self._a = a with the inverse function, of course the entry does not get set anymore.
Example:
import numpy as np
class SomeClass(object):
def __init__(self, a):
self._a = np.array(a)
#property
def array(self):
return np.sqrt(self._a)
#array.setter
def array(self, a):
self._a = np.power(a, 2)
Now, the user sees the following output:
>>> x = SomeClass([1, 4, 9])
>>> print (x.array)
array([1., 2., 3.])
>>> x.array[1] = 13
>>> print (x.array) # would expect an array([1., 13., 3.]) now!
array([1., 2., 3.])
I think I understand where the problem comes from (the array that NumPy creates during the operation gets its element changed but it doesn't have an effect on the stored array).
What would be a proper implementation of SomeClass to make single elements of the array write-accessible individually and thus settable as well?
Thanks a lot for your hints and help,
TheXMA
The points #Jaime made below his answer helped me a lot! Thanks!
Since arrays are mutable objects, the individual items are settable even without a setter function:
>>> class A(object):
... def __init__(self, a):
... self._a = np.asarray(a)
... #property
... def arr(self):
... return self._a
...
>>> a = A([1,2,3])
>>> a.arr
array([1, 2, 3])
>>> a.arr = [4,5,6] # There is no setter...
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: can't set attribute
>>> a.arr[1] = 7 # ...but the array is mutable
>>> a.arr
array([1, 7, 3])
This is one of the uses of tuples vs. lists, since the latter are mutable, but the former aren't. Anyway, to answer your question: making individual items settable is easy, as long as your getter returns the object itself.
The fancier performance in your second example doesn't seem easy to get in any simple way. I think you could make it happen by making your SomeClass.array attribute be a custom class, that either subclasses ndarray or wraps an instance of it. Either way would be a lot of nontrivial work.
This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
“Least Astonishment” in Python: The Mutable Default Argument
I have something quite strange when I tried to write a class like below. On the line 3, I must put a new copy of the argument newdata to the self.data, otherwise, it seems that when I initiate a new class instance, the value of the previous one is still remembered by the class. see the example below and please pay attention to the only difference in the two versions of code on line 3.
class Pt(object):
def __init__(self,newdata={}):
self.data=newdata.copy()
if self.data == {}:
self._taglist = []
else:
self._taglist = self.data.keys()
def add_tag(self,tag=None):
self.data[tag]={'x':[0,1,2,3,4]}
self._taglist.append(tag)
In [49]: pt = Pt()
In [50]: pt.add_tag('b')
In [51]: pt.add_tag('a')
In [52]: pt.data
Out[52]: {'a': {'x': [0, 1, 2, 3, 4]}, 'b': {'x': [0, 1, 2, 3, 4]}}
In [53]: pt2 = Pt()
In [54]: pt2._taglist
Out[54]: []
class Pt(object):
def __init__(self,newdata={}):
self.data=newdata
if self.data == {}:
self._taglist = []
else:
self._taglist = self.data.keys()
def add_tag(self,tag=None):
self.data[tag]={'x':[0,1,2,3,4]}
self._taglist.append(tag)
In [56]: pt = Pt()
In [57]: pt.add_tag('a')
In [58]: pt.add_tag('b')
In [59]: pt._taglist
Out[59]: ['a', 'b']
In [60]: pt2 = Pt()
In [61]: pt2._taglist
Out[61]: ['a', 'b']
In [62]: pt2.data
Out[62]: {'a': {'x': [0, 1, 2, 3, 4]}, 'b': {'x': [0, 1, 2, 3, 4]}}
I guess the second case happens because: both newdata and self.data refer to the same object (but how could this happen, the value should be given from right to the left but not reverse), so when I use the "add_tag" method to update the self.data, the newdata is updated as well. Yet I think when I initiate a new instance by pt2 = Pt(), the newdata should use the default value ({}), can it still keeps the old value from pt1?
The why and all is already explained very well in “Least Astonishment” in Python: The Mutable Default Argument. So here is just a quick help how to fix your problem.
As default parameter objects are kept, you should never use mutable objects as default parameter values. Instead, you should define a fixed value, usually None, as the default value for which then the default object is initialized. So your constructor would look like this:
def __init__(self, newdata=None):
if newdata is None:
newdata = {}
# ...
Your issue is that you copy the dictionary, you create a new dictionary with the same values. This means that the lists in your dictionaries are the same, so when you modify them, they are modified in all your dictionaries.
What you want to do is a copy.deepcopy() - this will not just copy the dictionaries, copy the lists as well.