Related
Here is what I want to achieve:
I have a class MyData that holds some sort of data.
Now in class A I store a sequence of data with attribute name data_list and type List[MyData].
Suppose MyData instances are data with different type index. Class A is a management class. I need A to hold all the data to implement sampling uniformly from all data.
But some other operations that are type-specific also need to be done. So a base class B and derived class B1,B2... is designed to account for each type of data. An instance of class A have a list of B instances as member, each storing data points with one type. Code that illustrates this: B.data_list = A.data_list[start_index:start_index+offset].
A have methods that returns some of the data, and B have methods that may modify some of the data.
Now here is the problem: I need to pass data by reference, so that any modification by member function of B is also visible from the side of A.
If I use python builtin List to store data, modifications by B won't be visible for A. I did some experiment using np.array(data_list, dtype=object), it seemed to work. But I'm not familiar with such kind of usage and not sure if it works for data of any type, and whether there will be performance concerns, etc.
Any suggestions or alternatives? Thanks!!
Illustrating code:
class A:
def __init__(self, data_list, n_segment):
self.data_list = data_list
data_count = len(data_list)
segment_length=data_count // n_segment
self.segments = [self.data_list[segment_length*i:segment_length*(i+1)] for i in range(n_segment)]
self.Bs = [B(segment) for segment in self.segments]
def __getitem__(self, item):
return self.data_list[item]
class B:
def __init__(self, data_list):
self.data_list = data_list
def modify(self, index, data):
self.data_list[index]=data
A_data_list = [1,2,3,4,5,6,7,8,9]
A_instance = A(A_data_list, n_segment=3)
print(A_instance[0]) # get 1
A_instance.Bs[0].modify(0,2) # modify A[0] to be 2
print(A_instance[0]) # still get 1
Note that in the above example changing A_data_list to numpy array will solve my problem, but in my case elements in list are objects which cannot be stacked into numpy arrays.
In class A, the segments are all copies of portions of data_list. Thus, so are Bs items. When you try to modify values, A.Bs are modified, but not the corresponding elements in A.data_list.
With numpy, it is probable that you have memory views instead. So when a value is modified, it affects both A.Bs and A.data_list. It is still bad form though.
Here is how to fix your classes so that the proper values are modified:
class A:
def __init__(self, data_list, n_segment):
self.data_list = data_list
data_count = len(data_list)
segment_length = data_count // n_segment
r = range(0, (n_segment + 1) * segment_length, segment_length)
slices = [slice(i, j) for i, j in zip(r, r[1:])]
self.Bs = [B(self.data_list, slice_) for slice_ in slices]
def __getitem__(self, item):
return self.data_list[item]
class B:
def __init__(self, data_list, slice_):
self.data_list = data_list
self.data_slice = slice_
def modify(self, index, data):
a_ix = list(range(*self.data_slice.indices(len(self.data_list))))[index]
self.data_list[a_ix] = data
Test:
A_data_list = [1,2,3,4,5,6,7,8,9]
a = A(A_data_list, n_segment=3)
>>> a[0]
1
a.Bs[0].modify(0, 2) # modify A[0] to be 2
>>> a[0]
2
a.Bs[1].modify(1, -5)
>>> vars(a)
{'data_list': [2, 2, 3, 4, -5, 6, 7, 8, 9],
... }
a.Bs[2].modify(-1, -1) # modify last element of segment #2
>>> vars(a)
{'data_list': [2, 2, 3, 4, -5, 6, 7, 8, -1],
... }
>>> A_instance.Bs[0].modify(3, 0)
IndexError: ... list index out of range
Note: This updated answer would also deal with arbitrary slices, including, hypothetically, ones with a step greater than 1.
I am creating a class that inherits from collections.UserList that has some functionality very similar to NumPy's ndarray (just for exercise purposes). I've run into a bit of a roadblock regarding recursive functions involving the modification of class attributes:
Let's take the flatten method, for example:
class Array(UserList):
def __init__(self, initlist):
self.data = initlist
def flatten(self):
# recursive function
...
Above, you can see that there is a singular parameter in the flatten method, being the required self parameter. Ideally, a recursive function should take a parameter which is passed recursively through the function. So, for example, it might take a lst parameter, making the signature:
Array.flatten(self, lst)
This solves the problem of having to set lst to self.data, which consequently will not work recursively, because self.data won't be changed. However, having that parameter in the function is going to be ugly in use and hinder the user experience of an end user who may be using the function.
So, this is the solution I've come up with:
def flatten(self):
self.data = self.__flatten(self.data)
def __flatten(self, lst):
...
return result
Another solution could be to nest __flatten in flatten, like so:
def flatten(self):
def __flatten(lst):
...
return result
self.data = __flatten(self.data)
However, I'm not sure if nesting would be the most readable as flatten is not the only recursive function in my class, so it could get messy pretty quickly.
Does anyone have any other suggestions? I'd love to know your thoughts, thank you!
A recursive method need not take any extra parameters that are logically unnecessary for the method to work from the caller's perspective; the self parameter is enough for recursion on a "child" element to work, because when you call the method on the child, the child is bound to self in the recursive call. Here is an example:
from itertools import chain
class MyArray:
def __init__(self, data):
self.data = [
MyArray(x) if isinstance(x, list) else x
for x in data]
def flatten(self):
return chain.from_iterable(
x.flatten() if isinstance(x, MyArray) else (x,)
for x in self.data)
Usage:
>>> a = MyArray([[[1, 2], [3, 4]], [[5, 6], [7, 8]]])
>>> list(a.flatten())
[1, 2, 3, 4, 5, 6, 7, 8]
Since UserList is an iterable, you can use a helper function to flatten nested iterables, which can deal likewise with lists and Array objects:
from collections import UserList
from collections.abc import Iterable
def flatten_iterable(iterable):
for item in iterable:
if isinstance(item, Iterable):
yield from flatten_iterable(item)
else:
yield item
class Array(UserList):
def __init__(self, initlist):
self.data = initlist
def flatten(self):
self.data = list(flatten_iterable(self.data))
a = Array([[1, 2], [3, 4]])
a.flatten(); print(a) # prints [1, 2, 3, 4]
b = Array([Array([1, 2]), Array([3, 4])])
b.flatten(); print(b) # prints [1, 2, 3, 4]
barrier of abstraction
Write array as a separate module. flatten can be generic like the example implementation here. This differs from a_guest's answer in that only lists are flattened, not all iterables. This is a choice you get to make as the module author -
# array.py
from collections import UserList
def flatten(t): # generic function
if isinstance(t, list):
for v in t:
yield from flatten(v)
else:
yield t
class array(UserList):
def flatten(self):
return list(flatten(self.data)) # specialization of generic function
why modules are important
Don't forget you are the module user too! You get to reap the benefits from both sides of the abstraction barrier created by the module -
As the author, you can easily expand, modify, and test your module without worrying about breaking other parts of your program
As the user, you can rely on the module's features without having to think about how the module is written or what the underlying data structures might be
# main.py
from array import array
t = array([1,[2,3],4,[5,[6,[7]]]]) # <- what is "array"?
print(t.flatten())
[1, 2, 3, 4, 5, 6, 7]
As the user, we don't have to answer "what is array?" anymore than you have to answer "what is dict?" or "what is iter?" We use these features without having to understand their implementation details. Their internals may change over time, but if the interface stays the same, our programs will continue to work without requiring change.
reusability
Good programs are reusable in many ways. See python's built-in functions for proof of this, or see the the guiding principles of the Unix philosophy -
Write programs that do one thing and do it well.
Write programs to work together.
If you wanted to use flatten in other areas of our program, we can reuse it easily -
# otherscript.py
from array import flatten
result = flatten(something)
Typically, all methods of a class have at least one argument which is called self in order to be able to reference the actual object this method is called on.
If you don't need self in your function, but you still want to include it in a class, you can use #staticmethod and just include a normal function like this:
class Array(UserList):
def __init__(self, initlist):
self.data = initlist
#staticmethod
def flatten():
# recursive function
...
Basically, #staticmethod allows you to make any function a method that can be called on a class or an instance of a class (object). So you can do this:
arr = Array()
arr.flatten()
as well as this:
Array.flatten()
Here is some further reference from Pyhon docs: https://docs.python.org/3/library/functions.html#staticmethod
This feels like a fairly simple concept I'm trying to do.
Just as an example:
Say I have a list [1, 2, 3, 4]
That changes to [2, 3, 4, 1]
I need to be able to identify the change so that I can represent and update the data in JSON without updating the entire list.
Bit of background - This is for use in MIDI, the actual lists can be quite a bit longer than this, and the JSON can be nested with varying complexity. There also may be more than a single change occurring at once. It's not going to be possible to update the entire JSON or nested lists due to time complexity. I am doing it this way currently but in order to expand I need to be able to identify when a specific change occurs and have some way of representing this. It needs to be doable in Python 2 WITHOUT any external packages as it's being used in a Python installation that's embedded within a DAW (Ableton Live).
Does anyone know of anything that may help with this problem? Any help or reading material would be greatly appreciated.
EDIT:
I've tried looping over both lists and comparing the values but this detects it as a change in all values which is no faster than just resending the whole list, potentially much slower as I've got two nested for loops first THEN still send the entire list out over MIDI.
how about this, make a class that track its changes, for example
#from collections.abc import MutableSequence #this for python 3.3+
from collections import MutableSequence
class TrackingList(MutableSequence):
"""list that track its changes"""
def __init__(self,iterable=()):
self.data = list(iterable)
self.changes =[]
def __len__(self):
return len(self.data)
def __getitem__(self,index):
return self.data[index]
def __setitem__(self,index,value):
self.data[index]=value
self.changes.append(("set",index,value))
def __delitem__(self,index):
del self.data[index]
self.changes.append(("del",index))
def insert(self,index,value):
self.data.insert(index,value)
self.changes.append(("insert",index,value))
def __str__(self):
return str(self.data)
example use
>>> tl=TrackingList([1,2,3,4])
>>> print(tl)
[1, 2, 3, 4]
>>> tl.changes
[]
>>> tl[0],tl[-1] = tl[-1],tl[0]
>>> print(tl)
[4, 2, 3, 1]
>>> tl.changes
[('set', 0, 4), ('set', -1, 1)]
>>> tl.append(32)
>>> tl.changes
[('set', 0, 4), ('set', -1, 1), ('insert', 4, 32)]
>>> print(tl)
[4, 2, 3, 1, 32]
>>>
the collections.abc make it easy to make container classes and you get for free a bunch of method, in the case MutableSequence those are: append, reverse, extend, pop, remove, __iadd__, __contains__, __iter__, __reversed__, index, and count
I am creating a data provider class that will hold data, perform transformations and make it available to other classes.
If the user creates and instance of this class and passes some data at instantiation, I would like to store it twice: once for all transformations and once as a copy of the original data. Let's assume the data itself has a copy method.
I am using the attrs package to create classes, but would also be interested in best approaches to this in general (perhaps there is a better way of getting what I am after?)
Here is what I have so far:
#attr.s
class DataContainer(object):
"""Interface for managing data. Reads and write data, acts as a provider to other classes.
"""
data = attr.ib(default=attr.Factory(list))
data_copy = data.copy()
def my_func(self, param1='all'):
"""Do something useful"""
return param1
This doesn't work: AttributeError: '_CountingAttr' object has no attribute 'copy'
I also cannot call data_copy = self.data.copy(), I get the error: NameError: name 'self' is not defined.
The working equivalent without the attrs package would be:
class DataContainer(object):
"""Interface for managing data. Reads and write data, acts as a provider to other classes.
"""
def __init__(self, data):
"Init method, saving passed data and a backup copy"
self.data = data
self.data_copy = data
EDIT:
As pointed out by #hynek, my simple init method above needs to be corrected to make an actual copy of the data: i.e. self.data_copy = data.copy(). Otherwise both self.data and self.data_copy would point to the same object.
You can do two things here.
The first one you've found yourself: you use __attr_post_init__.
The second one is to have a default:
>>> import attr
>>> #attr.s
... class C:
... x = attr.ib()
... _x_backup = attr.ib()
... #_x_backup.default
... def _copy_x(self):
... return self.x.copy()
>>> l = [1, 2, 3]
>>> i = C(l)
>>> i
C(x=[1, 2, 3], _x_backup=[1, 2, 3])
>>> i.x.append(4)
>>> i
C(x=[1, 2, 3, 4], _x_backup=[1, 2, 3])
JFTR, you example of
def __init__(self, data):
self.data = data
self.data_copy = data
is wrong because you’d assign the same object twice which means that modifying self.data also modifies self.data_copy and vice versa.
After looking through the documentation a little more deeply (scroll right to the bottom), I found that there is a kind of post-init hook for classes that are created by attrs.
You can just include a special __attrs_post_init__ method that can do the more complicated things one might want to do in an __init__ method, beyond simple assignment.
Here is my final working code:
In [1]: #attr.s
...: class DataContainer(object):
...: """Interface for managing data. Reads and write data,
...: acts as a provider to other classes.
...: """
...:
...: data = attr.ib()
...:
...: def __attrs_post_init__(self):
...: """Perform additional init work on instantiation.
...: Make a copy of the raw input data.
...: """
...: self.data_copy = self.data.copy()
In [2]: some_data = np.array([[1, 2, 3], [4, 5, 6]])
In [3]: foo = DataContainer(some_data)
In [4]: foo.data
Out[5]:
array([[1, 2, 3],
[4, 5, 6]])
In [6]: foo.data_copy
Out[7]:
array([[1, 2, 3],
[4, 5, 6]])
Just to be doubly sure, I checked to see that the two attributes are not referencing the same object. In this case they are not, which is likely thanks to the copy method on the NumPy array.
In [8]: foo.data[0,0] = 999
In [9]: foo.data
Out[10]:
array([[999, 2, 3],
[ 4, 5, 6]])
In [11]: foo.data_copy
Out[12]:
array([[1, 2, 3],
[4, 5, 6]])
I'm trying to implement matrix class for simple operations with plain python (no numpy and etc.).
Here is part of it:
class Matrix(list):
def __getitem__(self, item):
try:
return list.__getitem__(self, item)
except TypeError:
rows, cols = item
return [row[cols] for row in self[rows]]
It allows to do things like this:
m = Matrix([[i+j for j in [0,1,2,3]] for i in [0,4,8,12]])
print(m[0:2, 0:2])
will print: [[0, 1], [4, 5]]
I also want to be able to add/multiply all submatrix elements by given value, like:
m[0:2, 0:2] += 1
print(m[0:2, 0:2])
should print: [[1, 2], [5, 6]]
It's not clear which magic methods should I implement to make it work?
First, inheriting from list is a bad move here. A matrix doesn't support the kinds of operations a list does; for example, you can't append to or extend a matrix, and item assignment is completely different. Your matrix should contain a list, not be a list.
As for what magic methods you need, m[0:2, 0:2] += 1 roughly translates to the following:
temp = m.__getitem__((slice(0, 2), slice(0, 2)))
temp = operator.iadd(temp, 1)
m.__setitem__((slice(0, 2), slice(0, 2)), temp)
where operator.iadd tries temp.__iadd__, temp.__add__, and (1).__radd__ to perform the addition.
You need to implement __getitem__ and __setitem__ to retrieve the submatrix and assign the new submatrix. Additionally, __getitem__ will need to return a matrix, rather than a list.
You should probably implement both __add__ and __iadd__; while __add__ alone would be sufficient for this case, __iadd__ will be necessary for operations like m += 1 to work in-place instead of replacing m with a new matrix object.
NO. __iadd__ will do the trick for you if the magic was:
m += 2
But the magic is executed over m[0:2, 0:2]. You need to ensure that when slicing your matrix you get a different object, and not a list of lists, since list of lists do not support __iadd__.