I have been thinking about a class which could be useful for list transformations.
Here is my current implementation:
class ListTransform(object):
"""Specs: stores original list + transformations.
Transformations are stored in a list.
Every transformation is a func call, with
one parameter, transformations are done in place.
"""
def __init__(self, _list):
self.orig_list = _list
self.reset()
def addtransform(self,t):
self.transforms.append(t)
def reset(self, ts = []):
self.transforms = ts
def getresult(self):
li = self.orig_list[:] # start from a copy from the original
# call all the in-place transform functions in order
for transform in self.transforms:
transform(li)
return li
def pick_transform(pickindexes):
"""Only includes elements with specific indexes
"""
def pt(li):
newli = []
for idx in pickindexes:
newli.append(li[idx])
del li[:] # clear all the elements
li.extend(newli)
return pt
def map_transform(fn_for_every_element):
"""Creates a transformation, which will call a specific
function for every element in a list
"""
def mt(li):
newli = map(fn_for_every_element, li)
del li[:] # clear
li.extend(newli)
return mt
# example:
# the object which stores the original list and the transformations
li = ListTransform([0,10,20,30,40,50,60,70,80,90])
# transformations
li.addtransform(map_transform(lambda x: x + (x/10)))
li.addtransform(pick_transform([5,6,7]))
# getting result, prints 55, 66, 77
print li.getresult()
This works well, however, the feeling of implementing something in a substandard manner bothers me.
What Python features would you use in this implementation, I haven't used? How would you improve the overall design/ideas behind this class? How would you improve the code?
Also, since reinventing the wheel feels awkward: what are the standard tools replacing this class?
Thanks.
Having a general scope and not a particular use case in mind, I would look at this in a more "functional" way:
Don't make the tranformations in place -- rather return new lists. This is how standard functions in functional programming work (and also map(), filter() and reduce() in Python).
Concentrate on the transformations rather than on the data. In particular, I would not create a class like your ListTransform at all, but rather only have some kind of transformation objects that can be chained.
To code this having functional programming in mind, the transforms would simply be functions, just like in your design. All you would need in addition is some kind of composition for the transforms:
def compose(f, g):
return lambda lst: f(g(lst))
(For the sake of simplicity the given implementation has only two parameters instead of an arbitrary number.) Your example would now be very simple:
from functools import partial
map_transform = partial(map, lambda x: x + (x/10))
pick_transform = lambda lst: [lst[i] for i in (5,6,7)]
transform = compose(pick_transform, map_transform)
print transform([0,10,20,30,40,50,60,70,80,90])
# [55, 66, 77]
An alternative would be to implement the transforms as classes instead of functions.
Do not use an empty list as default argument. Use None and test for it:
def some_method(self, arg=None):
if arg is None:
arg = []
do_your_thing_with(arg)
I's a well known Python's beginner pitfall.
You could extend the list class itself, and apply the transforms lazily as the elements are needed. Here is a short implementation - it does not allow for index manipulation on the transforms, but you can apply any mapping transform in a stack.
class ListTransform(list):
def __init__(self, *args):
list.__init__(self, *args)
self.transforms = []
def __getitem__(self, index):
return reduce(lambda item, t: t(item), self.transforms, list.__getitem__(self, index))
def __iter__(self):
for index in xrange(len(self)):
yield self[index]
def __repr__(self):
return "'[%s]'" % ", ".join(repr(item) for item in self)
__str__ = lambda s: repr(s).strip("'")
And you are ready to go:
>>> a = ListTransform( range(10))
>>> a
'[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]'
>>> a.transforms.append(lambda x: 2 * x)>>> a
'[0, 2, 4, 6, 8, 10, 12, 14, 16, 18]'
>>> a.transforms.append(lambda x: x + 5)
>>> a
'[5, 7, 9, 11, 13, 15, 17, 19, 21, 23]'
>>> a.append(0)
>>> a
'[5, 7, 9, 11, 13, 15, 17, 19, 21, 23, 5]'
Ok - I may have overreached with the "reduce" call in the getitem method - but that is the fun part. :-)
Feel free to rewrite it in more lines for readability:
def __getitem__(self, index):
item = list.__getitem__(self, index)
for t in self.transforms:
item = t(item)
return item
If you like the idea, you could include a "filter" member to create filtering functions for the items, and check for the number of parameters on the transforms to allow them to work with indexes, and even reach other list items.
Related
Here is what I want to achieve:
I have a class MyData that holds some sort of data.
Now in class A I store a sequence of data with attribute name data_list and type List[MyData].
Suppose MyData instances are data with different type index. Class A is a management class. I need A to hold all the data to implement sampling uniformly from all data.
But some other operations that are type-specific also need to be done. So a base class B and derived class B1,B2... is designed to account for each type of data. An instance of class A have a list of B instances as member, each storing data points with one type. Code that illustrates this: B.data_list = A.data_list[start_index:start_index+offset].
A have methods that returns some of the data, and B have methods that may modify some of the data.
Now here is the problem: I need to pass data by reference, so that any modification by member function of B is also visible from the side of A.
If I use python builtin List to store data, modifications by B won't be visible for A. I did some experiment using np.array(data_list, dtype=object), it seemed to work. But I'm not familiar with such kind of usage and not sure if it works for data of any type, and whether there will be performance concerns, etc.
Any suggestions or alternatives? Thanks!!
Illustrating code:
class A:
def __init__(self, data_list, n_segment):
self.data_list = data_list
data_count = len(data_list)
segment_length=data_count // n_segment
self.segments = [self.data_list[segment_length*i:segment_length*(i+1)] for i in range(n_segment)]
self.Bs = [B(segment) for segment in self.segments]
def __getitem__(self, item):
return self.data_list[item]
class B:
def __init__(self, data_list):
self.data_list = data_list
def modify(self, index, data):
self.data_list[index]=data
A_data_list = [1,2,3,4,5,6,7,8,9]
A_instance = A(A_data_list, n_segment=3)
print(A_instance[0]) # get 1
A_instance.Bs[0].modify(0,2) # modify A[0] to be 2
print(A_instance[0]) # still get 1
Note that in the above example changing A_data_list to numpy array will solve my problem, but in my case elements in list are objects which cannot be stacked into numpy arrays.
In class A, the segments are all copies of portions of data_list. Thus, so are Bs items. When you try to modify values, A.Bs are modified, but not the corresponding elements in A.data_list.
With numpy, it is probable that you have memory views instead. So when a value is modified, it affects both A.Bs and A.data_list. It is still bad form though.
Here is how to fix your classes so that the proper values are modified:
class A:
def __init__(self, data_list, n_segment):
self.data_list = data_list
data_count = len(data_list)
segment_length = data_count // n_segment
r = range(0, (n_segment + 1) * segment_length, segment_length)
slices = [slice(i, j) for i, j in zip(r, r[1:])]
self.Bs = [B(self.data_list, slice_) for slice_ in slices]
def __getitem__(self, item):
return self.data_list[item]
class B:
def __init__(self, data_list, slice_):
self.data_list = data_list
self.data_slice = slice_
def modify(self, index, data):
a_ix = list(range(*self.data_slice.indices(len(self.data_list))))[index]
self.data_list[a_ix] = data
Test:
A_data_list = [1,2,3,4,5,6,7,8,9]
a = A(A_data_list, n_segment=3)
>>> a[0]
1
a.Bs[0].modify(0, 2) # modify A[0] to be 2
>>> a[0]
2
a.Bs[1].modify(1, -5)
>>> vars(a)
{'data_list': [2, 2, 3, 4, -5, 6, 7, 8, 9],
... }
a.Bs[2].modify(-1, -1) # modify last element of segment #2
>>> vars(a)
{'data_list': [2, 2, 3, 4, -5, 6, 7, 8, -1],
... }
>>> A_instance.Bs[0].modify(3, 0)
IndexError: ... list index out of range
Note: This updated answer would also deal with arbitrary slices, including, hypothetically, ones with a step greater than 1.
I am creating a class that inherits from collections.UserList that has some functionality very similar to NumPy's ndarray (just for exercise purposes). I've run into a bit of a roadblock regarding recursive functions involving the modification of class attributes:
Let's take the flatten method, for example:
class Array(UserList):
def __init__(self, initlist):
self.data = initlist
def flatten(self):
# recursive function
...
Above, you can see that there is a singular parameter in the flatten method, being the required self parameter. Ideally, a recursive function should take a parameter which is passed recursively through the function. So, for example, it might take a lst parameter, making the signature:
Array.flatten(self, lst)
This solves the problem of having to set lst to self.data, which consequently will not work recursively, because self.data won't be changed. However, having that parameter in the function is going to be ugly in use and hinder the user experience of an end user who may be using the function.
So, this is the solution I've come up with:
def flatten(self):
self.data = self.__flatten(self.data)
def __flatten(self, lst):
...
return result
Another solution could be to nest __flatten in flatten, like so:
def flatten(self):
def __flatten(lst):
...
return result
self.data = __flatten(self.data)
However, I'm not sure if nesting would be the most readable as flatten is not the only recursive function in my class, so it could get messy pretty quickly.
Does anyone have any other suggestions? I'd love to know your thoughts, thank you!
A recursive method need not take any extra parameters that are logically unnecessary for the method to work from the caller's perspective; the self parameter is enough for recursion on a "child" element to work, because when you call the method on the child, the child is bound to self in the recursive call. Here is an example:
from itertools import chain
class MyArray:
def __init__(self, data):
self.data = [
MyArray(x) if isinstance(x, list) else x
for x in data]
def flatten(self):
return chain.from_iterable(
x.flatten() if isinstance(x, MyArray) else (x,)
for x in self.data)
Usage:
>>> a = MyArray([[[1, 2], [3, 4]], [[5, 6], [7, 8]]])
>>> list(a.flatten())
[1, 2, 3, 4, 5, 6, 7, 8]
Since UserList is an iterable, you can use a helper function to flatten nested iterables, which can deal likewise with lists and Array objects:
from collections import UserList
from collections.abc import Iterable
def flatten_iterable(iterable):
for item in iterable:
if isinstance(item, Iterable):
yield from flatten_iterable(item)
else:
yield item
class Array(UserList):
def __init__(self, initlist):
self.data = initlist
def flatten(self):
self.data = list(flatten_iterable(self.data))
a = Array([[1, 2], [3, 4]])
a.flatten(); print(a) # prints [1, 2, 3, 4]
b = Array([Array([1, 2]), Array([3, 4])])
b.flatten(); print(b) # prints [1, 2, 3, 4]
barrier of abstraction
Write array as a separate module. flatten can be generic like the example implementation here. This differs from a_guest's answer in that only lists are flattened, not all iterables. This is a choice you get to make as the module author -
# array.py
from collections import UserList
def flatten(t): # generic function
if isinstance(t, list):
for v in t:
yield from flatten(v)
else:
yield t
class array(UserList):
def flatten(self):
return list(flatten(self.data)) # specialization of generic function
why modules are important
Don't forget you are the module user too! You get to reap the benefits from both sides of the abstraction barrier created by the module -
As the author, you can easily expand, modify, and test your module without worrying about breaking other parts of your program
As the user, you can rely on the module's features without having to think about how the module is written or what the underlying data structures might be
# main.py
from array import array
t = array([1,[2,3],4,[5,[6,[7]]]]) # <- what is "array"?
print(t.flatten())
[1, 2, 3, 4, 5, 6, 7]
As the user, we don't have to answer "what is array?" anymore than you have to answer "what is dict?" or "what is iter?" We use these features without having to understand their implementation details. Their internals may change over time, but if the interface stays the same, our programs will continue to work without requiring change.
reusability
Good programs are reusable in many ways. See python's built-in functions for proof of this, or see the the guiding principles of the Unix philosophy -
Write programs that do one thing and do it well.
Write programs to work together.
If you wanted to use flatten in other areas of our program, we can reuse it easily -
# otherscript.py
from array import flatten
result = flatten(something)
Typically, all methods of a class have at least one argument which is called self in order to be able to reference the actual object this method is called on.
If you don't need self in your function, but you still want to include it in a class, you can use #staticmethod and just include a normal function like this:
class Array(UserList):
def __init__(self, initlist):
self.data = initlist
#staticmethod
def flatten():
# recursive function
...
Basically, #staticmethod allows you to make any function a method that can be called on a class or an instance of a class (object). So you can do this:
arr = Array()
arr.flatten()
as well as this:
Array.flatten()
Here is some further reference from Pyhon docs: https://docs.python.org/3/library/functions.html#staticmethod
I have class MyList that inherits from list. When I pass instance of list ie. [0, 3, 5, 1] to MyList, how to construct MyList to avoid copy and have self have no-copy reference to other content.
I have tried with:
other.__class__ = MyList : gives TypeError
and with
super(MyList, cls).__new__(other) : gives TypeError
and with
super(MyList, other) : gives TypeError
lastly with
self[:] = other[:] : gives id(self) != id(other)
Also simple MyList([0, 1, 3, 4]) would not solve problem when I do some operations in-place inside MyList.
class MyList(list):
def __new__(cls, other):
other.__class__ = MyList
return other
# add bunch of methods that work inplace on list
def merge(self,):
pass
def sort(self,):
pass
def find(self, x):
pass
def nextNonMember(self, x):
pass
Alternative way that I want to avoid is:
class MyNotSoFancyList(object):
def __init__(self, other):
self.list = other
I expect to have this behavior:
t = [0, 1, 3, 100, 20, 4]
o = MyList(t)
o.sort()
assert(t == o)
Question is probably not so trivial one for me when I dont know Python on "low" level. It seems its not possible. Thus I wanted to ask, maybe someone knows some trick xD.
EDIT
Until now there was one hint in message to be deleted. Need some time to digest it, so will keep it here:
#RobertGRZELKA I think I kinda got to a conclusion with myself that this simply can't be done. As when you create an object of the class, it instantiates a new list in memory and references it. So if want to reference another list, there is no point in the new object. Bottom line I believe you will have to have the reference as an attribute of the class, implement your methods, and then override the list methods you are going to use so that they work on the referenced list. Tell me when you read that and I will delete this answer – Tomerikoo 2 hours ago
Try this
class MyList(list):
def __init__(self,value):
self.extend(value)
I dont really understand why you would want it unless you want to add more methods to the list object. but that should give you a list
t = [0, 1, 3, 100, 20, 4]
o = MyList(t)
o.sort()
t.sort()
assert(t==o)
Let's say we have a function add as follows
def add(x, y):
return x + y
we want to apply map function for an array
map(add, [1, 2, 3], 2)
The semantics are I want to add 2 to every element of the array. But the map function requires a list in the third argument as well.
Note: I am putting the add example for simplicity. My original function is much more complicated. And of course option of setting the default value of y in add function is out of question as it will be changed for every call.
One option is a list comprehension:
[add(x, 2) for x in [1, 2, 3]]
More options:
a = [1, 2, 3]
import functools
map(functools.partial(add, y=2), a)
import itertools
map(add, a, itertools.repeat(2, len(a)))
The docs explicitly suggest this is the main use for itertools.repeat:
Make an iterator that returns object over and over again. Runs indefinitely unless the times argument is specified. Used as argument to map() for invariant parameters to the called function. Also used with zip() to create an invariant part of a tuple record.
And there's no reason for pass len([1,2,3]) as the times argument; map stops as soon as the first iterable is consumed, so an infinite iterable is perfectly fine:
>>> from operator import add
>>> from itertools import repeat
>>> list(map(add, [1,2,3], repeat(4)))
[5, 6, 7]
In fact, this is equivalent to the example for repeat in the docs:
>>> list(map(pow, range(10), repeat(2)))
[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]
This makes for a nice lazy-functional-language-y solution that's also perfectly readable in Python-iterator terms.
Use a list comprehension.
[x + 2 for x in [1, 2, 3]]
If you really, really, really want to use map, give it an anonymous function as the first argument:
map(lambda x: x + 2, [1,2,3])
Map can contain multiple arguments, the standard way is
map(add, a, b)
In your question, it should be
map(add, a, [2]*len(a))
The correct answer is simpler than you think.
Simply do:
map(add, [(x, 2) for x in [1,2,3]])
And change the implementation of add to take a tuple i.e
def add(t):
x, y = t
return x+y
This can handle any complicated use case where both add parameters are dynamic.
Sometimes I resolved similar situations (such as using pandas.apply method) using closures
In order to use them, you define a function which dynamically defines and returns a wrapper for your function, effectively making one of the parameters a constant.
Something like this:
def add(x, y):
return x + y
def add_constant(y):
def f(x):
return add(x, y)
return f
Then, add_constant(y) returns a function which can be used to add y to any given value:
>>> add_constant(2)(3)
5
Which allows you to use it in any situation where parameters are given one at a time:
>>> map(add_constant(2), [1,2,3])
[3, 4, 5]
edit
If you do not want to have to write the closure function somewhere else, you always have the possibility to build it on the fly using a lambda function:
>>> map(lambda x: add(x, 2), [1, 2, 3])
[3, 4, 5]
If you have it available, I would consider using numpy. It's very fast for these types of operations:
>>> import numpy
>>> numpy.array([1,2,3]) + 2
array([3, 4, 5])
This is assuming your real application is doing mathematical operations (that can be vectorized).
If you really really need to use map function (like my class assignment here...), you could use a wrapper function with 1 argument, passing the rest to the original one in its body; i.e. :
extraArguments = value
def myFunc(arg):
# call the target function
return Func(arg, extraArguments)
map(myFunc, itterable)
Dirty & ugly, still does the trick
I believe starmap is what you need:
from itertools import starmap
def test(x, y, z):
return x + y + z
list(starmap(test, [(1, 2, 3), (4, 5, 6)]))
def func(a, b, c, d):
return a + b * c % d
map(lambda x: func(*x), [[1,2,3,4], [5,6,7,8]])
By wrapping the function call with a lambda and using the star unpack, you can do map with arbitrary number of arguments.
You can include lambda along with map:
list(map(lambda a: a+2, [1, 2, 3]))
To pass multiple arguments to a map function.
def q(x,y):
return x*y
print map (q,range(0,10),range(10,20))
Here q is function with multiple argument that map() calls.
Make sure, the length of both the ranges i.e.
len (range(a,a')) and len (range(b,b')) are equal.
In :nums = [1, 2, 3]
In :map(add, nums, [2]*len(nums))
Out:[3, 4, 5]
Another option is:
results = []
for x in [1,2,3]:
z = add(x,2)
...
results += [f(z,x,y)]
This format is very useful when calling multiple functions.
#multi argument
def joke(r):
if len(r)==2:
x, y = r
return x + y
elif len(r)==3:
x,y,z=r
return x+y+z
#using map
print(list(map(joke,[[2,3],[3,4,5]])))
output = [6,12]
if the case like above and just want use function
def add(x,y):
ar =[]
for xx in x:
ar.append(xx+y)
return ar
print(list(map(add,[[3,2,4]],[2]))[0])
output = [5,4,6]
Note: you can modified as you want.
I am befuddled over what I think is a very simple and straight forward subclass of a list in Python.
Suppose I want all the functionality of list. I want to add several methods to the default set of a list.
The following is an example:
class Mylist(list):
def cm1(self):
self[0]=10
def cm2(self):
for i,v in enumerate(self):
self[i]=self[i]+10
def cm3(self):
self=[]
def cm4(self):
self=self[::-1]
def cm5(self):
self=[1,2,3,4,5]
ml=Mylist([1,2,3,4])
ml.append(5)
print "ml, an instance of Mylist: ",ml
ml.cm1()
print "cm1() works: ",ml
ml.cm2()
print "cm2() works: ",ml
ml.cm3()
print "cm3() does NOT work as expected: ",ml
ml.cm4()
print "cm4() does NOT work as expected: ",ml
ml.cm5()
print "cm5() does NOT work as expected: ",ml
The output:
Mylist: [1, 2, 3, 4, 5]
cm1() works: [10, 2, 3, 4, 5]
cm2() works: [20, 12, 13, 14, 15]
cm3() does NOT work as expected: [20, 12, 13, 14, 15]
cm4() does NOT work as expected: [20, 12, 13, 14, 15]
cm5() does NOT work as expected: [20, 12, 13, 14, 15]
So it seems that a scalar assignment works as I expect and understand. List or slices do not work as I understand. By 'does not work,' I mean that the code in the method does not change the instance of ml as the first two methods do.
What do I need to do so that cm3() cm4() and cm5() work?
The problem here is that in cm3, cm4, and cm5, you are not modifying the object! You are creating a new one in the scope of the member function, and then assigning it to self. The outer scope doesn't respect this. In cm1 and cm2, you are modifying the same object, so the object stays the same.
Try using the id function to debug this:
def cm4(self):
self=self[::-1]
print 'DEBUG', id(self)
...
m1.cm4()
print 'DEBUG', id(self)
You'll see that the id is different.
So, you might ask, well how do I do this? You are lucky that with lists you can assign into a splice. This might not be as easy with other data structures. What this does is keeps the same list, but replaces the items. To do this, do:
self[:] = ...
So, for example:
self[:] = self[::-1]
Your misunderstanding is that there is something special about the word self. In those methods it is just a name in scope like any other name in python, so when you reassign it, you are just rebinding the name self to some other object - not mutating the parent object. In fact that argument doesn't even need to be named self, that is only a convention that python programmers use.
Here is reimplementation of your members to mutate properly:
def cm3(self):
self[:] = []
def cm4(self):
self.reverse()
def cm5(self):
self[:] = [1,2,3,4,5]
use self[:] instead of self, self = ... will rebind self variable to other object only, that will not change the list object.
def cm3(self):
self[:]=[]
def cm4(self):
self[:]=self[::-1]
def cm5(self):
self[:]=[1,2,3,4,5]