Custom Indexing Python Data Structure - python

I have a class that wraps around python deque from collections. When I go and create a deque x=deque(), and I want to reference the first variable....
In[78]: x[0]
Out[78]: 0
My question is how can use the [] for referencing in the following example wrapper
class deque_wrapper:
def __init__(self):
self.data_structure = deque()
def newCustomAddon(x):
return len(self.data_structure)
def __repr__(self):
return repr(self.data_structure)
Ie, continuing from above example:
In[75]: x[0]
Out[76]: TypeError: 'deque_wrapper' object does not support indexing
I want to customize my own referencing, is that possible?

You want to implement the __getitem__ method:
class DequeWrapper:
def __init__(self):
self.data_structure = deque()
def newCustomAddon(x):
return len(self.data_structure)
def __repr__(self):
return repr(self.data_structure)
def __getitem__(self, index):
# etc
Whenever you do my_obj[x], Python will actually call my_obj.__getitem__(x).
You may also want to consider implementing the __setitem__ method, if applicable. (When you write my_obj[x] = y, Python will actually run my_obj.__setitem__(x, y).
The documentation on Python data models will contain more information on which methods you need to implement in order to make custom data structures in Python.

Related

Create Nothing from falsey values using Returns library

Using the Returns library, I have a function that filters a list. I want it to return Nothing if the list is empty (i.e. falsey) or Some([...]) if the list has values.
Maybe seems to be mostly focused on "true" nothing, being None. But I'm wondering if there's a way to get Nothing from a falsey value without doing something like
data = []
result = Some(data) if len(data) > 0 else Nothing
It looks like you have at least a few options. (1) You can create a new class that inherits from Maybe, and then override any methods you like, (2) create a simple function that returns Nothing is data is false, else returns Maybe.from_optional(data) {or whatever other method of Maybe you prefer), or (3) create your own container as per the returns documentation at https://returns.readthedocs.io/en/latest/pages/create-your-own-container.html.
Here is a class called Possibly, that inherits from Maybe and overrides the from_optional class method. You can add similar overrides for other methods following this pattern.
from typing import Optional
from returns.maybe import Maybe, _NewValueType, _Nothing, Some
class Possibly(Maybe):
def __init__(self):
super().__init__()
#classmethod
def from_optional(
cls, inner_value: Optional[_NewValueType],
) -> 'Maybe[_NewValueType]':
"""
Creates new instance of ``Maybe`` container based on an optional value.
"""
if not inner_value or inner_value is None:
return _Nothing(inner_value)
return Some(inner_value)
data = [1,2,3]
empty_data = []
print(Possibly.from_optional(data))
print(Possibly.from_optional(empty_data))
Here are two equivalent functions:
from returns.maybe import Maybe, _Nothing
data = [1,2,3]
empty_data = []
def my_from_optional(anything):
if not anything:
return _Nothing(anything)
else:
return Maybe.from_optional(anything)
def my_from_optional(anything):
return Maybe.from_optional(anything) if anything else _Nothing(anything)
print(my_from_optional(data))
print(my_from_optional(empty_data))

PySpark applyinpands/grouped_map pandas_udf too many arguments

I'm trying to use the pyspark applyInPandas in my python code. Problem is, the function that I want to pass to it exists in the same class, and so it is defined as def func(self, key, df). This becomes an issue because applyInPandas will error out saying I'm passing too many arguments to the underlying func (at most it allows a key and df params, so the self is causing the issue). Is there any way around this?
The underlying goal is to process a pandas function on dataframe groups in parallel.
As OP mentioned, one way is to just use #staticmethod, which may not be desirable in some cases.
The pyspark source code for creating pandas_udf uses inspect.getfullargspec().args (line 386, 436), this includes self even if the class method is called from the instance. I would think this is a bug on their part (maybe worthwhile to raise a ticket).
To overcome this, the easiest way is to use functools.partial which can help change the argspec, i.e. remove the self argument and restore the number of args to 2.
This is based on the idea that calling an instance method is the same as calling the method directly from the class and supply the instance as the first argument (because of the descriptor magic):
A.func(A(), *args, **kwargs) == A().func(*args, **kwargs)
In a concrete example,
import functools
import inspect
class A:
def __init__(self, y):
self.y = y
def sum(self, a: int, b: int):
return (a + b) * self.y
def x(self):
# calling the method using the class and then supply the self argument
f = functools.partial(A.sum, self)
print(f(1, 2))
print(inspect.getfullargspec(f).args)
A(2).x()
This will print
6 # can still use 'self.y'
['a', 'b'] # 2 arguments (without 'self')
Then, in OP's case, one can simply do the same for key, df parameters:
class A:
def __init__(self):
...
def func(self, key, df):
...
def x(self):
f = functools.partial(A.func, self)
self.df.groupby(...).applyInPandas(f)

apply python class methods on list of instances

I recently moved from Matlab to Python and want to transfer some Matlab code to Python. However an obstacle popped up.
In Matlab you can define a class with its methods and create nd-arrays of instances. The nice thing is that you can apply the class methods to the array of instances as long as the method is written so it can deal with the arrays. Now in Python I found that this is not possible: when applying a class method to a list of instances it will not find the class method. Below an example of how I would write the code:
class testclass():
def __init__(self, data):
self.data = data
def times5(self):
return testclass(self.data * 5)
classlist = [testclass(1), testclass(10), testclass(100)]
times5(classlist)
This will give an error on the times5(classlist) line. Now this is a simple example explaining what I want to do (the final class will have multiple numpy arrays as variables).
What is the best way to get this kind of functionality in Python? The reason I want to do this is because it allows batch operations and they make the class a lot more powerful. The only solution I can think of is to define a second class that has a list of instances of the first class as variables. The batch processing would need to be implemented in the second class then.
thanks!
UPDATE:
In your comment , I notice this sentence,
For example a function that takes the data of the first class in the list and substracts the data of all following classe.
This can be solved by reduce function.
class testclass():
def __init__(self, data):
self.data = data
def times5(self):
return testclass(self.data * 5)
from functools import reduce
classlist = [x.data for x in [testclass(1), testclass(10), testclass(100)]]
result = reduce(lambda x,y:x-y,classlist[1:],classlist[0])
print(result)
ORIGIN ANSWER:
In fact, what you need is List Comprehensions.
Please let me show you the code
class testclass():
def __init__(self, data):
self.data = data
def times5(self):
return testclass(self.data * 5)
classlist = [testclass(1), testclass(10), testclass(100)]
results = [x.times5() for x in classlist]
print(results)

How to implement a list of references in python?

I'm trying to model a collection of objects in python (2). The collection should make a certain attribute (an integer, float or any immutable object) of the objects available via a list interface.
(1)
>>> print (collection.attrs)
[1, 5, 3]
>>> collection.attrs = [4, 2, 3]
>>> print (object0.attr == 4)
True
I especially expect this list interface in the collection to allow for reassigning a single object's attribute, e.g.
(2)
>>> collection.attrs[2] = 8
>>> print (object2.attr == 8)
True
I am sure this is a quite frequently occurring situation, unfortunately I was not able to find a satisfying answer on how to implement it on stackoverflow / google etc.
Behind the scenes, I expect the object.attr to be implemented as a mutable object. Somehow I also expect the collection to hold a "list of references" to the object.attr and not the respectively referenced (immutable) values themselves.
I ask for your suggestion how to solve this in an elegant and flexible way.
A possible implementation that allows for (1) but not for (2) is
class Component(object):
"""One of many components."""
def __init__(self, attr):
self.attr = attr
class System(object):
"""One System object contains and manages many Component instances.
System is the main interface to adjusting the components.
"""
def __init__(self, attr_list):
self._components = []
for attr in attr_list:
new = Component(attr)
self._components.append(new)
#property
def attrs(self):
# !!! this breaks (2):
return [component.attr for component in self._components]
#attrs.setter
def attrs(self, new_attrs):
for component, new_attr in zip(self._components, new_attrs):
component.attr = new_attr
The !!! line breaks (2) because we create a new list whose entries are references to the values of all Component.attr and not references to the attributes themselves.
Thanks for your input.
TheXMA
Just add another proxy inbetween:
class _ListProxy:
def __init__(self, system):
self._system = system
def __getitem__(self, index):
return self._system._components[index].attr
def __setitem__(self, index, value):
self._system._components[index].attr = value
class System:
...
#property
def attrs(self):
return _ListProxy(self)
You can make the proxy fancier by implementing all the other list methods, but this is enough for your use-case.
#filmor thanks a lot for your answer, this solves the problem perfectly! I made it a bit more general:
class _ListProxy(object):
"""Is a list of object attributes. Accessing _ListProxy entries
evaluates the object attributes each time it is accessed,
i.e. this list "proxies" the object attributes.
"""
def __init__(self, list_of_objects, attr_name):
"""Provide a list of object instances and a name of a commonly
shared attribute that should be proxied by this _ListProxy
instance.
"""
self._list_of_objects = list_of_objects
self._attr_name = attr_name
def __getitem__(self, index):
return getattr(self._list_of_objects[index], self._attr_name)
def __setitem__(self, index, value):
setattr(self._list_of_objects[index], self._attr_name, value)
def __repr__(self):
return repr(list(self))
def __len__(self):
return len(self._list_of_objects)
Are there any important list methods missing?
And what if I want some of the components (objects) to be garbage collected?
Do I need to use something like a WeakList to prevent memory leakage?

How to implement an array-like property wrapper in python?

I have a class in python that acts as a front-end to a c-library. This library performs simulations and handles very large arrays of data. This library passes forward a ctype array and my wrapper converts it into a proper numpy.ndarray.
class SomeClass(object):
#property
def arr(self):
return numpy.array(self._lib.get_arr())
However, in order to make sure that memory problems don't occur, I keep the ndarray data separate from the library data, so changing the ndarray does not cause a change in the true array being used by the library. However, I can pass along a new array of the same shape and overwrite the library's held array.
#arr.setter
def arr(self, new_arr):
self._lib.set_arr(new_arr.ctypes)
So, I can interact with the array like so:
x = SomeClass()
a = x.arr
a[0] += 1
x.arr = a
My desire is to simplify this even more by allowing syntax to simply be x.arr[0] += 1, which would be more readable and have less variables. I am not exactly sure how to go about creating such a wrapper (I have very little experience making wrapper classes/functions) that mimics properties but allows item access as my example.
How would I go about making such a wrapper class? Is there a better way to accomplish this goal? If you have any advice or reading that could help I would appreciate it very much.
This could work. Array is a proxy for the Numpy/C array:
class Array(object):
def __init__(self):
#self.__lib = ...
self.np_array = numpy.array(self._lib.get_arr())
def __getitem__(self, key):
self.np_array = numpy.array(self._lib.get_arr())
return self.np_array.__getitem__(key)
def __setitem__(self, key, value):
self.np_array.__setitem__(key, value)
self._lib.set_arr(new_arr.ctypes)
def __getattr__(self, name):
"""Delegate to NumPy array."""
try:
return getattr(self.np_array, name)
except AttributeError:
raise AttributeError(
"'Array' object has no attribute {}".format(name))
Should behave like this:
>>> a = Array()
>>> a[1]
1
>>> a[1] = 10
>>> a[1]
10
The 10 should end up in your C array too.
I think your descriptor should return Instance of list-like class which knows about self._lib and will update it during normal operation append, __setitem__, __getitem__, etc.

Categories

Resources