Change the underlying data representation with the descriptor protocol - python

Suppose I have an existing class, for example doing some mathematical stuff:
class Vector:
def __init__(self, x, y):
self.x = y
self.y = y
def norm(self):
return math.sqrt(math.pow(self.x, 2) + math.pow(self.y, 2))
Now, for some reason, I'd like to have that Python does not store the members x and y like any variable. I'd rather want that Python internally stores them as strings. Or that it stores them into a dedicated buffer, maybe for interoperability with some C code. So (for the string case) I build the following descriptor:
class MyStringMemory(object):
def __init__(self, convert):
self.convert = convert
def __get__(self, obj, objtype):
print('Read')
return self.convert(self.prop)
def __set__(self, obj, val):
print('Write')
self.prop = str(val)
def __delete__(self, obj):
print('Delete')
And I wrap the existing vector class in a new class where members x and y become MyStringMemory:
class StringVector(Vector):
def __init__(self, x, y):
self.x = x
self.y = y
x = MyStringMemory(float)
y = MyStringMemory(float)
Finally, some driving code:
v = StringVector(1, 2)
print(v.norm())
v.x, v.y = 10, 20
print(v.norm())
After all, I replaced the internal representation of x and y to be strings without any change in the original class, but still with its full functionality.
I just wonder: Will that concept work universally or do I run into serious pitfalls? As I said, the main idea is to store the data into a specific buffer location that is later on accessed by a C code.
Edit: The intention of what I'm doing is as follows. Currently, I have a nicely working program where some physical objects, all of type MyPhysicalObj interact with each other. The code inside the objects is vectorized with Numpy. Now I'd also like to vectorize some code over all objects. For example, each object has an energy that is computed by a complicated vectorized code per-object. Now I'd like to sum up all energies. I can iterate over all objects and sum up, but that's slow. So I'd rather have that property energy for each object automatically stored into a globally predefined buffer, and I can just use numpy.sum over that buffer.

There is one pitfall regarding python descriptors.
Using your code, you will reference the same value, stored in StringVector.x.prop and StringVector.y.prop respectively:
v1 = StringVector(1, 2)
print('current StringVector "x": ', StringVector.__dict__['x'].prop)
v2 = StringVector(3, 4)
print('current StringVector "x": ', StringVector.__dict__['x'].prop)
print(v1.x)
print(v2.x)
will have the following output:
Write
Write
current StringVector "x": 1
Write
Write
current StringVector "x": 3
Read
3.0
Read
3.0
I suppose this is not what you want=). To store unique value per object inside object, make the following changes:
class MyNewStringMemory(object):
def __init__(self, convert, name):
self.convert = convert
self.name = '_' + name
def __get__(self, obj, objtype):
print('Read')
return self.convert(getattr(obj, self.name))
def __set__(self, obj, val):
print('Write')
setattr(obj, self.name, str(val))
def __delete__(self, obj):
print('Delete')
class StringVector(Vector):
def __init__(self, x, y):
self.x = x
self.y = y
x = MyNewStringMemory(float, 'x')
y = MyNewStringMemory(float, 'y')
v1 = StringVector(1, 2)
v2 = StringVector(3, 4)
print(v1.x, type(v1.x))
print(v1._x, type(v1._x))
print(v2.x, type(v2.x))
print(v2._x, type(v2._x))
Output:
Write
Write
Write
Write
Read
Read
1.0 <class 'float'>
1 <class 'str'>
Read
Read
3.0 <class 'float'>
3 <class 'str'>
Also, you definitely could save data inside centralized store, using descriptor's __set__ method.
Refer to this document: https://docs.python.org/3/howto/descriptor.html

If you need a generic convertor('convert') like you did, this is the way to go.
The biggest downside will be performance when you will need to create a lot of instances( I assumed you might, since the class called Vector). This will be slow since python class initiation is slow.
In this case you might consider using namedTuple you can see the docs have a similar scenario as you have.
As a side note: If that possible, why not creating a dict with the string representation of x and y on the init method? and then keep using the x and y as normal variables without all the converting

Related

Why do we need to return Point object in __add__ function instead of just returning x and y?

I'm studying operator overloading in Python and came accross with this chunk of code. It is not clear to me, why do we return Point(x,y) in add function instead of just returning x and y.
class Point:
def __init__(self, x=0 , y=0):
self.x = x
self.y = y
def __str__(self):
return("({0},{1})" .format(self.x, self.y))
def __add__(self , other):
x = self.x + other.x
y = self.y + other.y
return Point(x, y) // here if we remove Point object and use return(x,y) it does not cause any errors
p1 = Point(1,5)
p2 = Point(2,5)
print(p1 + p2)
The (x,y) syntax creates a tuple object, while Point(x,y) creates an instance of the Point class and sets it's x and y properties.
There is a difference between these two types of python objects. A tuple is a sequence type object, which is formal talk for a list of values. A tuple, by itself, only has the two values, and the methods that apply for that type of collection. You can read more about tuples here: https://docs.python.org/3.3/library/stdtypes.html?highlight=tuple#tuple
On the other hand, while your Point class is still quite simple, can have much additional functionality via other methods. For example, the tuple will probably not have the add() method you are creating in your point class, or it may have another add() method which does something else. Hope this clears this up.

Why do Sets allow multiple of the same object with changing hashCodes?

According to my understanding, sets can only have one of each object in them. But I have found the following example where a set has two of the same object
class myObject:
def __init__(self, x):
self.x = x
def set(self, x):
self.x = x
def __hash__(self):
return self.x
def __eq__(self, o):
return self.x == o.x
def __str__(self):
return str(self.x)
def __repr__(self):
return str(self.x)
When I run the following:
x = myObject(1)
mySet = {x}
x.set(2)
mySet.add(x)
print(mySet)
x.set(3)
print(mySet)
I get the following output:
{2, 2}
{3, 3}
If I remove the __str__ and __repr__ methods it shows there are two objects in the set with the same memory address:
{<__main__.myObject object at 0x10e3a10d0>, <__main__.myObject object at 0x10e3a10d0>}
I am aware python doesn't allow things like lists to be hashed because the hashcode can change causing a similar error to what is shown above. Why is python allowing this but not for things like lists etc. Surely Python should also have some way of managing changing hashes.
I have tested this same example on java and the same thing happens. Why do these languages allow this?
had a link in here to the docs, which address the hash
What is a hash: https://docs.python.org/3/reference/datamodel.html#object.hash
What is hashable (IMPORTANT): https://docs.python.org/3/glossary.html#term-hashable
"An object is hashable if it has a hash value which never changes
during its lifetime..."
Looking at the set object itself after you have modified x.
s = set()
x = myObject(2)
Then look at the set member's hash:
Then:
x.set(4)
No change. In fact, if you continue to use that set in other places (e.g. fs = frozenset(s)) you will continue to pass around the old hash.

Subclassing and extending numpy.ndarray

I need some basic data class representations and I want to use existing numpy classes, since they already offer great functionality.
However, I'm not sure if this is the way to do it (although it works so far). So here is an example:
The Position class should act like a simple numpy.array, but it should map the attributes .x, .y and .z to the three array components. I overwrote the __new__ method which returns an ndarray with the initial array. To allow access and modification of the array, I defined properties along with setters for each one.
import numpy as np
class Position(np.ndarray):
"""Represents a point in a 3D space
Adds setters and getters for x, y and z to the ndarray.
"""
def __new__(cls, input_array=(np.nan, np.nan, np.nan)):
obj = np.asarray(input_array).view(cls)
return obj
#property
def x(self):
return self[0]
#x.setter
def x(self, value):
self[0] = value
#property
def y(self):
return self[1]
#y.setter
def y(self, value):
self[1] = value
#property
def z(self):
return self[2]
#z.setter
def z(self, value):
self[2] = value
This seems however a bit too much code for such a basic logic and I'm wondering if I do it the "correct" way. I also need bunch of other classes like Direction which will have quite a few other functionalities (auto-norm on change etc.) and before I start integrating numpy, I thought I ask you…
I would argue ndarray is the wrong choice here, you probably want a simple namedtuple.
>>> import collections
>>> Position = collections.namedtuple('Positions', 'x y z')
>>> p = Position(1, 2, 3)
>>> p
Positions(x=1, y=2, z=3)
You could get the unpacking like so
>>> x, y, z = p
>>> x, y, z
(1, 2, 3)
>>>

How can I make a class method return a new instance of itself?

I have a python class which has a few lists and variables(initialized in __init__).
I want to have a method which operates upon this particular instances data and returns a new instance(new data). In the end, this method should return a new instance with modified data while leaving the original instance's data intact.
What is a pythonic way to do this?
EDIT:
I have a method in the class called complement() which modifies the data in a particular way. I would like to add a __invert__() method which returns an instance of the class with complement()ed data.
Example: Suppose I have a class A.
a=A()
a.complement() would modify the data in instance a.
b = ~a would leave the data in instance a unchanged but b will contain complement()ed data.
I like to implement a copy method that creates an identical instance of the object. Then I can modify the values of that new instance as I please.
class Vector:
def __init__(self, x, y):
self.x, self.y = x, y
def copy(self):
"""
create a new instance of Vector,
with the same data as this instance.
"""
return Vector(self.x, self.y)
def normalized(self):
"""
return a new instance of Vector,
with the same angle as this instance,
but with length 1.
"""
ret = self.copy()
ret.x /= self.magnitude()
ret.y /= self.magnitude()
return ret
def magnitude(self):
return math.hypot(self.x, self.y)
so in your case, you might define a method like:
def complemented(self):
ret = self.copy()
ret.__invert__()
return ret
the copy module can make a copy of a instance exactly like you whish:
def __invert__(self):
ret = copy.deepcopy(self)
ret.complemented()
return ret
I think you mean implementation of Factory design pattern in Python example

In Python, can I bind a variable to a function/expression so that it automatically updates?

Let's say I've got a variable A that is the result of a function/expression F. F in it's turn has a number of other variables in it, let's say X,Y and Z.
Is it possible to bind A to F so that whenever X,Y or Z changes, A will be updated automatically?
What I want to avoid is that everytime X,Y and Z changes, I have to remember to update A explicitly in the code. I also don't want to call the function everytime I want to use the A.
Example (as per requested): I've got the following function:
def calcHits():
return sum(hitDiceRolls,level*modList['con'])
and in my program (outside of the function), I've got a variable called hitPoints (yes, it's a roleplaying game program). Whenever the variables that's used in the function is changed, I want hitPoints to change as well.
The typical way to do this in Python would be to use a class:
class ExpressionBinder:
def __init__(self, f):
self.f = f
self.x = 0
self.y = 0
self.z = 0
#property
def result(self):
return self.f(self.x, self.y, self.z)
You can use it like this:
def f(x, y, z):
return x**3 + y**2 + z
b = ExpressionBinder(f)
b.x = 1
b.y = 2
b.z = 3
print(b.result)
There is no way in Python to automatically rebind a name in global or local scope in response to other names being rebound. However, it should be possible to make a class that can keep track of some values and have a member function that returns the value you called A. And, as #Alok pointed out, you can use property descriptors to make a member name that implicitly calls a function to return its value, so you can hide the function and treat the name like a plain old name.
class Trk(object):
"""Track some values and compute a function if any change"""
def __init__(self, name, fn, **objects_to_track):
def _trk_fn(self):
if any(self.__dict__[x] != self.original_objects[x] for x in self.original_objects):
self.value = self.saved_fn(self.__dict___)
# now that self.value is updated, also update self.original_objects
for x in self.original_objects:
self.original_objects[x] = self.__dict__[x]
return self.value
self.original_objects = objects_to_track # make reference copy
self.__dict__.update(objects_to_track)
self.name = name
self.saved_fn = fn
self.fn = self._trk_fn()
self.value = self.fn()
I'm sorry but I am very tired right now, and I canot finish this example. I didn't test it either. But this shows one way to track values, and if they are different, do something different. You use it like this:
# want to track x, y, z
trk = Trk(x, y, z)
trk.fn() # returns up-to-date value
trk.x = new_value
trk.fn() #detects that trk.x changed and computes new trk.value
If the above works, you can use the property descriptor stuff to bind a name such that an attempt to read a value from the name will call self.fn()
EDIT: Oh, it's important that when self.value is updated, self.original_objects should be updated. I've added code to do that.
And now I'm going to sleep!

Categories

Resources