update metadata in objects when changing shared subobjects - python

I need advice from experience programmers as I am failing to wrap my head around this.
I have the following data structure.
class obj(object):
def __init__(self, data=[], meta=[]):
self.data=data
self.meta=meta
class subobj(object):
def __init__(self, data=[]):
self.data=data
Say, I'm creating the following objects from it.
sub1=subobj([0,1,5])
sub2=subobj([0,1,6])
sub3=subobj([0,1,7])
objA=obj(data=[sub1,sub2], meta=[3,3])
objB=obj(data=[sub3,sub1], meta=[3,3])
Now I am changing sub1 operating on the second object as well as its metadata. For simplicity, I'm writing here via obj. vars without setter/getters:
objB.data[1].data+=[10,11]
objB.meta[1]=5
Now, objA.data[0] has (obviously) changed. But objA.meta[0] stayed the same. I want some func(objB.meta[1]) to be triggered right after the change of the value in objA.data (caused in objB.data) and to change objA.meta as well. Important: this func() uses metadata of the changed sub1 from objB.
I simply don't know how to make every obj know about all other objs that share the same subobj as it does. I could make a func() be triggered upon having that knowledge. I would appreciate any hints.
Notes:
I want to pass those subobj around between objs without metadata and let them be changed by those objs. Metadata is supposed to store information that is defined within objs, not subobj. Hence, the value of the func() depends on the obj itself, but its definition is the same for all objs of the class obj.
For simplicity, this func(metadata) can be something like multiply3(metadata).
I will have thousands of those objects, so I am looking for rather an abstract solution that is not constrained by a small number of objects.
Is that possible in the current design? I am lost as to how to implement this.

Assuming that objs' data property can only contain subobjs and can never change, this code should work.
class obj(object):
__slots__=("data","meta")
def __init__(self,data=(),meta=None):
meta = meta or []
self.data = data
self.meta = meta
for so in data:
so._objs_using.append(self)
class subobj(object):
__slots__ = ("_data","_objs_using")
def __init__(self,data=()):
self._objs_using=[]
self._data = data
#property
def data(self):
return self._data
#data.setter
def data(self,val):
self._data = val
for obj in self._objs_using:
metadata_changed(obj.meta)
I called the function that you want to call on the metadata metadata_changed. This works by keeping track of a list of objs each subobj is used by, and then creating a special data property that notifies each obj whenever it changes.

Related

Stale pickling of class objects - best practice

I have a basic ETL workflow that grabs data from an API, builds a class object, performs various operations which results in storing the data in applicable tables in a DB but ultimately I pickle the object and store that into the DB as well. The reason for pickling is to save these events and reuse the data for new features.
The problem is how best to implement adding attributes for new features. Of course when a new attribute is added, pickled objects are now stale and need to be checked (AttributeError, etc). This is simple with one or two changes but over time it seems like it will be problematic.
Any design tips? Pythonic best practices for inherently updating pickled objects? Seems like a common problem in database design?!
You can define an update method for the class. The update method will take in one object of the same class (but an older version, as you specified) and update all the data from the object passed into the method to the new class object.
Here is an example:
class MyClass:
def __init__(self):
self.data = []
def add_data(self, data):
self.data.append(data)
def update(self, obj):
self.data = obj.data
my_class = MyClass()
my_class.add_data(34) # Class object then gets pickled...
class MyClass2:
def __init__(self):
self.data = []
def add_data(self, data):
self.data.append(data)
def new_attr(self):
print('This is a new attribute.')
def update(self, obj):
self.data = obj.data
my_class2 = MyClass2()
my_class2.update(my_class) # Remember to unpickle the class object

Switch case like mapping of a dictionary (values = methods)

As I'm fairly new python, I can't decide which of the following two solutions makes more sense, or maybe no sense at all.
Let's say my abstracted object class look like:
class SimpleData(object):
def __init__(self, data):
self.__data = data
def __getData(self):
return self.__data
def __setData(self, data):
self.__data = data
data = property(__getData, __setData)
#classmethod
def create_new(cls, data):
return cls(data)
Objects of this class, that I need frequently (having a 'predifined object payload'), I'd like to simply create by 'assigning' a preset_name to them. Using the preset_name I can create new copies of those specific objects, having that predefined payload, repeatedly.
I could use a dictionary:
class PresetDict(object):
#classmethod
def get_preset(cls, preset_name):
return {
'preset_111': SimpleData.create_new('111'),
'preset_222': SimpleData.create_new('222'),
'preset_333': SimpleData.create_new('333')
}.get(preset_name, None)
or map methods, using getattr:
class PresetMethod(object):
#classmethod
def get_preset(cls, preset_name):
return getattr(cls, preset_name, lambda: None)()
#classmethod
def preset_111(cls):
return SimpleData.create_new('111')
#classmethod
def preset_222(cls):
return SimpleData.create_new('222')
#classmethod
def preset_333(cls):
return SimpleData.create_new('333')
Both solutions do basically the same:
print(PresetDict.get_preset("preset_111").data)
print(PresetDict.get_preset("preset_333").data)
print(PresetDict.get_preset("not present"))
print(PresetMethod.get_preset("preset_111").data)
print(PresetMethod.get_preset("preset_333").data)
print(PresetMethod.get_preset("not present"))
I strongly prefer the dictionary solution, as it is easier to 'read', extend and will be easier to maintain in the future, especially with a big list of presets.
Here's the BUT:
Performace is of importance. Here, I have absolutely no insight, which of those two solutions will perform better, especially if the preset list grows. Especially the dictionary in PresetDict.get_preset looks 'dodgy' to me. Will it create only the SimpleData instance specified via 'preset_name' when called, or will it create all possible instances specified in the dictionary when PresetDict.get_preset is called, then return the instance specified via 'preset_name' and then discard all other instances.
Hope you can enlighten me on this matter. Maybe you know of possible improvements or even a better solution of what I'd like to achieve?
Thx in advance!
you're right, PresetDict.get_preset will create all three objects and then return one. You could just add a class variable to SimpleData that holds the dictionary so it is only created once, and the get_preset can return instances from that
class SimpleData(object):
_presets = {
'preset_111': SimpleData('111'),
'preset_222': SimpleData('222'),
'preset_333': SimpleData('333')
}
#classmethod
def get_preset(cls, preset_name):
return cls._presets.get(preset_name, None)
Note that this isn't really any more efficient, it will just make it easier to create commonly used classes.
Also see functools.lru_cache

Python class variables or #property

I am writing a python class to store data and then another class will create an instance of that class to print different variables. Some class variables require a lot of formatting which may take multiple lines of code to get it in its "final state".
Is it bad practice to just access the variables from outside the class with this structure?
class Data():
def __init__(self):
self.data = "data"
Or is it better practice to use an #property method to access variables?
class Data:
#property
def data(self):
return "data"
Be careful, if you do:
class Data:
#property
def data(self):
return "data"
d = Data()
d.data = "try to modify data"
will give you error:
AttributeError: can't set attribute
And as I see in your question, you want to be able to transform the data until its final state, so, go for the other option
class Data2():
def __init__(self):
self.data = "data"
d2 = Data2()
d2.data = "now I can be modified"
or modify the previus:
class Data:
def __init__(self):
self._data = "data"
#property
def data(self):
return self._data
#data.setter
def data(self, value):
self._data = value
d = Data()
d.data = "now I can be modified"
Common Practice
The normal practice in Python is to exposure the attributes directly. A property can be added later if additional actions are required when getting or setting.
Most of the modules in the standard library follow this practice. Public variables (not prefixed with an underscore) typically don't use property() unless there is a specific reason (such as making an attribute read-only).
Rationale
Normal attribute access (without property) is simple to implement, simple to understand, and runs very fast.
The possibility of use property() afterwards means that we don't have to practice defensive programming. We can avoid having to prematurely implement getters and setters which bloats the code and makes accesses slower.
Basically you could hide lot of complexity in the property and make it look like an attribute. This increases code readability.
Also, you need to understand the difference between property and attribute.
Please refer What's the difference between a Python "property" and "attribute"?

How to change the behavior of append for a class attribute

I am using a python package (simpy), which provides several classes that I need. One of the classes is called Event, with the following constructor:
def __init__(self, env):
self.env = env
"""The :class:`~simpy.core.Environment` the event lives in."""
self.callbacks = []
"""List of functions that are called when the event is processed."""
self._value = PENDING
At many different places in the code, objects are added to the callbacks of an event, using the Event.callbacks.append method.
What I need is a new class (which i call Zombie), which is actually an Event class, except for three modifications. Firstly, it should contain an additional attribute Zombie.reset_callbacks and a method Zombie.reset() to reset Zombie.callbacks to a previous state (this is why I need the Zombie.reset_callbacks attribute. All of this, I can do by subclassing Event.
However, for this to work, I would need that everytime Zombie.callbacks.append(x) is called, xis not only appended to Zombie.callbacks, but also Zombie.reset_callbacks. I have been looking into decorators to see if I could do this, but I do not see the light at the end of the tunnel. I currently feel this is not possible, or I might be looking in wrong directions.
Is such thing possible (changing the append behavior for a class attribute) in Python? And if so, how?
Thanx for your effort in advance!
B.
Whoops. Misread this. If you're really dedicated to maintaining this interface, you can define a helper class.
class SplitLists(object):
def __init__(*append_methods):
self._append_methods = append_methods
def append(self, value):
for method in self._append_methods:
method(value)
a = []
b = []
split_list = SplitLists(a.append, b.append)
split_list.append(1)
a # [1]
b # [1]
class Zombie(Event):
def __init__(self, *args, **kwargs):
super(Zombie, self).__init__(*args, **kwargs)
self._real_callbacks = []
self._reset_callbacks = []
self.callbacks = SplitLists(self._real_callbacks.append,
self._reset_callbacks.append)

Printing all instances of a class

With a class in Python, how do I define a function to print every single instance of the class in a format defined in the function?
I see two options in this case:
Garbage collector
import gc
for obj in gc.get_objects():
if isinstance(obj, some_class):
dome_something(obj)
This has the disadvantage of being very slow when you have a lot of objects, but works with types over which you have no control.
Use a mixin and weakrefs
from collections import defaultdict
import weakref
class KeepRefs(object):
__refs__ = defaultdict(list)
def __init__(self):
self.__refs__[self.__class__].append(weakref.ref(self))
#classmethod
def get_instances(cls):
for inst_ref in cls.__refs__[cls]:
inst = inst_ref()
if inst is not None:
yield inst
class X(KeepRefs):
def __init__(self, name):
super(X, self).__init__()
self.name = name
x = X("x")
y = X("y")
for r in X.get_instances():
print r.name
del y
for r in X.get_instances():
print r.name
In this case, all the references get stored as a weak reference in a list. If you create and delete a lot of instances frequently, you should clean up the list of weakrefs after iteration, otherwise there's going to be a lot of cruft.
Another problem in this case is that you have to make sure to call the base class constructor. You could also override __new__, but only the __new__ method of the first base class is used on instantiation. This also works only on types that are under your control.
Edit: The method for printing all instances according to a specific format is left as an exercise, but it's basically just a variation on the for-loops.
You'll want to create a static list on your class, and add a weakref to each instance so the garbage collector can clean up your instances when they're no longer needed.
import weakref
class A:
instances = []
def __init__(self, name=None):
self.__class__.instances.append(weakref.proxy(self))
self.name = name
a1 = A('a1')
a2 = A('a2')
a3 = A('a3')
a4 = A('a4')
for instance in A.instances:
print(instance.name)
You don't need to import ANYTHING! Just use "self". Here's how you do this
class A:
instances = []
def __init__(self):
self.__class__.instances.append(self)
print('\n'.join(A.instances)) #this line was suggested by #anvelascos
It's this simple. No modules or libraries imported
Very nice and useful code, but it has a big problem: list is always bigger and it is never cleaned-up, to test it just add print(len(cls.__refs__[cls])) at the end of the get_instances method.
Here a fix for the get_instances method:
__refs__ = defaultdict(list)
#classmethod
def get_instances(cls):
refs = []
for ref in cls.__refs__[cls]:
instance = ref()
if instance is not None:
refs.append(ref)
yield instance
# print(len(refs))
cls.__refs__[cls] = refs
or alternatively it could be done using WeakSet:
from weakref import WeakSet
__refs__ = defaultdict(WeakSet)
#classmethod
def get_instances(cls):
return cls.__refs__[cls]
Same as almost all other OO languages, keep all instances of the class in a collection of some kind.
You can try this kind of thing.
class MyClassFactory( object ):
theWholeList= []
def __call__( self, *args, **kw ):
x= MyClass( *args, **kw )
self.theWholeList.append( x )
return x
Now you can do this.
object= MyClassFactory( args, ... )
print MyClassFactory.theWholeList
Python doesn't have an equivalent to Smallktalk's #allInstances as the architecture doesn't have this type of central object table (although modern smalltalks don't really work like that either).
As the other poster says, you have to explicitly manage a collection. His suggestion of a factory method that maintains a registry is a perfectly reasonable way to do it. You may wish to do something with weak references so you don't have to explicitly keep track of object disposal.
It's not clear if you need to print all class instances at once or when they're initialized, nor if you're talking about a class you have control over vs a class in a 3rd party library.
In any case, I would solve this by writing a class factory using Python metaclass support. If you don't have control over the class, manually update the __metaclass__ for the class or module you're tracking.
See http://www.onlamp.com/pub/a/python/2003/04/17/metaclasses.html for more information.
In my project, I faced a similar problem and found a simple solution that may also work for you in listing and printing your class instances. The solution worked smoothly in Python version 3.7; gave partial errors in Python version 3.5.
I will copy-paste the relevant code blocks from my recent project.
```
instances = []
class WorkCalendar:
def __init__(self, day, patient, worker):
self.day = day
self.patient = patient
self.worker= worker
def __str__(self):
return f'{self.day} : {self.patient} : {self.worker}'
In Python the __str__ method in the end, determines how the object will be interpreted in its string form. I added the : in between the curly brackets, they are completely my preference for a "Pandas DataFrame" kind of reading. If you apply this small __str__ function, you will not be seeing some machine-readable object type descriptions- which makes no sense for human eyes. After adding this __str__ function you can append your objects to your list and print them as you wish.
appointment= WorkCalendar("01.10.2020", "Jane", "John")
instances.append(appointment)
For printing, your format in __str__ will work as default. But it is also possible to call all attributes separately:
for instance in instances:
print(instance)
print(instance.worker)
print(instance.patient)
For detailed reading, you may look at the source: https://dbader.org/blog/python-repr-vs-str

Categories

Resources