MPI broadcast changes the address of Python class instances - python

I am developing a python framework for a scientific application that I recently wanted to speed up a little using mpi4py. That framework works with, say, 3-4 class instances that do stuff with each other and each one store a reference to the other instance in a self.variable.
What troubles me is that the broadcasting function bcast changes the address of the instances each time. To make this work, I have to reassign the variable name (the new reference), of each instance, back to all the other instances. That makes me think that I am doing things wrong.
I reproduce this with the following piece of code where, each time the bcast function is called, the instance 'obj' changes address.
from mpi4py import MPI
commMPI = MPI.COMM_WORLD
rankMPI = commMPI.Get_rank()
sizeMPI = commMPI.Get_size()
class Dummy():
def __init__(self):
pass
if rankMPI == 0:
obj = Dummy()
else:
obj = None
obj = commMPI.bcast(obj,root=0)
obj1 = obj
obj = commMPI.bcast(obj,root=0)
print(rankMPI,obj is obj1)
I have just started using mpi4py so I only know the very basics. Do I have to implement an intrinsic class method to update the instances while keeping their primary address? What am I doing wrong? Thanks in advance.
Update
I found that I can retain the address of the root (rank 0) if I write this like
tmp = commMPI.bcast(obj,root=0)
if rankMPI != 0:
obj = tmp
Is that how I should do it or there is a clearer way to write that?

Related

Python Class with ctypes attributes has side effects

I create a class for an image buffer. The class looks like this:
import ctypes
class ImgBuf():
def __init__(self, bufnr=ctypes.c_int(-1)):
self.bufnr = bufnr
The attribute 'bufnr' is handed over to a shared library by reference and changed for buffer management. I want to have multiple instances of this class (to manage several image buffers). In the small example
import imgBuf.ImgBuf
buf1 = dicamsdk.ImgBuf()
buf2 = dicamsdk.ImgBuf()
sharedDLL.allocateBuffer(buf1)
the bufNr has been changed in the both instances. How can I make instances independent?
ctypes.c_int, like list, is mutable, which should not use as a default argument, a very common mistake. You should create a new object each time the function is called:
class ImgBuf():
def __init__(self, bufnr=None):
if bufnr is None:
bufnr = ctypes.c_int(-1)
self.bufnr =ctypes.c_int(bufnr)
Default values for kwargs like bufnr will only get evaluated once at load-time, when the def statement is read. So in this case, a c_int will be created once and the same instance is then used for all ImgBuf instances that don't specify bufnr explicitly.
To avoid this, you would tyipcally use None as the default, then check for is None and instantiate the default value if needed inside the function's body, which creates a new instance every time.
The same applies for any other mutable object used as a default value, like lists, dicts and so on. See this link.
See the following code snipped:
from imgBuf import ImgBuf
buf1 = ImgBuf()
buf2 = ImgBuf()
print('id(buf1) == id(buf2): ', id(buf1) == id(buf2))
print('id(buf1.bufnr) == id(buf2.bufnr): ', id(buf1.bufnr) == id(buf2.bufnr))
The output is
id(buf1) == id(buf2): False
id(buf1.bufnr) == id(buf2.bufnr): True
Even though python correctly creates two instances of the class object, the ctypes values have the same identity. Usually, this is not a problem as python manages the storage. But this does not apply to the shared library in use.
The default values have been copied to ensure independent instances. The class looks like this:
import ctypes
from copy import copy
class ImgBuf():
def __init__(self, bufnr=ctypes.c_int(-1)):
self.bufnr = copy(ctypes.c_int(bufnr))
Now the objects should be independent (run code snipped again).

Python: how to install a pass-through copy/deepcopy hook

I have a library which stores additional data for foreign user objects in a WeakKeyDictionary:
extra_stuff = weakref.WeakKeyDictionary()
def get_extra_stuff_for_obj(o):
return extra_stuff[o]
When user object is copied, I want the copy to have the same extra stuff. However, I have limited control over the user object. I would like to define a class decorator for user object classes which will be used in this manner:
def has_extra_stuff(klass):
def copy_with_hook(self):
new = magic_goes_here(self)
extra_stuff[new] = extra_stuff[self]
klass.__copy__ = copy_with_hook
return klass
This is easy if klass already defines __copy__, because I can close copy_with_hook over the original and call it. However, typically it's not defined. What to call here? This obviously can't be copy.copy, because that would result in infinite recursion.
I found this question which appears to ask the exact same question, but afaict the answer is wrong because this results in a deepcopy, not a copy. I would also be unable to do this, as I need to install hooks for both deepcopy and copy. (Incidentally, I would have continued the discussion in that question, but having no reputation I am not able to do this.)
I looked at what the copy module does, which is a bunch of voodoo involving __reduce_ex(). I can obviously cut/paste this into my code, or call its private methods directly, but I would consider this an absolute last resort. This seems like such a simple thing, I'm convinced I'm missing a simple solution.
Essentially, you need to (A) copy and preserve the original __copy__ if present (and delegate to it), otherwise (B) trick copy.copy into not using your newly-added __copy__ (and delegate to copy,copy).
So, for example...:
import copy
import threading
copylock = threading.RLock()
def has_extra_stuff(klass):
def simple_copy_with_hook(self):
with copylock:
new = original_copy(self)
extra_stuff[new] = extra_stuff[self]
def tricky_case(self):
with copylock:
try:
klass.__copy__ = None
new = copy.copy(self)
finally:
klass.__copy__ = tricky_case
extra_stuff[new] = extra_stuff[self]
original_copy = getattr(klass, '__copy__', None)
if original_copy is None:
klass.__copy__ = tricky_case
else:
klass.__copy__ = simple_copy_with_hook
return klass
Not the most elegant code ever written, but at least it just plays around with klass, without monkey-patching nor copy-and-pasting copy.py itself:-)
Added: since the OP mentioned in a comment he can't use this solution because the app is multi-threaded, added appropriate locking to make it actually usable. Using a single global re-entrant lock to ensure against deadlocks due to out-of-order acquires of multiple locks among multiple threads, and perhaps over-locked "just in case" although I suspect the simple case and the dict assignent in the tricky case probably don't need the lock... but, when threading threatens, better safe than sorry:-)
After some playing I've come up with the following:
import copy_reg, copy
# Library
def hook(new):
print "new object: %s" % new
def pickle_hooked(o):
pickle = o.__reduce_ex__(2)
creator = pickle[0]
def creator_hook(*args, **kwargs):
new = creator(*args, **kwargs)
hook(new)
return new
return (creator_hook,) + pickle[1:]
def with_copy_hook(klass):
copy_reg.pickle(klass, pickle_hooked)
return klass
# Application
#with_copy_hook
class A(object):
def __init__(self, value):
self.value = value
This registers a pass-through copy hook which also has the advantage of working for both copy and deepcopy. The only detail of the return value of reduce_ex it needs to concern itself with is that the first element in the tuple is a creator function. All other details are handed off to existing library code. It is not perfect, because I still don't see a way of detecting if the target class has already registered a pickler.

Thread in Python : class attribute (list) not thread-safe?

I am trying to understand Threads in Python.
The code
And now I have a problem, which I have surrounded in one simple class:
# -*- coding: utf-8 -*-
import threading
class myClassWithThread(threading.Thread):
__propertyThatShouldNotBeShared = []
__id = None
def __init__(self, id):
threading.Thread.__init__(self)
self.__id = id
def run(self):
while 1:
self.dummy1()
self.dummy2()
def dummy1(self):
if self.__id == 2:
self.__propertyThatShouldNotBeShared.append("Test value")
def dummy2(self):
for data in self.__propertyThatShouldNotBeShared:
print self.__id
print data
self.__propertyThatShouldNotBeShared.remove(data)
obj1 = myClassWithThread(1)
obj2 = myClassWithThread(2)
obj3 = myClassWithThread(3)
obj1.start()
obj2.start()
obj3.start()
Description
Here is what the class does :
The class has two attributes :
__id which is an identifier for the object, given when the constructor is called
__propertyThatShouldNotBeShared is a list and will contain a text value
Now the methods
run() contains an infinite loop in which I call dummy1() and then dummy2()
dummy1() which adds to attribute (list) __propertyThatShouldNotBeShared the value "Test value" only IF the __id of the object is equal to 2
dummy2() checks if the size of the list __propertyThatShouldNotBeShared is strictly superior to 0, then
for each value in __propertyThatShouldNotBeShared it prints the id of
the object and the value contained in __propertyThatShouldNotBeShared
then it removes the value
Here is the output that I get when I launch the program :
21
Test valueTest value
2
Test value
Exception in thread Thread-2:
Traceback (most recent call last):
File "E:\PROG\myFace\python\lib\threading.py", line 808, in __bootstrap_inner
self.run()
File "E:\PROG\myFace\myProject\ghos2\src\Tests\threadDeMerde.py", line 15, in run
self.dummy2()
File "E:\PROG\myFace\myProject\ghos2\src\Tests\threadDeMerde.py", line 27, in dummy2
self.__propertyThatShouldNotBeShared.remove(data)
ValueError: list.remove(x): x not in list
The problem
As you can see in the first line of the output I get this "1"...which means that, at some point, the object with the id "1" tries to print something on the screen...and actually it does!
But this should be impossible!
Only object with id "2" should be able to print anything!
What is the problem in this code ? Or what is the problem with my logic?
The problem is this:
class myClassWithThread(threading.Thread):
__propertyThatShouldNotBeShared = []
It defines one list for all objects which is shared. You should do this:
class myClassWithThread(threading.Thread):
def __init__(self, id):
self.__propertyThatShouldNotBeShared = []
# the other code goes here
There are two problems hereā€”the one you asked about, thread-safety, and the one you didn't, the difference between class and instance attributes.
It's the latter that's causing your actual problem. A class attribute is shared by all instances of the class. It has nothing to do with whether those instances are accessed on a single thread or on multiple threads; there's only one __propertyThatShouldNotBeShared that's shared by everyone. If you want an instance attribute, you have to define it on the instance, not on the class. Like this:
class myClassWithThread(threading.Thread):
def __init__(self, id):
self.__propertyThatShouldNotBeShared = []
Once you do that, each instance has its own copy of __propertyThatShouldNotBeShared, and each lives on its own thread, so there is no thread-safety issue possible.
However, your original code does have a thread-safety problem.
Almost nothing is automatically thread-safe (aka "synchronized"); exceptions (like queue.Queue) will say so explicitly, and be meant specifically for threaded programming.
You can avoid this in three ways:
Don't share anything.
Don't mutate anything you share.
Don't mutate anything you share unless it's protected by an appropriate synchronization object.
The last one is of course the most flexible, but also the most complicated. In fact, it's at the center of why people consider threaded programming hard.
The short version is, everywhere you modify or access shared mutable data like self.__propertyThatShouldNotBeShared, you need to be holding some kind of synchronization object, like a Lock. For example:
class myClassWithThread(threading.Thread):
__lock = threading.Lock()
# etc.
def dummy1(self):
if self.__id == 2:
with self.__lock:
self.__propertyThatShouldNotBeShared.append("Test value")
If you stick to CPython, and to built-in types, you can often get away with ignoring locks. But "often" in threaded programming is just a synonym for "always during testing and debugging, right up until the release or big presentation, when it suddenly begins failing". Unless you want to learn the rules for how the Global Interpreter Lock and the built-in types work in CPython, don't rely on this.
Class variables in Python are just that: shared by all instances of the class. You need an instance variable, which you usually define inside __init__. Remove the class-level declarations (and the double leading underscores, they're for name mangling which you don't need here.)

How to call a different instance of the same class in python?

I am new to Python. I am writing a simulation in SimPy to model a production line, which looks like: Machine 1 -> Buffer 1 -> Machine 2 -> Buffer 2 -> and so on..
My question:
I have a class, Machine, of which there are several instances. Suppose that the current instance is Machine 2. The methods of this instance affect the states of Machines 1 and 3. For ex: if Buffer 2 was empty then Machine 3 is idle. But when Machine 2 puts a part in Buffer 2, Machine 3 should be activated.
So, what is the way to refer to different instances of the same class from any given instance of that class?
Also, slightly different question: What is the way to call an object (Buffers 1 and 2, in this case) from the current instance of another class?
Edit: Edited to add more clarity about the system.
It is not common for instances of a class to know about other instances of the class.
I would recommend you keep some sort of collection of instances in your class itself, and use the class to look up the instances:
class Machine(object):
lst = []
def __init__(self, name):
self.name = name
self.id = len(Machine.lst)
Machine.lst.append(self)
m0 = Machine("zero")
m1 = Machine("one")
print(Machine.lst[1].name) # prints "one"
This is a silly example that I cooked up where you put some data into the first machine which then moves it to the first buffer which then moves it to the second machine ...
Each machine just tags the data with it's ID number and passes it along, but you could make the machines do anything. You could even register a function to be called at each machine when it gets data.
class Machine(object):
def __init__(self,number,next=None):
self.number=number
self.register_next(next)
def register_next(self,next):
self.next=next
def do_work(self,data):
#do some work here
newdata='%s %d'%(str(data),self.number)
if(self.next is not None):
self.next.do_work(newdata)
class Buffer(Machine):
def __init__(self,number,next=None):
Machine.__init__(self,number,next=next)
data=None
def do_work(self,data):
if(self.next is not None):
self.next.do_work(data)
else:
self.data=data
#Now, create an assembly line
assembly=[Machine(0)]
for i in xrange(1,20):
machine=not i%2
assembly.append(Machine(i) if machine else Buffer(i))
assembly[-2].register_next(assembly[-1])
assembly[0].do_work('foo')
print (assembly[-1].data)
EDIT
Buffers are now Machines too.
Now that you added more info about the problem, I'll suggest an alternate solution.
After you have created your machines, you might want to link them together.
class Machine(object):
def __init__(self):
self.handoff = None
def input(self, item):
item = do_something(item) # machine processes item
self.handoff(item) # machine hands off item to next machine
m0 = Machine()
m1 = Machine()
m0.handoff = m1.input
m2 = Machine()
m1.handoff = m2.input
def output(item):
print(item)
m2.handoff = output
Now when you call m0.input(item) it will do its processing, then hand off the item to m1, which will do the same and hand off to m2, which will do its processing and call output(). This example shows synchronous processing (an item will go all the way through the chain before the function calls return) but you could also have the .input() method put the item on a queue for processing and then return immediately; in this way, you could make the machines process in parallel.
With this system, the connections between the machines are explicit, and each machine only knows about the one that follows it (the one it needs to know about).
I use the word "threading" to describe the process of linking together objects like this. An item being processed follows the thread from machine to machine before arriving at the output. Its slightly ambiguous because it has nothing to do with threads of execution, so that term isn't perfect.

Printing all instances of a class

With a class in Python, how do I define a function to print every single instance of the class in a format defined in the function?
I see two options in this case:
Garbage collector
import gc
for obj in gc.get_objects():
if isinstance(obj, some_class):
dome_something(obj)
This has the disadvantage of being very slow when you have a lot of objects, but works with types over which you have no control.
Use a mixin and weakrefs
from collections import defaultdict
import weakref
class KeepRefs(object):
__refs__ = defaultdict(list)
def __init__(self):
self.__refs__[self.__class__].append(weakref.ref(self))
#classmethod
def get_instances(cls):
for inst_ref in cls.__refs__[cls]:
inst = inst_ref()
if inst is not None:
yield inst
class X(KeepRefs):
def __init__(self, name):
super(X, self).__init__()
self.name = name
x = X("x")
y = X("y")
for r in X.get_instances():
print r.name
del y
for r in X.get_instances():
print r.name
In this case, all the references get stored as a weak reference in a list. If you create and delete a lot of instances frequently, you should clean up the list of weakrefs after iteration, otherwise there's going to be a lot of cruft.
Another problem in this case is that you have to make sure to call the base class constructor. You could also override __new__, but only the __new__ method of the first base class is used on instantiation. This also works only on types that are under your control.
Edit: The method for printing all instances according to a specific format is left as an exercise, but it's basically just a variation on the for-loops.
You'll want to create a static list on your class, and add a weakref to each instance so the garbage collector can clean up your instances when they're no longer needed.
import weakref
class A:
instances = []
def __init__(self, name=None):
self.__class__.instances.append(weakref.proxy(self))
self.name = name
a1 = A('a1')
a2 = A('a2')
a3 = A('a3')
a4 = A('a4')
for instance in A.instances:
print(instance.name)
You don't need to import ANYTHING! Just use "self". Here's how you do this
class A:
instances = []
def __init__(self):
self.__class__.instances.append(self)
print('\n'.join(A.instances)) #this line was suggested by #anvelascos
It's this simple. No modules or libraries imported
Very nice and useful code, but it has a big problem: list is always bigger and it is never cleaned-up, to test it just add print(len(cls.__refs__[cls])) at the end of the get_instances method.
Here a fix for the get_instances method:
__refs__ = defaultdict(list)
#classmethod
def get_instances(cls):
refs = []
for ref in cls.__refs__[cls]:
instance = ref()
if instance is not None:
refs.append(ref)
yield instance
# print(len(refs))
cls.__refs__[cls] = refs
or alternatively it could be done using WeakSet:
from weakref import WeakSet
__refs__ = defaultdict(WeakSet)
#classmethod
def get_instances(cls):
return cls.__refs__[cls]
Same as almost all other OO languages, keep all instances of the class in a collection of some kind.
You can try this kind of thing.
class MyClassFactory( object ):
theWholeList= []
def __call__( self, *args, **kw ):
x= MyClass( *args, **kw )
self.theWholeList.append( x )
return x
Now you can do this.
object= MyClassFactory( args, ... )
print MyClassFactory.theWholeList
Python doesn't have an equivalent to Smallktalk's #allInstances as the architecture doesn't have this type of central object table (although modern smalltalks don't really work like that either).
As the other poster says, you have to explicitly manage a collection. His suggestion of a factory method that maintains a registry is a perfectly reasonable way to do it. You may wish to do something with weak references so you don't have to explicitly keep track of object disposal.
It's not clear if you need to print all class instances at once or when they're initialized, nor if you're talking about a class you have control over vs a class in a 3rd party library.
In any case, I would solve this by writing a class factory using Python metaclass support. If you don't have control over the class, manually update the __metaclass__ for the class or module you're tracking.
See http://www.onlamp.com/pub/a/python/2003/04/17/metaclasses.html for more information.
In my project, I faced a similar problem and found a simple solution that may also work for you in listing and printing your class instances. The solution worked smoothly in Python version 3.7; gave partial errors in Python version 3.5.
I will copy-paste the relevant code blocks from my recent project.
```
instances = []
class WorkCalendar:
def __init__(self, day, patient, worker):
self.day = day
self.patient = patient
self.worker= worker
def __str__(self):
return f'{self.day} : {self.patient} : {self.worker}'
In Python the __str__ method in the end, determines how the object will be interpreted in its string form. I added the : in between the curly brackets, they are completely my preference for a "Pandas DataFrame" kind of reading. If you apply this small __str__ function, you will not be seeing some machine-readable object type descriptions- which makes no sense for human eyes. After adding this __str__ function you can append your objects to your list and print them as you wish.
appointment= WorkCalendar("01.10.2020", "Jane", "John")
instances.append(appointment)
For printing, your format in __str__ will work as default. But it is also possible to call all attributes separately:
for instance in instances:
print(instance)
print(instance.worker)
print(instance.patient)
For detailed reading, you may look at the source: https://dbader.org/blog/python-repr-vs-str

Categories

Resources