Working with parallel python and classes - python

I was shocked to learn how little tutorials and guides there is to be found on the internet regarding parallel python (PP) and handling classes. I've ran into a problem where I want to initiate a couple of instances of the same class and after that retreive some variables (for instances reading 5 datafiles in parallel, and then retreive their data). Here's a simple piece of code to illustrate my problem:
import pp
class TestClass:
def __init__(self, i):
self.i = i
def doSomething(self):
print "\nI'm being executed!, i = "+str(self.i)
self.j = 2*self.i
print "self.j is supposed to be "+str(self.j)
return self.i
class parallelClass:
def __init__(self):
job_server = pp.Server()
job_list = []
self.instances = [] # for storage of the class objects
for i in xrange(3):
TC = TestClass(i) # initiate a new instance of the TestClass
self.instances.append(TC) # store the instance
job_list.append(job_server.submit(TC.doSomething, (), ())) # add some jobs to the job_list
results = [job() for job in job_list] # execute order 66...
print "\nIf all went well there's a nice bunch of objects in here:"
print self.instances
print "\nAccessing an object's i works ok, but accessing j does not"
print "i = "+str(self.instances[2].i)
print "j = "+str(self.instances[2].j)
if __name__ == '__main__' :
parallelClass() # initiate the program
I've added comments for your convenience. What am I doing wrong here?

You should use callbacks
A callbacks is a function that you pass to the submit call. That function will be called with the result of the job as argument (have a look at the API for more arcane usage).
In your case
Set up a callback:
class TestClass:
def doSomething(self):
j = 2 * self.i
return j # It's REQUIRED that you return j here.
def set_j(self, j):
self.j = j
Add the callback to the job submit call
class parallellClass:
def __init__(self):
#your code...
job_list.append(job_server.submit(TC.doSomething, callback=TC.set_j))
And you're done.
I made some improvements to the code to avoid using self.j in the doSomething call, and only use a local jvariable.
As mentioned in the comments, in pp, you only communicate the result of your job. That's why you have to return this variable, it will be passed to the callback.

Related

Incremented dict does not keep the value

I have the following situation:
class Test:
cities_visited: dict
#staticmethod
def prepare_city_dict(persons):
Test.cities_visited = {}
for i in range(len(persons)):
name = persons[i].surname
Test.cities_visited[name] = Test.create_visit()
#staticmethod
def create_visit():
counter: dict = {"City1": 0, "City2": 0, "City3": 0}
return counter
#staticmethod
def increment_visit(surname: str, key):
counter_visit = Test.cities_visited[surname]
current_value = counter_visit[key]
print(current_value)
counter_visit[key] = current_value + 1
Test.cities_visited[surname] = counter_visit
At start-up I am calling Test.prepare_city_dict, and then I create a thread and do a lock and call other stuff, at some point I try to increment 2 cities:
Test.increment_visit("Dummy", "City1")
Test.increment_visit("Dummy", "City2")
If I am trying to log how many times a city was visited, only the 'City1' is correctly implemented.
I am coming from a different language (which is pretty obvious I think :D), running my code in a docker container on the Windows OS, everything is incremented properly.
Running the same configuration (container) under Linux OS, only the first 'City1' is properly incremented.
I taught it was a race condition, but unfortunately I cannot reproduce it and I cannot figure out what is going on.
+++ UPDATE:
class TestClass:
def main():
Test.prepare_city_dict(persons)
lock = threading.Lock()
thread = threading.Thread(target=TestClass.process_message,
args=(lock, persons,))
thread.start()
def process_message(lock, persons):
lock.acquire()
Test.increment_visit("Dummy", "City1")
..... -> lots of calculations
Test.increment_visit("Dummy", "City2")
lock.release()

Manager / Container class, how to?

I am currently designing a software which needs to manage a certain hardware setup.
The hardware setup is as following :
System - The system contains two identical devices, and has certain functionality relative to the entire system.
Device - Each device contains two identical sub devices, and has certain functionality relative to both sub devices.
Sub device - Each sub device has 4 configurable entities (Controlled via the same hardware command - thus I don't count them as a sub-sub device).
What I want to achieve :
I want to control all configurable entities via the system manager (the entities are counted in a serial way), meaning I would be able to do the following :
system_instance = system_manager_class(some_params)
system_instance.some_func(0) # configure device_manager[0].sub_device_manager[0].entity[0]
system_instance.some_func(5) # configure device_manager[0].sub_device_manager[1].entity[1]
system_instance.some_func(8) # configure device_manager[1].sub_device_manager[1].entity[0]
What I have thought of doing :
I was thinking of creating an abstract class, which contains all sub device functions (with a call to a conversion function) and have the system_manager, device_manager and sub_device_manager inherit it. Thus all classes will have the same function name and I will be able to access them via the system manager.
Something around these lines :
class abs_sub_device():
#staticmethod
def convert_entity(self):
sub_manager = None
sub_entity_num = None
pass
def set_entity_to_2(entity_num):
sub_manager, sub_manager_entity_num = self.convert_entity(entity_num)
sub_manager.some_func(sub_manager_entity_num)
class system_manager(abs_sub_device):
def __init__(self):
self.device_manager_list = [] # Initiliaze device list
self.device_manager_list.append(device_manager())
self.device_manager_list.append(device_manager())
def convert_entity(self, entity_num):
relevant_device_manager = self.device_manager_list[entity_num // 4]
relevant_entity = entity_num % 4
return relevant_device_manage, relevant_entity
class device_manager(abs_sub_device):
def __init__(self):
self.sub_device_manager_list = [] # Initiliaze sub device list
self.sub_device_manager_list.append(sub_device_manager())
self.sub_device_manager_list.append(sub_device_manager())
def convert_entity(self, entity_num):
relevant_sub_device_manager = self.sub_device_manager_list[entity_num // 4]
relevant_entity = entity_num % 4
return relevant_sub_device_manager, relevant_entity
class sub_device_manager(abs_sub_device):
def __init__(self):
self.entity_list = [0] * 4
def set_entity_to_2(self, entity_num):
self.entity_list[entity_num] = 2
The code is for generic understanding of my design, not for actual functionality.
The problem :
It seems to me that the system I am trying to design is really generic and that there must be a built-in python way to do this, or that my entire object oriented look at it is wrong.
I would really like to know if some one has a better way of doing this.
After much thinking, I think I found a pretty generic way to solve the issue, using a combination of decorators, inheritance and dynamic function creation.
The main idea is as following :
1) Each layer dynamically creates all sub layer relevant functions for it self (Inside the init function, using a decorator on the init function)
2) Each function created dynamically converts the entity value according to a convert function (which is a static function of the abs_container_class), and calls the lowers layer function with the same name (see make_convert_function_method).
3) This basically causes all sub layer function to be implemented on the higher level with zero code duplication.
def get_relevant_class_method_list(class_instance):
method_list = [func for func in dir(class_instance) if callable(getattr(class_instance, func)) and not func.startswith("__") and not func.startswith("_")]
return method_list
def make_convert_function_method(name):
def _method(self, entity_num, *args):
sub_manager, sub_manager_entity_num = self._convert_entity(entity_num)
function_to_call = getattr(sub_manager, name)
function_to_call(sub_manager_entity_num, *args)
return _method
def container_class_init_decorator(function_object):
def new_init_function(self, *args):
# Call the init function :
function_object(self, *args)
# Get all relevant methods (Of one sub class is enough)
method_list = get_relevant_class_method_list(self.container_list[0])
# Dynamically create all sub layer functions :
for method_name in method_list:
_method = make_convert_function_method(method_name)
setattr(type(self), method_name, _method)
return new_init_function
class abs_container_class():
#staticmethod
def _convert_entity(self):
sub_manager = None
sub_entity_num = None
pass
class system_manager(abs_container_class):
#container_class_init_decorator
def __init__(self):
self.device_manager_list = [] # Initiliaze device list
self.device_manager_list.append(device_manager())
self.device_manager_list.append(device_manager())
self.container_list = self.device_manager_list
def _convert_entity(self, entity_num):
relevant_device_manager = self.device_manager_list[entity_num // 4]
relevant_entity = entity_num % 4
return relevant_device_manager, relevant_entity
class device_manager(abs_container_class):
#container_class_init_decorator
def __init__(self):
self.sub_device_manager_list = [] # Initiliaze sub device list
self.sub_device_manager_list.append(sub_device_manager())
self.sub_device_manager_list.append(sub_device_manager())
self.container_list = self.sub_device_manager_list
def _convert_entity(self, entity_num):
relevant_sub_device_manager = self.sub_device_manager_list[entity_num // 4]
relevant_entity = entity_num % 4
return relevant_sub_device_manager, relevant_entity
class sub_device_manager():
def __init__(self):
self.entity_list = [0] * 4
def set_entity_to_value(self, entity_num, required_value):
self.entity_list[entity_num] = required_value
print("I set the entity to : {}".format(required_value))
# This is used for auto completion purposes (Using pep convention)
class auto_complete_class(system_manager, device_manager, sub_device_manager):
pass
system_instance = system_manager() # type: auto_complete_class
system_instance.set_entity_to_value(0, 3)
There is still a little issue with this solution, auto-completion would not work since the highest level class has almost no static implemented function.
In order to solve this I cheated a bit, I created an empty class which inherited from all layers and stated to the IDE using pep convention that it is the type of the instance being created (# type: auto_complete_class).
Does this solve your Problem?
class EndDevice:
def __init__(self, entities_num):
self.entities = list(range(entities_num))
#property
def count_entities(self):
return len(self.entities)
def get_entity(self, i):
return str(i)
class Device:
def __init__(self, sub_devices):
self.sub_devices = sub_devices
#property
def count_entities(self):
return sum(sd.count_entities for sd in self.sub_devices)
def get_entity(self, i):
c = 0
for index, sd in enumerate(self.sub_devices):
if c <= i < sd.count_entities + c:
return str(index) + " " + sd.get_entity(i - c)
c += sd.count_entities
raise IndexError(i)
SystemManager = Device # Are the exact same. This also means you can stack that infinite
sub_devices1 = [EndDevice(4) for _ in range(2)]
sub_devices2 = [EndDevice(4) for _ in range(2)]
system_manager = SystemManager([Device(sub_devices1), Device(sub_devices2)])
print(system_manager.get_entity(0))
print(system_manager.get_entity(5))
print(system_manager.get_entity(15))
I can't think of a better way to do this than OOP, but inheritance will only give you one set of low-level functions for the system manager, so it wil be like having one device manager and one sub-device manager. A better thing to do will be, a bit like tkinter widgets, to have one system manager and initialise all the other managers like children in a tree, so:
system = SystemManager()
device1 = DeviceManager(system)
subDevice1 = SubDeviceManager(device1)
device2 = DeviceManager(system)
subDevice2 = SubDeviceManager(device2)
#to execute some_func on subDevice1
system.some_func(0, 0, *someParams)
We can do this by keeping a list of 'children' of the higher-level managers and having functions which reference the children.
class SystemManager:
def __init__(self):
self.children = []
def some_func(self, child, *params):
self.children[child].some_func(*params)
class DeviceManager:
def __init__(self, parent):
parent.children.append(self)
self.children = []
def some_func(self, child, *params):
self.children[child].some_func(*params)
class SubDeviceManager:
def __init__(self, parent):
parent.children.append(self)
#this may or may not have sub-objects, if it does we need to make it its own children list.
def some_func(self, *params):
#do some important stuff
Unfortunately, this does mean that if we want to call a function of a sub-device manager from the system manager without having lots of dots, we will have to define it again again in the system manager. What you can do instead is use the built-in exec() function, which will take in a string input and run it using the Python interpreter:
class SystemManager:
...
def execute(self, child, function, *args):
exec("self.children[child]."+function+"(*args)")
(and keep the device manager the same)
You would then write in the main program:
system.execute(0, "some_func", 0, *someArgs)
Which would call
device1.some_func(0, someArgs)
Here's what I'm thinking:
SystemManager().apply_to_entity(entity_num=7, lambda e: e.value = 2)
class EntitySuperManagerMixin():
"""Mixin to handle logic for managing entity managers."""
def __init__(self, *args, **kwargs):
super().__init__(*args, **kwargs) # Supports any kind of __init__ call.
self._entity_manager_list = []
def apply_to_entity(self, entity_num, action):
relevant_entity_manager = self._entity_manager_list[index // 4]
relevant_entity_num = index % 4
return relevant_entity_manager.apply_to_entity(
relevant_entity_num, action)
class SystemManager(EntitySuperManagerMixin):
def __init__(self):
super().__init__()
# An alias for _entity_manager_list to improve readability.
self.device_manager_list = self._entity_manager_list
self.device_manager_list.extend(DeviceManager() for _ in range(4))
class DeviceManager(EntitySuperManagerMixin):
def __init__(self):
super().__init__()
# An alias for _entity_manager_list to improve readability.
self.sub_device_manager_list = self._entity_manager_list
self.sub_device_manager_list.extend(SubDeviceManager() for _ in range(4))
class SubDeviceManager():
"""Manages entities, not entity managers, thus doesn't inherit the mixin."""
def __init__(self):
# Entities need to be classes for this idea to work.
self._entity_list = [Entity() for _ in range(4)]
def apply_to_entity(self, entity_num, action):
return action(self._entity_list[entity_num])
class Entity():
def __init__(self, initial_value=0):
self.value = initial_value
With this structure:
Entity-specific functions can stay bound to the Entity class (where it belongs).
Manager-specific code needs to be updated in two places: EntitySuperManagerMixin and the lowest level manager (which would need custom behavior anyway since it deals with the actual entities, not other managers).
The way i see it if you want to dynamically configure different part of system you need some sort of addressing so if you input an ID or address with some parameter the system will know with address on which sub sistem you are talking about and then configure that system with parameter.
OOP is quite ok for that and then you can easily manipulate such data via bitwise operators.
So basic addressing is done via binary system , so to do that in python you need first to implement an address static attribute to your class with perhaps some basic further detailing if system grows.
Basic implementation of addres systems is as follows:
bin(71)
1010 1011
and if we divide it into nibbles
1010 - device manager 10
1011 - sub device manager 11
So in this example we have system of 15 device managers and 15 sub device menagers, and every device and sub device manager has its integer address.So let's say you want to access device manager no10 with sub device manager no11. You would need their address which is in binary 71 and you would go with:
system.config(address, parameter )
Where system.config funcion would look like this:
def config(self,address, parameter):
device_manager = (address&0xF0)>>4 #10
sub_device_manager = address&0xf # 11
if device_manager not in range(self.devices): raise LookupError("device manager not found")
if sub_device_manager not in range(self.devices[device_manager].device): raise LookupError("sub device manager not found")
self.devices[device_manager].device[sub_device_manager].implement(parameter)
In layman you would tell system that sub_device 11 from device 10 needs configuration with this parameter.
So how would this setup look in python inheritance class of some base class of system that could be then composited/inherited to different classes:
class systems(object):
parent = None #global parent element, defaults to None well for simplicity
def __init__(self):
self.addrMASK = 0xf # address mask for that nibble
self.addr = 0x1 # default address of that element
self.devices = [] # list of instances of device
self.data = { #some arbitrary data
"param1":"param_val",
"param2":"param_val",
"param3":"param_val",
}
def addSubSystem(self,sub_system): # connects elements to eachother
# checks for valiability
if not isinstance(sub_system,systems):
raise TypeError("defined input is not a system type") # to prevent passing an integer or something
# appends a device to system data
self.devices.append(sub_system)
# search parent variables from sub device manager to system
obj = self
while 1:
if obj.parent is not None:
obj.parent.addrMASK<<=4 #bitshifts 4 bits
obj.parent.addr <<=4 #bitshifts 4 bits
obj = obj.parent
else:break
#self management , i am lazy guy so i added this part so i wouldn't have to reset addresses manualy
self.addrMASK <<=4 #bitshifts 4 bits
self.addr <<=4 #bitshifts 4 bits
# this element is added so the obj address is coresponding to place in list, this could be done more eloquently but i didn't know what are your limitations
if not self.devices:
self.devices[ len(self.devices)-1 ].addr +=1
self.devices[ len(self.devices)-1 ].parent = self
# helpful for checking data ... gives the address of system
def __repr__(self):
return "system at {0:X}, {1:0X}".format(self.addr,self.addrMASK)
# extra helpful lists data as well
def __str__(self):
data = [ '{} : {}\n'.format(k,v) for k,v in self.data.items() ]
return " ".join([ repr(self),'\n',*data ])
#checking for data, skips looping over sub systems
def __contains__(self,system_index):
return system_index-1 in range(len(self.data))
# applying parameter change -- just an example
def apply(self,par_dict):
if not isinstance(par_dict,dict):
raise TypeError("parameter must be a dict type")
if any( key in self.data.keys() for key in par_dict.keys() ):
for k,v in par_dict.items():
if k in self.data.keys():
self.data[k]=v
else:pass
else:pass
# implementing parameters trough addresses
def implement(self,address,parameter_dictionary):
if address&self.addrMASK==self.addr:
if address-self.addr!=0:
item = (address-self.addr)>>4
self.devices[item-1].implement( address-self.addr,parameter_dictionary )
else:
self.apply(parameter_dictionary)
a = systems()
b = systems()
a.addSubSystem(b)
c = systems()
b.addSubSystem(c)
print('a')
print(a)
print('')
print('b')
print(b)
print('')
print('c')
print(c)
print('')
a.implement(0x100,{"param1":"a"})
a.implement(0x110,{"param1":"b"})
a.implement(0x111,{"param1":"c"})
print('a')
print(a)
print('')
print('b')
print(b)
print('')
print('c')
print(c)
print('')

Threading, multiprocessing shared memory and asyncio

I am having trouble implementing the following scheme :
class A:
def __init__(self):
self.content = []
self.current_len = 0
def __len__(self):
return self.current_len
def update(self, new_content):
self.content.append(new_content)
self.current_len += 1
class B:
def __init__(self, id):
self.id = id
And I also have these 2 functions that will be called later in the main :
async def do_stuff(first_var, second_var):
""" this function is ideally called from the main in another
process. Also, first_var and second_var are not modified so it
would be nice if they could be given by reference without
having to copy them """
### used for first call
yield None
while len(first_var) < CERTAIN_NUMBER:
time.sleep(10)
while True:
## do stuff
if condition_met:
yield new_second_var ## which is a new instance of B
## continue doing stuff
def do_other_stuff(first_var, second_var):
while True:
queue = multiprocessing.JoinableQueue()
results = multiprocessing.Queue()
### do stuff
first_var.update(results)
The main looks like this at the moment :
first_var = A()
second_var = B()
while True:
async for new_second_var in do_stuff(first_var, second_var):
if new_second_var:
## stop the do_other_stuff that is currently running
## to re-launch it with the updated new_var
do_other_stuff(first_var, new_second_var)
else: ## used for the first call
do_other_stuff(first_var, second_var)
Here are my questions :
Is there a better solution to make this scheme work?
How can I implement the "stopping" part since there is a while True loop that fills first_var by reference?
Will the instance of A (first_var) be passed by reference to do_stuff if first_var doesn't get modified inside it?
Is it even possible to have an asynchronous generator in another process?
Is it even possible at all?
This is using Python 3.6 for the async generators.
I hope this is somewhat clear! Thanks a lot!

Python Multiprocessing with shared data source and multiple class instances

My program needs to spawn multiple instances of a class, each processing data that is coming from a streaming data source.
For example:
parameters = [1, 2, 3]
class FakeStreamingApi:
def __init__(self):
pass
def data(self):
return 42
pass
class DoStuff:
def __init__(self, parameter):
self.parameter = parameter
def run(self):
data = streaming_api.data()
output = self.parameter ** 2 + data # Some CPU intensive task
print output
streaming_api = FakeStreamingApi()
# Here's how this would work with no multiprocessing
instance_1 = DoStuff(parameters[0])
instance_1.run()
Once the instances are running they don't need to interact with each other, they just have to get the data as it comes in. (and print error messages, etc)
I am totally at a loss how to make this work with multiprocessing, since I first have to create a new instance of the class DoStuff, and then have it run.
This is definitely not the way to do it:
# Let's try multiprocessing
import multiprocessing
for parameter in parameters:
processes = [ multiprocessing.Process(target = DoStuff, args = (parameter)) ]
# Hmm, this doesn't work...
We could try defining a function to spawn classes, but that seems ugly:
import multiprocessing
def spawn_classes(parameter):
instance = DoStuff(parameter)
instance.run()
for parameter in parameters:
processes = [ multiprocessing.Process(target = spawn_classes, args = (parameter,)) ]
# Can't tell if it works -- no output on screen?
Plus, I don't want to have 3 different copies of the API interface class running, I want that data to be shared between all the processes... and as far as I can tell, multiprocessing creates copies of everything for each new process.
Ideas?
Edit:
I think I may have got it... is there anything wrong with this?
import multiprocessing
parameters = [1, 2, 3]
class FakeStreamingApi:
def __init__(self):
pass
def data(self):
return 42
pass
class Worker(multiprocessing.Process):
def __init__(self, parameter):
super(Worker, self).__init__()
self.parameter = parameter
def run(self):
data = streaming_api.data()
output = self.parameter ** 2 + data # Some CPU intensive task
print output
streaming_api = FakeStreamingApi()
if __name__ == '__main__':
jobs = []
for parameter in parameters:
p = Worker(parameter)
jobs.append(p)
p.start()
for j in jobs:
j.join()
I came to the conclusion that it would be necessary to use multiprocessing.Queues to solve this. The data source (the streaming API) needs to pass copies of the data to all the different processes, so they can consume it.
There's another way to solve this using the multiprocessing.Manager to create a shared dict, but I didn't explore it further, as it looks fairly inefficient and cannot propagate changes to inner values (e.g if you have a dict of lists, changes to the inner lists will not propagate).

Python Multiprocessing.Pool in a class

I am trying to change my code to a more object oriented format. In doing so I am lost on how to 'visualize' what is happening with multiprocessing and how to solve it. On the one hand, the class should track changes to local variables across functions, but on the other I believe multiprocessing creates a copy of the code which the original instance would not have access to. I need to figure out a way to manipulate classes, within a class, using multiprocessing, and have the parent class retain all manipulated values in the nested classes.
A simple version (OLD CODE):
function runMultProc():
...
dictReports = {}
listReports = ['reportName1.txt', 'reportName2.txt']
tasks = []
pool = multiprocessing.Pool()
for report in listReports:
if report not in dictReports:
dictReports[today][report] = {}
tasks.append(pool.apply_async(worker, args=([report, dictReports[today][report]])))
else:
continue
for task in tasks:
report, currentReportDict = task.get()
dictReports[report] = currentFileDict
function worker(report, currentReportDict):
<Manipulate_reports_dict>
return report, currentReportDict
NEW CODE:
class Transfer():
def __init__(self):
self.masterReportDictionary[<todays_date>] = [reportObj1, reportObj2]
def processReports(self):
self.pool = multiprocessing.Pool()
self.pool.map(processWorker, self.masterReportDictionary[<todays_date>])
self.pool.close()
self.pool.join()
def processWorker(self, report):
# **process and manipulate report, currently no return**
report.name = 'foo'
report.path = '/path/to/report'
class Report():
def init(self):
self.name = ''
self.path = ''
self.errors = {}
self.runTime = ''
self.timeProcessed = ''
self.hashes = {}
self.attempts = 0
I don't think this code does what I need it to do, which is to have it process the list of reports in parallel AND, as processWorker manipulates each report class object, store those results. As I am fairly new to this I was hoping someone could help.
The big difference between the two is that the first one build a dictionary and returned it. The second model shouldn't really be returning anything, I just need for the classes to finish being processed and they should have relevant information within them.
Thanks!

Categories

Resources