Parallelize tree creation with dask

Parallelize tree creation with dask - python

I need help about a problem that I'm pretty sure dask can solve.
But I don't know how to tackle it.
I need to construct a tree recursively.
For each node if a criterion is met a computation (compute_val) is done else 2 new childs are created. The same treament is performed on the childs (build).
Then if all the childs of node had performed a computation we can proceed to a merge (merge). The merge can perform a fusion of the childs (if they both meet a criterion) or nothing.
For the moment I was able to parallelize only the first level and I don't know which tools of dask I should use to be more effective.
This is a simplified MRE sequential of what I want to achieve:
import numpy as np
import time
class Node:
def __init__(self, level):
self.level = level
self.val = None
def merge(node, childs):
values = [child.val for child in childs]
if all(values) and sum(values)<0.1:
node.val = np.mean(values)
else:
node.childs = childs
return node
def compute_val():
time.sleep(0.1)
return np.random.rand(1)
def build(node):
print(node.level)
if (np.random.rand(1) < 0.1 and node.level>1) or node.level>5:
node.val = compute_val()
else:
childs = [build(Node(level=node.level+1)) for _ in range(2)]
node = merge(node, childs)
return node
tree = build(Node(level=0))

As I understand, the way you tackle recursion (or any dynamic computation) is to create tasks within a task.
I was experimenting with something similar, so below is my 5 minute illustrative solution. You'd have to optimise it according to characteristics of the algorithm.
Keep in mind that tasks add overhead, so you'd want to chunk the computations for optimal results.
Relevant doc:
https://distributed.dask.org/en/latest/task-launch.html
Api reference:
https://distributed.dask.org/en/latest/api.html#distributed.worker_client
https://distributed.dask.org/en/latest/api.html#distributed.Client.gather
https://distributed.dask.org/en/latest/api.html#distributed.Client.submit
import numpy as np
import time
from dask.distributed import Client, worker_client
# Create a dask client
# For convenience, I'm creating a localcluster.
client = Client(threads_per_worker=1, n_workers=8)
client
class Node:
def __init__(self, level):
self.level = level
self.val = None
self.childs = None # This was missing
def merge(node, childs):
values = [child.val for child in childs]
if all(values) and sum(values)<0.1:
node.val = np.mean(values)
else:
node.childs = childs
return node
def compute_val():
time.sleep(0.1) # Is this required.
return np.random.rand(1)
def build(node):
print(node.level)
if (np.random.rand(1) < 0.1 and node.level>1) or node.level>5:
node.val = compute_val()
else:
with worker_client() as client:
child_futures = [client.submit(build, Node(level=node.level+1)) for _ in range(2)]
childs = client.gather(child_futures)
node = merge(node, childs)
return node
tree_future = client.submit(build, Node(level=0))
tree = tree_future.result()

Related

How to write a thread-safe program for an n-ary tree to be consistent in python

I had an interview question last time about lockable trees in which a node of a tree can be locked if its ancestors are not locked and also its children are not locked.
The problem I have now is that I have to make this program thread-safe.
I used so many concepts but the interviewer gave me a hint that I can solve this problem using some extra variables.
My code is given below. I couldn't find any relevant articles related to this problem on the internet. I hope you have something to help me.
My code is given below.
import threading
lock = threading.Lock()
class Lockable(threading.Thread):
def __init__(self,parent):
self.parent = parent
self.locked = False
threading.Thread.__init__(self)
self.lockedDescendents = 0
def mutate_lock(self,delim):
with lock:
self.locked = delim
def mutate_locked_descendents(self,delim,ancestor):
with lock:
ancestor.lockedDescendents+=delim
def lock(self):
if(self.locked):
return True
if(self.lockedDescendents >0):
return False
ancestor = self.parent
while(ancestor):
with lock:
if(ancestor.locked):
return False
ancestor = ancestor.parent
while(ancestor):
self.mutate_locked_descendents(1,ancestor)
ancestor = ancestor.parent
self.mutate_lock(True)
return True
def unlock(self):
if(not self.locked):
return False
ancestor = self.parent
while(ancestor):
self.mutate_locked_descendents(-1,ancestor)
ancestor = ancestor.parent
self.mutate_lock(False)
return True
c = Lockable(None)
a = Lockable(c)
b = Lockable(c)
d = Lockable(a)
e = Lockable(b)
print(a.lock())
print(b.lock())
print(a.unlock())
print(e.lock())
print(d.lock())
The tree needs to be consistent and if two threads collide we can kill both the threads or kill one of the threads.
both threads can fail together but both of them must not concurrently pass together at any cost.

Trying to save node or reference to node in a list

I have a tree class in which the class gets initialized with a data, left, and right attributes.
in the same class I have a "save" method.
I am using a list as a queue.
I am attempting to create a "save" method which takes only one argument "data".
The purpose of this save method is to dequeue from my list, check that node to see if its empty and if it is then it saves my data there. Otherwise it enqueues the 2 children of that node into the list.
The purpose of this is to save data in level order into the tree.
Because the class gets initialized there is always at least 1 element in the tree which is the root node.
The issue i keep running into is that whenever i append the self.data (the root node, not the data im currently trying to add) into my list at the beginning of the save method it only saves the data there.
and obviously when I then try to append the left and right child of this int i get an error because the int has no left or right attributes.
I am wondering how to save the node in the list instead of the data at the node.
class Tree():
aqueue = []
def __init__(self, item):
self.item = item
self.leftchild = None
self.rightchild = None
self.aqueue.append(self.item)
def add(self, newitem):
temp = self.myqueue.pop(0)
if temp is None:
temp = Tree(newitem)
else:
self.aqueue.append(temp.leftchild)
self.aqueue.append(temp.rightcild)
temp.add(newitem)
self.aqueue.clear() #this is meant to clear queue of all nodes after the recursions are complete
self.aqueue.append(self.item) #this is meant to return the root node to the queue so that it is the only item for next time

There are a couple of obvious issues with your code: both the if and else branch return, so the code after will never run, temp == newitem is an equality expression, but even if it was an assignment it wouldn't do anything:
def add(self, newitem):
temp = self.myqueue.pop(0)
if temp == None: # should use temp is None
temp == newitem # temp = newitem still wouldn't do anything
return True
else:
self.aqueue.append(temp.leftchild)
self.aqueue.append(temp.rightcild)
return temp.add(newitem)
# you will never get here, since both branches of the if returns
self.aqueue.clear() # delete everything in the list..?
self.aqueue.append(self.item)

Confused about the size() function in python

I'm going through the code to write a circular queue in python
class CircularQueue:
# constructor for the class
# taking input for the size of the Circular queue
# from user
def __init__(self, maxSize):
self.queue = list()
# user input value for maxSize
self.maxSize = maxSize
self.head = 0
self.tail = 0
# add element to the queue
def enqueue(self, data):
# if queue is full
if self.size() == (self.maxSize - 1):
return("Queue is full!")
else:
# add element to the queue
self.queue.append(data)
# increment the tail pointer
self.tail = (self.tail+1) % self.maxSize
return True
and the part that confuses me is the self.size() in the method "enqueue"
I looked through the python docs and don't see any size() function, only references to size() in numpy.
Normally you'd want to call len() for the size of a list, but I know you can't do self.len()
any clarity/explanation of the syntax and logic behind writing something like this would be helpful!

You need to define your own size() method and just return the number of items currently held in the queue.

Manager / Container class, how to?

I am currently designing a software which needs to manage a certain hardware setup.
The hardware setup is as following :
System - The system contains two identical devices, and has certain functionality relative to the entire system.
Device - Each device contains two identical sub devices, and has certain functionality relative to both sub devices.
Sub device - Each sub device has 4 configurable entities (Controlled via the same hardware command - thus I don't count them as a sub-sub device).
What I want to achieve :
I want to control all configurable entities via the system manager (the entities are counted in a serial way), meaning I would be able to do the following :
system_instance = system_manager_class(some_params)
system_instance.some_func(0) # configure device_manager[0].sub_device_manager[0].entity[0]
system_instance.some_func(5) # configure device_manager[0].sub_device_manager[1].entity[1]
system_instance.some_func(8) # configure device_manager[1].sub_device_manager[1].entity[0]
What I have thought of doing :
I was thinking of creating an abstract class, which contains all sub device functions (with a call to a conversion function) and have the system_manager, device_manager and sub_device_manager inherit it. Thus all classes will have the same function name and I will be able to access them via the system manager.
Something around these lines :
class abs_sub_device():
#staticmethod
def convert_entity(self):
sub_manager = None
sub_entity_num = None
pass
def set_entity_to_2(entity_num):
sub_manager, sub_manager_entity_num = self.convert_entity(entity_num)
sub_manager.some_func(sub_manager_entity_num)
class system_manager(abs_sub_device):
def __init__(self):
self.device_manager_list = [] # Initiliaze device list
self.device_manager_list.append(device_manager())
self.device_manager_list.append(device_manager())
def convert_entity(self, entity_num):
relevant_device_manager = self.device_manager_list[entity_num // 4]
relevant_entity = entity_num % 4
return relevant_device_manage, relevant_entity
class device_manager(abs_sub_device):
def __init__(self):
self.sub_device_manager_list = [] # Initiliaze sub device list
self.sub_device_manager_list.append(sub_device_manager())
self.sub_device_manager_list.append(sub_device_manager())
def convert_entity(self, entity_num):
relevant_sub_device_manager = self.sub_device_manager_list[entity_num // 4]
relevant_entity = entity_num % 4
return relevant_sub_device_manager, relevant_entity
class sub_device_manager(abs_sub_device):
def __init__(self):
self.entity_list = [0] * 4
def set_entity_to_2(self, entity_num):
self.entity_list[entity_num] = 2
The code is for generic understanding of my design, not for actual functionality.
The problem :
It seems to me that the system I am trying to design is really generic and that there must be a built-in python way to do this, or that my entire object oriented look at it is wrong.
I would really like to know if some one has a better way of doing this.

After much thinking, I think I found a pretty generic way to solve the issue, using a combination of decorators, inheritance and dynamic function creation.
The main idea is as following :
1) Each layer dynamically creates all sub layer relevant functions for it self (Inside the init function, using a decorator on the init function)
2) Each function created dynamically converts the entity value according to a convert function (which is a static function of the abs_container_class), and calls the lowers layer function with the same name (see make_convert_function_method).
3) This basically causes all sub layer function to be implemented on the higher level with zero code duplication.
def get_relevant_class_method_list(class_instance):
method_list = [func for func in dir(class_instance) if callable(getattr(class_instance, func)) and not func.startswith("__") and not func.startswith("_")]
return method_list
def make_convert_function_method(name):
def _method(self, entity_num, *args):
sub_manager, sub_manager_entity_num = self._convert_entity(entity_num)
function_to_call = getattr(sub_manager, name)
function_to_call(sub_manager_entity_num, *args)
return _method
def container_class_init_decorator(function_object):
def new_init_function(self, *args):
# Call the init function :
function_object(self, *args)
# Get all relevant methods (Of one sub class is enough)
method_list = get_relevant_class_method_list(self.container_list[0])
# Dynamically create all sub layer functions :
for method_name in method_list:
_method = make_convert_function_method(method_name)
setattr(type(self), method_name, _method)
return new_init_function
class abs_container_class():
#staticmethod
def _convert_entity(self):
sub_manager = None
sub_entity_num = None
pass
class system_manager(abs_container_class):
#container_class_init_decorator
def __init__(self):
self.device_manager_list = [] # Initiliaze device list
self.device_manager_list.append(device_manager())
self.device_manager_list.append(device_manager())
self.container_list = self.device_manager_list
def _convert_entity(self, entity_num):
relevant_device_manager = self.device_manager_list[entity_num // 4]
relevant_entity = entity_num % 4
return relevant_device_manager, relevant_entity
class device_manager(abs_container_class):
#container_class_init_decorator
def __init__(self):
self.sub_device_manager_list = [] # Initiliaze sub device list
self.sub_device_manager_list.append(sub_device_manager())
self.sub_device_manager_list.append(sub_device_manager())
self.container_list = self.sub_device_manager_list
def _convert_entity(self, entity_num):
relevant_sub_device_manager = self.sub_device_manager_list[entity_num // 4]
relevant_entity = entity_num % 4
return relevant_sub_device_manager, relevant_entity
class sub_device_manager():
def __init__(self):
self.entity_list = [0] * 4
def set_entity_to_value(self, entity_num, required_value):
self.entity_list[entity_num] = required_value
print("I set the entity to : {}".format(required_value))
# This is used for auto completion purposes (Using pep convention)
class auto_complete_class(system_manager, device_manager, sub_device_manager):
pass
system_instance = system_manager() # type: auto_complete_class
system_instance.set_entity_to_value(0, 3)
There is still a little issue with this solution, auto-completion would not work since the highest level class has almost no static implemented function.
In order to solve this I cheated a bit, I created an empty class which inherited from all layers and stated to the IDE using pep convention that it is the type of the instance being created (# type: auto_complete_class).

Does this solve your Problem?
class EndDevice:
def __init__(self, entities_num):
self.entities = list(range(entities_num))
#property
def count_entities(self):
return len(self.entities)
def get_entity(self, i):
return str(i)
class Device:
def __init__(self, sub_devices):
self.sub_devices = sub_devices
#property
def count_entities(self):
return sum(sd.count_entities for sd in self.sub_devices)
def get_entity(self, i):
c = 0
for index, sd in enumerate(self.sub_devices):
if c <= i < sd.count_entities + c:
return str(index) + " " + sd.get_entity(i - c)
c += sd.count_entities
raise IndexError(i)
SystemManager = Device # Are the exact same. This also means you can stack that infinite
sub_devices1 = [EndDevice(4) for _ in range(2)]
sub_devices2 = [EndDevice(4) for _ in range(2)]
system_manager = SystemManager([Device(sub_devices1), Device(sub_devices2)])
print(system_manager.get_entity(0))
print(system_manager.get_entity(5))
print(system_manager.get_entity(15))

I can't think of a better way to do this than OOP, but inheritance will only give you one set of low-level functions for the system manager, so it wil be like having one device manager and one sub-device manager. A better thing to do will be, a bit like tkinter widgets, to have one system manager and initialise all the other managers like children in a tree, so:
system = SystemManager()
device1 = DeviceManager(system)
subDevice1 = SubDeviceManager(device1)
device2 = DeviceManager(system)
subDevice2 = SubDeviceManager(device2)
#to execute some_func on subDevice1
system.some_func(0, 0, *someParams)
We can do this by keeping a list of 'children' of the higher-level managers and having functions which reference the children.
class SystemManager:
def __init__(self):
self.children = []
def some_func(self, child, *params):
self.children[child].some_func(*params)
class DeviceManager:
def __init__(self, parent):
parent.children.append(self)
self.children = []
def some_func(self, child, *params):
self.children[child].some_func(*params)
class SubDeviceManager:
def __init__(self, parent):
parent.children.append(self)
#this may or may not have sub-objects, if it does we need to make it its own children list.
def some_func(self, *params):
#do some important stuff
Unfortunately, this does mean that if we want to call a function of a sub-device manager from the system manager without having lots of dots, we will have to define it again again in the system manager. What you can do instead is use the built-in exec() function, which will take in a string input and run it using the Python interpreter:
class SystemManager:
...
def execute(self, child, function, *args):
exec("self.children[child]."+function+"(*args)")
(and keep the device manager the same)
You would then write in the main program:
system.execute(0, "some_func", 0, *someArgs)
Which would call
device1.some_func(0, someArgs)

Here's what I'm thinking:
SystemManager().apply_to_entity(entity_num=7, lambda e: e.value = 2)
class EntitySuperManagerMixin():
"""Mixin to handle logic for managing entity managers."""
def __init__(self, *args, **kwargs):
super().__init__(*args, **kwargs) # Supports any kind of __init__ call.
self._entity_manager_list = []
def apply_to_entity(self, entity_num, action):
relevant_entity_manager = self._entity_manager_list[index // 4]
relevant_entity_num = index % 4
return relevant_entity_manager.apply_to_entity(
relevant_entity_num, action)
class SystemManager(EntitySuperManagerMixin):
def __init__(self):
super().__init__()
# An alias for _entity_manager_list to improve readability.
self.device_manager_list = self._entity_manager_list
self.device_manager_list.extend(DeviceManager() for _ in range(4))
class DeviceManager(EntitySuperManagerMixin):
def __init__(self):
super().__init__()
# An alias for _entity_manager_list to improve readability.
self.sub_device_manager_list = self._entity_manager_list
self.sub_device_manager_list.extend(SubDeviceManager() for _ in range(4))
class SubDeviceManager():
"""Manages entities, not entity managers, thus doesn't inherit the mixin."""
def __init__(self):
# Entities need to be classes for this idea to work.
self._entity_list = [Entity() for _ in range(4)]
def apply_to_entity(self, entity_num, action):
return action(self._entity_list[entity_num])
class Entity():
def __init__(self, initial_value=0):
self.value = initial_value
With this structure:
Entity-specific functions can stay bound to the Entity class (where it belongs).
Manager-specific code needs to be updated in two places: EntitySuperManagerMixin and the lowest level manager (which would need custom behavior anyway since it deals with the actual entities, not other managers).

The way i see it if you want to dynamically configure different part of system you need some sort of addressing so if you input an ID or address with some parameter the system will know with address on which sub sistem you are talking about and then configure that system with parameter.
OOP is quite ok for that and then you can easily manipulate such data via bitwise operators.
So basic addressing is done via binary system , so to do that in python you need first to implement an address static attribute to your class with perhaps some basic further detailing if system grows.
Basic implementation of addres systems is as follows:
bin(71)
1010 1011
and if we divide it into nibbles
1010 - device manager 10
1011 - sub device manager 11
So in this example we have system of 15 device managers and 15 sub device menagers, and every device and sub device manager has its integer address.So let's say you want to access device manager no10 with sub device manager no11. You would need their address which is in binary 71 and you would go with:
system.config(address, parameter )
Where system.config funcion would look like this:
def config(self,address, parameter):
device_manager = (address&0xF0)>>4 #10
sub_device_manager = address&0xf # 11
if device_manager not in range(self.devices): raise LookupError("device manager not found")
if sub_device_manager not in range(self.devices[device_manager].device): raise LookupError("sub device manager not found")
self.devices[device_manager].device[sub_device_manager].implement(parameter)
In layman you would tell system that sub_device 11 from device 10 needs configuration with this parameter.
So how would this setup look in python inheritance class of some base class of system that could be then composited/inherited to different classes:
class systems(object):
parent = None #global parent element, defaults to None well for simplicity
def __init__(self):
self.addrMASK = 0xf # address mask for that nibble
self.addr = 0x1 # default address of that element
self.devices = [] # list of instances of device
self.data = { #some arbitrary data
"param1":"param_val",
"param2":"param_val",
"param3":"param_val",
}
def addSubSystem(self,sub_system): # connects elements to eachother
# checks for valiability
if not isinstance(sub_system,systems):
raise TypeError("defined input is not a system type") # to prevent passing an integer or something
# appends a device to system data
self.devices.append(sub_system)
# search parent variables from sub device manager to system
obj = self
while 1:
if obj.parent is not None:
obj.parent.addrMASK<<=4 #bitshifts 4 bits
obj.parent.addr <<=4 #bitshifts 4 bits
obj = obj.parent
else:break
#self management , i am lazy guy so i added this part so i wouldn't have to reset addresses manualy
self.addrMASK <<=4 #bitshifts 4 bits
self.addr <<=4 #bitshifts 4 bits
# this element is added so the obj address is coresponding to place in list, this could be done more eloquently but i didn't know what are your limitations
if not self.devices:
self.devices[ len(self.devices)-1 ].addr +=1
self.devices[ len(self.devices)-1 ].parent = self
# helpful for checking data ... gives the address of system
def __repr__(self):
return "system at {0:X}, {1:0X}".format(self.addr,self.addrMASK)
# extra helpful lists data as well
def __str__(self):
data = [ '{} : {}\n'.format(k,v) for k,v in self.data.items() ]
return " ".join([ repr(self),'\n',*data ])
#checking for data, skips looping over sub systems
def __contains__(self,system_index):
return system_index-1 in range(len(self.data))
# applying parameter change -- just an example
def apply(self,par_dict):
if not isinstance(par_dict,dict):
raise TypeError("parameter must be a dict type")
if any( key in self.data.keys() for key in par_dict.keys() ):
for k,v in par_dict.items():
if k in self.data.keys():
self.data[k]=v
else:pass
else:pass
# implementing parameters trough addresses
def implement(self,address,parameter_dictionary):
if address&self.addrMASK==self.addr:
if address-self.addr!=0:
item = (address-self.addr)>>4
self.devices[item-1].implement( address-self.addr,parameter_dictionary )
else:
self.apply(parameter_dictionary)
a = systems()
b = systems()
a.addSubSystem(b)
c = systems()
b.addSubSystem(c)
print('a')
print(a)
print('')
print('b')
print(b)
print('')
print('c')
print(c)
print('')
a.implement(0x100,{"param1":"a"})
a.implement(0x110,{"param1":"b"})
a.implement(0x111,{"param1":"c"})
print('a')
print(a)
print('')
print('b')
print(b)
print('')
print('c')
print(c)
print('')

extracting the root of a tree in python

Hey everybody,
I've been trying to find a built-in function for extracting the root of a tree in python,
I haven't found something like that and I've been trying to build one of my own but I couldn't build something generic to fit all my needs.
Does anyone have something prepared or perhaps know how to extract this information from the tree structure in python?
thanks

You have to roll your own:
class Node(object):
def __init__(self, p=None):
self.parent = p
self.children = []
n1 = Node()
n2 = Node()
n1.children.append(n2)
n2.parent = n1
Of course you would want to have methods like addChild that would manage the .children and .parent attributes of the involved objects automatically.
Then you could write a method
def findRoot(node):
p = node
while p.parent != None:
p = p.parent
return p

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Parallelize tree creation with dask - python

Related

How to write a thread-safe program for an n-ary tree to be consistent in python

Trying to save node or reference to node in a list

Confused about the size() function in python

Manager / Container class, how to?

extracting the root of a tree in python

Categories

Resources