I am new to Python. I am writing a simulation in SimPy to model a production line, which looks like: Machine 1 -> Buffer 1 -> Machine 2 -> Buffer 2 -> and so on..
My question:
I have a class, Machine, of which there are several instances. Suppose that the current instance is Machine 2. The methods of this instance affect the states of Machines 1 and 3. For ex: if Buffer 2 was empty then Machine 3 is idle. But when Machine 2 puts a part in Buffer 2, Machine 3 should be activated.
So, what is the way to refer to different instances of the same class from any given instance of that class?
Also, slightly different question: What is the way to call an object (Buffers 1 and 2, in this case) from the current instance of another class?
Edit: Edited to add more clarity about the system.
It is not common for instances of a class to know about other instances of the class.
I would recommend you keep some sort of collection of instances in your class itself, and use the class to look up the instances:
class Machine(object):
lst = []
def __init__(self, name):
self.name = name
self.id = len(Machine.lst)
Machine.lst.append(self)
m0 = Machine("zero")
m1 = Machine("one")
print(Machine.lst[1].name) # prints "one"
This is a silly example that I cooked up where you put some data into the first machine which then moves it to the first buffer which then moves it to the second machine ...
Each machine just tags the data with it's ID number and passes it along, but you could make the machines do anything. You could even register a function to be called at each machine when it gets data.
class Machine(object):
def __init__(self,number,next=None):
self.number=number
self.register_next(next)
def register_next(self,next):
self.next=next
def do_work(self,data):
#do some work here
newdata='%s %d'%(str(data),self.number)
if(self.next is not None):
self.next.do_work(newdata)
class Buffer(Machine):
def __init__(self,number,next=None):
Machine.__init__(self,number,next=next)
data=None
def do_work(self,data):
if(self.next is not None):
self.next.do_work(data)
else:
self.data=data
#Now, create an assembly line
assembly=[Machine(0)]
for i in xrange(1,20):
machine=not i%2
assembly.append(Machine(i) if machine else Buffer(i))
assembly[-2].register_next(assembly[-1])
assembly[0].do_work('foo')
print (assembly[-1].data)
EDIT
Buffers are now Machines too.
Now that you added more info about the problem, I'll suggest an alternate solution.
After you have created your machines, you might want to link them together.
class Machine(object):
def __init__(self):
self.handoff = None
def input(self, item):
item = do_something(item) # machine processes item
self.handoff(item) # machine hands off item to next machine
m0 = Machine()
m1 = Machine()
m0.handoff = m1.input
m2 = Machine()
m1.handoff = m2.input
def output(item):
print(item)
m2.handoff = output
Now when you call m0.input(item) it will do its processing, then hand off the item to m1, which will do the same and hand off to m2, which will do its processing and call output(). This example shows synchronous processing (an item will go all the way through the chain before the function calls return) but you could also have the .input() method put the item on a queue for processing and then return immediately; in this way, you could make the machines process in parallel.
With this system, the connections between the machines are explicit, and each machine only knows about the one that follows it (the one it needs to know about).
I use the word "threading" to describe the process of linking together objects like this. An item being processed follows the thread from machine to machine before arriving at the output. Its slightly ambiguous because it has nothing to do with threads of execution, so that term isn't perfect.
Related
I am developing a python framework for a scientific application that I recently wanted to speed up a little using mpi4py. That framework works with, say, 3-4 class instances that do stuff with each other and each one store a reference to the other instance in a self.variable.
What troubles me is that the broadcasting function bcast changes the address of the instances each time. To make this work, I have to reassign the variable name (the new reference), of each instance, back to all the other instances. That makes me think that I am doing things wrong.
I reproduce this with the following piece of code where, each time the bcast function is called, the instance 'obj' changes address.
from mpi4py import MPI
commMPI = MPI.COMM_WORLD
rankMPI = commMPI.Get_rank()
sizeMPI = commMPI.Get_size()
class Dummy():
def __init__(self):
pass
if rankMPI == 0:
obj = Dummy()
else:
obj = None
obj = commMPI.bcast(obj,root=0)
obj1 = obj
obj = commMPI.bcast(obj,root=0)
print(rankMPI,obj is obj1)
I have just started using mpi4py so I only know the very basics. Do I have to implement an intrinsic class method to update the instances while keeping their primary address? What am I doing wrong? Thanks in advance.
Update
I found that I can retain the address of the root (rank 0) if I write this like
tmp = commMPI.bcast(obj,root=0)
if rankMPI != 0:
obj = tmp
Is that how I should do it or there is a clearer way to write that?
if class B is a child of A, would the following code cause circular dependency?
(I don't make difference between A, B, and A(), B() for simplicity)
def random_function(B):
# get parent
par = B.A
do_stuff(par, B)
class A:
def method(self):
# get B
b = A.B
random_function(b)
# random_function will reference A now, but not via self, is this a problem?
I know, that is easily solved with
def random_function(A,B):
do_stuff(A, B)
class A:
def method(self):
# get B
b = A.B
random_function(self, b)
... but I wonder whether I can still keep the signature with only one argument.
if this is too abstract, here is why I am asking: in Blender, there is a socket class and a node class, with sockets being children of a node. I have a subclass of node and a function like follows:
def func(socket):
# get parent
node = socket.node
do_stuff(node, socket)
class sub_class(Node):
def method(self):
func(self.Socket)
is there a way to keep the function signature with only one argument and not run into circular dependency here?
EDIT: an edit to supply some actual code and context, following suggestions in comments
I'll start with the description of the classes that, as I believe, are causing the circular reference.
Class ShaderNodeCustomGroup represents a node - a thing from visual programming. It has sockets and it's listetning to what is happening on those sockets. User connects inputs to input sockets, can get output from output sockets. Basically a visual representation of a function for non-programming users. Unreal has this as well, but mine is from Blender.
Class Socket, already mentioned above - this is what can take input and what gives you output.
I do not know, how exactly they interact - that's defined in source code, and I am just using API but somehow I crash entire application if I do the following:
function causing the crash:
def get_socket_neighbor(socket, offset):
"""
get the neighbor of this socket, whose position is specified by the offset
RETURNS: a socket or OUT_OF_RANGE
NOTE: offset is not for the Blender socket order, it's acual movement by 'offset' socket up or down in world coordinates
"""
sock_ind = get_socket_index(socket, socket.node)
desired_position = sock_ind - offset # <--- the actual position in 'blender coordinates'
if desired_position > (len(socket.node.inputs)-1) or desired_position < 0:
return OUT_OF_RANGE
else:
return socket.node.inputs[desired_position]
This one, for a socket in a node, finds its neighbor: for example, a socket one position up, or one position down (see the illustration for nodes and sockets, sockets are the small circles). What it does, it takes a socket, gets the node, to which it belongs, deduces the position of the socket (sock_ind = get_socket_index(socket, socket.node)), and returns a neighbor socket.
Now, I construct a subclass of ShaderNodeCustomGroup (in the following SUBCLASS). This subclass can spawn its UI and have sockets and do all the stuff that a node can do. It just has several additional features that I can define. From this class's method, the above function is called. SUBCLASS gives one of its sockets as input to this function. The function is then getting the socket's node (this is simply SUBCLASS itself). This crashes the application.
class LayerStackNode(bpy.types.ShaderNodeCustomGroup):
bl_label = "Layer Stack"
.
.
.
def insert_link(self, link):
# runs on the insert of a link,
# this is a preferred method when
# it gets repaired by blender devs
# check if the user is connecting mask
if link.to_socket.name == "mask_layer":
# simply draw a link from this socket to its mixer
my_index = mf.get_socket_index(link.to_socket, self)
relevant_layer = mf.get_socket_neighbor(link.to_socket, 1)
relevant_mixer = mf.get_node_by_type(self.node_tree.nodes, 'GROUP_INPUT').outputs[my_index - 1].links[0].to_node
self.node_tree.links.new(relevant_mixer.inputs[0], mf.get_node_by_type(self.node_tree.nodes, 'GROUP_INPUT').outputs[my_index])
return
def is_insert_input(self, link):
# determines whether the iserted link is input
if link.to_socket.name == 'layer':
return True
else:
return False
if not is_insert_input(self, link):
return
# if trying to insert on top of an empty layer, cancel and return
if mf.get_socket_neighbor(link.to_socket, -1) != mf.OUT_OF_RANGE and \
mf.get_socket_neighbor(link.to_socket, -1).is_NULL_layer:
# TODO: find a way to undo the operation
return
if self.is_NULL_layer(link.to_socket): # proceed only if it's empty
if not mf.get_socket_index(link.to_socket, self) == (len(self.inputs)-1): # <--- private case for the lowest layer in the stack
if not self.is_NULL_layer(mf.get_socket_neighbor(link.to_socket, -1)):
self.initialize_layer(link.to_socket)
else:
print("CANNOT MIX ONTO NULL LAYER")
else:
# when base layer, create a single link from input to output
sock_index = mf.get_socket_index(link.to_socket, self)
out = mf.get_node_by_type(self.node_tree.nodes, 'GROUP_INPUT').outputs[sock_index]
inp = mf.get_node_by_type(self.node_tree.nodes, 'GROUP_OUTPUT').inputs[0]
self.node_tree.links.new(inp, out)
else:
print("no action needed, the layer is not NULL")
.
.
.
Relevant for you is mf.get_socket_neighbor(link.to_socket, -1).is_NULL_layer.
Here the class of the node gives one if its sockets to the function, and it then tries to find the node of this socket, which is the calling class itself.
To sum up, SUBCLASS's method gives SUBCLASS's socket to a function, which then is trying to access the socket's node, which is SUBCLASS itself. If I rebuild the code, so that the function simply takes self, and does not have to access socket.node, everything works.
So, my question is: is this circular reference? If yes, are there ways to overcome it? I do care because that makes using classes less sensible: for example, I cannot define a property of socket which accesses its node. Because if I use this property from inside a node method, this will crash application as well.
There is no parent/child relationship here, either explicitly shown or implied. B.A and A.B do not suggest parent/child relationships. They just say that B contains an attribute name 'A' and A contains an attribute named 'B'. There's nothing wrong with this at all (naming conventions aside). The two names have nothing to do with each other, nor do they have anything to do with a parent/child relationship. In the case of B.A, the 'A' has nothing to do with the class named 'A'.
Regarding:
# random_function will reference A now, but not via self, is this a problem?
no, random_function will not reference A. random_function will reference whatever the A class's B attribute contains.
Suppose I have a python class with a large overhead
class some_class:
def __init__(self):
self.overhead = large_overhead
# Get new data
def read_new_data(self, data):
self.new_data = data
def do_something(self):
# DO SOMETHING.
Suppose I want to have it listen to output of another program, or multiple programs, and I have a way to maintain this steady stream of inputs. How do I not initiate a new instance every time given the overhead? Do I create a new script and package the class to maintain its 'live'? And if so, how do I capture the output of the programs if they cannot be in direct communication with the script I'm running without going through a middle storage like SQL or file?
You can use a class variable:
class some_class:
overhead = large_overhead
# Get new data
def read_new_data(self, data):
self.new_data = data
def do_something(self):
# DO SOMETHING.
now overhead is only evaluated once when the class is defined, and you can use self.overhead within any class instances.
Lacking specifics... Use asyncio to setup listeners/watchers and register your object's methods as callbacks for when the data comes in - run the whole thing in an event loop.
While that was easy to say and pretty abstract, I'm sure I would have a pretty steep learning curve to implement that, especially considering I'd want to implement some testing infrastructure. But it seems pretty straightforward.
I am trying to understand Threads in Python.
The code
And now I have a problem, which I have surrounded in one simple class:
# -*- coding: utf-8 -*-
import threading
class myClassWithThread(threading.Thread):
__propertyThatShouldNotBeShared = []
__id = None
def __init__(self, id):
threading.Thread.__init__(self)
self.__id = id
def run(self):
while 1:
self.dummy1()
self.dummy2()
def dummy1(self):
if self.__id == 2:
self.__propertyThatShouldNotBeShared.append("Test value")
def dummy2(self):
for data in self.__propertyThatShouldNotBeShared:
print self.__id
print data
self.__propertyThatShouldNotBeShared.remove(data)
obj1 = myClassWithThread(1)
obj2 = myClassWithThread(2)
obj3 = myClassWithThread(3)
obj1.start()
obj2.start()
obj3.start()
Description
Here is what the class does :
The class has two attributes :
__id which is an identifier for the object, given when the constructor is called
__propertyThatShouldNotBeShared is a list and will contain a text value
Now the methods
run() contains an infinite loop in which I call dummy1() and then dummy2()
dummy1() which adds to attribute (list) __propertyThatShouldNotBeShared the value "Test value" only IF the __id of the object is equal to 2
dummy2() checks if the size of the list __propertyThatShouldNotBeShared is strictly superior to 0, then
for each value in __propertyThatShouldNotBeShared it prints the id of
the object and the value contained in __propertyThatShouldNotBeShared
then it removes the value
Here is the output that I get when I launch the program :
21
Test valueTest value
2
Test value
Exception in thread Thread-2:
Traceback (most recent call last):
File "E:\PROG\myFace\python\lib\threading.py", line 808, in __bootstrap_inner
self.run()
File "E:\PROG\myFace\myProject\ghos2\src\Tests\threadDeMerde.py", line 15, in run
self.dummy2()
File "E:\PROG\myFace\myProject\ghos2\src\Tests\threadDeMerde.py", line 27, in dummy2
self.__propertyThatShouldNotBeShared.remove(data)
ValueError: list.remove(x): x not in list
The problem
As you can see in the first line of the output I get this "1"...which means that, at some point, the object with the id "1" tries to print something on the screen...and actually it does!
But this should be impossible!
Only object with id "2" should be able to print anything!
What is the problem in this code ? Or what is the problem with my logic?
The problem is this:
class myClassWithThread(threading.Thread):
__propertyThatShouldNotBeShared = []
It defines one list for all objects which is shared. You should do this:
class myClassWithThread(threading.Thread):
def __init__(self, id):
self.__propertyThatShouldNotBeShared = []
# the other code goes here
There are two problems hereāthe one you asked about, thread-safety, and the one you didn't, the difference between class and instance attributes.
It's the latter that's causing your actual problem. A class attribute is shared by all instances of the class. It has nothing to do with whether those instances are accessed on a single thread or on multiple threads; there's only one __propertyThatShouldNotBeShared that's shared by everyone. If you want an instance attribute, you have to define it on the instance, not on the class. Like this:
class myClassWithThread(threading.Thread):
def __init__(self, id):
self.__propertyThatShouldNotBeShared = []
Once you do that, each instance has its own copy of __propertyThatShouldNotBeShared, and each lives on its own thread, so there is no thread-safety issue possible.
However, your original code does have a thread-safety problem.
Almost nothing is automatically thread-safe (aka "synchronized"); exceptions (like queue.Queue) will say so explicitly, and be meant specifically for threaded programming.
You can avoid this in three ways:
Don't share anything.
Don't mutate anything you share.
Don't mutate anything you share unless it's protected by an appropriate synchronization object.
The last one is of course the most flexible, but also the most complicated. In fact, it's at the center of why people consider threaded programming hard.
The short version is, everywhere you modify or access shared mutable data like self.__propertyThatShouldNotBeShared, you need to be holding some kind of synchronization object, like a Lock. For example:
class myClassWithThread(threading.Thread):
__lock = threading.Lock()
# etc.
def dummy1(self):
if self.__id == 2:
with self.__lock:
self.__propertyThatShouldNotBeShared.append("Test value")
If you stick to CPython, and to built-in types, you can often get away with ignoring locks. But "often" in threaded programming is just a synonym for "always during testing and debugging, right up until the release or big presentation, when it suddenly begins failing". Unless you want to learn the rules for how the Global Interpreter Lock and the built-in types work in CPython, don't rely on this.
Class variables in Python are just that: shared by all instances of the class. You need an instance variable, which you usually define inside __init__. Remove the class-level declarations (and the double leading underscores, they're for name mangling which you don't need here.)
I am trouble figuring out how to make a synchronized Python object. I have a class called Observation and a class called Variable that basically looks like (code is simplified to show the essence):
class Observation:
def __init__(self, date, time_unit, id, meta):
self.date = date
self.time_unit = time_unit
self.id = id
self.count = 0
self.data = 0
def add(self, value):
if isinstance(value, list):
if self.count == 0:
self.data = []
self.data.append(value)
else:
self.data += value
self.count += 1
class Variable:
def __init__(self, name, time_unit, lock):
self.name = name
self.lock = lock
self.obs = {}
self.time_unit = time_unit
def get_observation(self, id, date, meta):
self.lock.acquire()
try:
obs = self.obs.get(id, Observation(date, self.time_unit, id, meta))
self.obs[id] = obs
finally:
self.lock.release()
return obs
def add(self, date, value, meta={}):
self.lock.acquire()
try:
obs = self.get_observation(id, date, meta)
obs.add(value)
self.obs[id] = obs
finally:
self.lock.release()
This is how I setup the multiprocessing part:
plugin = function defined somewhere else
tasks = JoinableQueue()
result = JoinableQueue()
mgr = Manager()
lock = mgr.RLock()
var = Variable('foobar', 'year', lock)
for person in persons:
tasks.put(Task(plugin, var, person))
Example of how the code is supposed to work:
I have an instance of Variable called var and I want to add an observation to var:
today = datetime.datetime.today()
var.add(today, 1)
So, the add function of Variable looks whether there already exists an observation for that date, if it does then it returns that observation else it creates a new instance of Observation. Having found an observation than the actual value is added by the call obs.add(value). My main concern is that I want to make sure that different processes are not creating multiple instances of Observation for the same date, that's why I lock it.
One instance of Variable is created and is shared between different processes using the multiprocessing library and is the container for numerous instances of Observation. The above code does not work, I get the error:
RuntimeError: Lock objects should only
be shared between processes through
inheritance
However, if I instantiate a Lock object before launching the different processes and supply it to the constructor of Variable then it seems that I get a race condition as all processes seem to be waiting for each other.
The ultimate goal is that different processes can update the obs variable in the object Variable. I need this to be threadsafe because I am not just modifying the dictionary in place but adding new elements and incrementing existing variables. the obs variable is a dictionary that contains a bunch of instances of Observation.
How can I make this synchronized where I share one single instance of Variable between numerous multiprocessing processes? Thanks so much for your cognitive surplus!
UPDATE 1:
* I am using multiprocessing Locks and I have changed the source code to show this.
* I have changed the title to more accurately capture the problem
* I have replaced theadsafe with synchronization where I was confusing the two terms.
Thanks to Dmitry Dvoinikov for pointing me out!
One question that I am still not sure about is where do I instantiate Lock? Should this happen inside the class or before initializing the multiprocesses and give it as an argument? ANSWER: Should happen outside the class.
UPDATE 2:
* I fixed the 'Lock objects should only be shared between processes through inheritance' error by moving the initialization of the Lock outside the class definition and using a manager.
* Final question, now everything works except that it seems that when I put my Variable instance in the queue then it does not get updated, and everytime I get it from the queue it does not contain the observation I added in the previous iteration. This is the only thing that is confusing me :(
UPDATE 3:
The final solution was to set the var.obs dictionary to an instance of mgr.dict() and then to have a custom serializer. Happy tho share the code with somebody who is struggling with this as well.
You are talking not about thread safety but about synchronization between separate processes and that's entirely different thing. Anyway, to start
different processes can update the obs variable in the object Variable.
implies that Variable is in shared memory, and you have to explicitly store objects there, by no magic a local instance becomes visible to separate process. Here:
Data can be stored in a shared memory map using Value or Array
Then, your code snippet is missing crucial import section. No way to tell whether you instantiate the right multiprocessing.Lock, not multithreading.Lock. Your code doesn't show the way you create processes and pass data around.
Therefore, I'd suggest that you realize the difference between threads and processes, whether you truly need a shared memory model for an application which contains multiple processes and examine the spec.