Thread in Python : class attribute (list) not thread-safe?

Thread in Python : class attribute (list) not thread-safe? - python

I am trying to understand Threads in Python.
The code
And now I have a problem, which I have surrounded in one simple class:
# -*- coding: utf-8 -*-
import threading
class myClassWithThread(threading.Thread):
__propertyThatShouldNotBeShared = []
__id = None
def __init__(self, id):
threading.Thread.__init__(self)
self.__id = id
def run(self):
while 1:
self.dummy1()
self.dummy2()
def dummy1(self):
if self.__id == 2:
self.__propertyThatShouldNotBeShared.append("Test value")
def dummy2(self):
for data in self.__propertyThatShouldNotBeShared:
print self.__id
print data
self.__propertyThatShouldNotBeShared.remove(data)
obj1 = myClassWithThread(1)
obj2 = myClassWithThread(2)
obj3 = myClassWithThread(3)
obj1.start()
obj2.start()
obj3.start()
Description
Here is what the class does :
The class has two attributes :
__id which is an identifier for the object, given when the constructor is called
__propertyThatShouldNotBeShared is a list and will contain a text value
Now the methods
run() contains an infinite loop in which I call dummy1() and then dummy2()
dummy1() which adds to attribute (list) __propertyThatShouldNotBeShared the value "Test value" only IF the __id of the object is equal to 2
dummy2() checks if the size of the list __propertyThatShouldNotBeShared is strictly superior to 0, then
for each value in __propertyThatShouldNotBeShared it prints the id of
the object and the value contained in __propertyThatShouldNotBeShared
then it removes the value
Here is the output that I get when I launch the program :
21
Test valueTest value
2
Test value
Exception in thread Thread-2:
Traceback (most recent call last):
File "E:\PROG\myFace\python\lib\threading.py", line 808, in __bootstrap_inner
self.run()
File "E:\PROG\myFace\myProject\ghos2\src\Tests\threadDeMerde.py", line 15, in run
self.dummy2()
File "E:\PROG\myFace\myProject\ghos2\src\Tests\threadDeMerde.py", line 27, in dummy2
self.__propertyThatShouldNotBeShared.remove(data)
ValueError: list.remove(x): x not in list
The problem
As you can see in the first line of the output I get this "1"...which means that, at some point, the object with the id "1" tries to print something on the screen...and actually it does!
But this should be impossible!
Only object with id "2" should be able to print anything!
What is the problem in this code ? Or what is the problem with my logic?

The problem is this:
class myClassWithThread(threading.Thread):
__propertyThatShouldNotBeShared = []
It defines one list for all objects which is shared. You should do this:
class myClassWithThread(threading.Thread):
def __init__(self, id):
self.__propertyThatShouldNotBeShared = []
# the other code goes here

There are two problems here—the one you asked about, thread-safety, and the one you didn't, the difference between class and instance attributes.
It's the latter that's causing your actual problem. A class attribute is shared by all instances of the class. It has nothing to do with whether those instances are accessed on a single thread or on multiple threads; there's only one __propertyThatShouldNotBeShared that's shared by everyone. If you want an instance attribute, you have to define it on the instance, not on the class. Like this:
class myClassWithThread(threading.Thread):
def __init__(self, id):
self.__propertyThatShouldNotBeShared = []
Once you do that, each instance has its own copy of __propertyThatShouldNotBeShared, and each lives on its own thread, so there is no thread-safety issue possible.
However, your original code does have a thread-safety problem.
Almost nothing is automatically thread-safe (aka "synchronized"); exceptions (like queue.Queue) will say so explicitly, and be meant specifically for threaded programming.
You can avoid this in three ways:
Don't share anything.
Don't mutate anything you share.
Don't mutate anything you share unless it's protected by an appropriate synchronization object.
The last one is of course the most flexible, but also the most complicated. In fact, it's at the center of why people consider threaded programming hard.
The short version is, everywhere you modify or access shared mutable data like self.__propertyThatShouldNotBeShared, you need to be holding some kind of synchronization object, like a Lock. For example:
class myClassWithThread(threading.Thread):
__lock = threading.Lock()
# etc.
def dummy1(self):
if self.__id == 2:
with self.__lock:
self.__propertyThatShouldNotBeShared.append("Test value")
If you stick to CPython, and to built-in types, you can often get away with ignoring locks. But "often" in threaded programming is just a synonym for "always during testing and debugging, right up until the release or big presentation, when it suddenly begins failing". Unless you want to learn the rules for how the Global Interpreter Lock and the built-in types work in CPython, don't rely on this.

Class variables in Python are just that: shared by all instances of the class. You need an instance variable, which you usually define inside __init__. Remove the class-level declarations (and the double leading underscores, they're for name mangling which you don't need here.)

Related

Do mutable class attributes require a lock when reading or updating?

I'm using a couple of class attributes to keep track of aggregate task completion across multiple instances of class. When reading or updating the class attributes do I need to use a lock of some sort?
class ClassAttrExample:
of_type_list = []
of_type_int = 0
def __init__(self, name):
self.name = name
def do_task(self):
# does some stuff
# do I need a lock context here???
self.of_type_list.append(self.name)
self.of_type_int += 1

If not threads are involved, no locks are required just because class instances share data. As long as the operations are performed in the same thread, everything is safe.
If threads are involved, you'll want locks.
For the specific case of CPython (the reference interpreter), as an implementation detail, the .append call does not require a lock. The GIL can only be switched out between bytecodes (or when a bytecode calls into C code that explicitly releases it, which list never does), and list.append is effectively atomic as a result (all the work it does occurs within a single CALL_METHOD bytecode which never calls back into Python level code, so the GIL is definitely held the whole time).
By contrast, += involves reading the input operand, then performing the increment, then reassigning the input, and the GIL can be swapped between those operations, leading to missed increments when two threads read the value before either writes back to it.
So if multithreaded access is possible, for the int case, the lock is required. And given you need the lock anyway, you may as well lock around the append call too, ensuring the code remains portable to GIL-free Python interpreters.
A fully portable thread-safe version of your class would look something like:
import threading
class ClassAttrExample:
_lock = threading.Lock()
of_type_list = []
of_type_int = 0
def __init__(self, name):
self.name = name
def do_task(self):
# does some stuff
with self._lock:
# Can't use bare name to refer to class attribute, must access
# through class or instance thereof
self.of_type_list.append(self.name) # Load-only access to of_type_list
# can use self directly
type(self).of_type_int += 1 # Must use type(self) to avoid creating
# instance attribute that shadows class
# attribute on store

Is it a good practice to keep reference in a class variable to the current instance of it?

I have a class that will always have only 1 object at the time. I'm just starting OOP in python and I was wondering what is a better approach: to assign an instance of this class to the variable and operate on that variable or rather have this instance referenced in the class variable instead. Here is an example of what I mean:
Referenced instance:
def Transaction(object):
current_transaction = None
in_progress = False
def __init__(self):
self.__class__.current_transaction = self
self.__class__.in_progress = True
self.name = 'abc'
self.value = 50
def update(self):
do_smth()
Transaction()
if Transaction.in_progress:
Transaction.current_transaction.update()
print Transaction.current_transaction.name
print Transaction.current_transaction.value
instance in a variable
def Transaction(object):
def __init__(self):
self.name = 'abc'
self.value = 50
def update(self):
do_smth()
current_transaction = Transaction()
in_progress = True
if in_progress:
current_transaction.update()
print current_transaction.name
print current_transaction.value

It's possible to see that you've encapsulated too much in the first case just by comparing the overall readability of the code: the second is much cleaner.
A better way to implement the first option is to use class methods: decorate all your method with #classmethod and then call with Transaction.method().
There's no practical difference in code quality for these two options. However, assuming that the the class is final, that is, without derived classes, I would go for a third choice: use the module as a singleton and kill the class. This would be the most compact and most readable choice. You don't need classes to create sigletons.

I think the first version doesn't make much sense, and the second version of your code would be better in almost all situations. It can sometimes be useful to write a Singleton class (where only one instance ever exists) by overriding __new__ to always return the saved instance (after it's been created the first time). But usually you don't need that unless you're wrapping some external resource that really only ever makes sense to exist once.
If your other code needs to share a single instance, there are other ways to do so (e.g. a global variable in some module or a constructor argument for each other object that needs a reference).
Note that if your instances have a very well defined life cycle, with specific events that should happen when they're created and destroyed, and unknown code running and using the object in between, the context manager protocol may be something you should look at, as it lets you use your instances in with statements:
with Transaction() as trans:
trans.whatever() # the Transaction will be notified if anything raises
other_stuff() # an exception that is not caught within the with block
trans.foo() # (so it can do a rollback if it wants to)
foo() # the Transaction will be cleaned up (e.g. committed) when the indented with block ends
Implementing the context manager protocol requires an __enter__ and __exit__ method.

Display an item from a object that has been put in a self variable

Basically I have a process class that takes in 3 variables. process as in the name, time and io.
I put that process into a ready queue then in another class I have put the first object from readyqueue into self.__ current. Usually when I put the object into a variable eg. current = process('hi', 6, False). I can say current._process.__time and it returns >>> 6.
Can anyone explain how i can do this with self.__current.
class Process:
def __init__(self,process,time,io = False):
self.__process = process
self.__time = time
self.__io = io
class cpu(Queue):
def __init__(self):
self.timequantum = 1
self.__current = []

The issue you have is because you're using attribute names that start (and don't end) with two underscores. When you do that in a method, it invokes Python's "name mangling" feature, which transforms a name like __process into _Process__process. That makes it awkward to access from other code.
The purpose of the name mangling system is to allow mixin classes to assign attributes on objects that may have any number of other attributes. Since the mangling adds the name of the class where the attribute is being used (not the name of class of the object), the mixin-author can be fairly confident that no other class will accidentally use the same attribute name.
You probably don't want name mangling, so you should not use __ at the start of your attribute names. If you want the attributes to be "private", use a single underscore, which indicates that the attribute is not part of the public API of the class. This is a convention, so for debugging or testing you can still access the _ prefixed attributes if you need to.
Or if you want other code to use the attributes, just use bare names. In Python, it's perfectly acceptable to have public attributes as part of an object's API. (Other programming languages discourage that, but in Python you can reimplement an attribute-API with getter and setter functions using a property later if you need to.)

I don't understand this python del behaviour

Can someone explain why the following code behaves the way it does:
import types
class Dummy():
def __init__(self, name):
self.name = name
def __del__(self):
print "delete",self.name
d1 = Dummy("d1")
del d1
d1 = None
print "after d1"
d2 = Dummy("d2")
def func(self):
print "func called"
d2.func = types.MethodType(func, d2)
d2.func()
del d2
d2 = None
print "after d2"
d3 = Dummy("d3")
def func(self):
print "func called"
d3.func = types.MethodType(func, d3)
d3.func()
d3.func = None
del d3
d3 = None
print "after d3"
The output (note that the destructor for d2 is never called) is this (python 2.7)
delete d1
after d1
func called
after d2
func called
delete d3
after d3
Is there a way to "fix" the code so the destructor is called without deleting the method added? I mean, the best place to put the d2.func = None would be in the destructor!
Thanks
[edit] Based on the first few answers, I'd like to clarify that I'm not asking about the merits (or lack thereof) of using __del__. I tried to create the shortest function that would demonstrate what I consider to be non-intuitive behavior. I'm assuming a circular reference has been created, but I'm not sure why. If possible, I'd like to know how to avoid the circular reference....

You cannot assume that __del__ will ever be called - it is not a place to hope that resources are automagically deallocated. If you want to make sure that a (non-memory) resource is released, you should make a release() or similar method and then call that explicitly (or use it in a context manager as pointed out by Thanatos in comments below).
At the very least you should read the __del__ documentation very closely, and then you should probably not try to use __del__. (Also refer to the gc.garbage documentation for other bad things about __del__)

I'm providing my own answer because, while I appreciate the advice to avoid __del__, my question was how to get it to work properly for the code sample provided.
Short version: The following code uses weakref to avoid the circular reference. I thought I'd tried this before posting the question, but I guess I must have done something wrong.
import types, weakref
class Dummy():
def __init__(self, name):
self.name = name
def __del__(self):
print "delete",self.name
d2 = Dummy("d2")
def func(self):
print "func called"
d2.func = types.MethodType(func, weakref.ref(d2)) #This works
#d2.func = func.__get__(weakref.ref(d2), Dummy) #This works too
d2.func()
del d2
d2 = None
print "after d2"
Longer version:
When I posted the question, I did search for similar questions. I know you can use with instead, and that the prevailing sentiment is that __del__ is BAD.
Using with makes sense, but only in certain situations. Opening a file, reading it, and closing it is a good example where with is a perfectly good solution. You've gone a specific block of code where the object is needed, and you want to clean up the object and the end of the block.
A database connection seems to be used often as an example that doesn't work well using with, since you usually need to leave the section of code that creates the connection and have the connection closed in a more event-driven (rather than sequential) timeframe.
If with is not the right solution, I see two alternatives:
You make sure __del__ works (see this blog for a better
description of weakref usage)
You use the atexit module to run a callback when your program closes. See this topic for example.
While I tried to provide simplified code, my real problem is more event-driven, so with is not an appropriate solution (with is fine for the simplified code). I also wanted to avoid atexit, as my program can be long-running, and I want to be able to perform the cleanup as soon as possible.
So, in this specific case, I find it to be the best solution to use weakref and prevent circular references that would prevent __del__ from working.
This may be an exception to the rule, but there are use-cases where using weakref and __del__ is the right implementation, IMHO.

Instead of del, you can use the with operator.
http://effbot.org/zone/python-with-statement.htm
just like with filetype objects, you could something like
with Dummy('d1') as d:
#stuff
#d's __exit__ method is guaranteed to have been called

del doesn't call __del__
del in the way you are using removes a local variable. __del__ is called when the object is destroyed. Python as a language makes no guarantees as to when it will destroy an object.
CPython as the most common implementation of Python, uses reference counting. As a result del will often work as you expect. However it will not work in the case that you have a reference cycle.
d3 -> d3.func -> d3
Python doesn't detect this and so won't clean it up right away. And its not just reference cycles. If an exception is throw you probably want to still call your destructor. However, Python will typically hold onto to the local variables as part of its traceback.
The solution is not to depend on the __del__ method. Rather, use a context manager.
class Dummy:
def __enter__(self):
return self
def __exit__(self, type, value, traceback):
print "Destroying", self
with Dummy() as dummy:
# Do whatever you want with dummy in here
# __exit__ will be called before you get here
This is guaranteed to work, and you can even check the parameters to see whether you are handling an exception and do something different in that case.

A full example of a context manager.
class Dummy(object):
def __init__(self, name):
self.name = name
def __enter__(self):
return self
def __exit__(self, exct_type, exce_value, traceback):
print 'cleanup:', d
def __repr__(self):
return 'Dummy(%r)' % (self.name,)
with Dummy("foo") as d:
print 'using:', d
print 'later:', d

It seems to me the real heart of the matter is here:
adding the functions is dynamic (at runtime) and not known in advance
I sense that what you are really after is a flexible way to bind different functionality to an object representing program state, also known as polymorphism. Python does that quite well, not by attaching/detaching methods, but by instantiating different classes. I suggest you look again at your class organization. Perhaps you need to separate a core, persistent data object from transient state objects. Use the has-a paradigm rather than is-a: each time state changes, you either wrap the core data in a state object, or you assign the new state object to an attribute of the core.
If you're sure you can't use that kind of pythonic OOP, you could still work around your problem another way by defining all your functions in the class to begin with and subsequently binding them to additional instance attributes (unless you're compiling these functions on the fly from user input):
class LongRunning(object):
def bark_loudly(self):
print("WOOF WOOF")
def bark_softly(self):
print("woof woof")
while True:
d = LongRunning()
d.bark = d.bark_loudly
d.bark()
d.bark = d.bark_softly
d.bark()

An alternative solution to using weakref is to dynamically bind the function to the instance only when it is called by overriding __getattr__ or __getattribute__ on the class to return func.__get__(self, type(self)) instead of just func for functions bound to the instance. This is how functions defined on the class behave. Unfortunately (for some use cases) python doesn't perform the same logic for functions attached to the instance itself, but you can modify it to do this. I've had similar problems with descriptors bound to instances. Performance here probably isn't as good as using weakref, but it is an option that will work transparently for any dynamically assigned function with the use of only python builtins.
If you find yourself doing this often, you might want a custom metaclass that does dynamic binding of instance-level functions.
Another alternative is to add the function directly to the class, which will then properly perform the binding when it's called. For a lot of use cases, this would have some headaches involved: namely, properly namespacing the functions so they don't collide. The instance id could be used for this, though, since the id in cPython isn't guaranteed unique over the life of the program, you'd need to ponder this a bit to make sure it works for your use case... in particular, you probably need to make sure you delete the class function when an object goes out of scope, and thus its id/memory address is available again. __del__ is perfect for this :). Alternatively, you could clear out all methods namespaced to the instance on object creation (in __init__ or __new__).
Another alternative (rather than messing with python magic methods) is to explicitly add a method for calling your dynamically bound functions. This has the downside that your users can't call your function using normal python syntax:
class MyClass(object):
def dynamic_func(self, func_name):
return getattr(self, func_name).__get__(self, type(self))
def call_dynamic_func(self, func_name, *args, **kwargs):
return getattr(self, func_name).__get__(self, type(self))(*args, **kwargs)
"""
Alternate without using descriptor functionality:
def call_dynamic_func(self, func_name, *args, **kwargs):
return getattr(self, func_name)(self, *args, **kwargs)
"""
Just to make this post complete, I'll show your weakref option as well:
import weakref
inst = MyClass()
def func(self):
print 'My func'
# You could also use the types modules, but the descriptor method is cleaner IMO
inst.func = func.__get__(weakref.ref(inst), type(inst))

use eval()
In [1]: int('25.0')
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-1-67d52e3d0c17> in <module>
----> 1 int('25.0')
ValueError: invalid literal for int() with base 10: '25.0'
In [2]: int(float('25.0'))
Out[2]: 25
In [3]: eval('25.0')
Out[3]: 25.0

Why do new instances of a class share members with other instances?

class Ball:
a = []
def __init__(self):
pass
def add(self,thing):
self.a.append(thing)
def size(self):
print len(self.a)
for i in range(3):
foo = Ball()
foo.add(1)
foo.add(2)
foo.size()
I would expect a return of :
2
2
2
But I get :
2
4
6
Why is this? I've found that by doing a=[] in the init, I can route around this behavior, but I'm less than clear why.

doh
I just figured out why.
In the above case, the a is a class attribute, not a data attribute - those are shared by all Balls(). Commenting out the a=[] and placing it into the init block means that it's a data attribute instead. (And, I couldn't access it then with foo.a, which I shouldn't do anyhow.) It seems like the class attributes act like static attributes of the class, they're shared by all instances.
Whoa.
One question though : CodeCompletion sucks like this. In the foo class, I can't do self.(variable), because it's not being defined automatically - it's being defined by a function. Can I define a class variable and replace it with a data variable?

What you probably want to do is:
class Ball:
def __init__(self):
self.a = []
If you use just a = [], it creates a local variable in the __init__ function, which disappears when the function returns. Assigning to self.a makes it an instance variable which is what you're after.
For a semi-related gotcha, see how you can change the value of default parameters for future callers.

"Can I define a class variable and replace it with a data variable?"
No. They're separate things. A class variable exists precisely once -- in the class.
You could -- to finesse code completion -- start with some class variables and then delete those lines of code after you've written your class. But every time you forget to do that nothing good will happen.
Better is to try a different IDE. Komodo Edit's code completions seem to be sensible.
If you have so many variables with such long names that code completion is actually helpful, perhaps you should make your classes smaller or use shorter names. Seriously.
I find that when you get to a place where code completion is more helpful than annoying, you've exceeded the "keep it all in my brain" complexity threshold. If the class won't fit in my brain, it's too complex.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Thread in Python : class attribute (list) not thread-safe? - python

Class variables in Python are just that: shared by all instances of the class. You need an instance variable, which you usually define inside init. Remove the class-level declarations (and the double leading underscores, they're for name mangling which you don't need here.)

Related

Do mutable class attributes require a lock when reading or updating?

Is it a good practice to keep reference in a class variable to the current instance of it?

Display an item from a object that has been put in a self variable

I don't understand this python del behaviour

Why do new instances of a class share members with other instances?

Categories

Resources

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Thread in Python : class attribute (list) not thread-safe? - python

Class variables in Python are just that: shared by all instances of the class. You need an instance variable, which you usually define inside __init__. Remove the class-level declarations (and the double leading underscores, they're for name mangling which you don't need here.)

Related

Do mutable class attributes require a lock when reading or updating?

Is it a good practice to keep reference in a class variable to the current instance of it?

Display an item from a object that has been put in a self variable

I don't understand this python __del__ behaviour

Why do new instances of a class share members with other instances?

Categories

Resources

Class variables in Python are just that: shared by all instances of the class. You need an instance variable, which you usually define inside init. Remove the class-level declarations (and the double leading underscores, they're for name mangling which you don't need here.)

I don't understand this python del behaviour