Do mutable class attributes require a lock when reading or updating? - python

I'm using a couple of class attributes to keep track of aggregate task completion across multiple instances of class. When reading or updating the class attributes do I need to use a lock of some sort?
class ClassAttrExample:
of_type_list = []
of_type_int = 0
def __init__(self, name):
self.name = name
def do_task(self):
# does some stuff
# do I need a lock context here???
self.of_type_list.append(self.name)
self.of_type_int += 1

If not threads are involved, no locks are required just because class instances share data. As long as the operations are performed in the same thread, everything is safe.
If threads are involved, you'll want locks.
For the specific case of CPython (the reference interpreter), as an implementation detail, the .append call does not require a lock. The GIL can only be switched out between bytecodes (or when a bytecode calls into C code that explicitly releases it, which list never does), and list.append is effectively atomic as a result (all the work it does occurs within a single CALL_METHOD bytecode which never calls back into Python level code, so the GIL is definitely held the whole time).
By contrast, += involves reading the input operand, then performing the increment, then reassigning the input, and the GIL can be swapped between those operations, leading to missed increments when two threads read the value before either writes back to it.
So if multithreaded access is possible, for the int case, the lock is required. And given you need the lock anyway, you may as well lock around the append call too, ensuring the code remains portable to GIL-free Python interpreters.
A fully portable thread-safe version of your class would look something like:
import threading
class ClassAttrExample:
_lock = threading.Lock()
of_type_list = []
of_type_int = 0
def __init__(self, name):
self.name = name
def do_task(self):
# does some stuff
with self._lock:
# Can't use bare name to refer to class attribute, must access
# through class or instance thereof
self.of_type_list.append(self.name) # Load-only access to of_type_list
# can use self directly
type(self).of_type_int += 1 # Must use type(self) to avoid creating
# instance attribute that shadows class
# attribute on store

Related

Undoing a decade of singleton pattern and class-level configuration

Overview
I need to duplicate a whole inheritance tree of classes. Simply deep-copying the class objects does not work; a proper factory pattern involves a huge amount of code changes; I'm not sure how to use metaclasses to accomplish this.
Background
The software I work on implements support for specialized external hardware, connected to the host computer via USB. Many years ago, it was assumed that there would only ever be one type of hardware in use at a time. Consequently, the hardware object is used as a singleton. Along the years, secondary classes were configured based on the currently active hardware class.
At the moment, it is impossible to use this library with two types of hardware at the same time, since the classobjects cannot be configured for both hardware at the same time.
In recent years, we have avoided this issue by creating one python process for each hardware, but this is becoming untenable.
Here is an extremely simplified example of the architecture:
# ----------
# Hardware classes
class HwBase():
def customizeComponent(self, compDict):
compDict['ComponentBase'].hardware = self
class HwA(HwBase):
def customizeComponent(self, compDict):
super().customizeComponent(compDict)
compDict['AnotherComponent'].prop.configure(1,2,3)
class HwB(HwBase):
def customizeComponent(self, compDict):
super().customizeComponent(compDict)
compDict['AnotherComponent'].prop.configure(4,5,6)
# ----------
# Property classes
class SpecialProperty(property):
def __init__(self, fvalidate):
self.fvalidate = fvalidate
# handle fset, fget, etc. here.
# super().__init__()
# ----------
# Component classes
class ComponentBase():
hardware = None
def validateProp(self, val):
return val < self.maxVal
prop = SpecialProperty(fvalidate=validateProp)
class SomeComponent():
"""Users directly instantiate and use this compoent via an interactive shell.
This component does complex operations with the hardware attribute"""
def validateThing(self, val):
return isinstance(val, ComponentBase)
thing = SpecialProperty(fvalidate=validateThing)
class AnotherComponent():
"""Users directly instantiate and use this compoent via an interactive shell
This component does complex operations with the hardware attribute"""
maxVal = 15
# ----------
# Initialization
def initialize():
""" This is only called once perppython instance."""
#activeCls = HwA
activeCls = HwB
allComponents = {
'ComponentBase': ComponentBase,
'SomeComponent': SomeComponent,
'AnotherComponent': AnotherComponent
}
hwInstance = activeCls()
hwInstance.customizeComponent(allComponents)
return allComponents
components = initialize()
# ----------
# User code goes here
someInstance1 = components['SomeComponent']()
someInstance2 = components['SomeComponent']()
someInstance1.prop = 10
someInstance2.prop = 10
The overarching goal would be to interact with both HwA and HwB at the same time. Since most interactions are done via components instead of the Hw objects themselves, I believe the solution involves having multiple versions of the components, e.g.: two separate inheritance trees, for a total of 6 final components, one tree/set configured for each hardware. This is what I need help with.
Potential solutions
Consider that I have around tens different hardware do configure for. Furthermore, there are hundreds of different leaf components classes, with many extra bases and mixin classes.
Move all configuration steps in the component's init method
Not possible due to the use of properties; these need to be set on the class.
Deepcopy the classobjects
Copy all classobjects, swap in the appropriate __bases__. Mutable class variables need to be carefully handled. However, I'm not sure how to deal with properties for this, since classbody references within the property objects (such as fvalidate) need to be updated to that of the copied class.
This requires a significant amount of manual intervention to work. Not impossible, but prone to breaking in the long term.
Factory pattern
Wrap all component definition in a factory function:
def ComponentBaseFactory(hw):
class SomeComponent(cache[hw].ComponentBase):
pass
and have some sort of component cache which would handle creating all classobjects during initialize()
This is what I consider the most architecturally-correct option available. Since the class body is re-executed
on every factory call, the attributes of the properties will reference the appropriate class object.
Downside: huge code footprint. I am familiar with doing codebase-wide changes via sed or python scripts, but this would be quite a lot.
Add metaclasses on components
I am not sure how to proceed for this. Based on the python data model (py3.7), the following happens at class creation (which happens right after the class definition indentation ends):
MRO entries are resolved;
the appropriate metaclass is determined;
the class namespace is prepared;
the class body is executed;
the class object is created.
I would need to redo these steps after the class has been defined (like a factory function!), but i'm not sure how to redo step 4. Specifically, the python documentation states in section 3.3.3.5 that the class body is executed as with a "special?" form of the exec() builtin. How can I re-exec the class body with a different set of locals/globals? Even if I access the class body's code with inspect shenanigans, i'm not sure i'll be able to reproduce the module environment properly.
Even if I mess with __prepare__ and __new__, I don't see how I can fix the cross-references introduced in the class code block regarding the property instantiation.
Components as metaclasses
A metaclass is a class factory, just like a class is an object factory. SomeComponent and AnotherComponent could be declared as metaclasses, then get instantiated with the Hw object during initialize():
SomeComponent = SomeComponentMeta(hw)
This is similar to the factory pattern, but would also require quite a few code changes: a lot of class code would have to be moved to the metaclass __init__.
I'd have to spend a lot more of time here to proper understand what you need, but if your "TL;DR" of executing the class body with different globals/nonlocal variables is the bottom line, the factory approach is a very clean and readable way, as you had considered.
At first, I don't think a metaclass could be a good approach here - although it could be used to customize your special properties (in my first read, I could not figure out what they actually do, and how they should differ between your final classes). If the function as a class factory can specialize your properties, it would work nonetheless.
If what you need is that the properties are independent for Hwa and HwB like in accessing a different list object in HwA than is accessed in HwB, yes, a metaclass could take care of that, by automatically recreating any properties when creating a subclass (so that the property objects themselves are not shared with the supper-classes and across the hierarchy).
If that i what you need, leave a comment, I can write some proof of concept code.
Anyway, it is possible to create a metaclass that, upon instantiating a subclass, will look upon the hierarchy for all SpecialProperty and create new-instances of those for the subclass - so that a base value set on a superclass remains valid for the subclasses, but when configuration runs, each class will have an independent configuration. (as it turns out, no metaclass is needed: we are covered by __init_subclass__ )
Another thing to take care of is that subclassses of property cannot be simply copies with Python's copy.copy (tested empirically), so we need a way to create reliable copies of those. I include one function bellow, but it might need to be improved to work with the actual SpecialProperty class.
from copy import copy
def copy_property(prop):
cls = prop.__class__
new_prop = cls.__new__(cls)
# Initialize the attributes that can't be set from Python code, inplace:
property.__init__(new_prop, prop.fget, prop.fset, prop.fdel)
if hasattr(prop, "__dict__"): # only exists for subclasses of property
# Possible adaptation needed: it may be that for some attributes of
# SpecialProperty, a deepcopy would be needed.
# But for the given example attribute of "fvalidate" a simple copy is better:
new_prop.__dict__ = copy(prop.__dict__)
return new_prop
# Python 3.6 introduced `__init_subclass__` which is called at subclass _creation_
# time. With it, the logic can be inserted in ComponentBase and there is no need for
# a metaclass.
class ComponentBase():
def __init_subclass__(cls, **kwargs):
super().__init_subclass__(**kwargs)
for attrname in dir(cls):
attr = getattr(cls, attrname)
if not isinstance(attr, SpecialProperty):
continue
new_prop = copy_property(attr)
setattr(cls, attrname, new_prop)
hardware = None
...
As you see- theres some workarounds that had to be done because your project opted for subclassing property. I am leaving this remark here as a remainder that unless property fits one exact needs, it is more clean to write a new class implementing the Descriptor Protocol - just by implementing __set__, __get__ and __delete__ directly.

Is it a good practice to keep reference in a class variable to the current instance of it?

I have a class that will always have only 1 object at the time. I'm just starting OOP in python and I was wondering what is a better approach: to assign an instance of this class to the variable and operate on that variable or rather have this instance referenced in the class variable instead. Here is an example of what I mean:
Referenced instance:
def Transaction(object):
current_transaction = None
in_progress = False
def __init__(self):
self.__class__.current_transaction = self
self.__class__.in_progress = True
self.name = 'abc'
self.value = 50
def update(self):
do_smth()
Transaction()
if Transaction.in_progress:
Transaction.current_transaction.update()
print Transaction.current_transaction.name
print Transaction.current_transaction.value
instance in a variable
def Transaction(object):
def __init__(self):
self.name = 'abc'
self.value = 50
def update(self):
do_smth()
current_transaction = Transaction()
in_progress = True
if in_progress:
current_transaction.update()
print current_transaction.name
print current_transaction.value
It's possible to see that you've encapsulated too much in the first case just by comparing the overall readability of the code: the second is much cleaner.
A better way to implement the first option is to use class methods: decorate all your method with #classmethod and then call with Transaction.method().
There's no practical difference in code quality for these two options. However, assuming that the the class is final, that is, without derived classes, I would go for a third choice: use the module as a singleton and kill the class. This would be the most compact and most readable choice. You don't need classes to create sigletons.
I think the first version doesn't make much sense, and the second version of your code would be better in almost all situations. It can sometimes be useful to write a Singleton class (where only one instance ever exists) by overriding __new__ to always return the saved instance (after it's been created the first time). But usually you don't need that unless you're wrapping some external resource that really only ever makes sense to exist once.
If your other code needs to share a single instance, there are other ways to do so (e.g. a global variable in some module or a constructor argument for each other object that needs a reference).
Note that if your instances have a very well defined life cycle, with specific events that should happen when they're created and destroyed, and unknown code running and using the object in between, the context manager protocol may be something you should look at, as it lets you use your instances in with statements:
with Transaction() as trans:
trans.whatever() # the Transaction will be notified if anything raises
other_stuff() # an exception that is not caught within the with block
trans.foo() # (so it can do a rollback if it wants to)
foo() # the Transaction will be cleaned up (e.g. committed) when the indented with block ends
Implementing the context manager protocol requires an __enter__ and __exit__ method.

is a Python dictionary thread-safe when keys are thread IDs?

Is a Python dictionary thread safe when using the thread ID of the current thread only to read or write? Like
import thread
import threading
class Thread(threading.Thread):
def __init__(self, data):
super(Thread, self).__init__()
self.data = data
def run(self):
data = self.data[thread.get_ident()]
# ...
If data is a standard Python dictionary, the __getitem__ call is implemented entirely in C, as is the __hash__ method on the integer value returned by thread.get_ident(). At that point the data.__getitem__(<thread identifier>) call is thread safe. The same applies to writing to data; the data.__setitem__() call is entirely handled in C.
The moment any of these hooks are implemented in Python code, the GIL can be released between bytecodes and all bets are off.
This all makes the assumption you are using CPython; Jython, IronPython, PyPy and other python implementations may make different decisions on when to switch threads.
You'd be better of using the threading.local() mapping object instead, as that is guaranteed to provide you with a thread-local namespace. It only supports attribute access though.

Thread in Python : class attribute (list) not thread-safe?

I am trying to understand Threads in Python.
The code
And now I have a problem, which I have surrounded in one simple class:
# -*- coding: utf-8 -*-
import threading
class myClassWithThread(threading.Thread):
__propertyThatShouldNotBeShared = []
__id = None
def __init__(self, id):
threading.Thread.__init__(self)
self.__id = id
def run(self):
while 1:
self.dummy1()
self.dummy2()
def dummy1(self):
if self.__id == 2:
self.__propertyThatShouldNotBeShared.append("Test value")
def dummy2(self):
for data in self.__propertyThatShouldNotBeShared:
print self.__id
print data
self.__propertyThatShouldNotBeShared.remove(data)
obj1 = myClassWithThread(1)
obj2 = myClassWithThread(2)
obj3 = myClassWithThread(3)
obj1.start()
obj2.start()
obj3.start()
Description
Here is what the class does :
The class has two attributes :
__id which is an identifier for the object, given when the constructor is called
__propertyThatShouldNotBeShared is a list and will contain a text value
Now the methods
run() contains an infinite loop in which I call dummy1() and then dummy2()
dummy1() which adds to attribute (list) __propertyThatShouldNotBeShared the value "Test value" only IF the __id of the object is equal to 2
dummy2() checks if the size of the list __propertyThatShouldNotBeShared is strictly superior to 0, then
for each value in __propertyThatShouldNotBeShared it prints the id of
the object and the value contained in __propertyThatShouldNotBeShared
then it removes the value
Here is the output that I get when I launch the program :
21
Test valueTest value
2
Test value
Exception in thread Thread-2:
Traceback (most recent call last):
File "E:\PROG\myFace\python\lib\threading.py", line 808, in __bootstrap_inner
self.run()
File "E:\PROG\myFace\myProject\ghos2\src\Tests\threadDeMerde.py", line 15, in run
self.dummy2()
File "E:\PROG\myFace\myProject\ghos2\src\Tests\threadDeMerde.py", line 27, in dummy2
self.__propertyThatShouldNotBeShared.remove(data)
ValueError: list.remove(x): x not in list
The problem
As you can see in the first line of the output I get this "1"...which means that, at some point, the object with the id "1" tries to print something on the screen...and actually it does!
But this should be impossible!
Only object with id "2" should be able to print anything!
What is the problem in this code ? Or what is the problem with my logic?
The problem is this:
class myClassWithThread(threading.Thread):
__propertyThatShouldNotBeShared = []
It defines one list for all objects which is shared. You should do this:
class myClassWithThread(threading.Thread):
def __init__(self, id):
self.__propertyThatShouldNotBeShared = []
# the other code goes here
There are two problems hereā€”the one you asked about, thread-safety, and the one you didn't, the difference between class and instance attributes.
It's the latter that's causing your actual problem. A class attribute is shared by all instances of the class. It has nothing to do with whether those instances are accessed on a single thread or on multiple threads; there's only one __propertyThatShouldNotBeShared that's shared by everyone. If you want an instance attribute, you have to define it on the instance, not on the class. Like this:
class myClassWithThread(threading.Thread):
def __init__(self, id):
self.__propertyThatShouldNotBeShared = []
Once you do that, each instance has its own copy of __propertyThatShouldNotBeShared, and each lives on its own thread, so there is no thread-safety issue possible.
However, your original code does have a thread-safety problem.
Almost nothing is automatically thread-safe (aka "synchronized"); exceptions (like queue.Queue) will say so explicitly, and be meant specifically for threaded programming.
You can avoid this in three ways:
Don't share anything.
Don't mutate anything you share.
Don't mutate anything you share unless it's protected by an appropriate synchronization object.
The last one is of course the most flexible, but also the most complicated. In fact, it's at the center of why people consider threaded programming hard.
The short version is, everywhere you modify or access shared mutable data like self.__propertyThatShouldNotBeShared, you need to be holding some kind of synchronization object, like a Lock. For example:
class myClassWithThread(threading.Thread):
__lock = threading.Lock()
# etc.
def dummy1(self):
if self.__id == 2:
with self.__lock:
self.__propertyThatShouldNotBeShared.append("Test value")
If you stick to CPython, and to built-in types, you can often get away with ignoring locks. But "often" in threaded programming is just a synonym for "always during testing and debugging, right up until the release or big presentation, when it suddenly begins failing". Unless you want to learn the rules for how the Global Interpreter Lock and the built-in types work in CPython, don't rely on this.
Class variables in Python are just that: shared by all instances of the class. You need an instance variable, which you usually define inside __init__. Remove the class-level declarations (and the double leading underscores, they're for name mangling which you don't need here.)

How to create a synchronized object with Python multiprocessing?

I am trouble figuring out how to make a synchronized Python object. I have a class called Observation and a class called Variable that basically looks like (code is simplified to show the essence):
class Observation:
def __init__(self, date, time_unit, id, meta):
self.date = date
self.time_unit = time_unit
self.id = id
self.count = 0
self.data = 0
def add(self, value):
if isinstance(value, list):
if self.count == 0:
self.data = []
self.data.append(value)
else:
self.data += value
self.count += 1
class Variable:
def __init__(self, name, time_unit, lock):
self.name = name
self.lock = lock
self.obs = {}
self.time_unit = time_unit
def get_observation(self, id, date, meta):
self.lock.acquire()
try:
obs = self.obs.get(id, Observation(date, self.time_unit, id, meta))
self.obs[id] = obs
finally:
self.lock.release()
return obs
def add(self, date, value, meta={}):
self.lock.acquire()
try:
obs = self.get_observation(id, date, meta)
obs.add(value)
self.obs[id] = obs
finally:
self.lock.release()
This is how I setup the multiprocessing part:
plugin = function defined somewhere else
tasks = JoinableQueue()
result = JoinableQueue()
mgr = Manager()
lock = mgr.RLock()
var = Variable('foobar', 'year', lock)
for person in persons:
tasks.put(Task(plugin, var, person))
Example of how the code is supposed to work:
I have an instance of Variable called var and I want to add an observation to var:
today = datetime.datetime.today()
var.add(today, 1)
So, the add function of Variable looks whether there already exists an observation for that date, if it does then it returns that observation else it creates a new instance of Observation. Having found an observation than the actual value is added by the call obs.add(value). My main concern is that I want to make sure that different processes are not creating multiple instances of Observation for the same date, that's why I lock it.
One instance of Variable is created and is shared between different processes using the multiprocessing library and is the container for numerous instances of Observation. The above code does not work, I get the error:
RuntimeError: Lock objects should only
be shared between processes through
inheritance
However, if I instantiate a Lock object before launching the different processes and supply it to the constructor of Variable then it seems that I get a race condition as all processes seem to be waiting for each other.
The ultimate goal is that different processes can update the obs variable in the object Variable. I need this to be threadsafe because I am not just modifying the dictionary in place but adding new elements and incrementing existing variables. the obs variable is a dictionary that contains a bunch of instances of Observation.
How can I make this synchronized where I share one single instance of Variable between numerous multiprocessing processes? Thanks so much for your cognitive surplus!
UPDATE 1:
* I am using multiprocessing Locks and I have changed the source code to show this.
* I have changed the title to more accurately capture the problem
* I have replaced theadsafe with synchronization where I was confusing the two terms.
Thanks to Dmitry Dvoinikov for pointing me out!
One question that I am still not sure about is where do I instantiate Lock? Should this happen inside the class or before initializing the multiprocesses and give it as an argument? ANSWER: Should happen outside the class.
UPDATE 2:
* I fixed the 'Lock objects should only be shared between processes through inheritance' error by moving the initialization of the Lock outside the class definition and using a manager.
* Final question, now everything works except that it seems that when I put my Variable instance in the queue then it does not get updated, and everytime I get it from the queue it does not contain the observation I added in the previous iteration. This is the only thing that is confusing me :(
UPDATE 3:
The final solution was to set the var.obs dictionary to an instance of mgr.dict() and then to have a custom serializer. Happy tho share the code with somebody who is struggling with this as well.
You are talking not about thread safety but about synchronization between separate processes and that's entirely different thing. Anyway, to start
different processes can update the obs variable in the object Variable.
implies that Variable is in shared memory, and you have to explicitly store objects there, by no magic a local instance becomes visible to separate process. Here:
Data can be stored in a shared memory map using Value or Array
Then, your code snippet is missing crucial import section. No way to tell whether you instantiate the right multiprocessing.Lock, not multithreading.Lock. Your code doesn't show the way you create processes and pass data around.
Therefore, I'd suggest that you realize the difference between threads and processes, whether you truly need a shared memory model for an application which contains multiple processes and examine the spec.

Categories

Resources