With reference to the following link: http://docs.python.org/faq/library.html#what-kinds-of-global-value-mutation-are-thread-safe
I wanted to know if the following:
(x, y) = (y, x)
will be guaranteed atomic in cPython. (x and y are both python variables)
Let's see:
>>> x = 1
>>> y = 2
>>> def swap_xy():
... global x, y
... (x, y) = (y, x)
...
>>> dis.dis(swap_xy)
3 0 LOAD_GLOBAL 0 (y)
3 LOAD_GLOBAL 1 (x)
6 ROT_TWO
7 STORE_GLOBAL 1 (x)
10 STORE_GLOBAL 0 (y)
13 LOAD_CONST 0 (None)
16 RETURN_VALUE
It doesn't appear that they're atomic: the values of x and y could be changed by another thread between the LOAD_GLOBAL bytecodes, before or after the ROT_TWO, and between the STORE_GLOBAL bytecodes.
If you want to swap two variables atomically, you'll need a lock or a mutex.
For those desiring empirical proof:
>>> def swap_xy_repeatedly():
... while 1:
... swap_xy()
... if x == y:
... # If all swaps are atomic, there will never be a time when x == y.
... # (of course, this depends on "if x == y" being atomic, which it isn't;
... # but if "if x == y" isn't atomic, what hope have we for the more complex
... # "x, y = y, x"?)
... print 'non-atomic swap detected'
... break
...
>>> t1 = threading.Thread(target=swap_xy_repeatedly)
>>> t2 = threading.Thread(target=swap_xy_repeatedly)
>>> t1.start()
>>> t2.start()
>>> non-atomic swap detected
Yes, yes it will.
I stand corrected.
Kragen Sitaker
writes:
Someone recommended using the idiom
spam, eggs = eggs, spam
to get a thread-safe swap. Does this really work? (...)
So if this thread loses control anywhere between the first LOAD_FAST
and the last STORE_FAST, a value could get stored by another thread
into "b" which would then be lost. There isn't anything keeping this
from happening, is there?
Nope. In general not even a simple
assignment is necessarily thread safe
since performing the assignment may
invoke special methods on an object
which themselves may require a number
of operations. Hopefully the object
will have internally locked its
"state" values, but that's not always
the case.
But it's really dictated by what
"thread safety" means in a particular
application, because to my mind there
are many levels of granularity of such
safety so it's hard to talk about
"thread safety". About the only thing
the Python interpreter is going to
give you for free is that a built-in
data type should be safe from internal
corruption even with native threading.
In other words if two threads have
a=0xff and a=0xff00, a will end up
with one or the other, but not
accidentally 0xffff as might be
possible in some other languages if a
isn't protected.
With that said, Python also tends to
execute in such a fashion that you can
get away with an awful lot without
formal locking, if you're willing to
live on the edge a bit and have
implied dependencies on the actual
objects in use. There was a decent
discussion along those lines here in
c.l.p a while back - search
groups.google.com for the "Critical
sections and mutexes" thread among
others.
Personally, I explicitly lock shared
state (or use constructs designed for
exchanging shared information properly
amongst threads, such as Queue.Queue)
in any multi-threaded application. To
my mind it's the best protection
against maintenance and evolution down
the road.
--
-- David
Python atomic for shared data types.
https://sharedatomic.top
The module can be used for atomic operations under multiple processs and multiple threads conditions. High performance python! High concurrency, High performance!
atomic api Example with multiprocessing and multiple threads:
You need the following steps to utilize the module:
create function used by child processes, refer to UIntAPIs, IntAPIs, BytearrayAPIs, StringAPIs, SetAPIs, ListAPIs, in each process, you can create multiple threads.
def process_run(a):
def subthread_run(a):
a.array_sub_and_fetch(b'\x0F')
threadlist = []
for t in range(5000):
threadlist.append(Thread(target=subthread_run, args=(a,)))
for t in range(5000):
threadlist[t].start()
for t in range(5000):
threadlist[t].join()
create the shared bytearray
a = atomic_bytearray(b'ab', length=7, paddingdirection='r', paddingbytes=b'012', mode='m')
start processes / threads to utilize the shared bytearray
processlist = []
for p in range(2):
processlist.append(Process(target=process_run, args=(a,)))
for p in range(2):
processlist[p].start()
for p in range(2):
processlist[p].join()
assert a.value == int.to_bytes(27411031864108609, length=8, byteorder='big')
Related
Intro:
Hello. I am exploring the python rxpy library for my use case - where I am building an execution pipeline using the reactive programming concepts. This way I expect I would not have to manipulate too many states. Though my solution seems to be functional, but I am having trouble trying to compose a new Observable from other Observables.
The problem is that the way I am composing my observables is causing some expensive calculations to be repeated twice. For performance, I really want to prevent triggering expensive calculations.
I am very new the reactive programming. Trying to scratch my head and have looked through internet resources and reference documentation - seems a little too terse for me to grasp. Please advice.
Following is a toy example which illustrates what I am doing:
import rx
from rx import operators as op
from rx.subject import Subject
root = Subject()
foo = root.pipe(
op.map( lambda x : x + 1 ),
op.do_action(lambda r: print("foo(x) = %s (expensive)" % str(r)))
)
bar_foo = foo.pipe(
op.map( lambda x : x * 2 ),
op.do_action(lambda r: print("bar(foo(x)) = %s" % str(r)))
)
bar_foo.pipe(
op.zip(foo),
op.map(lambda i: i[0]+i[1]),
op.do_action(lambda r: print("foo(x) + bar(foo(x)) = %s" % str(r)))
).subscribe()
print("-------------")
root.on_next(10)
print("-------------")
Output:
-------------
foo(x) = 11 (expensive)
bar(foo(x)) = 22
foo(x) = 11 (expensive)
foo(x) + bar(foo(x)) = 33
-------------
You could think of foo() and bar() to be expensive and complex operations. I first build an observable foo. Then compose a new observable bar_foo that incorporates foo. Later both are zipped together to calculate the final result foo(x)+bar(foo(x)).
Question:
What can I do to prevent foo() from getting triggered more than once for a single input?
I have really strong reasons to keep foo() and bar() separate. Also I also do not want to explicitly memoize foo().
Anyone with experience using rxpy in production could share their experiences. Will using rxpy lead to better performance or slowdowns as compared to an equivalent hand crafted (but unmaintainable) code?
Adding op.share() right after the expensive calculation in the foo pipeline could be useful here. So changing the foo pipeline to:
foo = root.pipe(
op.map( lambda x : x + 1 ),
op.do_action(lambda r: print("foo(x) = %s (expensive)" % str(r))),
op.share() # added to pipeline
)
will result in:
-------------
foo(x) = 11 (expensive)
bar(foo(x)) = 22
foo(x) + bar(foo(x)) = 33
-------------
I believe that .share() makes the emitted events of the expensive operation being shared among downstream subscribers, so that the result of a single expensive calculation can be used multiple times.
Regarding your second question; I am new to RxPy as well, so interested in the answer of more experienced users. Until now, I've noticed that as a beginner you can easily create (bad) pipelines where messages and calculations are repeated in the background. .share() seems to reduce this to some extend, but not sure about what is happening in the background.
For some reason, the execution time is still the same as without threading.
But if I add something like time.sleep(secs) there clearly is threading in work inside the target def d.
def d(CurrentPos, polygon, angale, id):
Returnvalue = 0
lock = True
steg = 0.0005
distance = 0
x = 0
y = 0
while lock == True:
x = math.sin(math.radians(angale)) * distance + CurrentPos[0]
y = math.cos(math.radians(angale)) * distance + CurrentPos[1]
Localpoint = Point(x, y)
inout = polygon.contains(Localpoint)
distance = distance + steg
if inout == False:
lock = False
l = LineString([[CurrentPos[0], CurrentPos[1]],[x,y]])
Returnvalue = list(l.intersection(polygon).coords)[0]
Returnvalue = calculateDistance(CurrentPos[0], CurrentPos[1],
Returnvalue[0], Returnvalue[1])
with Arraylock:
ReturnArray.append(Returnvalue)
ReturnArray.append(id)
def Main(CurrentPos, Map):
threads = []
for i in range(8):
t = threading.Thread(target = d, name ='thread{}'.format(i), args =
(CurrentPos, Map, angales[i], i))
threads.append(t)
t.start()
for i in threads:
i.join()
Welcome to the world of the Global Interpreter Lock a.k.a. GIL. Your function looks like a CPU bound code (some calculations, loops, ifs, memory access, etc.). You can't use threads to increase performance of CPU bound tasks, sorry. It is Python's limitation.
There are functions in Python that release GIL, e.g. disk i/o, network i/o and the one you've actually tried: sleep. And indeed, threads do increase performance of i/o bound tasks. But arithmetic and/or memory access won't run parallely in Python.
The standard workaround is to use processes instead of threads. But this is often painful due to not-that-easy interprocess communication. You may also want to consider using some low level libraries like numpy that actually releases GIL in certain situations (you can only do that at C level, GIL is not accessible from Python itself) or using some other language without this limitation, e.g. C#, Java, C, C++ and so on.
In python2, I would like to fill a global array by filling with parallel processes (or threads) different sub-arrays (there is a total 16 blocks). I must precise that each block doesn't depend of the others, I mean when I perfom the assignement of each cells of the current block.
1) From what I have found, I would have a great benefit from a CPU multi-cores by using different "processes" but it seems a little bit complicated to share the global array by all others processes.
2) From another point of view, I can use "threads" instead of "processes" since the implementation is less hard. I have found out the libray "ThreadPool" from "multiprocessing.dummy" allows to share this global array by all others concurrent threads.
For example, with python2.7, the following code works :
from multiprocessing.dummy import Pool as ThreadPool
## discretization along x-axis and y-axis for each block
arrayCross_k = np.linspace(kMIN, kMAX, dimPoints)
arrayCross_mu = np.linspace(-1, 1, dimPoints)
# Build all big matrix with N total blocks = dimBlock*dimBlock = 16 here
arrayFullCross = np.zeros((dimBlocks, dimBlocks, arrayCross_k.size, arrayCross_mu.size))
dimBlocks = 4
# Size of dimension along k and mu axis
dimPoints = 100
# dimension along one dimension of global arrayFullCross
dimMatCovCross = dimBlocks*dimPoints
# Build cross-correlation matrix
def buildCrossMatrix_loop(params_array):
# rows indices
xb = params_array[0]
# columns indices
yb = params_array[1]
# Current redshift
z = zrange[params_array[2]]
# Loop inside block
for ub in range(dimPoints):
for vb in range(dimPoints):
# Diagonal blocs
if (xb == yb):
# Fill the (xb,yb) su-block of global array by
arrayFullCross[xb][xb][ub][vb] = 2*P_obs_cross(arrayCross_k[ub], arrayCross_mu[vb] , z, 10**P_m(np.log10(arrayCross_k[ub])),
...
...
# End of function buildCrossMatrix_loop
# Main loop
while i < len(zrange):
def generatorCrossMatrix(index):
for igen in range(dimBlocks):
for lgen in range(dimBlocks):
yield igen, lgen, index
if __name__ == '__main__':
# Use 20 threads
pool = ThreadPool(20)
pool.map(buildCrossMatrix_loop, generatorCrossMatrix(i))
# Increment index "i"
i = i+1
But unfortunately, even by using 20 threads, I realize that the cores of my CPU are not fully running (actually, with 'top' or 'htop' command, I only see a single process at 100%).
3) What is the strategy that I have to chose if I want to full exploit the 16 cores of my CPU (like this is the case with pool.map(function, generator)) but with also the sharing of global array ?
4) some people told me to do I/O for each sub-array (basically, write each block in a file and gather all sub-arrays by reading them and get the full array filled). This solution is handy but I would like to avoid I/O (unless there is really not other solutions).
5) I have practised MPI library with C language and the operation of filling sub-array and finally gather them to build a big array, is not very complicated. However, I wouldn't like to use MPI with Python language (I don't know even if it exists).
6) I tried also to use Process with target equal to my filling function (buildCrossMatrix_loop) like this into while Main loop above :
from multiprocessing import Process
# Main loop on z range
while i < len(zrange):
params_p = []
for ip in range(4):
for jp in range(4):
params_p.append(ip)
params_p.append(jp)
params_p.append(i)
p = Process(target=buildCrossMatrix_loop, args=(params_p,))
params_p = []
p.start()
# Finished : wait everybody
p.join()
...
...
i = i+1
# End of main while loop
But the final 2D global array is filled only of zeros. So I must deduce that Process function doesn't share the array to fill ?
7) So which strategy I have to look for ? :
1. The using of "pool processes" and find a way to share the global array knowing all my 16-cores will be running
2. The using of "Threads" and share the global array but performances, at first sight, seems to be less good than with "pool processes". Maybe there is a way to increase the power of each "Threads", I mean like with "pool processes" ?
I tried to follow the different examples on https://docs.python.org/2/library/multiprocessing.html but without success, this is to say, without relevant performances from a speed-up point of view.
I think that in my case, the major issue is the gathering of all sub-arrays OR the fact that the global array arrayFullCross is not shared by other processes or threads.
If someone had a simple example of the sharing of global variable in a multi-threading context (here an array), this would nice to put it here.
UPDATE 1: I made test with the Threading (and not multiprocessing) but performances remain rather bad. GIL is not apparently unlocked, i.e only one process appears in htop command (maybe the version of Threading library is not the right one).
So I am going to try to handle my issue with using the "return" method.
Naively, I tried to return the whole array at the end of the function on which I apply the map function, like this :
# Build cross-correlation matrix
def buildCrossMatrix_loop(params_array):
# rows indices
xb = params_array[0]
# columns indices
yb = params_array[1]
# Current redshift
z = zrange[params_array[2]]
# Loop inside block
for ub in range(dimPoints):
for vb in range(dimPoints):
# Diagonal blocs
if (xb == yb):
arrayFullCross[xb][xb][ub][vb] = 2*P_obs_cross(arrayCross_k[ub], arrayCross_mu[vb])
...
... #others assignments on arrayFullCross elements
# Return global array to main process
return arrayFullCross
Then, I tried to receive this global array from map like this :
if __name__ == '__main__':
pool = Pool(16)
outputArray = pool.map(buildCrossMatrix_loop, generatorCrossMatrix(i))
pool.terminate()
## Print outputArray
print 'outputArray = ', outputArray
## Reshape 4D outputArray to 2D array
arrayFullCross2D_swap = np.array(outputArray).swapaxes(1,2).reshape(dimMatCovCross,dimMatCovCross)
Unfortunately, when I print the outputArray, I get :
outputArray = [None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None]
This is not the 4D outputArray expected, just a list of 16 None (I think that number of 16 correspond to the number of processes provided by generatorCrossMatrix(i)).
How could I get back the whole 4D array once map is launched and when it has finished ?
First of all I believe multiprocessing.ThreadPool is a private API so you should avoid it. Now multiprocessing.dummy is a useless module. It does not do any multithreading/processing that's why you don't see any benefit. You should use the "plain" multiprocessing module.
The second code does not work because it is using multiple processes. Processes do not share memory, so the changes you do in a subprocess are not reflected in the other subprocesses or the main process. You either want to:
Return the value and combine them together in the main process, for example using multiprocessing.Pool.map
Use threading instead of multiprocessing: just replaceimport multiprocessingwithimport threadingandmultiprocessing.Processwiththreading.Thread` and the code should work.
Note that the threading version will work only because numpy releases the GIL during computations, otherwise it would be stuck at 1 CPU.
You may want to look at this similar question which I answered a couple of minutes ago.
This may be a very easy question but definitely worn me out.
To use multiprocessing, I wrote the following code. the main function creates two processes which both use the same function , called prepare_input_data() but process different input datasets. this function must return multiple objects and values for each input to be used in the next steps of the code (not include here).
What I want is to get more than one value or object as a return from the function I am using in multiprocessing.
def prepare_input_data(inputdata_address,temporary_address, output):
p=current_process()
name = p.name
data_address = inputdata_address
layer = loading_layer(data_address)
preprocessing_object = Preprocessing(layer)
nodes= preprocessing_object.node_extraction(layer)
tree = preprocessing_object.index_nodes()
roundabouts_dict , roundabouts_tree= find_roundabouts(layer.address, layer, temporary_address)
#return layer, nodes, tree, roundabouts_dict, roundabouts_tree
#return [layer, nodes, tree, roundabouts_dict, roundabouts_tree]
output.put( [layer, nodes, tree, roundabouts_dict, roundabouts_tree])
if __name__ == '__main__':
print "the data preparation in multi processes starts here"
output=Queue()
start_time=time.time()
processes =[]
#outputs=[]
ref_process = Process(name ="reference", target=prepare_input_data, args=("D:/Ehsan/Skane/Input/Skane_data/Under_processing/identicals/clipped/test/NVDB_test3.shp", "D:/Ehsan/Skane/Input/Skane_data/Under_processing/temporary/",output))
cor_process = Process(name ="corresponding", target=prepare_input_data, args=("D:/Ehsan/Skane/Input/Skane_data/Under_processing/identicals/clipped/test/OSM_test3.shp", "D:/Ehsan/Skane/Input/Skane_data/Under_processing/temporary/",output))
#outputs.append(ref_process.start)
#outputs.append(cor_process.start)
ref_process.start
cor_process.start
processes.append(ref_process)
processes.append(cor_process)
for p in processes:
p.join()
print "the whole data preparation took ",time.time()-start_time
results={}
for p in processes:
results[p.name]=output.get()
########################
#ref_info = outputs[0]
# ref_nodes=ref_info[0]
Previous ERROR
when I use return,ref_info[0] has Nonetype.
ERROR:
based on the answer here I changed it to a Queueu object passed to the function then I used put() to add the results and get() to retrieve them for the further processing.
Traceback (most recent call last):
File "C:\Python27\ArcGISx6410.2\Lib\multiprocessing\queues.py", line 262, in _feed
send(obj)
UnpickleableError: Cannot pickle <type 'geoprocessing spatial reference object'> objects
Could you please help me solve how to return more than one value from a function in multiprocessing?
Parallel programming with shared state is a rocky road that even experienced programmers get wrong. A much more beginner-friendly method is to copy data around. This is the only way to move data between subprocesses (not quite true, but that's an advanced topic).
Citing https://docs.python.org/2/library/multiprocessing.html#exchanging-objects-between-processes, you'll want to setup a multiprocessing.Queue to fill with your returned data for each of your subprocesses. Afterward you can pass the queue to be read from to the next stage.
For multiple different datasets, such as your layer, nodes, tree, etc, you can use multiple queues to differentiate each return value. It may seem a bit cluttered to use a queue for each, but it's simple and understandable and safe.
Hope that helps.
if you use jpe_types.paralel's Process it will return the return value of the Processes target function like so
import jpe_types.paralel
def fun():
return 4, 23.4, "hi", None
if __name__ == "__main__":
p = jpe_types.paralel.Process(target = fun)
p.start()
print(p.join())
otherwise you could
import multiprocessing as mp
def fun(returner):
returner.send((1, 23,"hi", None))
if __name__ == "__main__":
processes = []
for i in range(2):
sender, recever = mp.Pipe()
p = mp.Process(target = fun, args=(sender,))
p.start()
processes.append((p, recever))
resses = []
for p, rcver in processes:
p.join()
resses.append(rcver.recv())
print(resses)
using the conection will garantee that the retun's don't get scrambeld
If you are looking to get multiple return values from multiprocessing, then you can do that. Here's a simple example, first in serial python, then with multiprocessing:
>>> a,b = range(10), range(10,0,-1)
>>> import math
>>> map(math.modf, (1.*i/j for i,j in zip(a,b)))
[(0.0, 0.0), (0.1111111111111111, 0.0), (0.25, 0.0), (0.42857142857142855, 0.0), (0.6666666666666666, 0.0), (0.0, 1.0), (0.5, 1.0), (0.3333333333333335, 2.0), (0.0, 4.0), (0.0, 9.0)]
>>>
>>> from multiprocessing import Pool
>>> res = Pool().imap(math.modf, (1.*i/j for i,j in zip(a,b)))
>>> for i,ai in enumerate(a):
... x,y = res.next()
... print("{x},{y} = modf({u}/{d})").format(x=x,y=y,u=ai,d=b[i])
...
0.0,0.0 = modf(0/10)
0.111111111111,0.0 = modf(1/9)
0.25,0.0 = modf(2/8)
0.428571428571,0.0 = modf(3/7)
0.666666666667,0.0 = modf(4/6)
0.0,1.0 = modf(5/5)
0.5,1.0 = modf(6/4)
0.333333333333,2.0 = modf(7/3)
0.0,4.0 = modf(8/2)
0.0,9.0 = modf(9/1)
So to get multiple values in the return from a function with multiprocessing, you only need to have a function that returns multiple values… you will just get the values back as a list of tuples.
The major issue with multiprocessing, as you can see from your error… is that most functions don't serialize. So, if you really want to do what it seems like you want to do… I'd strongly suggest you use pathos (as discussed below). The largest barrier you will have with multiprocessing is that the functions you are passing as the target must be serializable. There are several modifications you can make to your prepare_input_data function… the first of which is to make sure it is encapsulated. If your function is not fully encapsulated (e.g. it has name-reference lookups outside of it's own scope), then it probably won't pickle with pickle. That means, you need to include all imports inside the target function and pass any other variables in through the function input. The error you are seeing (UnPicklableError) is due to your target function and it's dependencies not being able be serialized -- and not that you can't return multiple values from multiprocessing.
While I'd encapsulate the target function anyway as a matter of good practice, it can be a bit tedious and could slow your code down a hair. I also suggest that you convert your code to use dill and pathos.multiprocessing -- dill is an advanced serializer that can pickle almost all python objects, and pathos provides a multiprocessing fork that uses dill. That way, you can pass most python objects in the pipe (i.e. apply) or the map that is available form the Pool object and not worry too much about sweating too hard refactoring your code to make sure plain old pickle and multiprocessing can handle it.
Also, I'd use an asynchronous map instead of doing what you are doing above. pathos.multiprocessing has the ability to take multiple arguments in the map function, so you don't need to wrap them in the tuple args as you've done above. The interface should be much cleaner with an asynchronous map, and you can return multiple arguments if you need to… just pack them in a tuple.
Here's some examples that should demonstrate what I'm referring to above.
Return multiple values:
>>> from pathos.multiprocessing import ProcessingPool as Pool
>>> def addsub(x,y):
... return x+y, x-y
...
>>> a,b = range(10),range(-10,10,2)
>>> res = Pool().imap(addsub, a, b)
>>>
>>> for i,ai in enumerate(a):
... add,sub = res.next()
... print("{a} + {b} = {p}; {a} - {b} = {m}".format(a=ai,b=b[i],p=add,m=sub))
...
0 + -10 = -10; 0 - -10 = 10
1 + -8 = -7; 1 - -8 = 9
2 + -6 = -4; 2 - -6 = 8
3 + -4 = -1; 3 - -4 = 7
4 + -2 = 2; 4 - -2 = 6
5 + 0 = 5; 5 - 0 = 5
6 + 2 = 8; 6 - 2 = 4
7 + 4 = 11; 7 - 4 = 3
8 + 6 = 14; 8 - 6 = 2
9 + 8 = 17; 9 - 8 = 1
>>>
Asynchronous map:
Python multiprocessing - tracking the process of pool.map operation
pathos:
Can't pickle <type 'instancemethod'> when using python's multiprocessing Pool.map()
pathos:
What can multiprocessing and dill do together?
We still can't run your code… but if you post code that can be run, it might be more possible help edit your code (using the pathos fork and the asynchronous map or otherwise).
FYI: A release for pathos is a little bit overdue (i.e. late), so if you want to try it, it's best to get the code here: https://github.com/uqfoundation
I need to use atomic CompareAndSet operation in my python program, but I didn't find reference about how to use it.
Does python provide such atomic function?
Thank you.
From the atomics library:
import atomics
a = atomics.atomic(width=4, atype=atomics.INT)
# set to 5 if a.load() compares == to 0
res = a.cmpxchg_strong(expected=0, desired=5)
print(res.success)
Note: I am the author of this library
Python atomic for shared data types.
https://sharedatomic.top
The module can be used for atomic operations under multiple processs and multiple threads conditions. High performance python! High concurrency, High performance!
atomic api Example with multiprocessing and multiple threads:
You need the following steps to utilize the module:
create function used by child processes, refer to UIntAPIs, IntAPIs, BytearrayAPIs, StringAPIs, SetAPIs, ListAPIs, in each process, you can create multiple threads.
def process_run(a):
def subthread_run(a):
a.array_sub_and_fetch(b'\x0F')
threadlist = []
for t in range(5000):
threadlist.append(Thread(target=subthread_run, args=(a,)))
for t in range(5000):
threadlist[t].start()
for t in range(5000):
threadlist[t].join()
create the shared bytearray
a = atomic_bytearray(b'ab', length=7, paddingdirection='r', paddingbytes=b'012', mode='m')
start processes / threads to utilize the shared bytearray
processlist = []
for p in range(2):
processlist.append(Process(target=process_run, args=(a,)))
for p in range(2):
processlist[p].start()
for p in range(2):
processlist[p].join()
assert a.value == int.to_bytes(27411031864108609, length=8, byteorder='big')