from threading import Thread
def test_first(*args):
'''
some code
:param args:
:return:
'''
td_num = Thread(target=test_first([1,2,3,4,5]))
td_char = Thread(target=test_first(['A','B','C','D','E']))
td_welcome = Thread(target=test_first("Welcome"))
td_num.start()
td_char.start()
td_welcome.start()
td_num.join()
td_char.join()
td_welcome.join()
I have one function which i'm calling from multiple threads, but instead of executing in parallel it is executing in sequence. Any suggestion what wrong in this?
You are calling the functions when you pass them to Thread.
This:
Thread(target=test_first([1,2,3,4,5]))
first calls the function test_first(), then passes the result of the function to Thread(). That means the function is called and complete before the thread is created.
The docs are pretty clear:
target is the callable object to be invoked by the run() method.
Defaults to None, meaning nothing is called.
You should instead pass the callable directly to the Thread and pass the arguments in separately:
td_num = Thread(target=test_first, args=([1,2,3,4,5],))
Related
I was writing some multithreading code and had a syntax issue in my code and found that the code was not executing in parallel but rather sequentially. I fixed the issue to pass the arguments to the function as a separate list instead of passing it as a parameter to the function but I couldn't figure out why python was behaving that way and couldn't find documentation for it. Anyone know why?
import time
from concurrent.futures import ThreadPoolExecutor
def do_work(i):
print("{} {} - Command started".format(i, time.time()))
time.sleep(1)
count = 0
executor = ThreadPoolExecutor(max_workers=2)
while count < 5:
print("Starting work")
executor.submit(do_work(count))
print("Work submitted")
count += 1
Fixed this line to make it go parallel.
executor.submit(do_work, count)
You were telling Python to execute the function do_work(), and to then pass whatever that function returned, to executor.do_work():
executor.submit(do_work(count))
It might be easier for you to see this if you used a variable to hold the result of do_work(). The following is functionally equivalent to the above:
do_work_result = do_work(count)
executor.submit(do_work_result)
In Python, functions are first-class objects; using just the name do_work you are referencing the function object. Only adding (...) to an expression that produces a function object (or another callable object type) causes something to be executed.
In the form
executor.submit(do_work, count)
you do not call the function. You are passing in the function object itself as the first argument, and count as the second argument. The executor.submit() function accepts callable objects and their arguments to then later on run those functions in parallel, with the arguments provided.
This allows the ThreadPoolExecutor to take that function reference and the single argument and only call the function in a new thread, later on.
Because you were calling the function first, you had to wait for each function to complete first as you called it sequentially before adding. And because the functions return None, you were adding those None references to executor.submit(), and would have seen a TypeError exception later on to tell you that 'NoneType' object is not callable. That happens because the threadpool executor tried to use None(), which doesn't work because indeed, None is not a callable.
Under the hood, the library essentially does this:
def submit(self, fn, *args, **kwargs):
# record the function to be called as a work item, with other information
w = _WorkItem(..., fn, args, kwargs)
self._work_queue.put(w)
so a work item referencing the function and arguments is added to a queue. Worker threads are created which take items from the queue again it is taken from the queue (in another thread, or a child process), the _WorkItem.run() method is called, which runs your function:
result = self.fn(*self.args, **self.kwargs)
Only then the (...) call syntax is used. Because there are multiple threads, the code is executed concurrently.
You do want to read up on how pure Python code can't run in parallel, only concurrently: Does Python support multithreading? Can it speed up execution time?
Your do_work() functions only run 'faster' because time.sleep() doesn't have to do any actual work, apart from telling the kernel to not give any execution time to the thread the sleep was executed on, for the requested amount of time. You end up with a bunch of threads all asleep. If your workers had to execute Python instructions, then the total time spent on running these functions concurrently or sequentially would not differ all that much.
I have a function which accepts both regular and asynchronous functions (not coroutines, but functions returning coroutines).
Internally it uses asyncio.iscoroutinefunction() test to see which type of function it got.
Recently it broke down when I attempted to create a partial async function.
In this demonstration, ptest is not recognized as a couroutine function, even if it returns a coroutine, i.e. ptest() is a coroutine.
import asyncio
import functools
async def test(arg): pass
print(asyncio.iscoroutinefunction(test)) # True
ptest = functools.partial(test, None)
print(asyncio.iscoroutinefunction(ptest)) # False!!
print(asyncio.iscoroutine(ptest())) # True
The problem cause is clear, but the solution is not.
How to dynamically create a partial async func which passes the test?
OR
How to test the func wrapped inside a partial object?
Either answer would solve the problem.
Using Python versions < 3.8 you can't make a partial() object pass that test, because the test requires there to be a __code__ object attached directly to the object you pass to inspect.iscoroutinefunction().
You should instead test the function object that partial wraps, accessible via the partial.func attribute:
>>> asyncio.iscoroutinefunction(ptest.func)
True
If you also need to test for partial() objects, then test against functools.partial:
def iscoroutinefunction_or_partial(object):
while isinstance(object, functools.partial):
object = object.func
return inspect.iscoroutinefunction(object)
In Python 3.8 (and newer), the relevant code in the inspect module (that asyncio.iscoroutinefunction() delegates to) was updated to handle partial() objects, and you no longer have to unwrap partial() objects yourself. The implementation uses the same while isinstance(..., functools.partial) loop.
I solved this by replacing all instances of partial with async_partial:
def async_partial(f, *args):
async def f2(*args2):
result = f(*args, *args2)
if asyncio.iscoroutinefunction(f):
result = await result
return result
return f2
I'm currently writing code in Python 2.7, which involves creating an object, in which I have two class methods and other regular methods. I need to use this specific combination of methods because of the larger context of the code I am writing- it's not relevant to this question, so I won't go into depth.
Within my __init__ function, I am creating a Pool (a multiprocessing object). In the creation of that, I call a setup function. This setup function is a #classmethod. I define a few variables in this setup function by using the cls.variablename syntax. As I mentioned, I call this setup function within my init function (inside the Pool creation), so these variables should be getting created, based on what I understand.
Later in my code, I call a few other functions, which eventually leads to me calling another #classmethod within the same object I was talking about earlier (same object as the first #classmethod). Within this #classmethod, I try to access the cls.variables I created in the first #classmethod. However, Python is telling me that my object doesn't have an attribute "cls.variable" (using general names here, obviously my actual names are specific to my code).
ANYWAYS...I realize that's probably pretty confusing. Here's some (very) generalized code example to illustrate the same idea:
class General(object):
def __init__(self, A):
# this is correct syntax based on the resources I'm using,
# so the format of argument isn't the issue, in case anyone
# initially thinks that's the issue
self.pool = Pool(processes = 4, initializer=self._setup, initargs= (A, )
#classmethod
def _setup(cls, A):
cls.A = A
#leaving out other functions here that are NOT class methods, just regular methods
#classmethod
def get_results(cls):
print cls.A
The error I'm getting when I get to the equivalent of the print cls.A line is this:
AttributeError: type object 'General' has no attribute 'A'
edit to show usage of this code:
The way I'm calling this in my code is as such:
G = General(5)
G.get_results()
So, I'm creating an instance of the object (in which I create the Pool, which calls the setup function), and then calling get_results.
What am I doing wrong?
The reason General.A does not get defined in the main process is that multiprocessing.Pool only runs General._setup in the subprocesses. This means that it will not be called in the main process (where you call Pool).
You end up with 4 processes where in each of them there is General.A is defined, but not in the main process. You don't actually initialize a Pool like that (see this answer to the question How to use initializer to set up my multiprocess pool?)
You want an Object Pool which is not natively impemented in Python. There's a Python Implementation of the Object Pool Design Pattern question here on StackOverflow, but you can find a bunch by just searching online.
In Python, what do you do if you are using a multiprocessing and you need to give the function an extra agruement?
Example:
if value == "Y":
pool = multiprocessing.Pool(processes=8)
pool.map(verify_headers, url_list)<-need to give parameter for a password
pool.close()
pool.join()
print "Done..."
and the function would be something like:
def verify_headers(url, password):
pass
Pool.map takes a function of one argument and an iterable to produce that argument. We can turn your function of two arguments into a function of one argument by wrapping it in another function body:
def verify_headers_with_passowrd(url):
return verify_headers(url, 'secret_password')
And pass that to pool.map instead:
pool.map(verify_headers_with_password, url_list)
so long as verify_headers can take password as a keyword argument, we can shorten that a little: you can use functools.partial
pool.map(functools.partial(verify_headers, password='secret_password'), url_list)
Edit: as Bakuriu points out, multiprocessing passes data round by pickling, so the following doesn't work:
pool.map(lambda url: verify_headers(url, 'secret_password'), url_list)
Since lambda's are functions without a name, and pickle serialzes functions by name.
i believe
from functools import partial
and
pool.map(partial(verify_headers,password=password),url_list)
should work?
edit: fixed based on recommendations below
You define a function, right after the original, that accepts as argument a 2-element tuple:
def verify_headers_tuple(url_passwd):
return verify_headers(*url_passwd)
Then you can zip the original url_list with itertools.repeat(password):
pool.map(verify_headers_tuple, it.izip(url_list, it.repeat(password)))
Note that the function passed to Pool.map must be defined at the top level of a module(due to pickling restrictions), which means you cannot use partial or lambda to create a "curried function".
I'm trying to run some simple threading in Python using:
t1 = threading.Thread(analysis("samplequery"))
t1.start()
other code runs in here
t1.join()
Unforunately I'm getting the error:
"AssertionError: group argument must be none for now"
I've never implemented threading in Python before, so I'm a bit unsure as to what's going wrong. Does anyone have any idea what the problem is?
I'm not sure if it's relevant at all, but analysis is a method imported from another file.
I had one follow up query as well. Analysis returns a dictionary, how would I go about assigning that for use in the original method?
Thanks
You want to specify the target keyword parameter instead:
t1 = threading.Thread(target=analysis("samplequery"))
You probably meant to make analysis the run target, but 'samplequery the argument when started:
t1 = threading.Thread(target=analysis, args=("samplequery",))
The first parameter to Thread() is the group argument, and it currently only accepts None as the argument.
From the threading.Thread() documentation:
This constructor should always be called with keyword arguments. Arguments are:
group should be None; reserved for future extension when a ThreadGroup class is implemented.
target is the callable object to be invoked by the run() method. Defaults to None, meaning nothing is called.
You need to provide the target attribute:
t1 = threading.Thread(target = analysis, args = ('samplequery',))