Python - Why does aws lambda run multiple threads so slowly?

Python - Why does aws lambda run multiple threads so slowly? - python

i am using AWS Lambda to run a python code
def get_likers(link):
#scrapes a site
def lambda_handler(event, context):
text = #gets link from a telegram bot message
checkt = threading.Thread(target=get_likers, args=[text])
checkt1 = threading.Thread(target=get_likers, args=["here's a link"])
checkt2 = threading.Thread(target=get_likers, args=["here's a link"])
checkt3 = threading.Thread(target=get_likers, args=["here's a link"])
checkt4 = threading.Thread(target=get_likers, args=["here's a link"])
checks = []
checks.append(checkt)
checks.append(checkt1)
checks.append(checkt2)
checks.append(checkt3)
checks.append(checkt4)
for thread in checks:
thread.start()
for thread in checks:
thread.join()
return {'statusCode': 200}
It should run the threads simoultaneously and finish fast, but while if i do this with just 1 thread it takes 3 seconds, with 5 threads it takes 7 seconds and with 20 threads 60 + seconds. Why is this happening? Each thread is kinda light and the data to scrape is the same for each thread

CPython threads I/O bound tasks well, but CPU bound tasks poorly.
And if you add one CPU bound thread to an otherwise I/O bound set of threads, they all start having problems.
I don't know the AWS Lambda specifics, but this could be what you're seeing.
Note that Python, the language, threads fine. It's implementations like CPython and Pypy that do not thread well. Jython and IronPython thread well.

Related

Using Multithreaded queue in python the correct way?

I am trying to use The Queue in python which will be multithreaded. I just wanted to know the approach I am using is correct or not. And if I am doing something redundant or If there is a better approach that I should use.
I am trying to get new requests from a table and schedule them using some logic to perform some operation like running a query.
So here from the main thread I spawn a separate thread for the queue.
if __name__=='__main__':
request_queue = SetQueue(maxsize=-1)
worker = Thread(target=request_queue.process_queue)
worker.setDaemon(True)
worker.start()
while True:
try:
#Connect to the database get all the new requests to be verified
db = Database(username_testschema, password_testschema, mother_host_testschema, mother_port_testschema, mother_sid_testschema, 0)
#Get new requests for verification
verify_these = db.query("SELECT JOB_ID FROM %s.table WHERE JOB_STATUS='%s' ORDER BY JOB_ID" %
(username_testschema, 'INITIATED'))
#If there are some requests to be verified, put them in the queue.
if len(verify_these) > 0:
for row in verify_these:
print "verifying : %s" % row[0]
verify_id = row[0]
request_queue.put(verify_id)
except Exception as e:
logger.exception(e)
finally:
time.sleep(10)
Now in the Setqueue class I have a process_queue function which is used for processing the top 2 requests in every run that were added to the queue.
'''
Overridding the Queue class to use set as all_items instead of list to ensure unique items added and processed all the time,
'''
class SetQueue(Queue.Queue):
def _init(self, maxsize):
Queue.Queue._init(self, maxsize)
self.all_items = set()
def _put(self, item):
if item not in self.all_items:
Queue.Queue._put(self, item)
self.all_items.add(item)
'''
The Multi threaded queue for verification process. Take the top two items, verifies them in a separate thread and sleeps for 10 sec.
This way max two requests per run will be processed.
'''
def process_queue(self):
while True:
scheduler_obj = Scheduler()
try:
if self.qsize() > 0:
for i in range(2):
job_id = self.get()
t = Thread(target=scheduler_obj.verify_func, args=(job_id,))
t.start()
for i in range(2):
t.join(timeout=1)
self.task_done()
except Exception as e:
logger.exception(
"QUEUE EXCEPTION : Exception occured while processing requests in the VERIFICATION QUEUE")
finally:
time.sleep(10)
I want to see if my understanding is correct and if there can be any issues with it.
So the main thread running in while True in the main func connects to database gets new requests and puts it in the queue. The worker thread(daemon) for the queue keeps on getting new requests from the queue and fork non-daemon threads which do the processing and since timeout for the join is 1 the worker thread will keep on taking new requests without getting blocked, and its child thread will keep on processing in the background. Correct?
So in case if the main process exit these won`t be killed until they finish their work but the worker daemon thread would exit.
Doubt : If the parent is daemon and child is non daemon and if parent exits does child exit?).
I also read here :- David beazley multiprocessing
By david beazley in using a Pool as a Thread Coprocessor section where he is trying to solve a similar problem. So should I follow his steps :-
1. Create a pool of processes.
2. Open a thread like I am doing for request_queue
3. In that thread
def process_verification_queue(self):
while True:
try:
if self.qsize() > 0:
job_id = self.get()
pool.apply_async(Scheduler.verify_func, args=(job_id,))
except Exception as e:
logger.exception("QUEUE EXCEPTION : Exception occured while processing requests in the VERIFICATION QUEUE")
Use a process from the pool and run the verify_func in parallel. Will this give me more performance?

While its possible to create a new independent thread for the queue, and process that data separately the way you are doing it, I believe it is more common for each independent worker thread to post messages to a queue that they already "know" about. Then that queue is processed from some other thread by pulling messages out of that queue.
Design Idea
The way I invision your application would be three threads. The main thread, and two worker threads. 1 worker thread would get requests from the database and put them in the queue. The other worker thread would process that data from the queue
The main thread would just waiting for the other threads to finish by using the thread functions .join()
You would protect queue that the threads have access to and make it thread safe by using a mutex. I have seen this pattern in many other designs in other languages as well.
Suggested Reading
"Effective Python" by Brett Slatkin has a great example of this very question.
Instead of inheriting from Queue, he just creates a wrapper to it in his class
called MyQueue and adds a get() and put(message) function.
He even provides the source code at his Github repo
https://github.com/bslatkin/effectivepython/blob/master/example_code/item_39.py
I'm not affiliated with the book or its author, but I highly recommend it as I learned quite a few things from it :)

I like this explanation of the advantages & differences between using threads and processes -
".....But there's a silver lining: processes can make progress on multiple threads of execution simultaneously. Since a parent process doesn't share the GIL with its child processes, all processes can execute simultaneously (subject to the constraints of the hardware and OS)...."
He has some great explanations for getting around GIL and how to improve performance
Read more here:
http://jeffknupp.com/blog/2013/06/30/pythons-hardest-problem-revisited/

Python threading with queue: how to avoid to use join?

I have a scenario with 2 threads:
a thread waiting for messages from a socket (embedded in a C library - blocking call is "Barra.ricevi") then putting an element on a queue
a thread waiting to get element from the queue and do something
Sample code
import Barra
import Queue
import threading
posQu = Queue.Queue(maxsize=0)
def threadCAN():
while True:
canMsg = Barra.ricevi("can0")
if canMsg[0] == 'ERR':
print (canMsg)
else:
print ("Enqueued message"), canMsg
posQu.put(canMsg)
thCan = threading.Thread(target = threadCAN)
thCan.daemon = True
thCan.start()
while True:
posMsg = posQu.get()
print ("Messagge from the queue"), posMsg
The result is that every time a new message is coming from the socket a new element is added to the queue, BUT the main thread that should get items from the queue is never woke up.
The output is as follow:
Enqueued message
Enqueued message
Enqueued message
Enqueued message
I expected to have:
Enqueued message
Messagge from the queue
Enqueued message
Messagge from the queue
The only way to solve this issue seams to add the line:
posQu.join()
at the end of the thread waiting for messages from the socket, and the line:
posQu.task_done()
at the end of the main thread.
In this case, after that a new message has been received from the socket, the thread is blocking waiting for the main thread to process the enqueued item.
Unfortunately this isn't the desired behavior since I would like a thread always ready to get messages from a socket and not waiting for a job to be compleated from another thread.
What I am doing wrong ?
Thanks
Andrew
(Italy)

This is likely because your Barra does not release the global interpreter lock (GIL) when Barra.ricevi. You may want to check this though.
The GIL ensures that only one thread can run at any one time (limiting the usefulness of threads in a multi-processor system). The GIL switches threads every 100 "ticks" -- a tick loosely mapping to bytecode instructions. See here for more details.
In your producer thread, not much happens outside of the C-library call. This means the producer thread will get to call Barra.ricevi a great many times before the GIL switches to another thread.
Solutions to this are to, in terms of increasing complexity:
Call time.sleep(0) after adding an item to the queue. This yields the thread so that another thread can run.
Use sys.setcheckinterval() to lower the amount of "ticks" executed before switching threads. This is will come at the cost of making the program much more computationally expensive.
Use multiprocessing rather than threading. This includes using multiprocessing.Queue instead of Queue.Queue.
Modify Barra so that it does release the GIL when its functions are called.
Example using multiprocessing. Be aware that when using multiprocessing, your processes no longer have an implied shared state. You will need to have a look at multiprocessing to see how to pass information between processes.
import Barra
import multiprocessing
def threadCAN(posQu):
while True:
canMsg = Barra.ricevi("can0")
if canMsg[0] == 'ERR':
print(canMsg)
else:
print("Enqueued message", canMsg)
posQu.put(canMsg)
if __name__ == "__main__":
posQu = multiprocessing.Queue(maxsize=0)
procCan = multiprocessing.Process(target=threadCAN, args=(posQu,))
procCan.daemon = True
procCan.start()
while True:
posMsg = posQu.get()
print("Messagge from the queue", posMsg)

How to reduce thread switching latency in Python

I have a Python 2.7 app that has 3 producer threads and 1 consumer thread that are connected to a Queue.queue. I'm using get and put, and the producer threads spend most of their time blocked in IO (reading from serial ports) - basically doing nothing. Basically calling serial.read()...
However, I seem to have what I would call a high latency between the time a producer thread puts to the queue and the time the consumer thread gets from the queue, like 25 ms (I'm running a 1 processor Beagle Bone Black (1GHz) on Angstrom Linux).
I would think that if all the processes are blocked, then the elapsed time between put and get should be really small, a few microseconds or so, not tens of milliseconds, except when the consumer thread is actually busy (which is not the case here).
I've read some things online that suggest that Python is guilty of busy spin, and that the GIL in Python is to blame. I guess I would rather not know the reason and just get something that is more responsive. I'm fine with the actual latency of serial transmission (about 1-2 ms).
The code looks basically like
q = Queue.queue
def a1():
while True:
p = read_serial_packet("/dev/ttyO1")
p.timestamp = time.time()
q.put(p)
def a2():
while True:
p = read_serial_packet("/dev/ttyO2")
p.timestamp = time.time()
q.put(p)
def a3():
while True:
p = read_serial_packet("/dev/ttyO3")
p.timestamp = time.time()
q.put(p)
def main():
while True:
p = q.get()
d = time.time() - p.timestamp
print str(d)
and there are 4 threads running a1, a2,a3 and main.
Here are some sample times
0.0119640827179
0.0178141593933
0.0154139995575
0.0192430019379
0.0185649394989
0.0225830078125
0.018187046051
0.0234098434448
0.0208261013031
0.0254039764404
0.0257620811462
Is this something that is "fixed" in Python 3?

As #fileoffset hinted, the answer seems to be switching from threading (which suffers from the fact that the Python GIL does not actually do "real" threading) to multiprocessing, which has several python processes instead of threads.
The conversion from threading to multiprocessing looks like this:
useMP = True # or False if you want threading
if useMP:
import multiprocessing
import multiprocessing.queues
import Queue # to import Queue.Empty exception, but don't use Queue.Queue
else:
import threading
import Queue
...
if useMP:
self.event_queue = multiprocessing.queues.Queue()
t1 = multiprocessing.Process(target=self.upstream_thread)
t2 = multiprocessing.Process(target=self.downstream_thread)
t3 = multiprocessing.Process(target=self.scanner_thread)
else :
self.event_queue = Queue.Queue()
t1 = threading.Thread(target=self.upstream_thread)
t2 = threading.Thread(target=self.downstream_thread)
t3 = threading.Thread(target=self.scanner_thread)
The rest of the API looks the same.
There is one other important issue though that was not easy to migrate and is left as an exercise. The issue is catch Unix signals, such as SIGINT or Ctrl-C handlers. Previously, the master thread catches the signal and all the other threads ignore it. Now, the signal is sent to all processes. So you have to be careful about catching KeyboardInterrupt and installing signal handlers. I don't think I did it the right way, so I am not going to elaborate... :)

You might try playing around with the value of the "check interval"
sys.setcheckinterval(50)
A brief explanation of the general concept can be found in these slides, starting around page 10.

Asynchronous multiprocessing with a worker pool in Python: how to keep going after timeout?

I would like to run a number of jobs using a pool of processes and apply a given timeout after which a job should be killed and replaced by another working on the next task.
I have tried to use the multiprocessing module which offers a method to run of pool of workers asynchronously (e.g. using map_async), but there I can only set a "global" timeout after which all processes would be killed.
Is it possible to have an individual timeout after which only a single process that takes too long is killed and a new worker is added to the pool again instead (processing the next task and skipping the one that timed out)?
Here's a simple example to illustrate my problem:
def Check(n):
import time
if n % 2 == 0: # select some (arbitrary) subset of processes
print "%d timeout" % n
while 1:
# loop forever to simulate some process getting stuck
pass
print "%d done" % n
return 0
from multiprocessing import Pool
pool = Pool(processes=4)
result = pool.map_async(Check, range(10))
print result.get(timeout=1)
After the timeout all workers are killed and the program exits. I would like instead that it continues with the next subtask. Do I have to implement this behavior myself or are there existing solutions?
Update
It is possible to kill the hanging workers and they are automatically replaced. So I came up with this code:
jobs = pool.map_async(Check, range(10))
while 1:
try:
print "Waiting for result"
result = jobs.get(timeout=1)
break # all clear
except multiprocessing.TimeoutError:
# kill all processes
for c in multiprocessing.active_children():
c.terminate()
print result
The problem now is that the loop never exits; even after all tasks have been processed, calling get yields a timeout exception.

The pebble Pool module has been built for solving these types of issue. It supports timeout on given tasks allowing to detect them and easily recover.
from pebble import ProcessPool
from concurrent.futures import TimeoutError
with ProcessPool() as pool:
future = pool.schedule(function, args=[1,2], timeout=5)
try:
result = future.result()
except TimeoutError:
print "Function took longer than %d seconds" % error.args[1]
For your specific example:
from pebble import ProcessPool
from concurrent.futures import TimeoutError
results = []
with ProcessPool(max_workers=4) as pool:
future = pool.map(Check, range(10), timeout=5)
iterator = future.result()
# iterate over all results, if a computation timed out
# print it and continue to the next result
while True:
try:
result = next(iterator)
results.append(result)
except StopIteration:
break
except TimeoutError as error:
print "function took longer than %d seconds" % error.args[1]
print results

Currently the Python does not provide native means to the control execution time of each distinct task in the pool outside the worker itself.
So the easy way is to use wait_procs in the psutil module and implement the tasks as subprocesses.
If nonstandard libraries are not desirable, then you have to implement own Pool on base of subprocess module having the working cycle in the main process, poll() - ing the execution of each worker and performing required actions.
As for the updated problem, the pool becomes corrupted if you directly terminate one of the workers (it is the bug in the interpreter implementation, because such behavior should not be allowed): the worker is recreated, but the task is lost and the pool becomes nonjoinable.
You have to terminate all the pool and then recreate it again for another tasks:
from multiprocessing import Pool
while True:
pool = Pool(processes=4)
jobs = pool.map_async(Check, range(10))
print "Waiting for result"
try:
result = jobs.get(timeout=1)
break # all clear
except multiprocessing.TimeoutError:
# kill all processes
pool.terminate()
pool.join()
print result
UPDATE
Pebble is an excellent and handy library, which solves the issue. Pebble is designed for the asynchronous execution of Python functions, where is PyExPool is designed for the asynchronous execution of modules and external executables, though both can be used interchangeably.
One more aspect is when 3dparty dependencies are not desirable, then PyExPool can be a good choice, which is a single-file lightweight implementation of Multi-process Execution Pool with per-Job and global timeouts, opportunity to group Jobs into Tasks and other features.
PyExPool can be embedded into your sources and customized, having permissive Apache 2.0 license and production quality, being used in the core of one high-loaded scientific benchmarking framework.

Try the construction where each process is being joined with a timeout on a separate thread. So the main program never gets stuck and as well the processes which if gets stuck, would be killed due to timeout. This technique is a combination of threading and multiprocessing modules.
Here is my way to maintain the minimum x number of threads in the memory. Its an combination of threading and multiprocessing modules. It may be unusual to other techniques like respected fellow members have explained above BUT may be worth considerable. For the sake of explanation, I am taking a scenario of crawling a minimum of 5 websites at a time.
so here it is:-
#importing dependencies.
from multiprocessing import Process
from threading import Thread
import threading
# Crawler function
def crawler(domain):
# define crawler technique here.
output.write(scrapeddata + "\n")
pass
Next is threadController function. This function will control the flow of threads to the main memory. It will keep activating the threads to maintain the threadNum "minimum" limit ie. 5. Also it won't exit until, all Active threads(acitveCount) are finished up.
It will maintain a minimum of threadNum(5) startProcess function threads (these threads will eventually start the Processes from the processList while joining them with a time out of 60 seconds). After staring threadController, there would be 2 threads which are not included in the above limit of 5 ie. the Main thread and the threadController thread itself. thats why threading.activeCount() != 2 has been used.
def threadController():
print "Thread count before child thread starts is:-", threading.activeCount(), len(processList)
# staring first thread. This will make the activeCount=3
Thread(target = startProcess).start()
# loop while thread List is not empty OR active threads have not finished up.
while len(processList) != 0 or threading.activeCount() != 2:
if (threading.activeCount() < (threadNum + 2) and # if count of active threads are less than the Minimum AND
len(processList) != 0): # processList is not empty
Thread(target = startProcess).start() # This line would start startThreads function as a seperate thread **
startProcess function, as a separate thread, would start Processes from the processlist. The purpose of this function (**started as a different thread) is that It would become a parent thread for Processes. So when It will join them with a timeout of 60 seconds, this would stop the startProcess thread to move ahead but this won't stop threadController to perform. So this way, threadController will work as required.
def startProcess():
pr = processList.pop(0)
pr.start()
pr.join(60.00) # joining the thread with time out of 60 seconds as a float.
if __name__ == '__main__':
# a file holding a list of domains
domains = open("Domains.txt", "r").read().split("\n")
output = open("test.txt", "a")
processList = [] # thread list
threadNum = 5 # number of thread initiated processes to be run at one time
# making process List
for r in range(0, len(domains), 1):
domain = domains[r].strip()
p = Process(target = crawler, args = (domain,))
processList.append(p) # making a list of performer threads.
# starting the threadController as a seperate thread.
mt = Thread(target = threadController)
mt.start()
mt.join() # won't let go next until threadController thread finishes.
output.close()
print "Done"
Besides maintaining a minimum number of threads in the memory, my aim was to also have something which could avoid stuck threads or processes in the memory. I did this using the time out function. My apologies for any typing mistake.
I hope this construction would help anyone in this world.
Regards,
Vikas Gautam

Python producer/consumer blocking indefinitely

I can't seem to figure out why my queue-based producer/consumer process is blocking and executing indefinitely:
def producer(q,urls):
for url in urls:
thread = ThreadChild(Profile.collection,Profile.collection,url,True)
thread.follow_links = follow
thread.start()
q.put(thread,True)
log.info('Done making threads')
def consumer(q,total_urls):
thread = q.get(True)
thread.join(timeout=40.0)
q.task_done()
q = Queue(2)
prod_thread = threading.Thread(target=producer, args=(q, urls))
cons_thread = threading.Thread(target=consumer, args=(q, len(urls)))
prod_thread.start()
cons_thread.start()
prod_thread.join(timeout=60.0)
cons_thread.join(timeout=60.0)
I've tried putting timeouts on both the producer and consumer threads, as well as the child process the producer spawns, and the process still run on an on indefinitely.
ThreadChild is a process that does some simple network jobs until it runs out of urls to process. The threads should not take long to execute. The var urls is just a list of urls for the thread to process. It's worthy of note that the log for 'Done making threads' never prints (log is a std python logger bound to a StreamHandler).
Shouldn't the timeouts defined for the producer and consumer threads terminate everything after 60 seconds, regardless of what's left in the queue? Have I misunderstood the use of these methods and structured the way things are added to the queue wrong?

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python - Why does aws lambda run multiple threads so slowly? - python

Related

Using Multithreaded queue in python the correct way?

Python threading with queue: how to avoid to use join?

How to reduce thread switching latency in Python

Asynchronous multiprocessing with a worker pool in Python: how to keep going after timeout?

Python producer/consumer blocking indefinitely

Categories

Resources