Using threads in flask web app

Using threads in flask web app - python

I am developing a web app which has a service/task which might take a long time to finish. I am new to python and read that python has GIL, which means at one time only one thread can run irrespective of number of cores.
my pseudo code is like this
def service_xxx(self, data):
thread = ThreadXXX(data)
thread.start()
self.threads[data.id] = thread
My questions is what happens when 100 requests comes ? Will flask framework run 100 user threads concurrently using all cores or its going to run 100 threads on single cores ?

Python (CPython) is not optimized for thread framework.You can keep allocating more resources and it will try spawning/queuing new threads and overloading the cores. You need to make a design change here:
Process based design:
Either use the multiprocessing module
Make use of rabbitmq and make this task run separately
Spawn a subprocess
Or if you still want to stick to threads:
Switch to PyPy (faster compared to CPython)
Switch to PyPy-STM (totally does away with GIL)

Related

Python: Threading within MultiProcess Function runs in the same core as the multiprocess one or the parent one?

def multiprocess_function():
run = 0
while run == 0:
for i in the range(100):
#This will initiate 100 threads
threading.Thread(target=sum, args=(i,0))
time.sleep(10)
p1 = multiprocessing.Process(target=multiprocess_function)
p1.start
In the above code snippet, I am starting a new infinite loop process (on a separate core (say #2)). Within this function, I launch 100 threads. Will the threads run on the same core #2 or it will run on the main python core?
Also, how many threads can you run on one core?

All threads run within the process they started from. In this case, the process p1.
With regard to how many threads you can run in a process, you have to keep in mind that in CPython, only one thread at a time can be executing Python bytecode. This is enforced by the Global Interpreter Lock ("GIL"). So for jobs that require a lot of calculations it is generally better to use processes and not threads.
If you look at the documentation for concurrent.futures.ThreadPoolExecutor, by default the number of worker threads that it uses is five times the amount of physical processors. That seams to be a reasonable amount for the kinds of workloads that the ThreadPoolExecutor is meant for.

Multiprocessing inside a child thread

I was learning about multi-processing and multi-threading.
From what I understand, threads run on the same core, so I was wondering if I create multiple processes inside a child thread will they be limited to that single core too?
I'm using python, so this is a question about that specific language but I would like to know if it is the same thing with other languages?

I'm not a pyhton expert but I expect this is like in other languages, because it's an OS feature in general.
Process
A process is executed by the OS and owns one thread which will be executed. This is in general your programm. You can start more threads inside your process to do some heavy calculations or whatever you have to do.
But they belong to the process.
Thread
One or more threads are owned by a process and execution will be distributed across all cores.
Now to your question
When you create a given number of threads these threads should in general be distributed across all your cores. They're not limited to the core who's executing the phyton interpreter.
Even when you create a subprocess from your phyton code the process can and should run on other cores.
You can read more about the gernal concept here:
Preemptive multitasking
There're some libraries in different languages who abstract a thread to something often called a Task or something else.
For these special cases it's possible that they're just running inside the thread they were created in.
For example. In the DotNet world there's a Thread and a Task. Often people are misusing the term thread when they're talking about a Task, which in general runns inside the thread it was created.

Every program is represented through one process. A process is the execution context one or multiple threads operate in. All threads in one process share the same tranche of virtual memory assigned to the process.
Python (refering to CPython, e.g. Jython and IronPython have no GIL) is special because it has the global interpreter lock (GIL), which prevents threaded python code from being run on multiple cores in parallel. Only code releasing the GIL can operate truely parallel (I/O operations and some C-extensions like numpy). That's why you will have to use the multiprocessing module for cpu-bound python-code you need to run in parallel. Processes startet with the multiprocessing module then will run it's own python interpreter instance so you can process code truely parallel.
Note that even a single threaded python-application can run on different cores, not in parallel but sequentially, in case the OS re-schedules execution to another core after a context switch took place.
Back to your question:
if I create multiple processes inside a child thread will they be limited to that single core too?
You don't create processes inside a thread, you spawn new independent python-processes with the same limitations of the original python process and on which cores threads of the new processes will execute is up to the OS (...as long you don't manipulate the core-affinity of a process, but let's not go there).

Why python does (not) use more CPUs? [duplicate]

I'm slightly confused about whether multithreading works in Python or not.
I know there has been a lot of questions about this and I've read many of them, but I'm still confused. I know from my own experience and have seen others post their own answers and examples here on StackOverflow that multithreading is indeed possible in Python. So why is it that everyone keep saying that Python is locked by the GIL and that only one thread can run at a time? It clearly does work. Or is there some distinction I'm not getting here?
Many posters/respondents also keep mentioning that threading is limited because it does not make use of multiple cores. But I would say they are still useful because they do work simultaneously and thus get the combined workload done faster. I mean why would there even be a Python thread module otherwise?
Update:
Thanks for all the answers so far. The way I understand it is that multithreading will only run in parallel for some IO tasks, but can only run one at a time for CPU-bound multiple core tasks.
I'm not entirely sure what this means for me in practical terms, so I'll just give an example of the kind of task I'd like to multithread. For instance, let's say I want to loop through a very long list of strings and I want to do some basic string operations on each list item. If I split up the list, send each sublist to be processed by my loop/string code in a new thread, and send the results back in a queue, will these workloads run roughly at the same time? Most importantly will this theoretically speed up the time it takes to run the script?
Another example might be if I can render and save four different pictures using PIL in four different threads, and have this be faster than processing the pictures one by one after each other? I guess this speed-component is what I'm really wondering about rather than what the correct terminology is.
I also know about the multiprocessing module but my main interest right now is for small-to-medium task loads (10-30 secs) and so I think multithreading will be more appropriate because subprocesses can be slow to initiate.

The GIL does not prevent threading. All the GIL does is make sure only one thread is executing Python code at a time; control still switches between threads.
What the GIL prevents then, is making use of more than one CPU core or separate CPUs to run threads in parallel.
This only applies to Python code. C extensions can and do release the GIL to allow multiple threads of C code and one Python thread to run across multiple cores. This extends to I/O controlled by the kernel, such as select() calls for socket reads and writes, making Python handle network events reasonably efficiently in a multi-threaded multi-core setup.
What many server deployments then do, is run more than one Python process, to let the OS handle the scheduling between processes to utilize your CPU cores to the max. You can also use the multiprocessing library to handle parallel processing across multiple processes from one codebase and parent process, if that suits your use cases.
Note that the GIL is only applicable to the CPython implementation; Jython and IronPython use a different threading implementation (the native Java VM and .NET common runtime threads respectively).
To address your update directly: Any task that tries to get a speed boost from parallel execution, using pure Python code, will not see a speed-up as threaded Python code is locked to one thread executing at a time. If you mix in C extensions and I/O, however (such as PIL or numpy operations) and any C code can run in parallel with one active Python thread.
Python threading is great for creating a responsive GUI, or for handling multiple short web requests where I/O is the bottleneck more than the Python code. It is not suitable for parallelizing computationally intensive Python code, stick to the multiprocessing module for such tasks or delegate to a dedicated external library.

Yes. :)
You have the low level thread module and the higher level threading module. But it you simply want to use multicore machines, the multiprocessing module is the way to go.
Quote from the docs:
In CPython, due to the Global Interpreter Lock, only one thread can
execute Python code at once (even though certain performance-oriented
libraries might overcome this limitation). If you want your
application to make better use of the computational resources of
multi-core machines, you are advised to use multiprocessing. However,
threading is still an appropriate model if you want to run multiple
I/O-bound tasks simultaneously.

Threading is Allowed in Python, the only problem is that the GIL will make sure that just one thread is executed at a time (no parallelism).
So basically if you want to multi-thread the code to speed up calculation it won't speed it up as just one thread is executed at a time, but if you use it to interact with a database for example it will.

I feel for the poster because the answer is invariably "it depends what you want to do". However parallel speed up in python has always been terrible in my experience even for multiprocessing.
For example check this tutorial out (second to top result in google): https://www.machinelearningplus.com/python/parallel-processing-python/
I put timings around this code and increased the number of processes (2,4,8,16) for the pool map function and got the following bad timings:
serial 70.8921644706279
parallel 93.49704207479954 tasks 2
parallel 56.02441442012787 tasks 4
parallel 51.026168536394835 tasks 8
parallel 39.18044807203114 tasks 16
code:
# increase array size at the start
# my compute node has 40 CPUs so I've got plenty to spare here
arr = np.random.randint(0, 10, size=[2000000, 600])
.... more code ....
tasks = [2,4,8,16]
for task in tasks:
tic = time.perf_counter()
pool = mp.Pool(task)
results = pool.map(howmany_within_range_rowonly, [row for row in data])
pool.close()
toc = time.perf_counter()
time1 = toc - tic
print(f"parallel {time1} tasks {task}")

Does Python support multithreading? Can it speed up execution time?

Yes. :)
You have the low level thread module and the higher level threading module. But it you simply want to use multicore machines, the multiprocessing module is the way to go.
Quote from the docs:
In CPython, due to the Global Interpreter Lock, only one thread can
execute Python code at once (even though certain performance-oriented
libraries might overcome this limitation). If you want your
application to make better use of the computational resources of
multi-core machines, you are advised to use multiprocessing. However,
threading is still an appropriate model if you want to run multiple
I/O-bound tasks simultaneously.

Threading is Allowed in Python, the only problem is that the GIL will make sure that just one thread is executed at a time (no parallelism).
So basically if you want to multi-thread the code to speed up calculation it won't speed it up as just one thread is executed at a time, but if you use it to interact with a database for example it will.

I feel for the poster because the answer is invariably "it depends what you want to do". However parallel speed up in python has always been terrible in my experience even for multiprocessing.
For example check this tutorial out (second to top result in google): https://www.machinelearningplus.com/python/parallel-processing-python/
I put timings around this code and increased the number of processes (2,4,8,16) for the pool map function and got the following bad timings:
serial 70.8921644706279
parallel 93.49704207479954 tasks 2
parallel 56.02441442012787 tasks 4
parallel 51.026168536394835 tasks 8
parallel 39.18044807203114 tasks 16
code:
# increase array size at the start
# my compute node has 40 CPUs so I've got plenty to spare here
arr = np.random.randint(0, 10, size=[2000000, 600])
.... more code ....
tasks = [2,4,8,16]
for task in tasks:
tic = time.perf_counter()
pool = mp.Pool(task)
results = pool.map(howmany_within_range_rowonly, [row for row in data])
pool.close()
toc = time.perf_counter()
time1 = toc - tic
print(f"parallel {time1} tasks {task}")

What are the advantages of multithreaded programming in Python?

When I hear about multithreaded programming, I think about the opportunity to accelerate my program, but it is not?
import eventlet
from eventlet.green import socket
from iptools import IpRangeList
class Scanner(object):
def __init__(self, ip_range, port_range, workers_num):
self.workers_num = workers_num or 1000
self.ip_range = self._get_ip_range(ip_range)
self.port_range = self._get_port_range(port_range)
self.scaned_range = self._get_scaned_range()
def _get_ip_range(self, ip_range):
return [ip for ip in IpRangeList(ip_range)]
def _get_port_range(self, port_range):
return [r for r in range(*port_range)]
def _get_scaned_range(self):
for ip in self.ip_range:
for port in self.port_range:
yield (ip, port)
def scan(self, address):
try:
return bool(socket.create_connection(address))
except:
return False
def run(self):
pool = eventlet.GreenPool(self.workers_num)
for status in pool.imap(self.scan, self.scaned_range):
if status:
yield True
def run_std(self):
for status in map(self.scan, self.scaned_range):
if status:
yield True
if __name__ == '__main__':
s = Scanner(('127.0.0.1'), (1, 65000), 100000)
import time
now = time.time()
open_ports = [i for i in s.run()]
print 'Eventlet time: %s (sec) open: %s' % (now - time.time(),
len(open_ports))
del s
s = Scanner(('127.0.0.1'), (1, 65000), 100000)
now = time.time()
open_ports = [i for i in s.run()]
print 'CPython time: %s (sec) open: %s' % (now - time.time(),
len(open_ports))
and results:
Eventlet time: -4.40343403816 (sec) open: 2
CPython time: -4.48356699944 (sec) open: 2
And my question is, if I run this code is not on my laptop but on the server and set more value of workers it will run faster than the CPython's version?
What are the advantages of threads?
ADD:
And so I rewrite app with use original cpython's threads
import socket
from threading import Thread
from Queue import Queue
from iptools import IpRangeList
class Scanner(object):
def __init__(self, ip_range, port_range, workers_num):
self.workers_num = workers_num or 1000
self.ip_range = self._get_ip_range(ip_range)
self.port_range = self._get_port_range(port_range)
self.scaned_range = [i for i in self._get_scaned_range()]
def _get_ip_range(self, ip_range):
return [ip for ip in IpRangeList(ip_range)]
def _get_port_range(self, port_range):
return [r for r in range(*port_range)]
def _get_scaned_range(self):
for ip in self.ip_range:
for port in self.port_range:
yield (ip, port)
def scan(self, q):
while True:
try:
r = bool(socket.create_conection(q.get()))
except Exception:
r = False
q.task_done()
def run(self):
queue = Queue()
for address in self.scaned_range:
queue.put(address)
for i in range(self.workers_num):
worker = Thread(target=self.scan,args=(queue,))
worker.setDaemon(True)
worker.start()
queue.join()
if __name__ == '__main__':
s = Scanner(('127.0.0.1'), (1, 65000), 5)
import time
now = time.time()
s.run()
print time.time() - now
and result is
Cpython's thread: 1.4 sec
And I think this is a very good result. I take as a standard nmap scanning time:
$ nmap 127.0.0.1 -p1-65000
Starting Nmap 5.21 ( http://nmap.org ) at 2012-10-22 18:43 MSK
Nmap scan report for localhost (127.0.0.1)
Host is up (0.00021s latency).
Not shown: 64986 closed ports
PORT STATE SERVICE
53/tcp open domain
80/tcp open http
443/tcp open https
631/tcp open ipp
3306/tcp open mysql
6379/tcp open unknown
8000/tcp open http-alt
8020/tcp open unknown
8888/tcp open sun-answerbook
9980/tcp open unknown
27017/tcp open unknown
27634/tcp open unknown
28017/tcp open unknown
39900/tcp open unknown
Nmap done: 1 IP address (1 host up) scanned in 0.85 seconds
And my question is now: how threads implemented in Eventlet as I can understand this is not threads but something special for Eventlet and why they dont speed up tasks?
Eventlet is used by many of the major projects like OpenStack and etc.
But why? Just do the heavy queries to a DB in asynchronous manner or something else?

Cpython threads:
Each cpython thread maps to an OS level thread (lightweight process/pthread in user space)
If there are many cpython threads executing python code concurrently: due to the global interpreter lock, only one cpython thread can interpret python at one time. The remaining threads will be blocked on the GIL when they need to interpret python instructions. When there are many python threads this slows things down a lot.
Now if your python code is spending most of its time inside networking operations (send, connect, etc): in this case there will be less threads fighting for GIL to interpret code. So the effect of GIL is not so bad.
Eventlet/Green threads:
From above we know that cpython has a performance limitation with threads. Eventlets tries to solve the problem by using a single thread running on a single core and using non blocking i/o for everything.
Green threads are not real OS level threads. They are a user space abstraction for concurrency. Most importantly, N green threads will map to 1 OS thread. This avoids the GIL problem.
Green threads cooperatively yield to each other instead of preemptively being scheduled.
For networking operations, the socket libraries are patched in run time (monkey patching) so that all calls are non-blocking.
So even when you create a pool of eventlet green threads, you are actually creating only one OS level thread. This single OS level thread will execute all the eventlets. The idea is that if all the networking calls are non blocking, this should be faster than python threads, in some cases.
Summary
For your program above, "true" concurrency happens to be faster (cpython version, 5 threads running on multiple processors ) than the eventlet model (single thread running on 1 processor.).
There are some cpython workloads that will perform badly on many threads/cores (e.g. if you have 100 clients connecting to a server, and one thread per client). Eventlet is an elegant programming model for such workloads, so its used in several places.

The title of your question is "What are the advantages of multithreaded programming in Python?" so I am giving you an example rather than try to solve your problem. I have a python program running on a pentium core duo I bought in 2005, running windows xp that downloads 500 csv files from finance.yahoo.com, each being about 2K bytes, one for each stock in the S&P 500. It uses the urllib2. If I do not use threads it takes over 2 minutes, using standard python threads (40 threads) it is between 3 to 4 seconds with an average of around 1/4 second each (this is wall clock time and includes compute and I/O). When I look at the start and stop times of each thread (wall clock) there is tremendous overlap. I have the same thing running as a java program and the performance is almost identical between python and java. Also same as c++ using curllib but curllib is just a tad slower than java or python. I am using standard python version 2.2.6

Python has a Global Interpreter Lock http://en.wikipedia.org/wiki/Global_Interpreter_Lock which prevents two threads from ever executing at the same time.
If you're using something like cython, the C portions can execute concurrently, which is why you see the speedup.
In pure python programs there's no performance benefit (in terms of amount of computation you can get done), but it's sometimes the easiest way to write code which does a lot of IO (e.g. leave a thread waiting for a socket read to finish while you do something else).

The main advantages of multithreaded programming, regardless of programming language are:
If you have a system with multiple CPUs or cores, then you can have all CPUs executing application code all at the same time. So for example, if you have a system with four CPUs, a process could potentially run up to 4 times faster with multithreading (though it is highly unlikely it will be that fast in most cases, since typical applications require threads to synchronize their access to shared resources, creation contention).
If the process needs to block for some reason (disk I/O, user input, network I/O) then while a thread or threads are blocked waiting for I/O completion other thread(s) can be doing other work. Note that for this type of concurrency you do not need multiple CPUs or cores, a process running on a single CPU can also benefit greatly from threading.
Whether these benefits can be applied to your process or not will largely depend on what your process does. In some cases you will get a considerable performance improvements, in other cases you won't and the threaded version might be slower. Note that writing good and efficient multithreaded apps is hard.
Now, since you are asking about Python in particular, let's discuss how these benefits apply to Python.
Due to the Global Interpreter Lock that is present in Python, running code in parallel in multiple CPUs is not possible. The GIL ensures that only one thread is interpreting Python code at a time, so there isn't really a way to take full advantage of multiple CPUs.
If a Python thread performs a blocking operation, another thread will get the CPU and continue to run, while the first thread is blocked waiting. When the blocking event completes, the blocked thread will resume. So this is a good reason to implement multithreading in a Python script (though it isn't the only way to achieve this type of concurrency, non-blocking I/O can achieve similar results).
Here are some examples that benefit from using multiple threads:
a GUI program that is doing a lengthy operation can have a thread that continues to keep the application window refreshed and responsive, maybe even showing a progress report on the long operation and a cancel button.
a process that needs to repeatedly read records from disk, then do some processing on them and finally write them back to disk can benefit from threading because while a thread is blocked waiting to get a record from disk another thread can be doing processing another record that was already read, and yet another thread can be writing another record back to disk. Without threads when the process is reading or writing to disk nothing else can happen. For a language that does not have a GIL (say C++) the benefit is even greater, as you can also have multiple threads, each running on a different core, all doing processing of different records.
I hope this helps!

Using the threading or multiprocessing modules enables you to use the multiple cores that are prevalent in modern CPUs.
This comes with a price; added complexity in your program needed to regulate access to shared data (especially writing); If one thread was iterating over a list while another thread was updating it, the result would be undetermined. This also applies to the internal data of the python interpreter.
Therefore the standard cpython has an important limitation with regard to using threads: only one thread at a time can be executing python bytecode.
If you want to paralellize a job that doesn't require a lot of communication between instances, multiprocessing (and especially multiprocessing.Pool) is often a better choice than threads because those jobs run in different processes that do not influence each other.

Adding threads won't necessarily make a process faster as there is an overhead associated with the management of the threads which may outweigh any performance gain you get from the threads.
If you are running this on a machine with few CPU's as opposed to one with many you may well find that it runs slower as it swaps each thread in and out of execution. There may be other factors at play as well. If the threads need access to some other subsystem or hardware that can't handle concurrent requests (a serial port for example) then multithreading won't help you improve performance.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.