Gevent multicore usage

Gevent multicore usage - python

I'm just started with python gevent and I was wondering about the cpu / mulitcore usage of the library.
Trying some examples doing many requests via the monkeypatched urllib I noticed, that they were running just on one core using 99% load.
How can I use all cores with gevent using python?
Is there best practice? Or are there any side-effects using multiple processes and gevent?
BR
dan

Gevent gives you the ability to deal with blocking requests. It does not give you the ability to run on multi-core.
There's only one greenlet (gevent's coroutine) running in a python process at any time. The real benefit of gevent is that it is very powerful when it deals with I/O bottlenecks (which is usually the case for general web apps, web apps serving API endpoints, web-based chat apps or backend and, in general, networked apps). When we do some CPU-heavy computations, there will be no performance-gain from using gevent. When an app is I/O bound, gevent is pure magic.
There is one simple rule: Greenlets get switched away whenever an I/O-operation would block or when you do the switch explicitly (e.g. with gevent.sleep() )
The built-in python threads actually behave in the same (pseudo) "concurrent" way as gevent's greenlets.
The key difference is this - greenlets use cooperative multitasking, where threads use preemptive multitasking. What this means is that a greenlet will never stop executing and "yield" to another greenlet unless it uses certain "yielding" functions (like gevent.socket.socket.recv or gevent.sleep).
Threads, on the other hand, will yield to other threads (sometimes unpredictably) based on when the operating system decides to swap them out.
And finally, to utilize multi-core in Python - if that's what you want - we have to depend on the multiprocessing module (which is a built-in module in Python). This "gets around GIL". Other alternatives include using Jython or executing tasks in parallel (on different CPUs) using a task queue, e.g. Zeromq.
I wrote a very long explanation here - http://learn-gevent-socketio.readthedocs.org/en/latest/. If you care to dive into the details. :-D

Related

Concurrency: multiprocessing, threading, greenthreads and asyncio

I'm currently working on Python project that receives a lot os AWS SQS messages (more than 1 million each day), process these messages, and send then to another SQS queue with additional data. Everything works fine, but now we need to speed up this process a lot!
From what we have seen, or biggest bottleneck is in regards to HTTP requests to send and receive messages from AWS SQS api. So basically, our code is mostly I/O bound due to these HTTP requests.
We are trying to escalate this process by one of the following methods:
Using Python's multiprocessing: this seems like a good idea, but our workers run on small machines, usually with a single core. So creating different process may still give some benefit, since the CPU will probably change process as one or another is stuck at an I/O operation. But still, that seems a lot of overhead of process managing and resources for an operations that doesn't need to run in parallel, but concurrently.
Using Python's threading: since GIL locks all threads at a single core, and threads have less overhead than processes, this seems like a good option. As one thread is stuck waiting for an HTTP response, the CPU can take another thread to process, and so on. This would get us to our desired concurrent execution. But my question is how dos Python's threading know that it can switch some thread for another? Does it knows that some thread is currently on an I/O operation and that he can switch her for another one? Will this approach absolutely maximize CPU usage avoiding busy wait? Do I specifically has to give up control of a CPU inside a thread or is this automatically done in Python?
Recently, I also read about a concept called green-threads, using Eventlet on Python. From what I saw, they seem the perfect match for my project. The have little overhead and don't create OS threads like threading. But will we have the same problems as threading referring to CPU control? Does a green-thread needs to warn the CPU that it may take another one? I saw on some examples that Eventlet offers some built-in libraries like Urlopen, but no Requests.
The last option we considered was using Python's AsyncIo and async libraries such as Aiohttp. I have done some basic experimenting with AsyncIo and wasn't very pleased. But I can understand that most of it comes from the fact that Python is not a naturally asynchronous language. From what I saw, it would behave something like Eventlet.
So what do you think would be the best option here? What library would allow me to maximize performance on a single core machine? Avoiding busy waits as much as possible?

Python Threading vs Gevent for High Volume Web Scraping

I'm trying to decide if I should use gevent or threading to implement concurrency for web scraping in python.
My program should be able to support a large (~1000) number of concurrent workers. Most of the time, the workers will be waiting for requests to come back.
Some guiding questions:
What exactly is the difference between a thread and a greenlet? What is the max number of threads \ greenlets I should create in a single process (with regard to the spec of the server)?

The python thread is the OS thread which is controlled by the OS which means it's a lot heavier since it needs context switch, but green threads are lightweight and since it's in userspace the OS does not create or manage them.
I think you can use gevent, Gevent = eventloop(libev) + coroutine(greenlet) + monkey patch. Gevent give you threads but without using threads with that you can write normal code but have async IO.
Make sure you don't have CPU bound stuff in your code.

I don't think you have thought this whole thing through. I have done some considerable lightweight thread apps with Greenlets created from the Gevent framework. As long as you allow control to switch between Greenlets with appropriate sleep's or switch's -- everything tends to work fine. Rather than blocking or waiting for a reply, it is recommended that the wait or block timeout, raise and except and then sleep (in except part of your code) and then loop again - otherwise you will not switch Greenlets readily.
Also, take care to join and/or kill all Greenlets, since you could end up with zombies that cause copious effects that you do not want.
However, I would not recommend this for your application. Rather, one of the following Websockets extensions that use Gevent... See this link
Websockets in Flask
and this link
https://www.shanelynn.ie/asynchronous-updates-to-a-webpage-with-flask-and-socket-io/
I have implemented a very nice app with Flask-SocketIO
https://flask-socketio.readthedocs.io/en/latest/
It runs through Gunicorn with Nginx very nicely from a Docker container. The SocketIO interfaces very nicely with Javascript on the client side.
(Be careful on the webscraping - use something like Scrapy with the appropriate ethical scraping enabled)

Python multiprocessing and concurency vs paralellism

I have created and application. In this application I use multiprocessing library. In that application I do spin two processes (instances of the same class) to consume data from Kafka and put into Python Queue.
This is the library I used:
Python multiprocessing
Q1. Is it concurrency or is it parallelism?
Q2. Is it multithreading or is it multiprocessing?
Q3. How does Python maps Processes to CPUs? (does this question make sense?)
I understand in order to speak about multithreading I need to use separate / multiple CPUs (so separate threads are mapped to separate CPU threads).
I understand in order to speak about multiprocessing I need to use separate memory space for both processes? Is it correct?
I assume if I spin two processes within one Application instance => we talk about concurrency.
If I spin multiple instances of above application then we would talk about parallelism? (multiple CPUs, separate memory spaces used)?
I see that Python library defines it as follows: Python multiprocessing library
The multiprocessing package offers both local and remote concurrency
...
Due to this, the multiprocessing module allows the programmer to fully leverage multiple processors on a given machine.
...
A prime example of this is the Pool object which offers a convenient means of parallelizing the execution of a function across multiple input values, distributing the input data across processes (data parallelism).

First, separate threads are not mapped to separate CPU-s. That's optional, and in python due to the GIL, all threads in a process will run on the same CPU
1) It's both concurrency, in that the order of execution is not set, and parallelism, since the multiprocessing package can run on multiple processors, bypassing the GIL limitations.
2) Since the threading package is another story, then it's definitely multiprocessing
3) I may be speaking out of line, but python , IMO does NOT map processes to CPU-s, it leaves this detail to the OS

Q1: It is at least concurrency, can be parallelism as well (terms intended as defined in the answer to this question). Clearly, if you have one processor only, true parallellism cannot be achieved, becuse only one process can use the CPU at a single time. In that case, however, the muliprocessing library still allows you to define multiple tasks, that run in separate processes. It will be the OS's scheduler to decide which process runs when.
Q2: Multiprocessing (...which is kind of implied by the library name). Due to the Global Interpreter Lock present in most Python interpreter implementations, parallelism with threads is impossible. Multiprocessing offers a threading-like interface that makes use of processes under the hood.
Q3: It doesn't. Python spawns processes, the OS scheduler decided who runs where and when. There are some ways to execute processes on specific CPUs, but this is not the default behaviour of multiprocessing (and I'm not aware of any way to force the library to pin processes to CPUs).

Flask and scaling and concurrency

In the documentation I see the following:
There is only one limiting factor regarding scaling in Flask which are
the context local proxies. They depend on context which in Flask is
defined as being either a thread, process or greenlet. If your server
uses some kind of concurrency that is not based on threads or
greenlets, Flask will no longer be able to support these global
proxies. However the majority of servers are using either threads,
greenlets or separate processes to achieve concurrency which are all
methods well supported by the underlying Werkzeug library.
My question: What other concurrent mechanisms are there other than these 3 methods?

One pretty interesting concurrency mechanism is the asynchronous model. You have a single process with a single thread running the whole show, with all the I/O or otherwise lengthy tasks being asynchronous and callback based. This method scales really well for I/O bound services, servers in this category easily handle the C10K problem.
See Tornado or node.js for examples.

Concurrency Testing For A Web Service Using Python

I have a web service that is required to handle significant concurrent utilization and volume and I need to test it. Since the service is fairly specialized, it does not lend itself well to a typical testing framework. The test would need to simulate multiple clients concurrently posting to a URL, parsing the resulting Http response, checking that a database has been appropriately updated and making sure certain emails have been correctly sent/received.
The current opinion at my company is that I should write this framework using Python. I have never used Python with multiple threads before and as I was doing my research I came across the Global Interpreter Lock which seems to be the basis of most of Python's concurrency handling. It seems to me that the GIL would prevent Python from being able to achieve true concurrency even on a multi-processor machine. Is this true? Does this scenario change if I use a compiler to compile Python to native code? Am I just barking up the wrong tree entirely and is Python the wrong tool for this job?

The Global Interpreter Lock prevents threads simultaneously executing Python code. This doesn't change when Python is compiled to bytecode, because the bytecode is still run by the Python interpreter, which will enforce the GIL. threading works by switching threads every sys.getcheckinterval() bytecodes.
This doesn't apply to multiprocessing, because it creates multiple Python processes instead of threads. You can have as many of those as your system will support, running truly concurrently.
So yes, you can do this with Python, either with threading or multiprocessing.

you can use python's multiprocessing library to achieve this.
http://docs.python.org/library/multiprocessing.html

Assuming general network conditions, as long you have sufficient system resources Python's regular threading module will allow you to simulate concurrent workload at an higher rate than any a real workload.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.