For example - I have some class that internally uses Gevent for concurrency, and it takes a callback, what if user of my class will use Python threading or multiprocessing modules? Will this lead the program to disaster?
UPDATE
Some more details:
My class is a custom protocol implementation, so it involves lots of reading/writing with sockets. Protocol includes such feature as option to send multiple requests (within single TCP connection) in order to load multiple cores on a server, and then wait for responses. That's why I need concurrency. This class is a library which will be part of a large project, and it can be used in every possible crazy way. I'm also new to Gevent and I barely aware about monkey-patching feature.
Related
I have the following scenario:
There is one thread that manages long-polling HTTP connection (non-stop) from an API. When a new message arrives, it must be processed within the special process() method.
I just want to design it in a way that incoming messages will be processed concurrently, but there is another important point: in the end of each processing an answer should be passed to the outcoming queue, which is organized in a separated thread. From there the answers will be sent via HTTP.
Here is a scheme:
Let's consider that it can be 30-50 messages in a second, and procces method will work from 1 up to 10 seconds.
The question is: what library or framework can I use to implement this architecture?
As far as I have researched, Python Tornado have good benchmarks, but here I do not need a web framework, just a tool that can provide a concurrent running of message processors.
Your message rate is pretty low. So you may freely use "standard" tools like RabbitMQ/Redis, Celery ("Celery Project") and asyncio.
RabbitMQ/Redis with Celery - are great tools to implement queues and manage your tasks and processes.
Asyncio is faster than Tornado but it doesn't matter for your task. What is more important is that asyncio gives you all the benefits of modern async/await coroutine technique.
I want some explanation about a something to do with sockets...
Suppose I create a Socket (Server and Client) for chatting, every client of this socket will receive data from the server and send data to the server, which will send data to all the clients, simultaneously. How can the server accept all the connections simultaneously?
I know that with the module "socket" there are 3 method:
create more threads with module "threading" but it isn't the best way
create more processes with module multiprocessing
use the module select
What is the best way?
what is the difference between using select and using multiprocessing?
Just some generalities based on my very limited experience with socket programming.
They are two completely different ways of handling IO.
select is often used to achieve non blocking IO, usually in a single thread. Tornado is a mature example of a framework around this.
http://www.tornadoweb.org/en/stable/, Tornado uses select (or equivalent internally)
Using select has the advantage of not having to worry about multithreaded/process programming, using os to notify of file descriptor changes allows a single thread to handle many hundreds or thousands or tens of thousands of open sockets.
Threading is a great way of dealing with io as well. Because the thread will not be cpu bound it is often acceptable and performant to spawn many io bound threads. Since the threads will be spending most of their time waiting on IO there *shouldn't be much overhead.
I would def look at tornado, as it has a chat example that is trivial to create
There are many many examples, blogs and tutorials of chat servers, performant python webservers and socket programming in python on google
I have implemented a server program using Twisted. I am using twisted.protocols.basic.LineReceiver along with twisted.internet.protocol.ServerFactory.
I would like to have each client that connects to the server run a set of functions in parallel (I'm thinking of multi-threading for this).
I have some confusion with using twisted.internet.threads.deferToThread for this problem.
Should I call deferToThread in the ServerFactory for this purpose?
Are twisted threads, thread-safe with respect to race conditions?
Previously, I tried using multiprocessing in my server program but it seemed not to work in combination with the Twisted reactor, while deferToThread did the job.
I'm wondering how are Twisted threads implemented? Don't they utilize multiprocessing?
Previously, I tried using multiprocessing in my server program but it seemed not to work in combination with the Twisted reactor, while deferToThread did the job. I'm wondering how are Twisted threads implemented? Don't they utilize multiprocessing?
You didn't say whether you used the multi-threaded version of multiprocessing or the multi-process version of multiprocessing.
You can read about mixing Twisted and multiprocessing on Stack Overflow, though:
Mix Python Twisted with multiprocessing?
Twisted network client with multiprocessing workers?
is twisted incompatible with multiprocessing events and queues?
(And more)
To answer the shorter part of this question - no, Twisted does not use the stdlib multiprocessing package to implement its threading APIs. It uses the stdlib threading module.
Are twisted threads, thread-safe with respect to race conditions?
The answer to this is implied by the above answer: no. "Twisted threads" aren't really a thing. Twisted's threading APIs are just a layer on top of the stdlib threading module (which is really just a Python API for POSIX threads (or something kind of similar but different on Windows). Twisted's threading APIs don't magically eliminate the possibility of race conditions (if there is any magic in Twisted, it is the ability to do certain things concurrently without using threads at all - which helps reduce the number of race conditions in your program, though it doesn't entirely eliminate the possibility of creating them).
Should I call deferToThread in the ServerFactory for this purpose?
I'm not quite sure what the point of this question is. Are you wondering if a method on your ServerFactory subclass is the best place to put your calls to deferToThread? That probably depends on the details of your implementation approach. It probably doesn't make a huge difference overall, though. If you like the pattern of having the factory provide services to protocol instances - go for it.
In the documentation I see the following:
There is only one limiting factor regarding scaling in Flask which are
the context local proxies. They depend on context which in Flask is
defined as being either a thread, process or greenlet. If your server
uses some kind of concurrency that is not based on threads or
greenlets, Flask will no longer be able to support these global
proxies. However the majority of servers are using either threads,
greenlets or separate processes to achieve concurrency which are all
methods well supported by the underlying Werkzeug library.
My question: What other concurrent mechanisms are there other than these 3 methods?
One pretty interesting concurrency mechanism is the asynchronous model. You have a single process with a single thread running the whole show, with all the I/O or otherwise lengthy tasks being asynchronous and callback based. This method scales really well for I/O bound services, servers in this category easily handle the C10K problem.
See Tornado or node.js for examples.
Short version: How can I prevent blocking Pika in a Remote Procedure Call situation?
Long version:
None of the Pika examples demonstrate my use case.
I have a Tornado server which communicates with other processes/machines over AMQP (RabbitMQ, Pika). These other processes are not very well-defined, but they will, for the most part, be returning data (see the RPC example on RabbitMQ's website). Sometimes, a process might need to take an extremely long time to process a large amount of information, but it shouldn't completely block smaller requests from being taken by the process. Or maybe the remote server is blocking because it sent out a web request. Think of it like a web server, but using AMQP instead of HTTP.
Since Pika documentation claims that it's not thread-safe, I cannot pass the connection to multiple threads (or processes, for that matter). What I want to do is start a new process, and add a socket event (for the pipe to that program) to the Pika IOLoop, as I would be able to do with Tornado. The Pika IOLoop is much different from the Tornado IOLoop, and it doesn't seem to support adding multiple handlers; it seems to operate using one "poller" on one socket.
I'd like to avoid requiring the Tornado package for this package, because I would only be using the IOLoop. It's not out of the question, but I want to see what my other options are, or if there is a solution to my problem by somehow connecting multiple Pika IOLoops/Pollers. RabbitMQ's documentation says that workers can often be "scaled up" by adding more. I'd like to avoid creating a connection for every request that comes in (if they're coming in fast).
From what you described, I believe you unfortunately either need a different communication model or need multiple Pika IOLoops/Pollers/Redundant Connections.
It sounds like from documentation and from other sites that RPC in Pika is always a blocking statement and unable to be passed around between threads. See http://www.rabbitmq.com/tutorials/tutorial-six-python.html where the author points out that RPC in Pika is inherently blocking once you actually call the ioloop.
"When in doubt avoid RPC. If you can, you should use an asynchronous pipeline - instead of RPC-like blocking"
If you want to keep sending multiple RPC calls on the same connection before one completes, you'll need a different Asynchronous model. Multiple RPC calls on the same connection before completion isn't the usual implementation of the RPC model, though it's not technically forbidden ( http://pic.dhe.ibm.com/infocenter/aix/v6r1/index.jsp?topic=%2Fcom.ibm.aix.progcomm%2Fdoc%2Fprogcomc%2Frpc_mod.htm ). I don't think Pika operates with this model, though it does have asynchronous support via callbacks (not what you are looking for I think).
If you just want to easily be able to generate new connections on the fly you could use a thread or process wrapper on a connection, where you create and block on the RPC in the other context and push to a common Queue which the main thread can monitor. Tornado might give you this, but I agree that it's a bit of overkill, and making such a connection wrapper shouldn't be all that difficult as I've done something similar for other I/O ops in less than 100 lines of Python (see Queue package for Threaded wrapper version). I think you already saw this possibility though based on your talk of multiple IOLoops.