Socket Python Select and Multiprocessing

Socket Python Select and Multiprocessing - python

I want some explanation about a something to do with sockets...
Suppose I create a Socket (Server and Client) for chatting, every client of this socket will receive data from the server and send data to the server, which will send data to all the clients, simultaneously. How can the server accept all the connections simultaneously?
I know that with the module "socket" there are 3 method:
create more threads with module "threading" but it isn't the best way
create more processes with module multiprocessing
use the module select
What is the best way?
what is the difference between using select and using multiprocessing?

Just some generalities based on my very limited experience with socket programming.
They are two completely different ways of handling IO.
select is often used to achieve non blocking IO, usually in a single thread. Tornado is a mature example of a framework around this.
http://www.tornadoweb.org/en/stable/, Tornado uses select (or equivalent internally)
Using select has the advantage of not having to worry about multithreaded/process programming, using os to notify of file descriptor changes allows a single thread to handle many hundreds or thousands or tens of thousands of open sockets.
Threading is a great way of dealing with io as well. Because the thread will not be cpu bound it is often acceptable and performant to spawn many io bound threads. Since the threads will be spending most of their time waiting on IO there *shouldn't be much overhead.
I would def look at tornado, as it has a chat example that is trivial to create
There are many many examples, blogs and tutorials of chat servers, performant python webservers and socket programming in python on google

Related

Python3 Asyncio shared resources between concurrent tasks

I've got a network application written in Python3.5 which takes advantage of pythons Asyncio which concurrently handles each incoming connection.
On every concurrent connection, I want to store the connected clients data in a list. I'm worried that if two clients connect at the same time (which is a possibility) then both tasks will attempt to write to the list at the same time, which will surely raise an issue. How would I solve this?

asyncio does context switching only on yield points (await expressions), thus two parallel tasks are not executed at the same time.
But if race conditions are still possible (it depends on concrete code structure) you may use asyncio synchronization primitives and queues.

There is lots of info that is missing in your question.
Is your app threaded? If yes, then you have to wrap your list in a threading.Lock.
Do you switch context (e.g. use await) between writes (to the list) in the request handler? If yes, then you have to wrap your list in a asyncio.Lock.
Do you do multiprocessing? If yes then you have to use multiprocessing.Lock
Is your app divided onto multiple machines? Then you have to use some external shared database (e.g. Redis).
If answers to all of those questions is no then you don't have to do anything since single-threaded async app cannot update shared resource parallely.

Whats the best way to implement python TCP client?

I need to write python script which performs several tasks:
read commands from console and send to server over tcp/ip
receive server response, process and make output to console.
What is the best way to create such a script? Do I have to create separate thread to listen to server response, while interacting with user in main thread? Are there any good examples?

Calling for a best way or code examples is rather off topic, but this is too long to be a comment.
There are three general ways to build those terminal emulator like applications :
multiple processes - the way the good old Unix cu worked with a fork
multiple threads - a variant from the above using light way threads instad of processes
using select system call with multiplexed io.
Generally, the 2 first methods are considered more straightforward to code with one thread (or process) processing upward communication while the other processes the downward one. And the third while being trickier to code is generally considered as more efficient
As Python supports multithreading, multiprocessing and select call, you can choose any method, with a slight preference for multithreading over multiprocessing because threads are lighter than processes and I cannot see a reason to use processes.
Following in just my opinion
Unless if you are writing a model for rewriting it later in a lower level language, I assume that performance is not the key issue, and my advice would be to use threads here.

Is it ok to use Gevent and threading together?

For example - I have some class that internally uses Gevent for concurrency, and it takes a callback, what if user of my class will use Python threading or multiprocessing modules? Will this lead the program to disaster?
UPDATE
Some more details:
My class is a custom protocol implementation, so it involves lots of reading/writing with sockets. Protocol includes such feature as option to send multiple requests (within single TCP connection) in order to load multiple cores on a server, and then wait for responses. That's why I need concurrency. This class is a library which will be part of a large project, and it can be used in every possible crazy way. I'm also new to Gevent and I barely aware about monkey-patching feature.

How can I offer concurrency with Pika in long-working consumers?

Short version: How can I prevent blocking Pika in a Remote Procedure Call situation?
Long version:
None of the Pika examples demonstrate my use case.
I have a Tornado server which communicates with other processes/machines over AMQP (RabbitMQ, Pika). These other processes are not very well-defined, but they will, for the most part, be returning data (see the RPC example on RabbitMQ's website). Sometimes, a process might need to take an extremely long time to process a large amount of information, but it shouldn't completely block smaller requests from being taken by the process. Or maybe the remote server is blocking because it sent out a web request. Think of it like a web server, but using AMQP instead of HTTP.
Since Pika documentation claims that it's not thread-safe, I cannot pass the connection to multiple threads (or processes, for that matter). What I want to do is start a new process, and add a socket event (for the pipe to that program) to the Pika IOLoop, as I would be able to do with Tornado. The Pika IOLoop is much different from the Tornado IOLoop, and it doesn't seem to support adding multiple handlers; it seems to operate using one "poller" on one socket.
I'd like to avoid requiring the Tornado package for this package, because I would only be using the IOLoop. It's not out of the question, but I want to see what my other options are, or if there is a solution to my problem by somehow connecting multiple Pika IOLoops/Pollers. RabbitMQ's documentation says that workers can often be "scaled up" by adding more. I'd like to avoid creating a connection for every request that comes in (if they're coming in fast).

From what you described, I believe you unfortunately either need a different communication model or need multiple Pika IOLoops/Pollers/Redundant Connections.
It sounds like from documentation and from other sites that RPC in Pika is always a blocking statement and unable to be passed around between threads. See http://www.rabbitmq.com/tutorials/tutorial-six-python.html where the author points out that RPC in Pika is inherently blocking once you actually call the ioloop.
"When in doubt avoid RPC. If you can, you should use an asynchronous pipeline - instead of RPC-like blocking"
If you want to keep sending multiple RPC calls on the same connection before one completes, you'll need a different Asynchronous model. Multiple RPC calls on the same connection before completion isn't the usual implementation of the RPC model, though it's not technically forbidden ( http://pic.dhe.ibm.com/infocenter/aix/v6r1/index.jsp?topic=%2Fcom.ibm.aix.progcomm%2Fdoc%2Fprogcomc%2Frpc_mod.htm ). I don't think Pika operates with this model, though it does have asynchronous support via callbacks (not what you are looking for I think).
If you just want to easily be able to generate new connections on the fly you could use a thread or process wrapper on a connection, where you create and block on the RPC in the other context and push to a common Queue which the main thread can monitor. Tornado might give you this, but I agree that it's a bit of overkill, and making such a connection wrapper shouldn't be all that difficult as I've done something similar for other I/O ops in less than 100 lines of Python (see Queue package for Threaded wrapper version). I think you already saw this possibility though based on your talk of multiple IOLoops.

Multi-Threading and Asynchronous sockets in python

I'm quite new to python threading/network programming, but have an assignment involving both of the above.
One of the requirements of the assignment is that for each new request, I spawn a new thread, but I need to both send and receive at the same time to the browser.
I'm currently using the asyncore library in Python to catch each request, but as I said, I need to spawn a thread for each request, and I was wondering if using both the thread and the asynchronous is overkill, or the correct way to do it?
Any advice would be appreciated.
Thanks
EDIT:
I'm writing a Proxy Server, and not sure if my client is persistent. My client is my browser (using firefox for simplicity)
It seems to reconnect for each request. My problem is that if I open a tab with http://www.google.com in it, and http://www.stackoverflow.com in it, I only get one request at a time from each tab, instead of multiple requests from google, and from SO.

I answered a question that sounds amazingly similar to your, where someone had a homework assignment to create a client server setup, with each connection being handled in a new thread: https://stackoverflow.com/a/9522339/496445
The general idea is that you have a main server loop constantly looking for a new connection to come in. When it does, you hand it off to a thread which will then do its own monitoring for new communication.
An extra bit about asyncore vs threading
From the asyncore docs:
There are only two ways to have a program on a single processor do
“more than one thing at a time.” Multi-threaded programming is the
simplest and most popular way to do it, but there is another very
different technique, that lets you have nearly all the advantages of
multi-threading, without actually using multiple threads. It’s really
only practical if your program is largely I/O bound. If your program
is processor bound, then pre-emptive scheduled threads are probably
what you really need. Network servers are rarely processor bound,
however.
As this quote suggests, using asyncore and threading should be for the most part mutually exclusive options. My link above is an example of the threading approach, where the server loop (either in a separate thread or the main one) does a blocking call to accept a new client. And when it gets one, it spawns a thread which will then continue to handle the communication, and the server goes back into a blocking call again.
In the pattern of using asyncore, you would instead use its async loop which will in turn call your own registered callbacks for various activity that occurs. There is no threading here, but rather a polling of all the open file handles for activity. You get the sense of doing things all concurrently, but under the hood it is scheduling everything serially.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.