How to ensure that send method of the protocol is thread safe

How to ensure that send method of the protocol is thread safe - python

I'm working on TCP client-server application using the IntNReceiver protocol. Server is accepting multiple TCP connections from client. I would like to let other threads use the protocol's sendString method, on both client and the server. I tried to use synchronized queue, monitored in separate thread and reactor.callFromThread() to call the sendString from there. This seems to work but there is a weird delay of about 20 seconds before the actual sendString actually sends the string. It does not block, returns immediately. I ran strace and the send() system call is definitely delayed. What is the proper way to do this kind of thing with twisted?

Just use callFromThread directly as your queue. The reactor is already synchronizing on and monitoring it. Anywhere you want to call foo.sendString() from a non-reactor thread, just do reactor.callFromThread(foo.sendString). Building additional infrastructure to do this (your own custom synchronized queues, for example) is just additional code that might break – as you have already discovered.

Related

Ensuring socket messages were sent using flask-socketio, redis

I have a flask-socketio server running on multiple pods, using redis as a message queue. I want to ensure that emits from external processes reach their destination 100% of the time, or to know when they have failed.
When process A emits an event to a socket that's connected to process B, the event goes through the message queue to process B, to the client. Is there any way I can intercept the outgoing emit on process B? Ideally i'd then use a worker to check after a few seconds if the message reached the client (via a confirm event emitted from the client) or it will be emitted again.
This code runs on process A:
#app.route('/ex')
def ex_route():
socketio.emit('external', {'text': f'sender: {socket.gethostname()}, welcome!'}, room='some_room')
return jsonify(f'sending message to room "some_room" from {socket.gethostname()}')
This is the output from process A
INFO:socketio.server:emitting event "external" to some_room [/]
INFO:geventwebsocket.handler:127.0.0.1 - - [2019-01-11 13:33:44] "GET /ex HTTP/1.1" 200 177 0.003196
This is the output from process B
INFO:engineio.server:9aab2215a0da4816a45e3fdc1e449fce: Sending packet MESSAGE data 2["external",{"text":"sender: *******, welcome!"}]

There is currently no mechanism to do what you ask, unfortunately.
I think you basically have two approaches to go about this:
Always run your emits from the main server(s). If you need to emit from an auxiliary process, use an IPC mechanism to notify the server so that it can run the emit on its behalf. And now you can use callbacks.
Ignore the callbacks, and instead have the client acknowledge receipt of the event by emitting back to the server.
Adding callback support for auxiliary processes should not be terribly difficult, by the way. I never needed that functionality myself and you are the first to ask about it. Maybe I should look into that at some point.
Edit: after some thought, I came up with a 3rd option:
You can connect your external process to the server as a client, instead of using the "emit-only" option. If this process is a client, it can emit an event to the server, which in turn the server can relay to the external client. When the client replies to the server, the server can once again relay the response to the external process, which is not another client and has full send and receive capabilities.

Using IPC is not very robust, especially in case of server receiving a lot of requests there might be an issue where you receive a message and don't retranslate it and it's vital.
Use either celery or zmq or redis itself for interconnect. The most natural is using socketio itself like mentioned by Miguel as it's already waiting for the requests has the environment and can emit anytime.
I've used a greenlet hack over threads - where greenlet is lighter than threads and runs in the same environment allowing it to send the message while your main thread awaits socket in non-blocking mode. Basically you write a thread, then apply eventlet or gevent to the whole code via monkeypatching and the thread becomes a greenlet - an inbetween function call. You put a sleep on it so it doesn't hog all resources and you have your sender because greenlets share environment easily, they are not bound by io, just cpu (which is the same for threads in Python but greenlets are even more lightweight due to no OS-level context change at all).
But as soon as CPU load increased I switched over to client/server. Imbuing IPC would require massive rewrites from the ground up.

How to handle a burst of connection to a port?

I've built a server listening on a specific port on my server using Python (asyncore and sockets) and I was curious to know if there was anything possible to do when there is too many people connecting at once on my server.
The code in itself cannot be changed, but will adding more process works? or is it from an hardware perspective and I should focus on adding a load balancer in front and balancing the requests on multiple servers?
This questions is borderline StackOverflow (code/python) and ServerFault (server management). I decided to go with SO because of the code, but if you think ServerFault is better, let me know.

1.
asyncore relies on operating system for whole connection handling, therefore what you are asking is OS dependent. It has very little to do with Python. Using twisted instead of asyncore wouldn't solve your problem.
On Windows, for example, you can listen only for 5 connections coming in simultaneously.
So, first requirement is, run it on *nix platform.
The rest depends on how long your handlers are taking and on your bandwith.
2.
What you can do is combine asyncore and threading to speed-up waiting for next connection.
I.e. you can make Handlers that are running in separate threads. It will be a little messy but it is one of possible solutions.
When server accepts a connection, instead of creating new traditional handler (which would slow down checking for following connection - because asyncore waits until that handler does at least a little bit of its job), you create a handler that deals with read and write as non-blocking.
I.e. it starts a thread and does the job, then, when it has data ready, only then sends it upon following loop()'s check.
This way, you allow asyncore.loop() to check the server's socket more often.
3.
Or you can use two different socket_maps with two different asyncore.loop()s.
You use one map (dictionary), let say the default one - asyncore.socket_map to check the server, and use one asyncore.loop(), let say in main thread, only for server().
And you start the second asyncore.loop() in a thread using your custom dictionary for client handlers.
So, One loop is checking only server that accepts connections, and when it arrives, it creates a handler which goes in separate map for handlers, which is checked by another asyncore.loop() running in a thread.
This way, you do not mix the server connection checks and client handling. So, server is checked immediately after it accepts one connection. The other loop balances between clients.
If you are determined to go even faster, you can exploit the multiprocessor computers by having more maps for handlers.
For example, one per CPU and as many threads with asyncore.loop()s.
Note, sockets are IO operations using system calls and select() is one too, therefore GIL is released while asyncore.loop() is waiting for results. This means, that you will have total advantage of multithreading and each CPU will deal with its number of clients in literally parallel way.
What you would have to do is make the server distributing the load and starting threading loops upon connection arrivals.
Don't forget that asyncore.loop() ends when the map empties. So the loop() in a thread that manages clients must be started when new connection is accepted and restarted if at some time there are no more connections present.
4.
If you want to be able to run your server on multiple computers and use them as a cluster, then you install the process balancer in front.
I do not see the serious need for it if you wrote the asyncore server correctly and want to run it on single computer only.

Making Tornado websocket handler thread safe

I am randomly getting error 1006 ( (I failed the WebSocket connection by dropping the TCP connection) when trying to write messages from threads using Tornado's websocket server handler.
I created N threads and passed my ws_handler to them.
But when I start using
self.ws_handler.write_message(jsondata)
for a large number of threads, I keep getting the same error.
From what I understand, 1006 is TCP connection dropped when a 'heartbeat' communication is skipped between websockets. I am guessing this is because of threads running in parallel and trying to send messages. I tested it using 2-3 threads and it works fine but for large number it doesn't.
I wonder if there's any method to achieve message sending within threads.( meaning lock being handled internally by ws_handler and sending accordingly).
One solution I am thinking of is to push jsondata into a queue and have another single thread push the messages, but I fear that would create a bottleneck.
My client is AutobahnPython.

Tornado is based on a single-threaded event loop; all interactions with Tornado objects must be on the event loop's thread. Use IOLoop.current().add_callback() from another thread when you need to transfer control back to the event loop.
See also http://www.tornadoweb.org/en/stable/web.html#thread-safety-notes

How can I offer concurrency with Pika in long-working consumers?

Short version: How can I prevent blocking Pika in a Remote Procedure Call situation?
Long version:
None of the Pika examples demonstrate my use case.
I have a Tornado server which communicates with other processes/machines over AMQP (RabbitMQ, Pika). These other processes are not very well-defined, but they will, for the most part, be returning data (see the RPC example on RabbitMQ's website). Sometimes, a process might need to take an extremely long time to process a large amount of information, but it shouldn't completely block smaller requests from being taken by the process. Or maybe the remote server is blocking because it sent out a web request. Think of it like a web server, but using AMQP instead of HTTP.
Since Pika documentation claims that it's not thread-safe, I cannot pass the connection to multiple threads (or processes, for that matter). What I want to do is start a new process, and add a socket event (for the pipe to that program) to the Pika IOLoop, as I would be able to do with Tornado. The Pika IOLoop is much different from the Tornado IOLoop, and it doesn't seem to support adding multiple handlers; it seems to operate using one "poller" on one socket.
I'd like to avoid requiring the Tornado package for this package, because I would only be using the IOLoop. It's not out of the question, but I want to see what my other options are, or if there is a solution to my problem by somehow connecting multiple Pika IOLoops/Pollers. RabbitMQ's documentation says that workers can often be "scaled up" by adding more. I'd like to avoid creating a connection for every request that comes in (if they're coming in fast).

From what you described, I believe you unfortunately either need a different communication model or need multiple Pika IOLoops/Pollers/Redundant Connections.
It sounds like from documentation and from other sites that RPC in Pika is always a blocking statement and unable to be passed around between threads. See http://www.rabbitmq.com/tutorials/tutorial-six-python.html where the author points out that RPC in Pika is inherently blocking once you actually call the ioloop.
"When in doubt avoid RPC. If you can, you should use an asynchronous pipeline - instead of RPC-like blocking"
If you want to keep sending multiple RPC calls on the same connection before one completes, you'll need a different Asynchronous model. Multiple RPC calls on the same connection before completion isn't the usual implementation of the RPC model, though it's not technically forbidden ( http://pic.dhe.ibm.com/infocenter/aix/v6r1/index.jsp?topic=%2Fcom.ibm.aix.progcomm%2Fdoc%2Fprogcomc%2Frpc_mod.htm ). I don't think Pika operates with this model, though it does have asynchronous support via callbacks (not what you are looking for I think).
If you just want to easily be able to generate new connections on the fly you could use a thread or process wrapper on a connection, where you create and block on the RPC in the other context and push to a common Queue which the main thread can monitor. Tornado might give you this, but I agree that it's a bit of overkill, and making such a connection wrapper shouldn't be all that difficult as I've done something similar for other I/O ops in less than 100 lines of Python (see Queue package for Threaded wrapper version). I think you already saw this possibility though based on your talk of multiple IOLoops.

Controlling http streams with python threads

I am implementing an app consuming a few http streams at the same time.
All threads (a pycurl object each) are spawned in the same loop.
The trick is how to build a proper architecture for handling reconnects.
Is it a good practice to create a separate controller thread that somehow
checks which connections are not alive or need forced reconnect?
Or may be such task should be done inside separate processes?

I would suggest to have one controling thread which spawns http streaming threads, and such a streaming thread implements the proper handling for a connection loss or timeout (e.g. either terminating itself or telling to controling thread that a new streaming thread should be spawned for a reconnect). Depending on your http serving peer you could also try to continue an interrupted stream by using the http Content-Range feature.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.