Python Websocket-Client Package Frequent Ping-Pong Timeouts Upon Too Many Subscriptions

Python Websocket-Client Package Frequent Ping-Pong Timeouts Upon Too Many Subscriptions - python

I am using the python websocket-client package (https://pypi.org/project/websocket-client/) to subscribe to multiple websocket channels. I implemented the basic ping-pong logic using the websocket client package as below. The ping-pong logic is working mostly fine under normal circumstances and I am able to run the code for hours without many disconnect/reconnect instances.
However, when I try to increase the number of channels I subscribe to, the program will encounter many "websocket._exceptions.WebSocketTimeoutException: ping/pong timed out" errors. It looks like the websocket package is not sending the 'ping' messages properly when they are too busy. Will get even more timed out errors when I try to add even more subscriptions.
Maximum stable websocket connection I could establish is around 4 separate websocket threads to different places with ~12 subscriptions each. Anything more than that will bring many timed out exception. I wonder if there is anything I could do to cater this issue or this is the limit of the websocket-client package/ my computer? Does anyone else encountered this problem before? Thanks.
Tried to set the 'skip_utf8_validation' to True to try to enhance the performance but it did not help.
def run(self):
# Setup the thread running WebSocketApp.
wst = threading.Thread(target=self._run, name='{}Raw'.format(self.name))
wst.daemon = True
wst.start()
connected = self.check_connected()
self.post_connect(connected)
def _run(self):
self.ws.run_forever(
ping_interval=20,
ping_timeout=10,
ping_payload=self.ping_payload,
skip_utf8_validation=True,
)
error: Traceback (most recent call last):
File "~/lib/python3.10/site-packages/websocket/_app.py", line 383, in run_forever
dispatcher.read(self.sock.sock, read, check)
File "~/lib/python3.10/site-packages/websocket/_app.py", line 68, in read
check_callback()
File "~/lib/python3.10/site-packages/websocket/_app.py", line 380, in check
raise WebSocketTimeoutException("ping/pong timed out")
websocket._exceptions.WebSocketTimeoutException: ping/pong timed out

Found one possible cause - one of the websocket threads is having some codes that are running slow and making all other websocket threads slower and slower while running. Trying to fix and see if that is the root clause. Thanks guys first.

Related

Can websockets "lag" when a lot of messages are received?

I'm trying to create a script using asyncio and websocket that should connect to 4-5 cryptocurrency exchange websockets and receive trades in real time from those exchanges. My code works and it's very simple, it looks like this:
import asyncio
import websockets
import json
subscriptions = ['btcusdt#trade', 'ethusdt#trade', 'bchusdt#trade', 'xrpusdt#trade', 'eosusdt#trade', 'ltcusdt#trade', 'trxusdt#trade', 'etcusdt#trade', 'linkusdt#trade', 'xlmusdt#trade', 'adausdt#trade', 'xmrusdt#trade', 'dashusdt#trade', 'zecusdt#trade', 'xtzusdt#trade', 'bnbusdt#trade', 'atomusdt#trade', 'ontusdt#trade', 'iotausdt#trade', 'batusdt#trade', 'vetusdt#trade', 'neousdt#trade', 'qtumusdt#trade', 'iostusdt#trade', 'thetausdt#trade', 'algousdt#trade', 'zilusdt#trade', 'kncusdt#trade', 'zrxusdt#trade', 'compusdt#trade', 'omgusdt#trade', 'dogeusdt#trade', 'sxpusdt#trade', 'kavausdt#trade', 'bandusdt#trade', 'rlcusdt#trade', 'wavesusdt#trade', 'mkrusdt#trade', 'snxusdt#trade', 'dotusdt#trade', 'defiusdt#trade', 'yfiusdt#trade', 'balusdt#trade', 'crvusdt#trade', 'trbusdt#trade', 'yfiiusdt#trade', 'runeusdt#trade', 'sushiusdt#trade', 'srmusdt#trade', 'bzrxusdt#trade', 'egldusdt#trade', 'solusdt#trade', 'icxusdt#trade', 'storjusdt#trade', 'blzusdt#trade', 'uniusdt#trade', 'avaxusdt#trade', 'ftmusdt#trade', 'hntusdt#trade', 'enjusdt#trade', 'flmusdt#trade', 'tomousdt#trade', 'renusdt#trade', 'ksmusdt#trade', 'nearusdt#trade', 'aaveusdt#trade', 'filusdt#trade', 'rsrusdt#trade', 'lrcusdt#trade', 'maticusdt#trade', 'oceanusdt#trade', 'cvcusdt#trade', 'belusdt#trade', 'ctkusdt#trade', 'axsusdt#trade', 'alphausdt#trade', 'zenusdt#trade', 'sklusdt#trade']
async def connect():
while True:
async with websockets.client.connect('wss://fstream.binance.com/ws/trade') as ws:
tradeStr = {"method": "SUBSCRIBE", "params": subscriptions, 'id': 1}
await ws.send(json.dumps(tradeStr))
while True:
try:
msg = await asyncio.wait_for(ws.recv(), 5)
message = json.loads(msg)
try:
print(message)
except Exception as e:
print(e)
except asyncio.TimeoutError:
break
asyncio.get_event_loop().run_until_complete(connect())
In the example above, i'm connecting to Binance and i'm receiving trades for all the markets available. I do this for more exchanges at once, but the problem will happen with one too as long as i'm receiving a lot of messages per second.
Each message looks like this {"rate": "xx", "market": "xx", "amount": "xx", "side": "xx"}, so very small.
The big problem i'm noticing is that after a while the script is running, i start receiving less messages, a lot of them will come after a lot of seconds and i don't even receive a lot others, as if they get lost or as if the connection is freezing.
Now, i know that it's not a very specific question, but what could be the problem here?
Is it possible that when websockets receive a lot of messages per second there could be problems of this kind? I tried to test this system from my local and from a vps, and in both cases i encountered the same issues. Is it possible that this is a resource problem? Or is it most likely related to the server, and not the client which is me? I tried to be as specific as possible, i can be more detailed if needed.
I read that websockets stores received messages in a buffer. Is it possible that the problem is with the buffer getting filled?
Any kind of advice is appreciated!

From what you explained and in my experience seems to be related to resources management, yes the WebSockets are affected if you receive a good amount of messages per second and yes this causes issues in your server. Why? because the buffer of course is limited and also the amount of memory available to process all those messages at the same time, as they mention in the official docs from the WebSocket library (version 8.1), I think that your issue here is that you are opening a lot number of connections at the same time and this causes memory exhaustion of course these depends on the size of the messages and the resources of your server, this can be easily tested you can try it with 2 VPS with different amount of resources if the 2 servers have different times to run into that state definitely one of the issues are resources (which I think should be the expected result fewer resources should run into the issue first and by resources means memory). Here are some links to the official docs for the WebSockets library where they make reference to these issues caused by memory and an approach to optimize the use of the memory. Hope this helps you 👍.
Memory Usage
Optimizations

RabbitMQ closes connection when processing long running tasks and timeout settings produce errors

I am using a RabbitMQ producer to send long running tasks (30 mins+) to a consumer. The problem is that the consumer is still working on a task when the connection to the server is closed and the unacknowledged task is requeued.
From researching I understand that either a heartbeat or an increased connection timeout can be used to solve this. Both these solutions raise errors when attempting them. In reading answers to similar posts I've also learned that many changes have been implemented to RabbitMQ since the answers were posted (e.g. the default heartbeat timeout has changed to 60 from 580 prior to RabbitMQ 3.5.5).
When specifying a heartbeat and blocked connection timeout:
credentials = pika.PlainCredentials('user', 'password')
parameters = pika.ConnectionParameters('XXX.XXX.XXX.XXX', port, '/', credentials, blocked_connection_timeout=2000)
connection = pika.BlockingConnection(parameters)
channel = connection.channel()
The following error is displayed:
TypeError: __init__() got an unexpected keyword argument 'blocked_connection_timeout'
When specifying heartbeat_interval=1000 in the connection parameters a similar error is shown: TypeError: __init__() got an unexpected keyword argument 'heartbeat_interval'
And similarly for socket_timeout = 1000 the following error is displayed: TypeError: __init__() got an unexpected keyword argument 'socket_timeout'
I am running RabbitMQ 3.6.1, pika 0.10.0 and python 2.7 on Ubuntu 14.04.
Why are the above approaches producing errors?
Can a heartbeat approach be used where there is a long running continuous task? For example can heartbeats be used when performing large database joins which take 30+ mins? I am in favour of the heartbeat approach as many times it is difficult to judge how long a task such as database join will take.
I've read through answers to similar questions
Update: running code from the pika documentation produces the same error.

I've run into the same problem with my systems, that you are seeing, with dropped connection during very long tasks.
It's possible the heartbeat might help keep your connection alive, if your network setup is such that idle TCP/IP connections are forcefully dropped. If that's not the case, though, changing the heartbeat won't help.
Changing the connection timeout won't help at all. This setting is only used when initially creating the connection.
I am using a RabbitMQ producer to send long running tasks (30 mins+) to a consumer. The problem is that the consumer is still working on a task when the connection to the server is closed and the unacknowledged task is requeued.
there are two reasons for this, both of which you have run into already:
Connections drop randomly, even under the best of circumstances
Re-starting a process because of a re-queued message can cause problems
Having deployed RabbitMQ code with tasks that range from less than a second, out to several hours in time, I found that acknowledging the message immediately and updating the system with status messages works best for very long tasks, like this.
You will need to have a system of record (probably with a database) that keeps track of the status of a given job.
When the consumer picks up a message and starts the process, it should acknowledge the message right away and send a "started" status message to the system of record.
As the process completes, send another message to say it's done.
This won't solve the dropped connection problem, but nothing will 100% solve that anyways. Instead, it will prevent the message re-queueing problem from happening when a connection is dropped.
This solution does introduce another problem, though: when the long running process crashes, how do you resume the work?
The basic answer is to use the system of record (your database) status for the job to tell you that you need to pick up that work again. When the app starts, check the database to see if there is work that is unfinished. If there is, resume or restart that work in whatever manner is appropriate.

I've already see this issue. The reason is you declare to use this queue. but you didn't bind the queue in the exchange.
for example:
#Bean(name = "test_queue")
public Queue testQueue() {
return queue("test_queue");
}
#RabbitListener(queues = "test_queue_1")
public void listenCreateEvent(){
}
if you listen a queue didn't bind to the exchange. it will happen.

Broken pipe error and connection reset by peer 104

I'm using Bottle server to implement my own server using an implementation not so far away from the simple "hello world" here , my own implementation is (without the routing section of course):
bottleApp =bottle.app()
bottleApp.run(host='0.0.0.0',port=80, debug=true)
My server is keep getting unresponsive all the time and then I get in the Browser: Connection reset by peer, broken pipe errno 32
The logs give me almost exactly the same stack traces such as in question.
Here are my own logs:
What I tried so far, without success:
Wrapping the server run line with try except, something like, shown here the answer of "mhawke".
This stopped the error messages in logs, apparently because I caught them in except clause, but problem is that when catching the exception like that it means that we have been thrown out of the run method context, and I want to catch it in a way it will not cause my server to fall.
I don't know if its possible without touching the inner implementations files of bottle.
Adding this before server run line:
from signal import signal, SIGPIPE, SIG_DFL
signal(SIGPIPE,SIG_DFL)
As suggested here, but it seems that it didn't had any impact on not getting Broken pipe\connection reset errors and server responsiveness.
I thought of trying also the second answer here, but I don't have any idea where to locate this code in the context of the bottle server.

This sounds like a permissions issue or a firewall.
if you really need to listen on port 80, then you need to run with a privileged account. Also you will probably need to open port 80 for tcp traffic.
I can see your using something that appears to be Posix (Linux/Unix/OSx) If you post what OS you are using I can edit this answer to be more specific as to how to open the firewall and execute privileged commands (probably sudo but who knows).

Recover from dropped connection in redis pub/sub

I am running client that is connecting to a redis db. The client is on a WiFi connection and will drop the connection at times. Unfortunately, when this happens, the program just keeps running without throwing any type of warning.
r = redis.StrictRedis(host=XX, password=YY...)
ps = r.pubsub()
ps.subscribe("12345")
for items in ps.listen():
if items['type'] == 'message':
data = items['data']
Ideally, what I am looking for is a catch an event when the connection is lost, try and reestablish the connection, do some error correcting, then get things back up and running. Should this be done in the python program? Should I have an external watchdog?

Unfortunately, one have to 'ping' Redis to check if it is available. If You try to put a value to Redis storage, it will raise an ConnectionError exception if connection is lost. But the listen() generator will not close automatically when connection is lost.
I think that hacking Redis' connection pool could help, give it a try.
P.S. In is very insecure to connect to redis in an untrusted network environment.

This is an old, old question but I linked one of my own questions to it and happened to run across it again. It turned out there was a bug in the redis library that caused the client to enter an infinite loop attempting to reconnect if it lost connection to the redis server. I debugged the issue and PR'd the change. it was merged a long time ago now. Once surfaced the maintainer also knew of a second location that had the same issue.
This problem shouldn't occur anymore.
To fully answer the question, I can't remember which error it is given the time since I fixed this but there is now a specific error returned you can catch and reconnect on.

Python RabbitMQ and kombu: heartbeats

I have a Twisted application (Python 2.7) that uses the kombu module to communicate with a RabbitMQ message server.
We're having problems with closing connections (probably firewall related) and I'm trying to implement the heartbeat_check() method to handle this. I've got a heartbeat value of 10 set on the connection and I've got a Twisted LoopingCall that calls the heartbeat_check(rate=2) method every 5 seconds.
However, once things get rolling I'm getting an exception thrown every other call to heartbeat_check() (based on the logging information I've got in the function that LoopingCall calls, which includes the call to heartbeat_check). I've tried all kinds of variations of heartbeat and rate values and nothing seems to help. When I debug into the heartbeat_call() it looks like some minor message traffic is being sent back and forth, am I supposed to respond to that in my message consumer?
Any hints or pointers would be very helpful, thanks in advance!!
Doug

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.