Can websockets "lag" when a lot of messages are received? - python

I'm trying to create a script using asyncio and websocket that should connect to 4-5 cryptocurrency exchange websockets and receive trades in real time from those exchanges. My code works and it's very simple, it looks like this:
import asyncio
import websockets
import json
subscriptions = ['btcusdt#trade', 'ethusdt#trade', 'bchusdt#trade', 'xrpusdt#trade', 'eosusdt#trade', 'ltcusdt#trade', 'trxusdt#trade', 'etcusdt#trade', 'linkusdt#trade', 'xlmusdt#trade', 'adausdt#trade', 'xmrusdt#trade', 'dashusdt#trade', 'zecusdt#trade', 'xtzusdt#trade', 'bnbusdt#trade', 'atomusdt#trade', 'ontusdt#trade', 'iotausdt#trade', 'batusdt#trade', 'vetusdt#trade', 'neousdt#trade', 'qtumusdt#trade', 'iostusdt#trade', 'thetausdt#trade', 'algousdt#trade', 'zilusdt#trade', 'kncusdt#trade', 'zrxusdt#trade', 'compusdt#trade', 'omgusdt#trade', 'dogeusdt#trade', 'sxpusdt#trade', 'kavausdt#trade', 'bandusdt#trade', 'rlcusdt#trade', 'wavesusdt#trade', 'mkrusdt#trade', 'snxusdt#trade', 'dotusdt#trade', 'defiusdt#trade', 'yfiusdt#trade', 'balusdt#trade', 'crvusdt#trade', 'trbusdt#trade', 'yfiiusdt#trade', 'runeusdt#trade', 'sushiusdt#trade', 'srmusdt#trade', 'bzrxusdt#trade', 'egldusdt#trade', 'solusdt#trade', 'icxusdt#trade', 'storjusdt#trade', 'blzusdt#trade', 'uniusdt#trade', 'avaxusdt#trade', 'ftmusdt#trade', 'hntusdt#trade', 'enjusdt#trade', 'flmusdt#trade', 'tomousdt#trade', 'renusdt#trade', 'ksmusdt#trade', 'nearusdt#trade', 'aaveusdt#trade', 'filusdt#trade', 'rsrusdt#trade', 'lrcusdt#trade', 'maticusdt#trade', 'oceanusdt#trade', 'cvcusdt#trade', 'belusdt#trade', 'ctkusdt#trade', 'axsusdt#trade', 'alphausdt#trade', 'zenusdt#trade', 'sklusdt#trade']
async def connect():
while True:
async with websockets.client.connect('wss://fstream.binance.com/ws/trade') as ws:
tradeStr = {"method": "SUBSCRIBE", "params": subscriptions, 'id': 1}
await ws.send(json.dumps(tradeStr))
while True:
try:
msg = await asyncio.wait_for(ws.recv(), 5)
message = json.loads(msg)
try:
print(message)
except Exception as e:
print(e)
except asyncio.TimeoutError:
break
asyncio.get_event_loop().run_until_complete(connect())
In the example above, i'm connecting to Binance and i'm receiving trades for all the markets available. I do this for more exchanges at once, but the problem will happen with one too as long as i'm receiving a lot of messages per second.
Each message looks like this {"rate": "xx", "market": "xx", "amount": "xx", "side": "xx"}, so very small.
The big problem i'm noticing is that after a while the script is running, i start receiving less messages, a lot of them will come after a lot of seconds and i don't even receive a lot others, as if they get lost or as if the connection is freezing.
Now, i know that it's not a very specific question, but what could be the problem here?
Is it possible that when websockets receive a lot of messages per second there could be problems of this kind? I tried to test this system from my local and from a vps, and in both cases i encountered the same issues. Is it possible that this is a resource problem? Or is it most likely related to the server, and not the client which is me? I tried to be as specific as possible, i can be more detailed if needed.
I read that websockets stores received messages in a buffer. Is it possible that the problem is with the buffer getting filled?
Any kind of advice is appreciated!

From what you explained and in my experience seems to be related to resources management, yes the WebSockets are affected if you receive a good amount of messages per second and yes this causes issues in your server. Why? because the buffer of course is limited and also the amount of memory available to process all those messages at the same time, as they mention in the official docs from the WebSocket library (version 8.1), I think that your issue here is that you are opening a lot number of connections at the same time and this causes memory exhaustion of course these depends on the size of the messages and the resources of your server, this can be easily tested you can try it with 2 VPS with different amount of resources if the 2 servers have different times to run into that state definitely one of the issues are resources (which I think should be the expected result fewer resources should run into the issue first and by resources means memory). Here are some links to the official docs for the WebSockets library where they make reference to these issues caused by memory and an approach to optimize the use of the memory. Hope this helps you 👍.
Memory Usage
Optimizations

Related

Will a websocket server "save and queue" messages if the client is not currently receiving them or will they get lost?

I am using the python library websocket-client to connect to a server and receive data
from websocket import create_connection
api_data = {...}
connection = create_connection(DOMAIN)
connection.send(api_data)
while True:
data = connection.recv()
print(data)
This works and all, but in order to write a more sophisticated fail-safe application I need to understand how a websocket receive command works. If my program is busy doing other things, for example if I add a time.sleep(10) command in my while loop, will the updates "queue" and when I finally do connection.recv() I obtain all at once, or will messages from the server be lost? Any links that explain practical things like this are very welcome because I feel like I lack some very fundamental knowledge since I don't see questions like this posed anywhere.

Google PubSub message duplication

I am using The python client (That comes as part of google-cloud 0.30.0) to process messages.
Sometimes (about 10% ) my messages are being duplicated. I will get the same message again and again up to 50 instances within a few hours.
My Subscription setup is for a 600 seconds ack time but a message may be resent a minute after its predecessor.
While running , I would occasionally get 503 errors (Which I log with my policy_class)
Has anybody experienced that behavior? any ideas ?
My code look like
c = pubsub_v1.SubscriberClient(policy_class)
subscription = c.subscribe(c.subscription_path(my_proj ,my_topic)
res = subscription.open(callback=callback_func)
res.result()
def callback_func(msg)
try:
log.info('got %s', msg.data )
...
finally:
ms.ack()
The client library you are using uses a new Pub/Sub API for subscribing called StreamingPull. One effect of this is that the subscription deadline you have set is no longer used, and instead one calculated by the client library is. The client library also automatically extends the deadlines of messages for you.
When you get these duplicate messages - have you already ack'd the message when it is redelivered, or is this while you are still processing it? If you have already ack'd, are there some messages you have avoided acking? Some messages may be duplicated if they were ack'd but messages in the same batch needed to be sent again.
Also keep in mind that some duplicates are expected currently if you take over a half hour to process a message.
This seems to be an issue with google-cloud-pubsub python client, I upgraded to version 0.29.4 and ack() work as expected
In general, duplicates can happen given that Google Cloud Pub/Sub offers at-least-once delivery. Typically, this rate should be very low. A rate of 10% would be very high. In this particular instance, it was likely an issue in the client libraries that resulted in excessive duplicates, which was fixed in April 2018.
For the general case of excessive duplicates there are a few things to check to determine if the problem is on the user side or not. There are two places where duplication can happen: on the publish side (where there are two distinct messages that are each delivered once) or on the subscribe side (where there is a single message delivered multiple times). The way to distinguish the cases is to look at the messageID provided with the message. If the same ID is repeated, then the duplication is on the subscribe side. If the IDs are unique, then duplication is happening on the publish side. In the latter case, one should look at the publisher to see if it is getting errors that are resulting in publish retries.
If the issue is on the subscriber side, then one should check to ensure that messages are being acknowledged before the ack deadline. Messages that are not acknowledged within this time will be redelivered. If this is the issue, then the solution is to either acknowledge messages faster (perhaps by scaling up with more subscribers for the subscription) or by increasing the acknowledgement deadline. For the Python client library, one sets the acknowledgement deadline by setting the max_lease_duration in the FlowControl object passed into the subscribe method.

What all possible times 'kazoo.exceptions.ConnectionLoss' is raised?

I am using apache-zookeeper and kazoo framework for one of my requirement. I have a simple zookeeper cluster setup and few clients connecting to server cluster to read node information. I am facing kazoo.exceptions.ConnectionLoss randomly(once in fifty times).
My concern is on what all times this exception is raised ? Below are the points I thought.
Connection to server was lost
Server didn't respond back within timeout set in server configuration
Can there be any other reasons for this exception? I don't see documentation explaining anything in detail on this.
I fear I don't have ready answer but looking at Kazoo code I think this can come in following conditions,
Socket read timeout,
Socket write timeout,
deserialize failure because of timeout issues,
Client creation with high initial bytes value of node
Tried to gather this from Kazoo unittest code test_connection test_client ,

RabbitMQ closes connection when processing long running tasks and timeout settings produce errors

I am using a RabbitMQ producer to send long running tasks (30 mins+) to a consumer. The problem is that the consumer is still working on a task when the connection to the server is closed and the unacknowledged task is requeued.
From researching I understand that either a heartbeat or an increased connection timeout can be used to solve this. Both these solutions raise errors when attempting them. In reading answers to similar posts I've also learned that many changes have been implemented to RabbitMQ since the answers were posted (e.g. the default heartbeat timeout has changed to 60 from 580 prior to RabbitMQ 3.5.5).
When specifying a heartbeat and blocked connection timeout:
credentials = pika.PlainCredentials('user', 'password')
parameters = pika.ConnectionParameters('XXX.XXX.XXX.XXX', port, '/', credentials, blocked_connection_timeout=2000)
connection = pika.BlockingConnection(parameters)
channel = connection.channel()
The following error is displayed:
TypeError: __init__() got an unexpected keyword argument 'blocked_connection_timeout'
When specifying heartbeat_interval=1000 in the connection parameters a similar error is shown: TypeError: __init__() got an unexpected keyword argument 'heartbeat_interval'
And similarly for socket_timeout = 1000 the following error is displayed: TypeError: __init__() got an unexpected keyword argument 'socket_timeout'
I am running RabbitMQ 3.6.1, pika 0.10.0 and python 2.7 on Ubuntu 14.04.
Why are the above approaches producing errors?
Can a heartbeat approach be used where there is a long running continuous task? For example can heartbeats be used when performing large database joins which take 30+ mins? I am in favour of the heartbeat approach as many times it is difficult to judge how long a task such as database join will take.
I've read through answers to similar questions
Update: running code from the pika documentation produces the same error.
I've run into the same problem with my systems, that you are seeing, with dropped connection during very long tasks.
It's possible the heartbeat might help keep your connection alive, if your network setup is such that idle TCP/IP connections are forcefully dropped. If that's not the case, though, changing the heartbeat won't help.
Changing the connection timeout won't help at all. This setting is only used when initially creating the connection.
I am using a RabbitMQ producer to send long running tasks (30 mins+) to a consumer. The problem is that the consumer is still working on a task when the connection to the server is closed and the unacknowledged task is requeued.
there are two reasons for this, both of which you have run into already:
Connections drop randomly, even under the best of circumstances
Re-starting a process because of a re-queued message can cause problems
Having deployed RabbitMQ code with tasks that range from less than a second, out to several hours in time, I found that acknowledging the message immediately and updating the system with status messages works best for very long tasks, like this.
You will need to have a system of record (probably with a database) that keeps track of the status of a given job.
When the consumer picks up a message and starts the process, it should acknowledge the message right away and send a "started" status message to the system of record.
As the process completes, send another message to say it's done.
This won't solve the dropped connection problem, but nothing will 100% solve that anyways. Instead, it will prevent the message re-queueing problem from happening when a connection is dropped.
This solution does introduce another problem, though: when the long running process crashes, how do you resume the work?
The basic answer is to use the system of record (your database) status for the job to tell you that you need to pick up that work again. When the app starts, check the database to see if there is work that is unfinished. If there is, resume or restart that work in whatever manner is appropriate.
I've already see this issue. The reason is you declare to use this queue. but you didn't bind the queue in the exchange.
for example:
#Bean(name = "test_queue")
public Queue testQueue() {
return queue("test_queue");
}
#RabbitListener(queues = "test_queue_1")
public void listenCreateEvent(){
}
if you listen a queue didn't bind to the exchange. it will happen.

How to store real-time chat messages in database?

I am using mysqldb for my database currently, and I need to integrate a messaging feature that is in real-time. The chat demo that Tornado provides does not implement a database, (whereas the blog does.)
This messaging service also will also double as an email in the future (like how Facebook's message service works. The chat platform is also email.) Regardless, I would like to make sure that my current, first chat version will be able to be expanded to function as email, and overall, I need to store messages in a database.
Is something like this as simple as: for every chat message sent, query the database and display the message on the users' screen. Or, is this method prone to suffer from high server load and poor optimization? How exactly should I structure the "infrastructure" to make this work?
(I apologize for some of the inherent subjectivity in this question; however, I prefer to "measure twice, code once.")
Input, examples, and resources appreciated.
Regards.
Tornado is a single threaded non blocking server.
What this means is that if you make any blocking calls on the main thread you will eventually kill performance. You might not notice this at first because each database call might only block for 20ms. But once you are making more than 200 database calls per seconds your application will effectively be locked up.
However that's quite a few DB calls. In your case that would be 200 people hitting send on their chat message in the same second.
What you probably want to do is use a queue with a non blocking API. So Tornado receives a chat message. You put it on the queue to be saved to the database by another process, then you send the chat message back out to the other chat members.
When someone connects to a chat session you also need to send off a request to the queue for all the previous messages, when the queue responds you send those off to the newly connected user.
That's how I would approach the problem anyway.
Also see this question and answer: Any suggestion for using non-blocking MySQL api on Tornado in Python3?
Just remember, Tornado is single threaded. It's amazing. And can handle thousands of simultaneous connections. But if code in one of those connections blocks for 1 second then NOTHING else will be done for any other connection during that second.

Categories

Resources