Latency under low traffic in Google PubSub

Latency under low traffic in Google PubSub - python

I have noticed that under high load pubsub gives great throughput with pretty low latency. But if I want to send a single message, the latency can often be several seconds. I have used the publish_time in the incoming message to see how long the message spent in the queue and it is usually pretty low. Can't tell if, under very low traffic conditions, a published message doesn't actually get sent by the client libraries right away or if the libraries don't deliver it to the application immediately. I am using asynchronous pull in Python.

There can be several factors that impact latency of low-throughput Pub/Sub streams. First of all, the publish-side client library does wait a period of time to try to batch messages by default. You can get a little bit of improvement by setting the max_messages property of the pubsub_v1.types.BatchSettings to 1, which will ensure that every message is sent as soon as it is ready.
In general, there is also the issue of cold caches in the Pub/Sub service. If your publish rate is infrequent, say, O(1) publish call every 10-15 minutes, then the service may have to load state on each publish that can delay the delivery. If low latency for these messages is very important, our current recommendation is to send a heartbeat message every few seconds to keep all of the state active. You can add an attribute to the messages to indicate it is a heartbeat message and have your subscriber ignore them.

Related

Allowing message dropping in websockets

Is there a simple method or library to allow a websocket to drop certain messages if bandwidth doesn't allow? Or any one of the following?
to measure the queue size of outgoing messages that haven't yet reached a particular client
to measure the approximate bitrate that a client has been receiving recent messages at
to measure the time that a particular write_message finished being transmitted to the client
I'm using Tornado on the server side (tornado.websocket.WebSocketHandler) and vanilla JS on the client side. In my use case it's really only important that the server realize that a client is slow and throttle its messages (or use lossier compression) when it realizes that condition.

You can implement this on top of what you have by having the client confirm every message it gets and then use that information on the server to adapt the sending of messages to each client.
This is the only way you will know which outgoing messages haven't yet reached the client, be able to approximate bitrate or figure out the time it took for the message to reach the client. You must consider that the message back to the server will also take time and that if you use timestamps on the client, they will likely not match your servers as clients have their time set incorrectly more often than not.

Can you detect how well a client is keeping up with a stream of websocket messages?

I'm writing a video streaming service, and was thinking of streaming video via websockets.
A problem I foresee is that the client has insufficient bandwidth to receive the stream, so I want to be able to detect if I'm getting too far ahead of my client, and throttle down the messages to either a lower framerate or quality.
Can you detect when tornado is sending too much for the client to receive?

You don't have to worry about a slow network; but you do have to worry about a fast network.
You won't be able to write more data to the network than the client is able to accept. So you will not get ahead.
Let's say you're reading and sending the video in chunks. This is what your code may look like:
while True:
self.write(chunk)
await self.flush() # write chunk to network
The await self.flush() statement will pause the loop until the chunk has been written to the network. So if it's a slow network, it will pause for longer. As you can see you don't have to worry about getting far ahead of the client.
However, if your client's network is fast, then the flush operation will also be very fast and this may block your server because this loop will keep running until all the data has been sent and the IOLoop won't get a chance to serve other clients.
For those cases, Ben Darnell, tornado's maintainer, offered a very clever solution in a google forums thread which he calls:
serve each client at a "fair" rate instead of allowing a single client to consume as much bandwidth as you can give it.
Here's the code (taken directly from Ben Darnell's post):
while True:
# Start the clock to ensure a steady maximum rate
deadline = IOLoop.current().time() + 0.1
# Read a 1MB chunk
self.write(chunk)
await self.flush()
# This sleep will be instant if the deadline has already passed;
# otherwise we'll sleep long enough to keep the transfer
# rate around 10MB/sec (adjust the numbers above as needed
# for your desired transfer rate)
await gen.sleep(deadline)
Now, even if the flush operation is fast, in the next statement the loop will sleep until the deadline thereby allowing the server to serve other clients.

Google PubSub message duplication

I am using The python client (That comes as part of google-cloud 0.30.0) to process messages.
Sometimes (about 10% ) my messages are being duplicated. I will get the same message again and again up to 50 instances within a few hours.
My Subscription setup is for a 600 seconds ack time but a message may be resent a minute after its predecessor.
While running , I would occasionally get 503 errors (Which I log with my policy_class)
Has anybody experienced that behavior? any ideas ?
My code look like
c = pubsub_v1.SubscriberClient(policy_class)
subscription = c.subscribe(c.subscription_path(my_proj ,my_topic)
res = subscription.open(callback=callback_func)
res.result()
def callback_func(msg)
try:
log.info('got %s', msg.data )
...
finally:
ms.ack()

The client library you are using uses a new Pub/Sub API for subscribing called StreamingPull. One effect of this is that the subscription deadline you have set is no longer used, and instead one calculated by the client library is. The client library also automatically extends the deadlines of messages for you.
When you get these duplicate messages - have you already ack'd the message when it is redelivered, or is this while you are still processing it? If you have already ack'd, are there some messages you have avoided acking? Some messages may be duplicated if they were ack'd but messages in the same batch needed to be sent again.
Also keep in mind that some duplicates are expected currently if you take over a half hour to process a message.

This seems to be an issue with google-cloud-pubsub python client, I upgraded to version 0.29.4 and ack() work as expected

In general, duplicates can happen given that Google Cloud Pub/Sub offers at-least-once delivery. Typically, this rate should be very low. A rate of 10% would be very high. In this particular instance, it was likely an issue in the client libraries that resulted in excessive duplicates, which was fixed in April 2018.
For the general case of excessive duplicates there are a few things to check to determine if the problem is on the user side or not. There are two places where duplication can happen: on the publish side (where there are two distinct messages that are each delivered once) or on the subscribe side (where there is a single message delivered multiple times). The way to distinguish the cases is to look at the messageID provided with the message. If the same ID is repeated, then the duplication is on the subscribe side. If the IDs are unique, then duplication is happening on the publish side. In the latter case, one should look at the publisher to see if it is getting errors that are resulting in publish retries.
If the issue is on the subscriber side, then one should check to ensure that messages are being acknowledged before the ack deadline. Messages that are not acknowledged within this time will be redelivered. If this is the issue, then the solution is to either acknowledge messages faster (perhaps by scaling up with more subscribers for the subscription) or by increasing the acknowledgement deadline. For the Python client library, one sets the acknowledgement deadline by setting the max_lease_duration in the FlowControl object passed into the subscribe method.

kafka producer parameters require one message sent to take effect

I'm using confluent-kafka-python (https://github.com/confluentinc/confluent-kafka-python) to send some messages to Kafka, using Python. I send messages infrequently, so want the latency to be really really low.
If I do this, I can get messages to appear to my consumer with about a 2ms delay:
conf = { "bootstrap.servers" : "kafka-test-10-01",
"queue.buffering.max.ms" : 0,
'batch.num.messages': 1,
'queue.buffering.max.messages': 100,
"default.topic.config" : {"acks" : 0 }}
p = confluent_kafka.Producer(**conf)
p.produce(...)
BUT: the latency only drops to near zero after I've sent a first message with this new producer. Subsequent messages have latency near the 2ms mark.
The first message though has around a 1 second latency. Why?

Magnus Edenhill, the author of librdkafka, documented some useful parameters to set to decrease latency in any librdkafka client:
https://github.com/edenhill/librdkafka/wiki/How-to-decrease-message-latency
You don't show your consumer parameters but from your description it sounds like the consumer is polling and rightly getting nothing (null messages) before the first message is published and so it then waits the default 500 ms fetch.error.backoff.ms interval before trying to poll again and getting the first message. After that the messages are probably coming fast enough that the error back off is not triggered. Perhaps try setting fetch.error.backoff.ms lower and see if that helps.

ZMQ DEALER ROUTER loses message at high frequency?

I am sending 20000 messages from a DEALER to a ROUTER using pyzmq.
When I pause 0.0001 seconds between each messages they all arrive but if I send them 10x faster by pausing 0.00001 per message only around half of the messages arrive.
What is causing the problem?

What is causing the problem?
A default setup of the ZMQ IO-thread - that is responsible for the mode of operations.
I would dare to call it a problem, the more if you invest your time and dive deeper into the excellent ZMQ concept and architecture.
Since early versions of the ZMQ library, there were some important parameters, that help the central masterpiece ( the IO-thread ) keep the grounds both stable and scalable and thus giving you this powerful framework.
Zero SHARING / Zero COPY / (almost) Zero LATENCY are the maxims that do not come at zero-cost.
The ZMQ.Context instance has quite a rich internal parametrisation that can be modified via API methods.
Let me quote from a marvelous and precious source -- Pieter HINTJENS' book, Code Connected, Volume 1.
( It is definitely worth spending time and step through the PDF copy. C-language code snippets do not hurt anyone's pythonic state of mind as the key messages are in the text and stories that Pieter has crafted into his 300+ thrilling pages ).
High-Water Marks
When you can send messages rapidly from process to process, you soon discover that memory is a precious resource, and one that can be trivially filled up. A few seconds of delay somewhere in a process can turn into a backlog that blows up a server unless you understand the problem and take precautions.
...
ØMQ uses the concept of HWM (high-water mark) to define the capacity of its internal pipes. Each connection out of a socket or into a socket has its own pipe, and HWM for sending, and/or receiving, depending on the socket type. Some sockets (PUB, PUSH) only have send buffers. Some (SUB, PULL, REQ, REP) only have receive buffers. Some (DEALER, ROUTER, PAIR) have both send and receive buffers.
In ØMQ v2.x, the HWM was infinite by default. This was easy but also typically fatal for high-volume publishers. In ØMQ v3.x, it’s set to 1,000 by default, which is more sensible. If you’re still using ØMQ v2.x, you should always set a HWM on your sockets, be it 1,000 to match ØMQ v3.x or another figure that takes into account your message sizes and expected subscriber performance.
When your socket reaches its HWM, it will either block or drop data depending on the socket type. PUB and ROUTER sockets will drop data if they reach their HWM, while other socket types will block. Over the inproc transport, the sender and receiver share the same buffers, so the real HWM is the sum of the HWM set by both sides.
Lastly, the HWM-s are not exact; while you may get up to 1,000 messages by default, the real buffer size may be much lower (as little as half), due to the way libzmq implements its queues.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.