We created a lambda which moves messages from DL SQS Queue to SQS Queue (target) on a schedule basis. As part of that I want to implement transactions.
Basically copying message to target Queue and then delete message in DL queue (source). But in any case after copying message to target queue, and fails to delete message in source queue, the message should be deleted from target queue.
Here is my source Code
import json
import boto3
import sys
import sys
def get_messages_from_queue(sqs_client, queue_url, max_message_count):
"""Generates messages from an SQS queue.
Note: this continues to generate messages until the queue is empty.
Every message on the queue will be deleted.
:param queue_url: URL of the SQS queue to read.
See https://alexwlchan.net/2018/01/downloading-sqs-queues/
processed_message_count = 0
while processed_message_count < max_message_count:
#print("Max Mesage Count: " + str(max_message_count))
remaining_message_count = max_message_count - processed_message_count
#print("Remaining messages: " + str(remaining_message_count))
receive_message_count = min(10, remaining_message_count)
get_resp = sqs_client.receive_message(
QueueUrl=queue_url, AttributeNames=["All"], MaxNumberOfMessages=receive_message_count
#print("Actual response:")
#print("Number of messages receieved: " + str(len(get_resp["Messages"])))
yield from get_resp["Messages"]
except KeyError:
entries = [
{"Id": msg["MessageId"], "ReceiptHandle": msg["ReceiptHandle"]}
for msg in get_resp["Messages"]
resp = sqs_client.delete_message_batch(QueueUrl=queue_url, Entries=entries)
if len(resp["Successful"]) != len(entries):
raise RuntimeError(
f"Failed to delete messages: entries={entries!r} resp={resp!r}"
processed_message_count += len(get_resp["Messages"])
print("After deleting, number of processed messages are: " + str(processed_message_count))
def lambda_handler(event, context):
max_message_count = event['MSG_TRANSFER_LIMIT']
src_queue_url = event["SRC_QUEUE_URL"]
dst_queue_url = event["DEST_QUEUE_URL"]
if src_queue_url == dst_queue_url:
sys.exit("Source and destination queues cannot be the same.")
sqs_client = boto3.client("sqs")
#while processed_message_count < max_message_count:
for message in get_messages_from_queue(sqs_client, src_queue_url, max_message_count):
response = sqs_client.send_message(QueueUrl=dst_queue_url, MessageBody=message["Body"])
return {
'ProcessedMessageCount': max_message_count
It is not possible to retrieve a specific message from an Amazon SQS queue. The code will call receive_messages() and get whatever is in the queue. There is no capability to select or filter which message(s) will be returned.
Frankly, if you are worried that the source message won't delete, then I would recommend implementing re-try code that attempts the deletion again. The inability to delete would most likely be due either to a transient networking error (which a re-try should fix), or the fact that the message is already deleted.
I'm implementing a telegram bot that will serve users. Initially, it used to get any new message sequentially, even in the middle of an ongoing session with another user. Because of that, anytime 2 or more users tried to use the bot, it used to get all jumbled up. To solve this I implemented a queue system that put users on hold until the ongoing conversation was finished. But this queue system is turning out to be a big hassle. I think my problems would be solved with just a method to get the new messages from a specific chat_id or user. This is the code that I'm using to get any new messages:
def get_next_message_result(self, update_id: int, chat_id: str):
get the next message the of a given chat.
In case of the next message being from another user, put it on the queue, and wait again for
expected one.
update_id += 1
link_requisicao = f'{self.url_base}getUpdates?timeout={message_timeout}&offset={update_id}'
result = json.loads(requests.get(link_requisicao).content)["result"]
if len(result) == 0:
return result, update_id # timeout
if "text" not in result[0]["message"]:
self.responder(speeches.no_text_speech, message_chat_id)
return [], update_id # message without text
message_chat_id = result[0]["message"]["chat"]["id"]
while message_chat_id != chat_id:
self.responder(speeches.wait_speech, message_chat_id)
if message_chat_id not in self.current_user_queue:
print("Queuing user with the following chat_id:", message_chat_id)
update_id += 1
link_requisicao = f'{self.url_base}getUpdates?timeout={message_timeout}&offset={update_id}'
result = json.loads(requests.get(link_requisicao).content)["result"]
if len(result) == 0:
return result, update_id # timeout
if "text" not in result[0]["message"]:
self.responder(speeches.no_text_speech, message_chat_id)
return [], update_id # message without text
message_chat_id = result[0]["message"]["chat"]["id"]
return result, update_id
On another note: I use the queue so that the moment the current conversation ends, it calls the next user in line. Should I just drop the queue feature and tell the concurrent users to wait a few minutes? While ignoring any messages not from the current chat_id?
I'm using SQS service in my application which pushes jobs to the queue based on client request.
I have the following polling mechanism setup in a separate job server.
def get_message(self):
response = self.queue.receive_message(
if response and ('Messages' in response) and (len(response["Messages"]) > 0):
return response['Messages'][0]
return None
def poll()
while True:
message = self.get_message()
if message is None:
config.logger.info(f'Processing message {message}')
message_receipt_handle = message["ReceiptHandle"]
except Exception:
config.logger.error(f'Error in processing message from queue: {str(traceback.format_exc())}')
// From job server
From the logs I observed that the messages are pushed to the queue, but it receives just one message and rest of the messages are just deleted without ever calling the action() method.
I'm pretty new to SQS and not sure what I'm doing wrong here. The polling takes place in a single thread.
SQS Attributes set,
ReceiveMessageWaitTimeSeconds: 20
VisibilityTimeout: 300
MaxReceiveCount: 5
I changed get_message() to return 10 messages (maximum) instead of 1 and modified poll() to iterate over the messages and now it seems to be working alright.
For my project using Firebase messaging to send push notification. I have users's firebase tokens stored on the database. Using them I sent push to each user. Total time of sending is about 100 seconds for 100 users. Is there way to send push asynchronously(I mean at one time to send many push notifications)
# Code works synchronously
for user in users:
message = messaging.Message(
title="Push title",
body="Push body"
token = user['fcmToken']
response = messaging.send(message)
Sure, you could use one of the python concurrency libraries. Here's one option:
from concurrent.futures import ThreadPoolExecutor, wait, ALL_COMPLETED
def send_message(user):
message = messaging.Message(
title="Push title",
body="Push body"),
token = user['fcmToken'])
return messaging.send(message)
with ThreadPoolExecutor(max_workers=10) as executor: # may want to try more workers
future_list = []
for u in users:
future_list.append(executor.submit(send_message, u))
wait(future_list, return_when=ALL_COMPLETED)
# note: we must use the returned self to get the test count
print([future.result() for future in future_list])
If you want to send the same message to all tokens, you can use a single API call with a multicast message. The Github repo has this sample of sending a multicast message in Python:
def send_multicast():
# [START send_multicast]
# Create a list containing up to 500 registration tokens.
# These registration tokens come from the client FCM SDKs.
registration_tokens = [
# ...
message = messaging.MulticastMessage(
data={'score': '850', 'time': '2:45'},
response = messaging.send_multicast(message)
# See the BatchResponse reference documentation
# for the contents of response.
print('{0} messages were sent successfully'.format(response.success_count))
# [END send_multicast]
I have a Google Cloud Function triggered by a PubSub. The doc states messages are acknowledged when the function end with success.
But randomly, the function retries (same execution ID) exactly 10 minutes after execution. It is the PubSub ack max timeout.
I also tried to get message ID and acknowledge it programmatically in Function code but the PubSub API respond there is no message to ack with that id.
In StackDriver monitoring, I see some messages not being acknowledged.
Here is my code : main.py
import base64
import logging
import traceback
from google.api_core import exceptions
from google.cloud import bigquery, error_reporting, firestore, pubsub
from sql_runner.runner import orchestrator
def main(event, context):
bigquery_client = bigquery.Client()
firestore_client = firestore.Client()
publisher_client = pubsub.PublisherClient()
subscriber_client = pubsub.SubscriberClient()
query_id = base64.b64decode(event.get('data',b'')).decode('utf-8')
# inject dependencies
sub_path = (context.resource['name']
.replace('topics', 'subscriptions')
.replace('function-sql-runner', 'gcf-sql-runner-europe-west1-function-sql-runner')
# explicitly ack message to avoid duplicates invocations
[context.event_id] # message_id to ack
'message_id %s acknowledged (FORCED)',
except exceptions.InvalidArgument as err:
# google.api_core.exceptions.InvalidArgument: 400 You have passed an invalid ack ID to the service (ack_id=982967258971474).
'message_id %s already acknowledged',
except Exception as err:
# catch all exceptions and log to prevent cold boot
# report with error_reporting
'Internal error : %s -> %s',
if __name__ == '__main__': # for testing
from collections import namedtuple # use namedtuple to avoid Class creation
Context = namedtuple('Context', 'event_id resource')
context = Context('666', {'name': 'projects/my-dev/topics/function-sql-runner'})
script_to_start = b' ' # launch the 1st script
script_to_start = b'060-cartes.sql'
event={"data": base64.b64encode(script_to_start)},
Here is my code : runner.py
import logging
import os
from retry import retry
PROJECT_ID = os.getenv('GCLOUD_PROJECT') or 'my-dev'
def orchestrator(query_id, bigquery_client, firestore_client, publisher_client):
if query_id empty, start the first sql script
else, call the given query_id.
Anyway, call the next script.
If the sql script is the last, no call
retrieve SQL queries from FireStore
run queries on BigQuery
docs_refs = [
doc_ref.get() for doc_ref in
sorted_queries = sorted(docs_refs, key=lambda x: x.id)
if not bool(query_id.strip()) : # first execution
current_index = 0
# find the query to run
query_ids = [ query_doc.id for query_doc in sorted_queries]
current_index = query_ids.index(query_id)
query_doc = sorted_queries[current_index]
query_doc.to_dict()['request'], # sql query
'Query %s executed',
# exit if the current query is the last
if len(sorted_queries) == current_index + 1:
logging.info('All scripts were executed.')
next_query_id = sorted_queries[current_index+1].id.encode('utf-8')
publish(publisher_client, next_query_id)
def publish(publisher_client, next_query_id):
send a message in pubsub to call the next query
this mechanism allow to run one sql script per Function instance
so as to not exceed the 9min deadline limit
logging.info('Calling next query %s', next_query_id)
future = publisher_client.publish(
# ensure publish is successfull
message_id = future.result()
logging.info('Published message_id = %s', message_id)
It looks like the pubsub message is not ack on success.
I do not think I have background activity in my code.
My question : why my Function is randomly retrying even when success ?
Cloud Functions does not guarantee that your functions will run exactly once. According to the documentation, background functions, including pubsub functions, are given an at-least-once guarantee:
Background functions are invoked at least once. This is because of the
asynchronous nature of handling events, in which there is no caller
that waits for the response. The system might, in rare circumstances,
invoke a background function more than once in order to ensure
delivery of the event. If a background function invocation fails with
an error, it will not be invoked again unless retries on failure are
enabled for that function.
Your code will need to expect that it could possibly receive an event more than once. As such, your code should be idempotent:
To make sure that your function behaves correctly on retried execution
attempts, you should make it idempotent by implementing it so that an
event results in the desired results (and side effects) even if it is
delivered multiple times. In the case of HTTP functions, this also
means returning the desired value even if the caller retries calls to
the HTTP function endpoint. See Retrying Background Functions for more
information on how to make your function idempotent.
I'm trying to create a simple Kafka producer based on confluent_kafka. My code is the following:
#!/usr/bin/env python
from confluent_kafka import Producer
import json
def delivery_report(err, msg):
"""Called once for each message produced to indicate delivery result.
Triggered by poll() or flush().
see https://github.com/confluentinc/confluent-kafka-python/blob/master/README.md"""
if err is not None:
print('Message delivery failed: {}'.format(err))
print('Message delivered to {} [{}]'.format(
msg.topic(), msg.partition()))
class MySource:
"""Kafka producer"""
def __init__(self, kafka_hosts, topic):
:kafka_host list(str): hostnames or 'host:port' of Kafka
:topic str: topic to produce messages to
self.topic = topic
# see https://github.com/edenhill/librdkafka/blob/master/CONFIGURATION.md
config = {
'metadata.broker.list': ','.join(kafka_hosts),
'group.id': 'mygroup',
self.producer = Producer(config)
def main():
topic = 'my-topic'
message = json.dumps({
'measurement': [1, 2, 3]})
mys = MySource(['kafka'], topic)
topic, message, on_delivery=delivery_report)
if __name__ == "__main__":
The first time I use a topic (here: "my-topic"), Kafka does react with "Auto creation of topic my-topic with 1 partitions and replication factor 1 is successful (kafka.server.KafkaApis)". However, the call-back function (on_delivery=delivery_report) is never called and it hangs at flush() (it terminates if I set a timeout for flush) neither the first time nor subsequent times. The Kafka logs does not show anything if I use an existing topic.