Pubsub from a Function published in the background? - python

The whole reason I use PubSub is in order to not make my Cloud Function to wait, so things happens auto. To publish a topic from a Function, the code Google Docs show is :
# Publishes a message to a Cloud Pub/Sub topic.
def publish(topic_name,message):
# Instantiates a Pub/Sub client
publisher = pubsub_v1.PublisherClient()
if not topic_name or not message:
return ('Missing "topic" and/or "message" parameter.', 400)
# References an existing topic
topic_path = publisher.topic_path(PROJECT_ID, topic_name)
message_json = json.dumps({
'data': {'message': message},
})
message_bytes = message_json.encode('utf-8')
# Publishes a message
try:
publish_future = publisher.publish(topic_path, data=message_bytes)
publish_future.result() # Verify the publish succeeded
return 'Message published.'
except Exception as e:
print(e)
return (e, 500)
Which means Function is waiting for respond, but i want my Function to spend 0 seconds on this. How can I publish and forget ? not wait for respond? (without more dependencies?)

As you can see from the comments in the code, it is waiting to make sure that the publish succeeded. It's not waiting for any sort of response from any of the subscribers on that topic. It's extremely important the code wait until the publish succeeds, otherwise the message might not actually be sent at all, and you risk losing that data entirely. This is because Cloud Functions terminates the code and locks down CPU and I/O after the function returns.
If you really want to risk it, you could try removing the call to result(), but I don't think it's a good idea if you want a reliable system.

You can schedule your functions to run at certain times of the day or every 'interval' time. In this example, this would go into your index.js file and deployed to your functions.
The code would run 'every minute' in the background. The error would simply return to your logs in google cloud console.
If you are using firestore and need to manage documents, you can make the function run on specific events like on document create or update etc..
https://firebase.google.com/docs/functions/firestore-events
EDIT: Not exactly sure if this example matches your use case but hope this example helps
exports.scheduledFx = functions.pubsub.schedule('every minute').onRun(async (context) => {
// Cron time string Description
// 30 * * * * Execute a command at 30 minutes past the hour, every hour.
// 0 13 * * 1 Execute a command at 1:00 p.m. UTC every Monday.
// */5 * * * * Execute a command every five minutes.
// 0 */2 * * * Execute a command every second hour, on the hour.
try {
//your code here
} catch (error) {
return error
}
})

Related

Facebook Marketing API - how to handle rate limit for retrieving *all* ad sets through campaign ids?

I've recently started working with the Facebook Marketing API, using the facebook_business SDK for Python (running v3.9 on Ubuntu 20.04). I think I've mostly wrapped my head around how it works, however, I'm still kind of at a loss as to how I can handle the arbitrary way in which the API is rate-limited.
Specifically, what I'm attempting to do is to retrieve all Ad Sets from all the campaigns that have ever run on my ad account, regardless of whether their effective_status is ACTIVE, PAUSED, DELETED or ARCHIVED.
Hence, I pulled all the campaigns for my ad account. These are stored in a dict, whereby the key indicates the effective_status, like so, called output:
{'ACTIVE': ['******************',
'******************',
'******************'],
'PAUSED': ['******************',
'******************',
'******************'}
Then, I'm trying to pull the Ad Set ids, like so:
import pandas as pd
import json
import re
import time
from random import *
from facebook_business.api import FacebookAdsApi
from facebook_business.adobjects.adaccount import AdAccount # account-level info
from facebook_business.adobjects.campaign import Campaign # campaign-level info
from facebook_business.adobjects.adset import AdSet # ad-set level info
from facebook_business.adobjects.ad import Ad # ad-level info
# auth init
app_id = open(APP_ID_PATH, 'r').read().splitlines()[0]
app_secret = open(APP_SECRET_PATH, 'r').read().splitlines()[0]
token = open(APP_ACCESS_TOKEN, 'r').read().splitlines()[0]
# init the connection
FacebookAdsApi.init(app_id, app_secret, token)
campaign_types = list(output.keys())
ad_sets = {}
for status in campaign_types:
ad_sets_for_status = []
for campaign_id in output[status]:
# sleep and wait for a random time
sleepy_time = uniform(1, 3)
time.sleep(sleepy_time)
# pull the ad_sets for this particular campaign
campaign_ad_sets = Campaign(campaign_id).get_ad_sets()
for entry in campaign_ad_sets:
ad_sets_for_status.append(entry['id'])
ad_sets[status] = ad_sets_for_status
Now, this crashes at different times whenever I run it, with the following error:
FacebookRequestError:
Message: Call was not successful
Method: GET
Path: https://graph.facebook.com/v11.0/23846914220310083/adsets
Params: {'summary': 'true'}
Status: 400
Response:
{
"error": {
"message": "(#17) User request limit reached",
"type": "OAuthException",
"is_transient": true,
"code": 17,
"error_subcode": 2446079,
"fbtrace_id": "***************"
}
}
I can't reproduce the time at which it crashes, however, it certainly doesn't take ~600 calls (see here: https://stackoverflow.com/a/29690316/5080858), and as you can see, I'm sleeping ahead of every API call. You might suggest that I should just call the get_ad_sets method on the AdAccount endpoint, however, this pulls fewer ad sets than the above code does, even before it crashes. For my use-case, it's important to pull ads that are long over as well as ads that are ongoing, hence it's important that I get as much data as possible.
I'm kind of annoyed with this -- seeing as we are paying for these ads to run, you'd think FB would make it as easy as possible to retrieve info on them via API, and not introduce API rate limits similar to those for valuable data one doesn't necessarily own.
Anyway, I'd appreciate any kind of advice or insights - perhaps there's also a much better way of doing this that I haven't considered.
Many thanks in advance!
The error with 'code': 17 means that you reach the limit of call and in order to get more nodes you have to wait.
Firstly I would handle the error in this way:
from facebook_business.exceptions import FacebookRequestError
...
for status in campaign_types:
ad_sets_for_status = []
for campaign_id in output[status]:
# keep trying until the request is ok
while True:
try:
campaign_ad_sets = Campaign(campaign_id).get_ad_sets()
break
except FacebookRequestError as error:
if error.api_error_code() in [17, 80000]:
time.sleep(sleepy_time) # sleep for a period of time
for entry in campaign_ad_sets:
ad_sets_for_status.append(entry['id'])
ad_sets[status] = ad_sets_for_status
I'd like to suggest you moreover to fetch the list of nodes from the account (by using the 'level': node param in params) and by using the batch calls: I can assure you that this will help you a lot and it will decrease the program run time.
I hope I was helpful.

Telegram bot api how to schedule a notification?

I've made a bot that gets today football matches and if the user wants he can get a reminder 10 min before a selected match.
while current_time != new_hour:
now = datetime.now()
current_time = now.strftime("%H:%M")
#return notification
text_caps = "Your match starts in 10 minutes"
context.bot.send_message(chat_id=update.effective_chat.id, text=text_caps)
Obviously while the loop runs i can not use another command . I am new to programming how could i implement this so i still get the notification but while that runs i can use other commands?
Thank you!
Try to use an aiogram and you can make scheduled tasks with aiocron (store users who wants to get notification in database or in global dict)
You can schedule a job.
Let's say you have a CommandHandler("watch_match", watch_match) that listens for a /watch_match conmmand and 10 minutes later a message is supposed to arrive
def watch_match(update: Update, context: CallbackContext):
chat_id = update.effective_chat.id
ten_minutes = 60 * 10 # 10 minutes in seconds
context.job_queue.run_once(callback=send_match_info, when=ten_minutes, context=chat_id)
# Whatever you pass here as context is available in the job.context variable of the callback
def send_match_info(context: CallbackContext):
chat_id = context.job.context
context.bot.send_message(chat_id=chat_id, text="Yay")
A more detailed example in the official repository
And in the official documentation you can see the run_once function

Get Latest Message for a Confluent Kafka Topic in Python

Here's what I've tried so far:
from confluent_kafka import Consumer
c = Consumer({... several security/server settings skipped...
'auto.offset.reset': 'beginning',
'group.id': 'my-group'})
c.subscribe(['my.topic'])
msg = poll(30.0) # msg is of None type.
msg almost always ends up being None though. I think the issue might be that 'my-group' has already consumed all the messages for 'my.topic'... but I don't care whether a message has already been consumed or not - I still need the latest message. Specifically, I need the timestamp from that latest message.
I tried a bit more, and from this it looks like there are probably 25 messages in the topic, but I have no idea how to get at them:
a = c.assignment()
print(a) # Outputs [TopicPartition{topic=my.topic,partition=0,offset=-1001,error=None}]
offsets = c.get_watermark_offsets(a[0])
print(offsets) # Outputs: (25, 25)
If there are no messages because the topic has never had anything written to it at all, how can I determine that? And if that's the case, how can I determine how long the topic has existed for? I'm looking to write a script that automatically deletes any topics that haven't been written to in the past X days (14 initially - will probably tweak it over time.)
I run into the same issue, and no example on this. In my case there is one partition, and I need to read the last message, to know the some info from that message to setup the consumer/producer component I have.
Logic is that start Consumer, subscribe to topic, poll for message -> this triggers on_assign, where the rewinding happens, by assigning the modified partitions back. After on_assign finishes, the poll for msg continues and reads the last message from topic.
settings = {
"bootstrap.servers": "my.kafka.server",
"group.id": "my-work-group",
"client.id": "my-work-client-1",
"enable.auto.commit": False,
"session.timeout.ms": 6000,
"default.topic.config": {"auto.offset.reset": "largest"},
}
consumer = Consumer(settings)
def on_assign(a_consumer, partitions):
# get offset tuple from the first partition
last_offset = a_consumer.get_watermark_offsets(partitions[0])
# position [1] being the last index
partitions[0].offset = last_offset[1] - 1
consumer.assign(partitions)
consumer.subscribe(["test-topic"], on_assign=on_assign)
msg = consumer.poll(6.0)
Now msg is having the last message inside.
If anyone still needs an example for case with multiple partitions; this is how I did it:
from confluent_kafka import OFFSET_END, Consumer
settings = {
'bootstrap.servers': "my.kafka.server",
'group.id': "my-work-group",
'auto.offset.reset': "latest"
}
def on_assign(consumer, partitions):
for partition in partitions:
partition.offset = OFFSET_END
consumer.assign(partitions)
consumer = Consumer(settings)
consumer.subscribe(["test-topic"], on_assign=on_assign)
msg = consumer.poll(1.0)

Google Cloud Functions randomly retrying on success

I have a Google Cloud Function triggered by a PubSub. The doc states messages are acknowledged when the function end with success.
link
But randomly, the function retries (same execution ID) exactly 10 minutes after execution. It is the PubSub ack max timeout.
I also tried to get message ID and acknowledge it programmatically in Function code but the PubSub API respond there is no message to ack with that id.
In StackDriver monitoring, I see some messages not being acknowledged.
Here is my code : main.py
import base64
import logging
import traceback
from google.api_core import exceptions
from google.cloud import bigquery, error_reporting, firestore, pubsub
from sql_runner.runner import orchestrator
logging.getLogger().setLevel(logging.INFO)
def main(event, context):
bigquery_client = bigquery.Client()
firestore_client = firestore.Client()
publisher_client = pubsub.PublisherClient()
subscriber_client = pubsub.SubscriberClient()
logging.info(
'event=%s',
event
)
logging.info(
'context=%s',
context
)
try:
query_id = base64.b64decode(event.get('data',b'')).decode('utf-8')
logging.info(
'query_id=%s',
query_id
)
# inject dependencies
orchestrator(
query_id,
bigquery_client,
firestore_client,
publisher_client
)
sub_path = (context.resource['name']
.replace('topics', 'subscriptions')
.replace('function-sql-runner', 'gcf-sql-runner-europe-west1-function-sql-runner')
)
# explicitly ack message to avoid duplicates invocations
try:
subscriber_client.acknowledge(
sub_path,
[context.event_id] # message_id to ack
)
logging.warning(
'message_id %s acknowledged (FORCED)',
context.event_id
)
except exceptions.InvalidArgument as err:
# google.api_core.exceptions.InvalidArgument: 400 You have passed an invalid ack ID to the service (ack_id=982967258971474).
logging.info(
'message_id %s already acknowledged',
context.event_id
)
logging.debug(err)
except Exception as err:
# catch all exceptions and log to prevent cold boot
# report with error_reporting
error_reporting.Client().report_exception()
logging.critical(
'Internal error : %s -> %s',
str(err),
traceback.format_exc()
)
if __name__ == '__main__': # for testing
from collections import namedtuple # use namedtuple to avoid Class creation
Context = namedtuple('Context', 'event_id resource')
context = Context('666', {'name': 'projects/my-dev/topics/function-sql-runner'})
script_to_start = b' ' # launch the 1st script
script_to_start = b'060-cartes.sql'
main(
event={"data": base64.b64encode(script_to_start)},
context=context
)
Here is my code : runner.py
import logging
import os
from retry import retry
PROJECT_ID = os.getenv('GCLOUD_PROJECT') or 'my-dev'
def orchestrator(query_id, bigquery_client, firestore_client, publisher_client):
"""
if query_id empty, start the first sql script
else, call the given query_id.
Anyway, call the next script.
If the sql script is the last, no call
retrieve SQL queries from FireStore
run queries on BigQuery
"""
docs_refs = [
doc_ref.get() for doc_ref in
firestore_client.collection(u'sql_scripts').list_documents()
]
sorted_queries = sorted(docs_refs, key=lambda x: x.id)
if not bool(query_id.strip()) : # first execution
current_index = 0
else:
# find the query to run
query_ids = [ query_doc.id for query_doc in sorted_queries]
current_index = query_ids.index(query_id)
query_doc = sorted_queries[current_index]
bigquery_client.query(
query_doc.to_dict()['request'], # sql query
).result()
logging.info(
'Query %s executed',
query_doc.id
)
# exit if the current query is the last
if len(sorted_queries) == current_index + 1:
logging.info('All scripts were executed.')
return
next_query_id = sorted_queries[current_index+1].id.encode('utf-8')
publish(publisher_client, next_query_id)
#retry(tries=5)
def publish(publisher_client, next_query_id):
"""
send a message in pubsub to call the next query
this mechanism allow to run one sql script per Function instance
so as to not exceed the 9min deadline limit
"""
logging.info('Calling next query %s', next_query_id)
future = publisher_client.publish(
topic='projects/{}/topics/function-sql-runner'.format(PROJECT_ID),
data=next_query_id
)
# ensure publish is successfull
message_id = future.result()
logging.info('Published message_id = %s', message_id)
It looks like the pubsub message is not ack on success.
I do not think I have background activity in my code.
My question : why my Function is randomly retrying even when success ?
Cloud Functions does not guarantee that your functions will run exactly once. According to the documentation, background functions, including pubsub functions, are given an at-least-once guarantee:
Background functions are invoked at least once. This is because of the
asynchronous nature of handling events, in which there is no caller
that waits for the response. The system might, in rare circumstances,
invoke a background function more than once in order to ensure
delivery of the event. If a background function invocation fails with
an error, it will not be invoked again unless retries on failure are
enabled for that function.
Your code will need to expect that it could possibly receive an event more than once. As such, your code should be idempotent:
To make sure that your function behaves correctly on retried execution
attempts, you should make it idempotent by implementing it so that an
event results in the desired results (and side effects) even if it is
delivered multiple times. In the case of HTTP functions, this also
means returning the desired value even if the caller retries calls to
the HTTP function endpoint. See Retrying Background Functions for more
information on how to make your function idempotent.

PyAPNs and the need to Sleep between Sends

I am using PyAPNs to send notifications to iOS devices. I am often sending groups of notifications at once. If any of the tokens is bad for any reason, the process will stop. As a result I am using the enhanced setup and the following method:
apns.gateway_server.register_response_listener
I use this to track which token was the problem and then I pick up from there sending the rest. The issue is that when sending the only way to trap these errors is to use a sleep timer between token sends. For example:
for x in self.retryAPNList:
apns.gateway_server.send_notification(x, payload, identifier = token)
time.sleep(0.5)
If I don't use a sleep timer no errors are caught and thus my entire APN list is not sent to as the process stops when there is a bad token. However, this sleep timer is somewhat arbitrary. Sometimes the .5 seconds is enough while other times I have had to set it to 1. In no case has it worked without some sleep delay being added. Doing this slows down web calls and it feels less than bullet proof to enter random sleep times.
Any suggestions for how this can work without a delay between APN calls or is there a best practice for the delay needed?
Adding more code due to the request made below. Here are 3 methods inside of a class that I use to control this:
class PushAdmin(webapp2.RequestHandler):
retryAPNList=[]
channelID=""
channelName = ""
userName=""
apns = APNs(use_sandbox=True,cert_file="mycert.pem", key_file="mykey.pem", enhanced=True)
def devChannelPush(self,channel,name,sendAlerts):
ucs = UsedChannelStore()
pus = PushUpdateStore()
channelName = ""
refreshApnList = pus.getAPN(channel)
if sendAlerts:
alertApnList,channelName = ucs.getAPN(channel)
if not alertApnList: alertApnList=[]
if not refreshApnList: refreshApnList=[]
pushApnList = list(set(alertApnList+refreshApnList))
elif refreshApnList:
pushApnList = refreshApnList
else:
pushApnList = []
self.retryAPNList = pushApnList
self.channelID = channel
self.channelName = channelName
self.userName = name
self.retryAPNPush()
def retryAPNPush(self):
token = -1
payload = Payload(alert="A message from " +self.userName+ " posted to "+self.channelName, sound="default", badge=1, custom={"channel":self.channelID})
if len(self.retryAPNList)>0:
token +=1
for x in self.retryAPNList:
self.apns.gateway_server.send_notification(x, payload, identifier = token)
time.sleep(0.5)
Below is the calling class (abbreviate to reduce non-related items):
class ChannelStore(ndb.Model):
def writeMessage(self,ID,name,message,imageKey,fileKey):
notify = PushAdmin()
notify.devChannelPush(ID,name,True)
Below is the slight change I made to the placement of the sleep timer that seems to have resolved the issue. I am, however, still concerned for whether the time given will be the right amount in all circumstances.
def retryAPNPush(self):
identifier = 1
token = -1
payload = Payload(alert="A message from " +self.userName+ " posted to "+self.channelName, sound="default", badge=1, custom={"channel":self.channelID})
if len(self.retryAPNList)>0:
token +=1
for x in self.retryAPNList:
self.apns.gateway_server.send_notification(x, payload, identifier = token)
time.sleep(0.5)
Resolution:
As noted in the comments at bottom, the resolution to this problem was to move the following statement to the module level outside the class. By doing this there is no need for any sleep statements.
apns = APNs(use_sandbox=True,cert_file="mycert.pem", key_file="mykey.pem", enhanced=True)
In fact, PyAPNS will auto resend dropped notifications for you, please see PyAPNS
So you don't have to retry by yourself, you can just record what notifications have bad tokens.
The behavior of your code might be result from APNS object kept in local scope (within if len(self.retryAPNList)>0:)
I suggest you to pull out APNS object to class or module level, so that it can complete its error handling procedure and reuse the TCP connection.
Please kindly let me know if it helps, thanks :)

Categories

Resources