Create multiple Google Docs using task queue - python

I'm trying to create multiples Google Docs in background task.
I try to use the taskqueue from Google App Engine but I mustn't understand a point as I kept getting this message :
INFO 2016-05-17 15:38:46,393 module.py:787] default: "POST /update_docs HTTP/1.1" 302 -
WARNING 2016-05-17 15:38:46,393 taskqueue_stub.py:1981] Task task1 failed to execute. This task will retry in 0.800 seconds
Here is my code. I make a multiple call to the method UpdateDocs that need to be executed from the queue.
# Create a GDoc in the queue (called by her)
class UpdateDocs(BaseHandler):
#decorator.oauth_required
def post(self):
try:
http = decorator.http()
service = discovery.build("drive", "v2", http=http)
# Create the file
docs_name = self.request.get('docs_name')
body = {
'mimeType': DOCS_MIMETYPE,
'title': docs_name,
}
service.files().insert(body=body).execute()
except AccessTokenRefreshError:
self.redirect("/")
# Create multiple GDocs by calling the queue
class QueueMultiDocsCreator(BaseHandler):
def get(self):
try:
for i in range(5):
name = "File_n" + str(i)
taskqueue.add(
url='/update_docs',
params={
'docs_name': name,
})
self.redirect('/files')
except AccessTokenRefreshError:
self.redirect('/')
I can see the push queue in the App Engine Console, and every tasks is inside it but they can't run, I don't get why.

Kindly try to specify the worker module in your code.
As shown in Creating a new task, after calling the taskqueue.add() function, it targets the module named worker and invokes its handler by setting the url/update-counter.
class EnqueueTaskHandler(webapp2.RequestHandler):
def post(self):
amount = int(self.request.get('amount'))
task = taskqueue.add(
url='/update_counter',
target='worker',
params={'amount': amount})
self.response.write(
'Task {} enqueued, ETA {}.'.format(task.name, task.eta))
And from what I have read in this blog, a worker is one important piece of a task queue. It is a python process that reads jobs form the queue and executes them one at a time.
I hope that helps.

Related

Celery response to be displayed in flask app

Views.py
class UserCreate(Resource):
def post(self):
# try:
celery=create_user_employee.delay(auth_header, form_data, employee_id, employee_response)
# from celery import current_task
if "exc" in celery.get:
# print(celery)
return deliver_response(False, 'Celery task failed'), 400
return deliver_response(True, message="User Creation in Progress", status=200, data=data), 200
Task.py
#celery.task(name='accounts.tasks.create_user_employee')
def create_user_employee(auth_header, form_data, employee_id, employee_response):
try:
# add_employee(form_data, employee_id, eid)
if "streetLane" in form_data:
user_id=form_data.get("personalEmail","")
employee_address=address_post__(form_data, auth_header, user_id)
return deliver_response(True,message="success",status=200),200
except Exception as e:
return deliver_response(False, str(e), status=500), 500
Note:I am not able to return the response to flask app from tasks.py the objective here is that i need to break the views.py func if there is any error response from tasks.py but the result is not bein able to import or print in views.py any help would be great.
...................................................................................................................................................................................
As the celery process is a Async process, you will not get immediate response from celery job.
So this code
celery=create_user_employee.delay(auth_header, form_data, employee_id, employee_response)
doesn't give you result of the task. So celery variable doesn't give any meaningful info of task getting completed. Also don't use celery as a variable name.
So to fetch results of celery task,
from celery import result
res = result.AsyncResult(job_id)
So res variable has res.status and res.result to fetch status of task and result of job_id task. You can set your own job id when you call celery task.
When to fetch those results is upto you. Either write a recursive function that sees the status of job periodically or try to see job status whenever the result is required.
I would expect an Error to be thrown at the place where you have if "exc" in celery.get: ... something like TypeError: argument of type 'function' is not iterable.
Did you try to slightly change that line and have if "exc" in celery.get(): (notice the parenthesis)?
Also, giving a result of delay() a name "celery" is misleading - whoever reads your code may think that is an instance of the Celery class, which is not. That is an instance of the AsyncResult type. get() is just one of its methods.

Python: Flask simple task queue without external libraries not working

I'm trying to do a simple task queue with Flask and without any DB.
In the most simple version, I have two endpoints. Submit job and Check status.
Submit job will add request to queue and check status will return the status of a job id (queued, running, failed, finished).
The workflow is as follows:
user submits a job
job is added to queue
user will check status of the job every 5 seconds
every check of status will trigger a function that checks if the number of running jobs is smaller than the maximum number of jobs (from config). If the number is smaller, it will span another thread with the job on top of queue.
This is the simplified code:
app = Flask(__name__)
def finish_job(job_id):
finished.append(job_id)
last = running.pop(job_id)
last.close()
def remove_finished():
for j in list(running.keys()):
if not running[j].is_alive():
finish_job(j)
def start_jobs():
while len(running) < config.threads and len(queue_list) > 0:
print('running now', len(running))
next_job = queue.pop()
queue_list.remove(next_job[0])
start_job(*next_job)
#app.route("/Simulation", methods=['POST'])
#authenticate
def submit_job():
# create id
job_id = str(uuid.uuid4())
job_data = request.data.decode('utf-8')
queue.append((job_id, job_data))
queue_list.add(job_id)
return 'QUEUED', 200
#app.route("/Simulation/<uuid:job_id>", methods=['GET'])
#authenticate
def check_status(job_id: uuid):
job_id = str(job_id)
remove_finished()
start_jobs()
if job_id in running:
r = 'RUNNING'
elif job_id in queue_list:
r = 'QUEUED'
elif job_id in finished:
r = 'COMPLETED'
else:
r = 'FAILED'
return status_response(r), 200
running = {}
finished = []
queue = []
queue_list = set()
app.run()
Now, the problem is, that if multiple users submit a check status request at the same time, and there is only one slot free for running a task, both requests will spawn the job.
Is there some way to force Flask to only run one instance of a function at a time?
Thank you
after much searching, I have finally found an answer for this.
As of Flask 1.0, the builtin WSGI server runs threaded by default.
So, I just needed to add parameter to stop threads
app.run(threaded=False)

Python 3.7+: Wait until result is produced - api/task system

In the following code, an API gives a task to a task broker, who puts it in a queue, where it is picked up by a worker. The worker will then execute the task and notify the task broker (using a redis message channel) that he is done, after which the task broker will remove it from its queue. This works.
What I'd like is that the task broker is then able to return the result of the task to the API. But I'm unsure on how to do so since it is asynchronous code and I'm having difficulty figuring it out. Can you help?
Simplified the code is roughly as follows, but incomplete.
The API code:
#router.post('', response_model=BaseDocument)
async def post_document(document: BaseDocument):
"""Create the document with a specific type and an optional name given in the payload"""
task = DocumentTask({ <SNIP>
})
task_broker.give_task(task)
result = await task_broker.get_task_result(task)
return result
The task broker code, first part is giving the task, the second part is removing the task and the final part is what I assume should be a blocking call on the status of the removed task
def give_task(self, task_obj):
self.add_task_to_queue(task_obj)
<SNIP>
self.message_channel.publish(task_obj)
# ...
def remove_task_from_queue(self, task):
id_task_to_remove = task.id
for i in range(len(task_queue)):
if task_queue[i]["id"] == id_task_to_remove:
removed_task = task_queue.pop(i)
logger.debug(
f"[TaskBroker] Task with id '{id_task_to_remove}' succesfully removed !"
)
removed_task["status"] = "DONE"
return
# ...
async def get_task_result(self, task):
return task.result
My intuition would like to implement a way in get_task_result that blocks on task.result until it is modified, where I would modify it in remove_task_from_queue when it is removed from the queue (and thus done).
Any idea in how to do this, asynchronously?

Google Cloud Functions randomly retrying on success

I have a Google Cloud Function triggered by a PubSub. The doc states messages are acknowledged when the function end with success.
link
But randomly, the function retries (same execution ID) exactly 10 minutes after execution. It is the PubSub ack max timeout.
I also tried to get message ID and acknowledge it programmatically in Function code but the PubSub API respond there is no message to ack with that id.
In StackDriver monitoring, I see some messages not being acknowledged.
Here is my code : main.py
import base64
import logging
import traceback
from google.api_core import exceptions
from google.cloud import bigquery, error_reporting, firestore, pubsub
from sql_runner.runner import orchestrator
logging.getLogger().setLevel(logging.INFO)
def main(event, context):
bigquery_client = bigquery.Client()
firestore_client = firestore.Client()
publisher_client = pubsub.PublisherClient()
subscriber_client = pubsub.SubscriberClient()
logging.info(
'event=%s',
event
)
logging.info(
'context=%s',
context
)
try:
query_id = base64.b64decode(event.get('data',b'')).decode('utf-8')
logging.info(
'query_id=%s',
query_id
)
# inject dependencies
orchestrator(
query_id,
bigquery_client,
firestore_client,
publisher_client
)
sub_path = (context.resource['name']
.replace('topics', 'subscriptions')
.replace('function-sql-runner', 'gcf-sql-runner-europe-west1-function-sql-runner')
)
# explicitly ack message to avoid duplicates invocations
try:
subscriber_client.acknowledge(
sub_path,
[context.event_id] # message_id to ack
)
logging.warning(
'message_id %s acknowledged (FORCED)',
context.event_id
)
except exceptions.InvalidArgument as err:
# google.api_core.exceptions.InvalidArgument: 400 You have passed an invalid ack ID to the service (ack_id=982967258971474).
logging.info(
'message_id %s already acknowledged',
context.event_id
)
logging.debug(err)
except Exception as err:
# catch all exceptions and log to prevent cold boot
# report with error_reporting
error_reporting.Client().report_exception()
logging.critical(
'Internal error : %s -> %s',
str(err),
traceback.format_exc()
)
if __name__ == '__main__': # for testing
from collections import namedtuple # use namedtuple to avoid Class creation
Context = namedtuple('Context', 'event_id resource')
context = Context('666', {'name': 'projects/my-dev/topics/function-sql-runner'})
script_to_start = b' ' # launch the 1st script
script_to_start = b'060-cartes.sql'
main(
event={"data": base64.b64encode(script_to_start)},
context=context
)
Here is my code : runner.py
import logging
import os
from retry import retry
PROJECT_ID = os.getenv('GCLOUD_PROJECT') or 'my-dev'
def orchestrator(query_id, bigquery_client, firestore_client, publisher_client):
"""
if query_id empty, start the first sql script
else, call the given query_id.
Anyway, call the next script.
If the sql script is the last, no call
retrieve SQL queries from FireStore
run queries on BigQuery
"""
docs_refs = [
doc_ref.get() for doc_ref in
firestore_client.collection(u'sql_scripts').list_documents()
]
sorted_queries = sorted(docs_refs, key=lambda x: x.id)
if not bool(query_id.strip()) : # first execution
current_index = 0
else:
# find the query to run
query_ids = [ query_doc.id for query_doc in sorted_queries]
current_index = query_ids.index(query_id)
query_doc = sorted_queries[current_index]
bigquery_client.query(
query_doc.to_dict()['request'], # sql query
).result()
logging.info(
'Query %s executed',
query_doc.id
)
# exit if the current query is the last
if len(sorted_queries) == current_index + 1:
logging.info('All scripts were executed.')
return
next_query_id = sorted_queries[current_index+1].id.encode('utf-8')
publish(publisher_client, next_query_id)
#retry(tries=5)
def publish(publisher_client, next_query_id):
"""
send a message in pubsub to call the next query
this mechanism allow to run one sql script per Function instance
so as to not exceed the 9min deadline limit
"""
logging.info('Calling next query %s', next_query_id)
future = publisher_client.publish(
topic='projects/{}/topics/function-sql-runner'.format(PROJECT_ID),
data=next_query_id
)
# ensure publish is successfull
message_id = future.result()
logging.info('Published message_id = %s', message_id)
It looks like the pubsub message is not ack on success.
I do not think I have background activity in my code.
My question : why my Function is randomly retrying even when success ?
Cloud Functions does not guarantee that your functions will run exactly once. According to the documentation, background functions, including pubsub functions, are given an at-least-once guarantee:
Background functions are invoked at least once. This is because of the
asynchronous nature of handling events, in which there is no caller
that waits for the response. The system might, in rare circumstances,
invoke a background function more than once in order to ensure
delivery of the event. If a background function invocation fails with
an error, it will not be invoked again unless retries on failure are
enabled for that function.
Your code will need to expect that it could possibly receive an event more than once. As such, your code should be idempotent:
To make sure that your function behaves correctly on retried execution
attempts, you should make it idempotent by implementing it so that an
event results in the desired results (and side effects) even if it is
delivered multiple times. In the case of HTTP functions, this also
means returning the desired value even if the caller retries calls to
the HTTP function endpoint. See Retrying Background Functions for more
information on how to make your function idempotent.

Python muiltithreading is mixing the data of different request in django

I am using python muiltithreading for achieving a task which is like 2 to 3 mins long ,i have made one api endpoint in django project.
Here is my code--
from threading import Thread
def myendpoint(request):
print("hello")
lis = [ *args ]
obj = Model.objects.get(name =" jax")
T1 = MyThreadClass(lis, obj)
T1.start()
T1.deamon = True
return HttpResponse("successful", status=200)
Class MyThreadClass(Thread):
def __init__(self,lis,obj):
Thread.__init__(self)
self.lis = lis
self.obj = obj
def run(self):
for i in lis:
res =Func1(i)
self.obj.someattribute = res
self.obj.save()
def Func1(i):
'''Some big codes'''
context =func2(*args)
return context
def func2(*args):
"' some codes "'
return res
By this muiltithreading i can achieve the quick response from the django server on calling the endpoint function as the big task is thrown in another tread and execution of the endpoint thread is terminated on its return statement without keeping track of the spawned thread.
This part works for me correctly if i hit the url once , but if i hit the url 2 times as soon as 1st execution starts then on 2nd request i can see my request on console. But i cant get any response from it.
And if i hit the same url from 2 different client at the same time , both the individual datas are getting mixed up and i see few records of one client's request on other client data.
I am testing it to my local django runserver.
So guys please help , and i know about celery so dont recommend celery. Just tell me why this thing is happening or can it be fixed . As my task is not that long to use celery. I want to achieve it by muiltithreading.

Categories

Resources