I'm trying to create background task for my django app to translate some text.
I've managed to write code that works when I execute it directly using .py file:
os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = r"xxxxx\GoogleCloudKey.json"
client = translate.TranslationServiceClient()
def translate_text(source_lang, target_lang, text):
output = client.translate_text(
contents=[text],
target_language_code=target_lang,
source_language_code=source_lang,
parent='projects/xxxx',
)
translated_text = ''
for translation in output.translations:
translated_text += translation.translated_text
print(translated_text)
return translated_text
def translate_table(target_lang):
print('Recieved task!')
print('Target lang :', target_lang)
data = ['Wood and Wood Residuals', 'test application]
#data = ModelXXX.objects.all()
for row in data:
if row:
row = translate_text('en-US', target_lang, row)
but when I add #shared_task decorator to translate_table function and execute it from my django app (using translate_table.delay(target_lang)) it throws google.api_core.exceptions.ServiceUnavailable: 503 Deadline Exceeded after Making request: POST https://oauth2.googleapis.com/token. Do you have any idea what can cause this error?
I'm starting celery background worker using celery -A responsiblee worker -l info -P eventlet --loglevel=DEBUG. Also I have tried using another worker: celery -A responsiblee worker -l info -P gevent --loglevel=DEBUG
Related
I have a flask app which sends emails/SMSs to users at a specific time using the ETA/Countdown celery functions with Redis as a broker, The issue is the emails & SMS tasks duplicate randomly - sometimes users get 10 emails/SMSs sometimes users get 20+ for these tasks and the task is only supposed to run once off. The data flow:
Initial function schedule_event_main calls the ETA tasks with the notifications
date_event = datetime.combine(day, time.max)
schedule_ratings_email.apply_async([str(event[0])], eta=date_event)
schedule_ratings_sms.apply_async([str(event[0])], eta=date_event)
Inside function schedule_ratings_email & schedule_ratings_sms task is the .delay task function which creates the individual celery tasks to send out the emails + SMSs to the various guests for an event.
#app.task(bind=True)
def schedule_ratings_email(self,event_id):
""" Fetch feed of URLs to crawl and queue up a task to grab and process
each url. """
try:
url = SITE_URL + 'user/dashboard'
guests = db.session.query(EventGuest).filter(EventGuest.event_id == int(event_id)).all()
event_details = db.session.query(Event).filter(Event.id == event_id).first()
if guests:
if event_details.status == "archived":
for guest in guests:
schedule_individual_ratings_emails.delay(guest.user.first_name, guest.event.host.first_name, guest.user.email,url)
except Exception as e:
log.error("Error processing ratings email for %s" % event_id, exc_info=e)
# self.retry()
This is the final .delay individual task for sending the notifications:
#app.task()
def schedule_individual_ratings_emails(guest_name, host, guest, url):
try:
email_rating(guest_name, host, guest, url)
except Exception as e:
log.error("Error processing ratings email for %s", exc_info=e)
I've tried multiple SO answers and tweaked a lot of variables including celery settings however the notifications are still duplicating. It's only the ETA/Countdown tasks and ONLY with 3rd party servers as I have certain ETA tasks which have DB data writing and those tasks don't have any issues.
This is both an issue on local and heroku (production). Current tech stack:
Flask==1.0.2
celery==4.1.0
Redis 4.0.9
Celery startup: worker: celery worker --app openseat.tasks --beat --concurrency 1 --loglevel info
Celery config details:
CELERY_ACKS_LATE = True
CELERY_TASK_SERIALIZER = 'json'
CELERY_RESULT_SERIALIZER = 'json'
CELERY_ACCEPT_CONTENT = ['json']
CELERY_TIMEZONE = 'Africa/Johannesburg'
CELERY_ENABLE_UTC = True
I have a SQS queue on a LocalStack server and I'm trying to consume messages from it with a Celery consumer.
It seams that the consumer is properly attached to the queue, for example the queue sqs-test-queue, but it does not receive any message when I try to send one with aws command.
My celeryconfig.py looks like this:
from kombu import (
Exchange,
Queue
)
broker_transport_options = {'region': REGION}
broker_transport = 'sqs'
accept_content = ['application/json']
result_serializer = 'json'
content_encoding = 'utf-8'
task_serializer = 'json'
worker_enable_remote_control = False
worker_send_task_events = False
result_backend = None
task_queues = (
Queue('sqs-test-queue', exchange=Exchange(''), routing_key='sqs-test-queue'),
)
and my tasks.py module looks like this:
from celery import Celery
from kombu.utils.url import quote
AWS_ACCESS_KEY = quote("AWS_ACCESS_KEY")
AWS_SECRET_KEY = quote("AWS_SECRET_KEY")
LOCALSTACK = "<IP>:<PORT>"
broker_url = "sqs://{access}:{secret}#{host}".format(access=AWS_ACCESS_KEY,
secret=AWS_SECRET_KEY,
host=LOCALSTACK)
app = Celery('tasks', broker=broker_url, backend=None)
app.config_from_object('celeryconfig')
#app.task(bind=True, name='tasks.consume', acks_late=True, ignore_result=True)
def consume(self, msg):
# DO SOMETHING WITH THE RECEIVED MESSAGE
return True
Tried to execute it with celery -A tasks worker -l INFO -Q sqs-test-queue and everything seams OK:
...
[tasks]
. tasks.consume
[... INFO/MainProcess] Connected to sqs://AWS_ACCESS_KEY:**#<IP>:<PORT>//
[... INFO/MainProcess] celery#local ready
but when I try to send a message with aws sqs send-message --endpoint-url=http://<IP>:<PORT> --queue-url=http://localhost:<PORT>/queue/sqs-test-queue --message-body="Test message", nothing happens.
What am I doing wrong? Have I missed something in the configuration maybe?
PS: If I try to run the command aws sqs receive-message --endpoint-url=http://<IP>:<PORT> --queue-url=http://localhost:<PORT>/queue/sqs-test-queue, I'm able to get the message.
NOTE:
I'm using Python 3.7.0 and my pip freeze looks like this:
boto3==1.10.16
botocore==1.13.16
celery==4.3.0
kombu==4.6.6
pycurl==7.43.0.3
...
I am going through the same thing as you. To fix it I did a couple of things:
I set the HOSTNAME_EXTERNAL and HOSTNAME env variables in localstack
Set broker_url to sqs://{access}:{secret}#{host}:{port} (as you have it)
Make sure that the celery worker's broker_transport_options does not include the config item: wait_time_seconds since this causes errors with localstack as of February 7th, 2020 (check issue here).
Once I did those two things, it started working, hope it helps.
Celery can't publish or consume arbitrary messages to/from any message queue system. Use kombu for that - that is what Celery uses behind the scenes too.
I am building a webapp using flask and using celery to send mails periodically.
The problem is, whenever the there is a new entry in the database, celery doesn't sees it and continues to use the old entries. I have to restart the celery worker each time to make it work properly. Celery beat is running and I am using redis as a broker.
Celery related functions:
#celery.on_after_configure.connect
def setup_periodic_tasks(sender, **kwargs):
sender.add_periodic_task(30.0, appointment_checkout, name='appointment_checkout')
#celery.task(name='app.Blueprints.periods.appointment_checkout')
def appointment_checkout():
from app.Blueprints.api.routes import fetchAllAppointments, fetch_user_email, fetch_user_phone
from app.Blueprints import db
dt = datetime.now() + timedelta(minutes=10)
#fa = Appointment.query.filter_by(date = dt.strftime("%Y-%m-%d"))
fa = fetchAllAppointments()
for i in fa:
# send emails to clients and counsellors
try:
if(str(i.date.year) != dt.strftime("%Y") or str(i.date.month) != dt.strftime("%m") or str(i.date.day) != dt.strftime("%d")):
continue
except:
continue
if(i.reminderFlag == 1):
continue
if(int(dt.strftime("%H")) == int(i.time.hour) and int(dt.strftime("%M")) == int(i.time.minute)):
client = fetch_user_email(i.user)
counsellor = fetch_user_email(i.counsellor)
client_phone = fetch_user_phone(i.user)
counsellor_phone = fetch_user_phone(i.counsellor)
i.reminderFlag = 1
db.session.add(i)
db.session.commit()
# client email
subject = "appointment notification"
msg = "<h1>Greetings</h1></p>This is to notify you that your appointment is about to begin soon.</p>"
sendmail.delay(subject, msg, client)
sendmail.delay(subject, msg, counsellor)
sendmsg.delay(client_phone, msg)
sendmsg.delay(counsellor_phone, msg)
When I add something in the appointment table, celery doesn't sees the new entry. After restarting the celery worker it sees it.
I am running beat and worker using the following commands:
celery -A periods beat --loglevel=INFO
celery -A periods worker --loglevel=INFO --concurrency=2
We have Python 3.6.1 set up with Django, Celery, and Rabbitmq on Ubuntu 14.04. Right now, I'm using the Django debug server (for dev and Apache isn't working). My current problem is that the celery workers get launched from Python and immediately die -- processes show as defunct. If I use the same command in a terminal window, the worker gets created and picks up the task if there is one waiting in the queue.
Here's the command:
celery worker --app=myapp --loglevel=info --concurrency=1 --maxtasksperchild=20 -n celery_1 -Q celery
The same functionality occurs for whichever queues are being set up.
In the terminal, we see the output myapp.settings - INFO - Loading... followed by output that describes the queue and lists the tasks. When running from Python, the last thing we see is the Loading...
In the code, we do have a check to be sure we are not running the celery command as root.
These are the Celery settings from our settings.py file:
CELERY_ACCEPT_CONTENT = ['json','pickle']
CELERY_TASK_SERIALIZER = 'pickle'
CELERY_RESULT_SERIALIZER = 'json'
CELERY_IMPORTS = ('api.tasks',)
CELERYD_PREFETCH_MULTIPLIER = 1
CELERYD_CONCURRENCY = 1
BROKER_POOL_LIMIT = 120 # Note: I tried this set to None but it didn't seem to make any difference
CELERYD_LOG_COLOR = False
CELERY_LOG_FORMAT = '%)asctime)s - $(processName)s - %(levelname)s - %(message)s'
CELERYD_HIJACK_ROOT_LOGGER = False
STATIC_URL = '/static/'
STATIC_ROOT = os.path.join(psconf.BASE_DIR, 'myapp_static/')
BROKER_URL = psconf.MQ_URI
CELERY_RESULT_BACKEND = 'rpc'
CELERY_RESULT_PERSISTENT = True
CELERY_ROUTES = {}
for entry in os.scandir(psconf.PLUGIN_PATH):
if not entry.is_dir() or entry.name == '__pycache__':
continue
plugin_dir = entry.name
settings_file = f'{plugin_dir}.settings'
try:
plugin_tasks = importlib.import_module(settings_file)
queue_name = plugin_tasks.QUEUENAME
except ModuleNotFoundError as e:
logging.warning(e)
except AttributeError:
logging.debug(f'The plugin {plugin_dir} will use the general worker queue.')
else:
CELERY_ROUTES[f'{plugin_dir}.tasks.run'] = {'queue': queue_name}
logging.debug(f'The plugin {plugin_dir} will use the {queue_name} queue.')
Here is the part that kicks off the worker:
class CeleryWorker(BackgroundProcess):
def __init__(self, n, q):
self.name = n
self.worker_queue = q
cmd = f'celery worker --app=myapp --loglevel=info --concurrency=1 --maxtasksperchild=20 -n {self.name" -Q {self.worker_queue}'
super().__init__(cmd, cwd=str(psconf.BASE_DIR))
class BackgroundProcess(subprocess.Popen):
def __init__(self, args, **kwargs):
super().__init__(args, shell=True, stdout=subprocess.PIPE, stderr=subprocess.PIPE, universal_newlines=True, **kwargs)
Any suggestions as to how to get this working from Python are appreciated. I'm new to Rabbitmq/Celery.
Just in case someone else needs this...It turns out that the problem was that the shell script which kicks off this whole app is now being launched with sudo and, even though I thought I was checking so we wouldn't launch the celery worker with sudo, I'd missed something and we were trying to launch as root. That is a no-no. I'm now explicitly using 'sudo -u ' and the workers are starting properly.
I want to accept multiple concurrent request for Flask API. API is currently getting "company name" through POST method and call the crawler engine, and each crawling process takes 5-10 minutes to finish. I want to run many crawler engine in parallel for different respective that many request. I followed this, but could not get it working. Currently, second request is cancelling the first request. How can I achieve this parallelism?
Current API implementation:
app.py
app = Flask(__name__)
app.debug = True
#app.route("/api/v1/crawl", methods=['POST'])
def crawl_end_point():
if not request.is_json:
abort(415)
inputs = CompanyNameSchema(request)
if not inputs.validate():
return jsonify(success=False, errros=inputs.errors)
data = request.get_json()
company_name = data.get("company_name")
print(company_name)
if company_name is not None:
search = SeedListGenerator(company_name)
search.start_crawler()
scrap = RunAllScrapper(company_name)
scrap.start_all()
subprocess.call(['/bin/bash', '-i', '-c', 'myconda;scrapy crawl company_profiler;'])
return 'Data Pushed successfully to Solr Index!', 201
if __name__ == "__main__":
app.run(host="10.250.36.52", use_reloader=True, threaded=True)
gunicorn.sh
#!/bin/bash
NAME="Crawler-API"
FLASKDIR=/root/Public/company_profiler
SOCKFILE=/root/Public/company_profiler/sock
LOG=./logs/gunicorn/gunicorn.log
PID=./guincorn.pid
user=root
GROUP=root
NUM_WORKERS=10 # generally in the 2-4 x $(NUM_CORES)+1 range
TIMEOUT=1200
#preload_apps = False
# The maximum number of requests a worker will process before restarting.
MAX_REQUESTS=0
echo "Starting $NAME"
# Create the run directory if it doesn't exist
RUNDIR=$(dirname $SOCKFILE)
test -d $RUNDIR || mkdir -p $RUNDIR
# Start your gunicorn
exec gunicorn app:app -b 0.0.0.0:5000 \
--name $NAME \
--worker-class gevent \
--workers 5 \
--keep-alive 900 \
--graceful-timeout 1200 \
--worker-connections 5 \
--user=$USER --group=$GROUP \
--bind=unix:$SOCKFILE \
--log-level info \
--backlog 0 \
--pid=$PID \
--access-logformat='%(h)s %(l)s %(u)s %(t)s "%(r)s" %(s)s %(b)s "%(f)s" "%(a)s' \
--error-logfile $LOG \
--log-file=-
Thanks in advance!
Better way - using Job Queue with Redis or something similar.
You will can create queues for jobs, get results and organize exchange with frontend via API requests. Every job will be working in separate process without stuck main application. In other case you will need resolving problems with bottlenecks on every step.
Good implementation - RQ lib or flask-rq fo Redis.
http://python-rq.org/
Start instance of Redis (i'm using docker for it)
Write your own worker like this:
import redis
from rq import Worker, Queue, Connection
listen = ['high', 'default', 'low']
redis_url = os.getenv('REDISTOGO_URL', 'redis://localhost:6379')
conn = redis.from_url(redis_url)
if __name__ == '__main__':
with Connection(conn):
worker = Worker(map(Queue, listen))
worker.work()
Start workers via flask or via console(better for debug process)
and create job in queue and controling results.
from redis import Redis
from rq import Queue
q = Queue(connection=Redis())
def crawl_end_point():
...
#adding task to queue
result = q.enqueue(crawl_end_point, timeout=3600)
#simplest way save id of job
session['j_id'] = result.get_id()
#get job status
job = Job.fetch(session['j_id'], connection=conn)
job.get_status()
#get job results
job.result
Also you can check Celery for this purposes:
https://stackshare.io/stackups/celery-vs-redis