Celery. RuntimeError: can't start new thread - python

I use Celery (Redis as brocker) to send telegram messages (using Telegram Bot API).
My VPS has 4 shared CPUs.
I divide the list of recipients into groups of 5 people; every 1-3 seconds, the bot sends messages to these groups.
The problem is that when testing (~2000 receivers) at a certain point, my tasks fail due to a RuntimeError: can't start new thread error (while the CPU load does not exceed 15% and when using ulimit -u I get 7795).
Please tell me what the problem is and how it can be solved.
models.py
class Post(models.Model):
text = models.TextField(max_length=4096, blank=True, null=True, default=None, verbose_name="Text")
#receiver(post_save, sender=Post)
def instance_created(sender, instance, created, **kwargs):
if created:
pre_send_tg.apply_async((instance.id,), countdown=5)
tasks.py
def chunks(lst, n):
res = []
for i in range(0, len(lst), n):
res.append(lst[i:i + n])
return res
#celery_app.task(ignore_result=True)
def pre_send_tg(post_id):
try:
Post = apps.get_model('news.post')
TelegramUser = apps.get_model('tools.telegramuser')
post = Post.objects.get(id=post_id)
users = [x.tg_id for x in TelegramUser.objects.all()]
_start = datetime.datetime.now() + datetime.timedelta(seconds=5)
count = 0
for i in chunks(users, 5):
for tg_id in i:
send_message.apply_async((tg_id, post.text), eta=_start)
if count % 100 == 0:
_start += datetime.timedelta(seconds=random.randint(50, 60))
else:
_start += datetime.timedelta(seconds=random.randint(2, 3))
count += 1
except Exception as e:
print(e)
#celery_app.task(ignore_result=True, time_limit=10, autoretry_for=(Exception,),
retry_backoff=1800, retry_kwargs={'max_retries': 2})
def send_message(tg_id, text):
bot = telebot.TeleBot(token)
try:
bot.send_message(chat_id=tg_id, text=text)
except Exception as e:
if e.args == (
'A request to the Telegram API was unsuccessful. Error code: 403. Description: Forbidden: bot was blocked by the user',):
pass
elif e.args[0].startswith(
"A request to the Telegram API was unsuccessful. Error code: 429."):
raise Exception
else:
pass
celery commands
celery -A Bot worker --loglevel=INFO --concurrency=10 -n worker1#%h --purge
celery -A Bot worker --loglevel=INFO --concurrency=10 -n worker2#%h --purge
celery -A Bot worker --loglevel=INFO --concurrency=10 -n worker3#%h --purge

Related

Python stomp.py connection gets disconnected and listener stops working

I am writing a python script using the python stomp library to connect and subscribe to an ActiveMQ message queue.
My code is very similar to the examples in the documentation "Dealing with disconnects" with the addition of the timer being placed in a loop for a long running listener.
The listener class is working to receive and process messages. However after a few minutes, the connection gets disconnected and then the listener stops picking up messages.
Problem:
The on_disconnected method is getting called which runs the connect_and_subscribe() method, however it seems the listener stops working after this happens. Perhaps the listener needs to be re-initialized? After the script is run again, the listener is re-created, it starts picking up messages again, but this is not practical to keep running the script again periodically.
Question 1: How can I set this up to re-connect and re-create the listener automatically?
Question 2: Is there a better way to initialize a long-running listener rather than the timeout loop?
import os, time, datetime, stomp
_host = os.getenv('MQ_HOST')
_port = os.getenv('MQ_PORT')
_user = os.getenv('MQ_USER')
_password = os.getenv('MQ_PASSWORD')
_queue = os.getenv('QUEUE_NAME')
# Subscription id is unique to the subscription in this case there is only one subscription per connection
sub_id = 1
def connect_and_subscribe(conn):
conn.connect(_user, _password, wait=True)
conn.subscribe(destination=_queue, id=sub_id, ack='client-individual')
print('connect_and_subscribe connecting {} to with connection id {}'.format(_queue, sub_id), flush=True)
class MqListener(stomp.ConnectionListener):
def __init__(self, conn):
self.conn = conn
self.sub_id = sub_id
print('MqListener init')
def on_error(self, frame):
print('received an error "%s"' % frame.body)
def on_message(self, headers, body):
print('received a message headers "%s"' % headers)
print('message body "%s"' % body)
time.sleep(1)
print('processed message')
print('Acknowledging')
self.conn.ack(headers['message-id'], self.sub_id)
def on_disconnected(self):
print('disconnected! reconnecting...')
connect_and_subscribe(self.conn)
def initialize_mqlistener():
conn = stomp.Connection([(_host, _port)], heartbeats=(4000, 4000))
conn.set_listener('', MqListener(conn))
connect_and_subscribe(conn)
# https://github.com/jasonrbriggs/stomp.py/issues/206
while conn.is_connected():
time.sleep(2)
conn.disconnect()
if __name__ == '__main__':
initialize_mqlistener()
I was able to solve this issue by refactoring the retry attempts loop and the on_error handler. Also, I have installed and configured supervisor in the docker container to run and manage the listener process. That way if the listener program stops it will be automatically restarted by the supervisor process manager.
Updated python stomp listener script
init_listener.py
import os, json, time, datetime, stomp
_host = os.getenv('MQ_HOST')
_port = os.getenv('MQ_PORT')
_user = os.getenv('MQ_USER')
_password = os.getenv('MQ_PASSWORD')
# The listener will listen for messages that are relevant to this specific worker
# Queue name must match the 'worker_type' in job tracker file
_queue = os.getenv('QUEUE_NAME')
# Subscription id is unique to the subscription in this case there is only one subscription per connection
_sub_id = 1
_reconnect_attempts = 0
_max_attempts = 1000
def connect_and_subscribe(conn):
global _reconnect_attempts
_reconnect_attempts = _reconnect_attempts + 1
if _reconnect_attempts <= _max_attempts:
try:
conn.connect(_user, _password, wait=True)
print('connect_and_subscribe connecting {} to with connection id {} reconnect attempts: {}'.format(_queue, _sub_id, _reconnect_attempts), flush=True)
except Exception as e:
print('Exception on disconnect. reconnecting...')
print(e)
connect_and_subscribe(conn)
else:
conn.subscribe(destination=_queue, id=_sub_id, ack='client-individual')
_reconnect_attempts = 0
else:
print('Maximum reconnect attempts reached for this connection. reconnect attempts: {}'.format(_reconnect_attempts), flush=True)
class MqListener(stomp.ConnectionListener):
def __init__(self, conn):
self.conn = conn
self._sub_id = _sub_id
print('MqListener init')
def on_error(self, headers, body):
print('received an error "%s"' % body)
def on_message(self, headers, body):
print('received a message headers "%s"' % headers)
print('message body "%s"' % body)
message_id = headers.get('message-id')
message_data = json.loads(body)
task_name = message_data.get('task_name')
prev_status = message_data.get('previous_step_status')
if prev_status == "success":
print('CALLING DO TASK')
resp = True
else:
print('CALLING REVERT TASK')
resp = True
if (resp):
print('Ack message_id {}'.format(message_id))
self.conn.ack(message_id, self._sub_id)
else:
print('NON Ack message_id {}'.format(message_id))
self.conn.nack(message_id, self._sub_id)
print('processed message message_id {}'.format(message_id))
def on_disconnected(self):
print('disconnected! reconnecting...')
connect_and_subscribe(self.conn)
def initialize_mqlistener():
conn = stomp.Connection([(_host, _port)], heartbeats=(4000, 4000))
conn.set_listener('', MqListener(conn))
connect_and_subscribe(conn)
# https://github.com/jasonrbriggs/stomp.py/issues/206
while True:
time.sleep(2)
if not conn.is_connected():
print('Disconnected in loop, reconnecting')
connect_and_subscribe(conn)
if __name__ == '__main__':
initialize_mqlistener()
Supervisor installation and configuration
Dockerfile
Some details removed for brevity
# Install supervisor
RUN apt-get update && apt-get install -y supervisor
# Add the supervisor configuration file
ADD supervisord.conf /etc/supervisor/conf.d/supervisord.conf
# Start supervisor with the configuration file
CMD ["/usr/bin/supervisord", "-c", "/etc/supervisor/conf.d/supervisord.conf"]
supervisor.conf
[supervisord]
nodaemon=true
logfile=/home/exampleuser/logs/supervisord.log
[program:mqutils]
command=python3 init_listener.py
directory=/home/exampleuser/mqutils
user=exampleuser
autostart=true
autorestart=true

"Runtime error event loop already running" during asyncio

I am trying out some asyncio examples found on the web:
Proxybroker example
When I run this first example:
"""Find and show 10 working HTTP(S) proxies."""
import asyncio
from proxybroker import Broker
async def show(proxies):
while True:
proxy = await proxies.get()
if proxy is None: break
print('Found proxy: %s' % proxy)
proxies = asyncio.Queue()
broker = Broker(proxies)
tasks = asyncio.gather(
broker.find(types=['HTTP', 'HTTPS'], limit=10),
show(proxies))
loop = asyncio.get_event_loop()
loop.run_until_complete(tasks)
I get the error:
RuntimeError: This event loop is already running
But the loop completes as expected.
I'm new to concurrent code so any explanation / pseudo code of what is occurring would be appreciated.
I install this package,and run it passed, no error occured,are use a ide? try to run it on cli,or move it another directory

Asynchronous Service Calls Python

I am working on a service wherein we have to store the user data in DB and send an email to the user as a notification and return the success response and it is taking some time to complete this process.
As Python script execute synchronously I want to run this process asynchronously, for example, user details are stored and return the success response and then mail process has to be done asynchronously (later after returning the success response) such that the overall response should not depend on this mail execution
def userregistration(details):
#store user details
result = storeuserdb(details)
print("storeuserdb result", result)
result["status"] == True:
sendmailtouser() #have to be asynchronously
return result
def storeuserdb(details)
#store code goes here
def sendmailtouser()
#email code goes here
Is there any chance to run again after returning the response to service?
Like this
def userregistration(details):
#store user details
result = storeuserdb(details)
print("storeuserdb result", result)
return result
result["status"] == True:
sendmailtouser() #have to be asynchronously
You can use Threads. Here is a little example:
import threading
def worker(num):
"""thread worker function"""
print 'Worker: %s\n' % num
if num == 1:
print 'I am number 1'
raw_input("Press Enter to continue...")
return
threads = []
for i in range(5):
t = threading.Thread(target=worker, args=(i,))
threads.append(t)
t.start()
Output:
Worker: 3
Worker: 0
Worker: 1
Worker: 2
Worker: 4
I am number 1
Press Enter to continue...
The program does not stop when the worker 1 is handled. Although an input is expected.

How to check in python rabbitmq if messages not delivered in last 5 seconds

Hello im trying to write code in python with rabbitmq. I have an queue where i send messages but i have to check on the consumer if message have been sent in last 5 seconds and if not i should terminate the process. I tryied to search the internet for such function but there is no relevant answer, can you suggest me something guys?
RabbitMQ includes a heartbeat to detect unresponsive peers/failed messages
From the docs:
Detecting Dead TCP Connections with Heartbeats
In some types of network failure, packet loss can mean that disrupted
TCP connections take a moderately long time (about 11 minutes with
default configuration on Linux, for example) to be detected by the
operating system. AMQP 0-9-1 offers a heartbeat feature to ensure that
the application layer promptly finds out about disrupted connections
(and also completely unresponsive peers). Heartbeats also defend
against certain network equipment which may terminate "idle" TCP
connections.
To enable Hearbeats with Java Client :
ConnectionFactory cf = new ConnectionFactory();
// set the heartbeat timeout to 5 seconds
cf.setRequestedHeartbeat(5);
Similarly with .NET Client:
var cf = new ConnectionFactory();
// set the heartbeat timeout to 5 seconds
cf.RequestedHeartbeat = 5;
Hope this helps.
(there's more about dead-letter exchanges in the rabbitmq docs, also about the nack and ack/(neg/pos) delivery/confirms on this page but configuring Heartbeats should do the trick.)
EDIT: Sorry, there's also a python remote procedure callback example in the docs! It requires 'pika'.. missed that!
Server Code Example:
#!/usr/bin/env python
import pika
connection = pika.BlockingConnection(pika.ConnectionParameters(host='localhost'))
channel = connection.channel()
channel.queue_declare(queue='rpc_queue')
def fib(n):
if n == 0:
return 0
elif n == 1:
return 1
else:
return fib(n-1) + fib(n-2)
def on_request(ch, method, props, body):
n = int(body)
print(" [.] fib(%s)" % n)
response = fib(n)
ch.basic_publish(exchange='',
routing_key=props.reply_to,
properties=pika.BasicProperties(correlation_id = \
props.correlation_id),
body=str(response))
ch.basic_ack(delivery_tag = method.delivery_tag)
channel.basic_qos(prefetch_count=1)
channel.basic_consume(on_request, queue='rpc_queue')
print(" [x] Awaiting RPC requests")
channel.start_consuming()
Client Code example:
#!/usr/bin/env python
import pika
import uuid
class FibonacciRpcClient(object):
def __init__(self):
self.connection = pika.BlockingConnection(pika.ConnectionParameters(host='localhost'))
self.channel = self.connection.channel()
result = self.channel.queue_declare(exclusive=True)
self.callback_queue = result.method.queue
self.channel.basic_consume(self.on_response, no_ack=True,
queue=self.callback_queue)
def on_response(self, ch, method, props, body):
if self.corr_id == props.correlation_id:
self.response = body
def call(self, n):
self.response = None
self.corr_id = str(uuid.uuid4())
self.channel.basic_publish(exchange='',
routing_key='rpc_queue',
properties=pika.BasicProperties(
reply_to = self.callback_queue,
correlation_id = self.corr_id,
),
body=str(n))
while self.response is None:
self.connection.process_data_events()
return int(self.response)
fibonacci_rpc = FibonacciRpcClient()
print(" [x] Requesting fib(30)")
response = fibonacci_rpc.call(30)
print(" [.] Got %r" % response)

Thread polling sqs and adding it to a python queue for processing dies

I have a piece of multi threaded code - 3 threads that polls data from SQS and add it to a python queue. 5 threads that take the messages from python queue, process them and send it to a back end system.
Here is the code:
python_queue = Queue.Queue()
class GetDataFromSQS(threading.Thread):
"""Threaded Url Grab"""
def __init__(self, python_queue):
threading.Thread.__init__(self)
self.python_queue = python_queue
def run(self):
while True:
time.sleep(0.5) //sleep for a few secs before querying again
try:
msgs = sqs_queue.get_messages(10)
if msgs == None:
print "sqs is empty now"!
for msg in msgs:
#place each message block from sqs into python queue for processing
self.python_queue.put(msg)
print "Adding a new message to Queue. Queue size is now %d" % self.python_queue.qsize()
#delete from sqs
sqs_queue.delete_message(msg)
except Exception as e:
print "Exception in GetDataFromSQS :: " + e
class ProcessSQSMsgs(threading.Thread):
def __init__(self, python_queue):
threading.Thread.__init__(self)
self.python_queue = python_queue
self.pool_manager = PoolManager(num_pools=6)
def run(self):
while True:
#grabs the message to be parsed from sqs queue
python_queue_msg = self.python_queue.get()
try:
processMsgAndSendToBackend(python_queue_msg, self.pool_manager)
except Exception as e:
print "Error parsing:: " + e
finally:
self.python_queue.task_done()
def processMsgAndSendToBackend(msg, pool_manager):
if msg != "":
###### All the code related to processing the msg
for individualValue in processedMsg:
try:
response = pool_manager.urlopen('POST', backend_endpoint, body=individualValue)
if response == None:
print "Error"
else:
response.release_conn()
except Exception as e:
print "Exception! Post data to backend: " + e
def startMyPython():
#spawn a pool of threads, and pass them queue instance
for i in range(3):
sqsThread = GetDataFromSQS(python_queue)
sqsThread.start()
for j in range(5):
parseThread = ProcessSQSMsgs(python_queue)
#parseThread.setDaemon(True)
parseThread.start()
#wait on the queue until everything has been processed
python_queue.join()
# python_queue.close() -- should i do this?
startMyPython()
The problem:
3 python workers die randomly (monitored using top -p -H) once every few days and everything is alright if i kill the process and start the script again. I suspect the workers that vanish are the 3 GetDataFromSQS threads.. And because the GetDataFromSQS dies, the other 5 workers although running always sleep as there is no data in the python queue. I am not sure what I am doing wrong here as I am pretty new to python and followed this tutorial for creating this queuing logic and threads - http://www.ibm.com/developerworks/aix/library/au-threadingpython/
Thanks in advance for your help. Hope I have explained my problem clear.
The problem for the thread hanging was related to getting a handle of the sqs queue. I used IAM for managing credentials and the boto sdk for connecting to sqs.
The root cause for this issue was that the boto package was reading the metadata for auth from AWS and it was failing once in a while.
The fix is to edit the boto config, increasing the attempts that are made to perform the auth call to AWS.
[Boto]
metadata_service_num_attempts = 5
( https://groups.google.com/forum/#!topic/boto-users/1yX24WG3g1E )

Categories

Resources