Interact between processes with multiprocessing? - python

I'm working on a project that's going to be handling thousands of accounts at a time with a limit of x workers at a time. I have a Handler class that's going to be the core of the program.
When the Handler class initiates it creates a new IMAP process that monitors an email inbox for incoming mail. It also checks to see how many accounts are currently created that are stored locally, if accounts created are < x, it starts x workers running an account creator and stores the created accounts locally.
When the IMAP process receives a new email it checks the contents of email for specific processing. It should alert the Handler process what to do with the received email.
I'm still new to threading, and especially multiprocessing, and used a queue from the parent thread to send commands to the child thread. However, in this use-case I need the child IMAP process to be able to communicate to the Handler class to tell it when a new email arrives and what to do when it's received.
How can I initiate x workers with multiprocessing while being able to communicate child->parent and child->child?
example of my IMAP class, using the same format for my other classes
class IMAP(object):
def __init__(self):
self._config = configparser.ConfigParser().read('config.ini')['imap']
self._mail = imaplib.IMAP4_SSL(self._config['host'])
self._mail.login(self._config['username'], self._config['password'])
self._mail.list()
def run(self):
while True:
# login to email and monitor
EDIT
I apologize for confusion, I'm bad at describing my thoughts. I'm using multi-processing, not threading. Here's a rough diagram
(x for account processing creation is x workers)

Related

Telegram bot - problem with function sync blocking

I have a telegram bot written in python that lets users create EC2 instances in AWS. The code is the following:
# We create the new EC2 instance for the new user
instance_id, ec2 = generateInstanceForUser(user_id)
i = ec2.Instance(id=instance_id) # instance id
i.start()
i.wait_until_running()
i.load()
time.sleep(45)
# Create account in DB
createAccountDB(user_id, username, user.mail, instance_id)
# Now that the instance and the account have been created, now settings have to be updated too
updateSettings(user_id, dictChange)
The problem is that function generateInstanceForUser(user_id) is blocking the workflow, as well as the following 5 lines (obvious, with time.sleep() function). The last function updateSettings() connects via SSH to the just created machine and do some operations. Without considering delays, this workflow works well.
HOWEVER, since I am using a Telegram bot, during this portion of the code the bot freezes during 2 minutes. As a result, if there are other users sending commands, the bot does not respond, and that is not desirable, obviously.
NOTE: functions used are held in the boto3 library.
QUESTION
Do you know some alternative to avoid workflow blocking during the execution of the given code in order to avoid bad UX with Telegram bot? Thank you.
I found myself the answer. I just encapsuled the blocking portion of the code inside another function, and used threading in order to create a parallel thread. This way, the main thread would not block and the bot would still keep working normally:
threads = []
t = threading.Thread(target=workerFunc, args=(apiKey, apiSecret, user_id, startStepValue, username, user, bot, update, leverageValue))
threads.append(t)
t.start()

(Django) Running asynchronous server task continously in the background

I want to let a class run on my server, which contains a connected bluetooth socket and continously checks for incoming data, which can then by interpreted. In principle the class structure would look like this:
Interpreter:
-> connect (initializes the class and starts the loop)
-> loop (runs continously in the background)
-> disconnect (stops the loop)
This class should be initiated at some point and then run continously in the background, from time to time a http request would perhaps need data from the attributes of the class, but it should run on its own.
I don't know how to accomplish this and don't want to get a description on how to do it, but would like to know where I should start, like how this kind of process is called.
Django on its own doesn't support any background processes - everything is request-response cycle based.
I don't know if what you're trying to do even has a dedicated name. But most certainly - it's possible. But don't tie yourself to Django with this solution.
The way I would accomplish this is I'd run a separate Python process, that would be responsible for keeping the connection to the device and upon request return the required data in some way.
The only difficulty you'd have is determining how to communicate with that process from Django. Since, like I said, django is request based, that secondary app could expose some data to your Django app - it could do any of the following:
Expose a dead-simple HTTP Rest API
Expose an UNIX socket that would just return data immediatelly after connection
Continuously dump data to some file/database/mmap/queue that Django could read

Python producer / consumer with data persistence in database?

I'm writing a producer / consumer to suit my needs in work.
Generally there's a producer thread which fetch some log from remote server, put it in the queue. And one or more consumer thread which read data from the queue and do some work. After that the data and the result both need to be saved (e.g. in sqlite3 db) for later analysis.
To make sure that each piece of log can be processed only once, every time before consuming the data, I have to query the database to see if it has been done. I wonder if there is a better way to accomplish this. If there are more than one consumer threads, database locking seems to be a problem.
Code relevant:
import Queue
import threading
import requests
out_queue = Queue.Queue()
class ProducerThread(threading.Thread):
def __init__(self, out_queue):
threading.Thread.__init__(self)
self.out_queue = out_queue
def run(self):
while True:
# Read remote log and put chunk in out_queue
resp = requests.get("http://example.com")
# place chunk into out queue and sleep for some time.
self.out_queue.put(resp)
time.sleep(10)
class ConsumerThread(threading.Thread):
def __init__(self, out_queue):
threading.Thread.__init__(self)
self.out_queue = out_queue
def run(self):
while True:
# consume the data.
chunk = self.out_queue.get()
# check whether chunk has been consumed before. query the database.
flag = query_database(chunk)
if not flag:
do_something_with(chunk)
# signals to queue job is done
self.out_queue.task_done()
# persist the data and other info insert to the database.
data_persist()
else:
print("data has been consumed before.")
def main():
# just one producer thread.
t = ProducerThread(out_queue)
t.setDaemon(True)
t.start()
for i in range(3):
ct = ConsumerThread(out_queue)
ct.setDaemon(True)
ct.start()
# wait on the queue until everything has been processed
out_queue.join()
main()
If the logs read remote server are not duplicated/repeated, then there is no need to check whether the logs are processed for multiple times, as Queue class implements all the required locking semantics and thus Queue.get() ensures a specific item could only be got by one ConsumerThread.
If the logs could be duplicated (I guess not), then you should do the checking in ProducerThread (before adding the logs to the queue), rather than the do checking in ConsumerThread. In this way, you don't need to consider locking.
update based on #dofine's confirmation on my understanding about the requirement in below comments:
For points #2 and #3, you may need a lightweight persistent queue such as FifoDiskQueue in queuelib. To be honest, I didn't use this lib before but I think it should work for you. Please check out the lib.
For point #1, I guess you can achieve it by using whatever a (non-memory) database, in combination with another queue of FifoDiskQueue:
The 2nd queue serves the purpose of re-queueing a log immediately if it fails to be processed by one consumer thread. Please see my first comment below for the idea
there is a single table in the db. The producer thread always adds new records to it, but never updates any records; and the consumer thread only updates those records it has picked from the queue
with above logic, you should never needs a lock the table
on application startup (prior to starting the consumers), you may have the producer query the db for those logs that are "lost" in track due to application's unexpected termination
this update is typed in mobile SO, so it is kind of inconvenient to extend it. If needed, I will update again when I get a chance

Initiate a parallel process from within a python script?

I'm building a telegram bot and for the start I used the structure from an example of the api wrapper. In the py script there is an infinite loop which is polling the telegram api to get new messages for the bot. And processes each new message one by one.
while True:
for update in bot.getUpdates(offset=LAST_UPDATE_ID, timeout=10):
chat_id = update.message.chat.id
update_id = update.update_id
if update.message.text:
#do things with the message \ start other functions and so on
What I foresee already now, is that with some messages\requests - i'll have a longer processing time and other messages, if the even came at the same time - will wait. For the user it will look like a delay in answering. Which boils down to a simple dependency: more user chatting = more delay.
I was thinking this: Can I have this main script bot.py run and check for new messages and each time a message arrived - this script will kickstart another script answer.py to do the processing of the message and reply.
And to start as many as needed those answer.py scripts in parallel.
I can also use bot.py to log all incoming things into DB with reference data about the user who is sending a message and then have another process processing all newly logged data and marking it as answered - but also then it should process each new entry parallel to each other.
I'm not a guru in python and is asking for some ideas and guidance on how to approach this? Thank you!
What you need are threads, or some frameworks that can handle many requests asynchronously, e.g. Twisted, Tornado, or asyncio in Python 3.4.
Here is an implementation using threads:
import threading
def handle(message):
##### do your response here
offset = None
while True:
for update in bot.getUpdates(offset=offset, timeout=10):
if update.message.text:
t = threading.Thread(target=handle, args=(update.message,))
t.start()
offset = update.update_id + 1
##### log the message if you want
This way, the call to handle() would not block, and the loop can go on handling the next message.
For more complicated situations, for example if you have to maintain states across messages from the same chat_id, I recommend taking a look at telepot, and this answer:
Handle multiple questions for Telegram bot in python
In short, telepot spawns threads for you, freeing you from worrying about the low-level details and letting you focus on the problem at hand.

Sync message to twitter in background in a web application

I'm writing an web app. Users can post text, and I need to store them in my DB as well as sync them to a twitter account.
The problem is that I'd like to response to the user immediately after inserting the message to DB, and run the "sync to twitter" process in background.
How could I do that? Thanks
either you choose zrxq's solution, or you can do that with a thread, if you take care of two things:
you don't tamper with objects from the main thread (be careful of iterators),
you take good care of killing your thread once the job is done.
something that would look like :
import threading
class TwitterThreadQueue(threading.Thread):
queue = []
def run(self):
while len(self.queue!=0):
post_on_twitter(self.queue.pop()) # here is your code to post on twitter
def add_to_queue(self,msg):
self.queue.append(msg)
and then you instanciate it in your code :
tweetQueue = TwitterThreadQueue()
# ...
tweetQueue.add_to_queue(message)
tweetQueue.start() # you can check if it's not already started
# ...

Categories

Resources