Exchangelib with multithreading python3 - python

I am trying to delete recoverable items in #OUTLOOK/#HOTMAIL many mails very fast
I was trying multithreading, but exactly the process of connecting to an mail account and deleting mails going in single thread, as I understand due to logs time
from exchangelib import DELEGATE, Account, Credentials
import threading
def main(login,password):
print('started',time.time())
credentials = Credentials(
username=login, # Or myusername#example.com for O365
password=password
)
a = Account(
primary_smtp_address=login,
credentials=credentials,
autodiscover=True,
access_type=DELEGATE
)
a.protocol.TIMEOUT=15
###Tried adding this, but no results
#a.protocol.SESSION_POOLSIZE=50
#a.protocol.CONNECTIONS_PER_SESSION=50
a.recoverable_items_deletions.empty()
a.protocol.close()
print('done',time.time())
for i in logins_passwords_dict:
threading.Thread(target=main, args=(i[0],i[1],))
the code works, but in single thread (time value is just for an example)
started 17000
started 17000
started 17000
done 17010
done 17020
done 17030
I am looking to speed up the process of deleting recoverable items

Check out the max_connections option: https://ecederstrand.github.io/exchangelib/#optimizing-connections
That will allow exchangelib to create a larger connection pool. If you use that approach, then you should only create one Account object and pass that to your main() method.
However, .empty() sends a single command to the server to delete the folder. The server does the actual work of emptying the folder. It will not help to run the command in parallel.
Finally, it's possible that you are not allowed to empty the recoverable items folder. Many servers have policies set up to prevent this, due to required archiving.

Related

Update single database value on a website with many users

For this question, I'm particularly struggling with how to structure this:
User accesses website
User clicks button
Value x in database increments
My issue is that multiple people could potentially be on the website at the same time and click the button - I want to make sure each user is able to click the button, and update the value and read the incremented value too, but I don't know how to circumvent any synchronisation/concurrency issues.
I'm using flask to run my website backend, and I'm thinking of using MongoDB or Redis to store my single value that needs to be updated.
Please comment if there is any lack of clarity in my question, but this is a problem I've really been struggling with how to solve.
Thanks :)
redis, I think you can use redis hincrby command, or create a distributed lock to make sure there is only one writer at the same time and only the lock holding writer can make the update in your flask framework. Make sure you release the lock after certain period of time or after the writer done using the lock.
mysql, you can start a transaction, and make the update and commit the change to make sure the data is right
To solve this problem I would suggest you follow a micro service architecture.
A service called worker would handle the flask route that's called when the user clicks on the link/button on the website. It would generate a message to be sent to another service called queue manager that maintains a queue of increment/decrement messages from the worker service.
There can be multiple worker service instances running concurrently but the queue manager is a singleton service that takes the messages from each service and adds them to the queue. If the queue manager is busy the worker service will either timeout and retry or return a failure message to the user. If the queue is full a response is sent back to the worker to retry n number of times, and you can count down that n.
A third service called storage manager is run every time the queue is not empty, this service sends the messages to the storage solution (whatever mongo, redis, good ol' sql) and it will ensure the increment/decrement messages are handled in the order they were received in the queue. You could also include a time stamp from the worker service in the message if you wanted to use that to sort the queue.
Generally whatever hosting environment for flask will use gunicorn as the production web server and support multiple concurrent worker instances to handle the http requests, and this would naturally be your worker service.
How you build and coordinate the queue manager and storage manager is down to implementation preference, for instance you could use something like Google Cloud pub/sub system to send messages between different deployed services but that's just off the top of my head. There's a load of different ways to do it, and you're in the best position to decide that.
Without knowing more details about what you're trying to achieve and what's the requirements for concurrent traffic I can't go into greater detail, but that's roughly how I've approached this type of problem in the past. If you need to handle more concurrent users at the website, you can pick a hosting solution with more concurrent workers. If you need the queue to be longer, you can pick a host with more memory, or else write the queue to an intermediate storage. This will slow it down but will make recovering from a crash easier.
You also need to consider handling when messages fail between different services, how to recover from a service crashing or the queue filling up.
EDIT: Been thinking about this over the weekend and a much simpler solution is to just create a new record in a table directly from the flask route that handles user clicks. Then to get your total you just get a count from this table. Your bottlenecks are going to be how many concurrent workers your flask hosting environment supports and how many concurrent connections your storage supports. Both of these can be solved by throwing more resources at them.

Python Boto3 Firehose Client cannot connect to endpoint after a while

I have a Python implemented service implemented on AWS ECS. Besides doing other things, it performs an AWS Firehose put_record for every operation. These "operations" are processed in parallel in multiple threads (many). Each thread eventually does the following:
try:
boto3.client("firehose").put_record(
DeliveryStreamName="my-delivery-stream",
Record={
'Data': json.dumps(observation) + "\n"
}
except Exception as error:
logging.error(error.args[0])
I get the following exception more often than not for the put_record call:
'Could not connect to the endpoint URL: "https://firehose.us-east-1.amazonaws.com/'
What am I doing wrong?
What happens here is that the "retry" mechanism goes into full bore and, apparently, holds onto the file descriptor. More quickly than later, I get a resource exhaustion on everything else caused by "Too many open files." So, all new connections start failing, not just the ones to Firehose. This situation, of course, renders the entire service useless.
Any insight in to what may be a solution?
Firstly, I thought it was the boto3 Firehose client, since I only had one for the entire process that was initialized on start up. So, I thought, maybe, this boto3 thing was not thread safe. So, I switched to the approach above getting a new client for each put_record. That new client per operation approach was not any better than the single client per process approach.
Note, that it does work sometimes and some put_record calls do get through. So, it is NOT a firewall or AWS security group issue. Nor, is it an AWS credentials issue.
Is there some restriction or anomaly with Firehose or boto3 of which I am unaware?

How to open a new pyghmi Session via pyghmi.impi.command.Command after the previous one has timed out?

I'm having some issues with the pyghmi python library, which is used for sending IPMI commands with python scripts. My goal is to implement an HTTP API to send IPMI commands through HTTP requests.
I am already able to create a Session and send a few commands with the library, but if the Session remains IDLE for 30 seconds, it logged itself out.
When the Session is logged out, I can't create a new one : I get an error "Session is logged out", or a deadlock.
How can I do if I want to have a server that is always up and create Session when it receives requests, if I can't create new Session when the previous one is logged out ?
What I've tried :
from pyghmi.ipmi import command
ipmi = command.Command(ip, user, passwd)
res = ipmi.get_power()
print(res)
# wait 30 seconds
res2 = ipmi.get_power() # get "Session logged out" error
ipmi2 = command.Command(ip, user, paswd) # Deadlock if wait < 30 seconds, else no error
res3 = ipmi2.get_power() # get "Session logged out" error
# Impossible to create new command.Command() Session, every command will give "logged out" error
The other problem is that I can't use the asynchronous way by giving an "onlogon callback" function in the command.Command() call, because I will need the callback return value in the caller and that's not possible with this sort of thread behavior.
Edit: I already tried some examples provided here but it's always one-time run scripts, whereas I'm looking for something that can stay "up" forever.
So I finally achieved a sort of solution. I emailed the Pyghmi's main contributor and he said that this lib was not suited for a multi- and reuseable- Session implementation (there is currently an open issue "Session reuse" on Pyghmi repository).
First "solution": use processes
My goal was to create an HTTP API. To avoid the Session timeout issue, I create a new Process (not Thread) for every new request. That works fine, but I did not keep this solution because it is to heavy and sockets consuming. It seems that by creating processes, the memory used by Pyghmi is not shared between processes (that's the goal of processes) so every Session utilisation is not a reuse but a creation.
Second "solution" : use Confluent
Confluent is a tool developed by Lenovo that allow to control hardware via HTTP. It uses a sort of patched version of Pyghmi as backend for IPMI calls. Confluent documentation here.
Once installed and configured on a server, Confluent worked well to control IPMI devices via HTTP. I packaged it in a Docker image along with an ipmi_simulator for testing purposes : confluent dockerized.
The solution today is to run Command.eventloop() after creating the connection. It is documented in ipmi/command.py, which has a very trivial Housekeeper class which in the current version 1.5.53 is actually just a renamed Thread class, with no additional features. It merely runs the eventloop.
The implementation looks like this. One of those mentioned house keeping tasks is sending keepalive messages, if enabled which it is by default and can be influence by supplying keepalive=True at Command instantiation:
class Housekeeper(threading.Thread):
"""A Maintenance thread for housekeeping
Long lived use of pyghmi may warrant some recurring asynchronous behavior.
This stock thread provides a simple minimal context for these housekeeping
tasks to run in. To use, do 'pyghmi.ipmi.command.Maintenance().start()'
and from that point forward, pyghmi should execute any needed ongoing
tasks automatically as needed. This is an alternative to calling
wait_for_rsp or eventloop in a thread of the callers design.
"""
def run(self):
Command.eventloop()

Data buffering/storage - Python

I am writing an embedded application that reads data from a set of sensors and uploads to a central server. This application is written in Python and runs on a Rasberry Pi unit.
The data needs to be collected every 1 minute, however, the Internet connection is unstable and I need to buffer the data to a non volatile storage (SD-card) etc. whenever there is no connection. The buffered data should be uploaded as and when the connection comes back.
Presently, I'm thinking about storing the buffered data in a SQLite database and writing a cron job that can read the data from this database continuously and upload.
Is there a python module that can be used for such feature?
Is there a python module that can be used for such feature?
I'm not aware of any readily available module, however it should be quite straight forward to build one. Given your requirement:
the Internet connection is unstable and I need to buffer the data to a non volatile storage (SD-card) etc. whenever there is no connection. The buffered data should be uploaded as and when the connection comes back.
The algorithm looks something like this (pseudo code):
# buffering module
data = read(sensors)
db.insert(data)
# upload module
# e.g. scheduled every 5 minutes via cron
data = db.read(created > last_successful_upload)
success = upload(data)
if success:
last_successful_upload = max(data.created)
The key is to seperate the buffering and uploading concerns. I.e. when reading data from the sensor don't attempt to immediately upload, always upload from the scheduled module. This keeps the two modules simple and stable.
There are a few edge cases however that you need to concern yourself with to make this work reliably:
insert data while uploading is in progress
SQLlite doesn't support being accessed from multiple processes well
To solve this, you might want to consider another database, or create multiple SQLite databases or even flat files for each batch of uploads.
If you mean a module to work with SQLite database, check out SQLAlchemy.
If you mean a module which can do what cron does, check out sched, a python event scheduler.
However, this looks like a perfect place to implemet a task queue --using a dedicated task broker (rabbitmq, redis, zeromq,..), or python's threads and queues. In general, you want to submit an upload task, and worker thread will pick it up and execute, while the task broker handles retries and failures. All this happens asynchronously, without blocking your main app.
UPD: Just to clarify, you don't need the database if you use a task broker, because a task broker stores the tasks for you.
This is only database work. You can create a master and slave databases in different locations and if one is not on the network, will run with the last synched info.
And when the connection came back hr merge all the data.
Take a look in this answer and search for master and slave database

GAE Request Timeout when user uploads csv file and receives new csv file as response

I have an app on GAE that takes csv input from a web form and stores it to a blob, does some stuff to obtain new information using input from the csv file, then uses csv.writer on self.response.out to write a new csv file and prompt the user to download it. It works well, but my problem is if it takes over 60 seconds it times out. I've tried to setup the do some stuff part as a task in task queue, and it would work, except I can't make the user wait while this is running, and there's no way of calling the post that would write out the new csv file automatically when the task queue is complete, and having the user periodically push a button to see if it is done is less than optimal.
Is there a better solution to a problem like this other than using the task queue and having the user have to manually push a button periodically to see if the task is complete?
You have many options:
Use a timer in your client to check periodically (i.e. every 15 seconds) if the file is ready. This is the simplest option that requires only a few lines of code.
Use the Channel API. It's elegant, but it's an overkill unless you face similar problems frequently.
Email the results to the user.
If your problem is 60s limit for requests, you could consider to use App Engine Modules that allow you to control scaling type of a module/version. Basically there are three scaling types available.
Manual Scaling
Such a module runs continuously. Requests can run indefinitely.
Basic Scaling
Such a module creates an instance when the application receives a request. The instance will be turned down when the app becomes idle. Requests can run indefinitely.
Automatic Scaling
The same scaling policy that App Engine has used since its inception. It is based on request rate, response latencies, and other application metrics. There is 60-second deadline for HTTP requests.
You can find more details here.

Categories

Resources