I have a following architecture:
Main Control Unit(MCU) here must: run TCP/IP server for communication with robot; run Redis Database; be able to run several data processing programs (where data sent by robot or obtained from Redis is used). External Control Unit is connected to Redis DB.
I have to write program for MCU. Ideally it must be able to perform asynchronously following tasks:
Get request from robot and pass it to Redis DB, so External Control Unit can react to the signal appearance and start acquiring data from sensor (Then publishing this data to Redis DB).
Get request from robot to start receiving data from robot (Then publish this data to Redis DB).
React to the appearance of data from External Control Unit in Redis DB. This must force Main Control Unit to start data processing program using obtained sensor data.
Get request from robot to send resultant data to robot.
This is a simplified version of the system, since there will be more External Control Units with different sensors. But most of MCU tasks are described.
For now I can ensure data transmission between MCU and robot. And I'm pretty familiar with publishing/subscribing techniques of Redis DB.
But I'm struggling with choosing proper technology/paradigm to program MCU, asynchronous/multithreading/multiprocessing programming? Where to dig?
Addition to question:
Which paradigm (asynchronous/multithreading/multiprocessing programming) is better to use to implement such a behavior of MCU:
MCU receive request from robot to start up computer vision routine (it takes around 30-40 seconds to be finished). The routine can be started only if necessary data are found in redis DB. So, may be MCU must wait until ExtXU end publishing this data into DB, may be not. After the routine has been started, during 30-40 seconds of CV routine run, another request from robot can come, and it must be processed while CV routine is running.
I studied today an asyncio python module, and in my mind it's not suitable to implement what I want. It is ideal for processing multiple client requests on server, or trying to get some data from different servers from one client. A coroutines lock/wait is essential, so program can do something else while coroutine waits something. But my CV routine do not wait anything, it just runs.
Related
I've built a simple application using Django Channels where a Consumer receives data from an external service, the consumer than sends this data to some subscribers on another consumer.
I'm new to websockets and i had a little concern: the consumer is receiving a lot of data, in the order of 100 (or more) JSON records per second. At what point should i be worried about this service crashing or running into performance issues? Is there some sort of limit for what i'm doing?
there is not explicit limit, however it is worth nothing that for each instances (open connection) of the consumer you can only process one WS message at once.
So if you have a single websocket connection and are sending lots and lots of WS messages down that connection if the consumer does work on these (eg write them to the db) the queue of messages might fill up and you will get an error.
their are a few solutions to this,
Open multiple ws connections and share out the load
in your consumer before doing any work that will take time put it onto a work queue and have some background tasks (that you do not await) consume it.
For this second option it is probably a good idea to create this background queue in your on_connect method and handle shutting it down/waiting for it to flush everything in the on disconnect method.
--
if you are expecting a massive amount of data and don't want to fork out for a costly (high memory) VM you might be better of using a server that is no written in python.
My suggestion for a python developer would be https://docs.vapor.codes/4.0/websockets/ this is a Swift server framework, Swift linguistically if very close to TypeAnotated python so is easier than other high performance options to pick up for a python dev.
I need your opinion on a challenge that I'm facing. I'm building a website that uses Django as a backend, PostgreSQL as my DB, GraphQL as my API layer and React as my frontend framework. Website is hosted on Heroku. I wrote a python script that logs me in to my gmail account and parse few emails, based on pre-defined conditions, and store the parsed data into Google Sheet. Now, I want the script to be part of my website in which user will specify what exactly need to be parsed (i.e. filters) and then display the parsed data in a table to review accuracy of the parsing task.
The part that I need some help with is how to architect such workflow. Below are few ideas that I managed to come up with after some googling:
generate a graphQL mutation that stores a 'task' into a task model. Once a new task entry is stored, a Django Signal will trigger the script. Not sure yet if Signal can run custom python functions, but from what i read so far, it seems doable.
Use Celery to run this task asynchronously. But i'm not sure if asynchronous tasks is what i'm after here as I need this task to run immediately after the user trigger the feature from the frontend. But i'm might be wrong here. I'm also not sure if I need Redis to store the task details or I can do that on PostgreSQL.
What is the best practice in implementing this feature? The task can be anything, not necessarily parsing emails; it can also be importing data from excel. Any task that is user generated rather than scheduled or repeated task.
I'm sorry in advance if this question seems trivial to some of you. I'm not a professional developer and the above project is a way for me to sharpen my technical skills and learn new techniques.
Looking forward to learn from your experiences.
You can dissect your problem into the following steps:
User specifies task parameters
System executes task
System displays result to the User
You can either do all of these:
Sequentially and synchronously in one swoop; or
Step by step asynchronously.
Synchronously
You can run your script when generating a response, but it will come with the following downsides:
The process in the server processing your request will block until the script is finished. This may or may not affect the processing of other requests by that same server (this will depend on the number of simultaneous requests being processed, workload of the script, etc.)
The client (e.g. your browser) and even the server might time out if the script takes too long. You can fix this to some extent by configuring your server appropriately.
The beauty of this approach however is it's simplicity. For you to do this, you can just pass the parameters through the request, server parses and does the script, then returns you the result.
No setting up of a message queue, task scheduler, or whatever needed.
Asynchronously
Ideally though, for long-running tasks, it is best to have this executed outside of the usual request-response loop for the following advantages:
The server responding to the requests can actually serve other requests.
Some scripts can take a while, some you don't even know if it's going to finish
Script is no longer dependent on the reliability of the network (imagine running an expensive task, then your internet connection skips or is just plain intermittent; you won't be able to do anything)
The downside of this is now you have to set more things up, which increases the project's complexity and points of failure.
Producer-Consumer
Whatever you choose, it's usually best to follow the producer-consumer pattern:
Producer creates tasks and puts them in a queue
Consumer takes a task from the queue and executes it
The producer is basically you, the user. You specify the task and the parameters involved in that task.
This queue could be any datastore: in-memory datastore like Redis; a messaging queue like RabbitMQ; or an relational database management system like PostgreSQL.
The consumer is your script executing these tasks. There are multiple ways of running the consumer/script: via Celery like you mentioned which runs multiple workers to execute the tasks passed through the queue; via a simple time-based job scheduler like crontab; or even you manually triggering the script
The question is actually not trivial, as the solution depends on what task you are actually trying to do. It is best to evaluate the constraints, parameters, and actual tasks to decide which approach you will choose.
But just to give you a more relevant guideline:
Just keep it simple, unless you have a compelling reason to do so (e.g. server is being bogged down, or internet connection is not reliable in practice), there's really no reason to be fancy.
The more blocking the task is, or the longer the task takes or the more dependent it is to third party APIs via the network, the more it makes sense to push this to a background process add reliability and resiliency.
In your email import script, I'll most likely push that to the background:
Have a page where you can add a task to the database
In the task details page, display the task details, and the result below if it exists or "Processing..." otherwise
Have a script that executes tasks (import emails from gmail given the task parameters) and save the results to the database
Schedule this script to run every few minutes via crontab
Yes the above has side effects, like crontab running the script in multiple times at the same time and such, but I won't go into detail without knowing more about the specifics of the task.
I have just started with Python, although I have been programming in other languages over the past 30 years. I wanted to keep my first application simple, so I started out with a little home automation project hosted on a Raspberry Pi.
I got my code to work fine (controlling a valve, reading a flow sensor and showing some data on a display), but when I wanted to add some web interactivity it came to a sudden halt.
Most articles I have found on the subject suggest to use the Flask framework to compose dynamic web pages. I have tried, and understood, the basics of Flask, but I just can't get around the issue that Flask is blocking once I call the "app.run" function. The rest of my python code waits for Flask to return, which never happens. I.e. no more water flow measurement, valve motor steering or display updating.
So, my basic question would be: What tool should I use in order to serve a simple dynamic web page (with very low load, like 1 request / week), in parallel to my applications main tasks (GPIO/Pulse counting)? All this in the resource constrained environment of a Raspberry Pi (3).
If you still suggest Flask (because it seems very close to target), how should I arrange my code to keep handling the real-world events, such as mentioned above?
(This last part might be tough answering without seeing the actual code, but maybe it's possible answering it in a "generic" way? Or pointing to existing examples that I might have missed while searching.)
You're on the right track with multithreading. If your monitoring code runs in a loop, you could define a function like
def monitoring_loop():
while True:
# do the monitoring
Then, before you call app.run(), start a thread that runs that function:
import threading
from wherever import monitoring_loop
monitoring_thread = threading.Thread(target = monitoring_loop)
monitoring_thread.start()
# app.run() and whatever else you want to do
Don't join the thread - you want it to keep running in parallel to your Flask app. If you joined it, it would block the main execution thread until it finished, which would be never, since it's running a while True loop.
To communicate between the monitoring thread and the rest of the program, you could use a queue to pass messages in a thread-safe way between them.
The way I would probably handle this is to split your program into two distinct separately running programs.
One program handles the GPIO monitoring and communication, and the other program is your small Flask server. Since they run as separate processes, they won't block each other.
You can have the two processes communicate through a small database. The GIPO interface can periodically record flow measurements or other relevant data to a table in the database. It can also monitor another table in the database that might serve as a queue for requests.
Your Flask instance can query that same database to get the current statistics to return to the user, and can submit entries to the requests queue based on user input. (If the GIPO process updates that requests queue with the current status, the Flask process can report that back out.)
And as far as what kind of database to use on a little Raspberry Pi, consider sqlite3 which is a very small, lightweight file-based database well supported as a standard library in Python. (It doesn't require running a full "database server" process.)
Good luck with your project, it sounds like fun!
Hi i was trying the connection with dronekit_sitl and i got the same issue , after 30 seconds the connection was closed.To get rid of that , there are 2 solutions:
You use the decorator before_request:in this one you define a method that will handle the connection before each request
You use the decorator before_first_request : in this case the connection will be made once the first request will be called and the you can handle the object in the other route using a global variable
For more information https://pythonise.com/series/learning-flask/python-before-after-request
I am trying to build a temperature control module that can be controlled over a network or with manual controls. the individual parts of my program all work but I'm having trouble figuring out how to make them all work together.also my temperature control module is python and the client is C#.
so far as physical components go i have a keypad that sets a temperature and turns the heater on and off and an lcd screen that displays temperature data and of course a temperature sensor.
for my network stuff i need to:
constantly send temperature data to the client.
send a list of log files to the client.
await prompts from the client to either set the desired temperature or send a log file to the client.
so far all the hardware works fine and each individual part of the network functions work but not together. I have not tried to use both physical and network components.
I have been attempting to use threads for this but was wondering if i should be using something else?
EDIT:
here is the basic logic behind what i want to do:
Hardware:
keypad takes a number inputs until '*' it then sets a temp variable.
temp variable is compared to sensor data and the heater is turned on or off accordingly.
'#' turns of the heater and sets temp variable to 0.
sensor data is written to log files while temp variable is not 0
Network:
upon client connect the client is sent a list of log files
temperature sensor data is continuously sent to client.
prompt handler listens for prompts.
if client requests log file the temperature data is halted and the file sent after which the temperature data is resumed.
client can send a command to the prompt handler to set the temp variable to trigger the heater
client can send a command to the prompt handler to stop the heater and set temp variable to 0
commands from either the keypad or client should work at all times.
Multiprocessing is generally for when you want to take advantage of the computational power of multiple processing cores. Multiprocessing limits your options on how to handle shared state between components of your program, as memory is copied initially on process creation, but not shared or updated automatically. Threads execute from the same region of memory, and do not have this restriction, but cannot take advantage of multiple cores for computational performance. Your application does not sound like it would require large amounts of computation, and simply would benefit from concurrency to be able to handle user input, networking, and a small amount of processing at the same time. I would say you need threads not processes. I am not experienced enough with asyncio to give a good comparison of that to threads.
Edit: This looks like a fairly involved project, so don't expect it to go perfectly the first time you hit "run", but definitely very doable and interesting.
Here's how I would structure this project...
I see effectively four separate threads here (maybe small ancillary dameon threads for stupid little tasks)
I would have one thread acting as your temperature controller (PID control / whatever) that has sole control of the heater output. (other threads get to make requests to change setpoint / control mode (duty cycle / PID))
I would have one main thread (with a few dameon threads) to handle the data logging: Main thead listens for logging commands (pause, resume, get, etc.) dameon threads to poll thermometer, rotate log files, etc..
I am not as familiar with networking, and this will be specific to your client application, but I would probably get started with http.server just for prototyping, or maybe something like websockets and a little bit of asyncio. The main thing is that it would interact with the data logger and temperature controller threads with getters and setters rather than directly modifying values
Finally, for the keypad input, I would likely just make up a quick tkinter application to grab keypresses, because that's what I know. Again, form a request with the tkinter app, but don't modify values directly; use getters and setters when "talking" between threads. It just keeps things better organized and compartmentalized.
I am writing an embedded application that reads data from a set of sensors and uploads to a central server. This application is written in Python and runs on a Rasberry Pi unit.
The data needs to be collected every 1 minute, however, the Internet connection is unstable and I need to buffer the data to a non volatile storage (SD-card) etc. whenever there is no connection. The buffered data should be uploaded as and when the connection comes back.
Presently, I'm thinking about storing the buffered data in a SQLite database and writing a cron job that can read the data from this database continuously and upload.
Is there a python module that can be used for such feature?
Is there a python module that can be used for such feature?
I'm not aware of any readily available module, however it should be quite straight forward to build one. Given your requirement:
the Internet connection is unstable and I need to buffer the data to a non volatile storage (SD-card) etc. whenever there is no connection. The buffered data should be uploaded as and when the connection comes back.
The algorithm looks something like this (pseudo code):
# buffering module
data = read(sensors)
db.insert(data)
# upload module
# e.g. scheduled every 5 minutes via cron
data = db.read(created > last_successful_upload)
success = upload(data)
if success:
last_successful_upload = max(data.created)
The key is to seperate the buffering and uploading concerns. I.e. when reading data from the sensor don't attempt to immediately upload, always upload from the scheduled module. This keeps the two modules simple and stable.
There are a few edge cases however that you need to concern yourself with to make this work reliably:
insert data while uploading is in progress
SQLlite doesn't support being accessed from multiple processes well
To solve this, you might want to consider another database, or create multiple SQLite databases or even flat files for each batch of uploads.
If you mean a module to work with SQLite database, check out SQLAlchemy.
If you mean a module which can do what cron does, check out sched, a python event scheduler.
However, this looks like a perfect place to implemet a task queue --using a dedicated task broker (rabbitmq, redis, zeromq,..), or python's threads and queues. In general, you want to submit an upload task, and worker thread will pick it up and execute, while the task broker handles retries and failures. All this happens asynchronously, without blocking your main app.
UPD: Just to clarify, you don't need the database if you use a task broker, because a task broker stores the tasks for you.
This is only database work. You can create a master and slave databases in different locations and if one is not on the network, will run with the last synched info.
And when the connection came back hr merge all the data.
Take a look in this answer and search for master and slave database