I am wondering about the available solutions to my problem. Need to retrieve data from API every (preferably) 200ms and save this data to the database since it will still be processed by another service. My solution I wanted to base on RabbitMQ and task queuing. That is, my API from which you can send or delete a task that fetches data every 200ms and adds it to the database. There may be several such tasks though not many. While I know that the latency associated with the database can not be avoided, I do not know if the solution with RabbitMQ is optimal in this case. Maybe someone has experience and can suggest better solutions to this problem ? My API is based on python and FastAPI.
Related
We are thinking about creating a dynamic and scheduled data-fetching application from a number of data sources (rest API calls). The considerations are as follows
User shall be able to configure API/webservice endpoints, frequency of fetching, and response content type (can be JSON or CSV)
Once the user completes the configuration part, a job queue will be created programmatically.
A scheduler framework shall be used used to make requests to the endpoints and push the response into the respective queues. We are thinking of a queue here to preserve the order of the responses and also as an intermediate storage for the raw response from the endpoints.
The items stored in the queues shall be processed using python/pandas. We are planning to use a NoSQL DB storage for this.
Question
For this purpose is it better to use celery or RabbitMQ? We are thinking of using Celery as it has a relatively simple implementation.
Any thoughts on this is greatly appreciated.
Thank you.
A bit of the context:
want to write an algorithm that accepts tickets from the client. Sorts them on some constraints, handles them, and replies back to the client with the results.
I did some research and though REST API of python is a good idea. But as I explored it I found out that, It usually is built to handle one request at once.
Is there a way to add tasks(REST API requests) to the queue, sort them and execute them with workers and reply back to clients once processing is done?
I can suggest three ways to do that.
Try to use database to store the request content, constraint and status as 'pending'. Later, when you want to trigger the processing of the requests just retrieve them in sorted order by your constraint and update the status to 'processed'.
You can use Redis task queue with flask. See the article. https://realpython.com/flask-by-example-implementing-a-redis-task-queue/
You can also use Celery module with Flask. See the documentation. https://flask.palletsprojects.com/en/1.1.x/patterns/celery/
I don't know if this is the right place to ask but, i am desperate for an answer.
The problem in hand here is not the number of requests but, the amount of time one single request will take. For each request, the server has to query about 12 different sources for data and it can take upto 6 hours for server to get the data (let's leave request timeout from this because, this is not the server directly communicating with the client. This server is fetching messages from kafka and then starts getting the data from the sources). I am supposed to come up with a scalable solution. Can anyone help me with this?
The problem don't end here:
Once the server gets the data, he has to push to kafka for further computation using spark. Streaming api will be used in this part.
I am open to any web framework or any scaling solution in python.
I have a desktop python application whose data backend is a MySQL database, but whose previous database was a network-accessed xml file(s). When it was xml-powered, I had a thread spawned at the launch of the application that would simply check the xml file for changes and whenever the date modified changed (due to any user updating it), the app would refresh itself so multiple users could use and see the changes of the app as they went about their business.
Now that the program has matured and is venturing toward an online presence so it can be used anywhere. Xml is out the window and I'm using MySQL with SQLAlchemy as the database access method. The plot thickens, however, because the information is no longer stored in one xml file but rather it is split into multiple tables in the SQL database. This complicates the idea of some sort of 'last modified' table value or structure. Thus the question, how do you inform the users that the data has changed and the app needs to refresh? Here are some of my thoughts:
Each table needs a last-modified column (this seems like the worst option ever)
A separate table that holds some last modified column?
Some sort of push notification through a server?
It should be mentioned that I have the capability of running perhaps a very small python script on the same server hosting the SQL db that perhaps the app could connect to and (through sockets?) it could pass information to and from all connected clients?
Some extra information:
The information passed back and forth would be pretty low-bandwidth. Mostly text with the potential of some images (rarely over 50k).
Number of clients at present is very small, in the tens. But the project could be picked up by some bigger companies with client numbers possibly getting into the hundreds. Even still the bandwidth shouldn't be a problem for the foreseeable future.
Anyway, somewhat new territory for me, so what would you do? Thanks in advance!
As I understand this is not a client-server application, but rather an application that has a common remote storage.
One idea would be to change to web services (this would solve most of your problems on the long run).
Another idea (if you don't want to switch to web) is to refresh periodically the data in your interface by using a timer.
Another way (and more complicated) would be to have a server that receives all the updates, stores them in the database and then pushes the changes to the other connected clients.
The first 2 ideas you mentioned will have maintenance, scalability and design uglyness issues.
The last 2 are a lot better in my opinion, but I still stick to web services as being the best.
I was wondering what is the 'best' way of passing data between views. Is it better to create invisible fields and pass it using POST or should I encode it in my URLS? Or is there a better/easier way of doing this? Sorry if this question is stupid, I'm pretty new to web programming :)
Thanks
There are different ways to pass data between views. Actually this is not much different that the problem of passing data between 2 different scripts & of course some concepts of inter-process communication come in as well. Some things that come to mind are -
GET request - First request hits view1->send data to browser -> browser redirects to view2
POST request - (as you suggested) Same flow as above but is suitable when more data is involved
Django session variables - This is the simplest to implement
Client-side cookies - Can be used but there is limitations of how much data can be stored.
Shared memory at web server level- Tricky but can be done.
REST API's - If you can have a stand-alone server, then that server can REST API's to invoke views.
Message queues - Again if a stand-alone server is possible maybe even message queues would work. i.e. first view (API) takes requests and pushes it to a queue and some other process can pop messages off and hit your second view (another API). This would decouple first and second view API's and possibly manage load better.
Cache - Maybe a cache like memcached can act as mediator. But then if one is going this route, its better to use Django sessions as it hides a whole lot of implementation details but if scale is a concern, memcached or redis are good options.
Persistent storage - store data in some persistent storage mechanism like mysql. This decouples your request taking part (probably a client facing API) from processing part by having a DB in the middle.
NoSql storages - if speed of writes are in other order of hundreds of thousands per sec, then MySql performance would become bottleneck (there are ways to get around by tweaking mysql config but its not easy). Then considering NoSql DB's could be an alternative. e.g: dynamoDB, Redis, HBase etc.
Stream Processing - like Storm or AWS Kinesis could be an option if your use-case is real-time computation. In fact you could use AWS Lambda in the middle as a server-less compute module which would read off and call your second view API.
Write data into a file - then the next view can read from that file (real ugly). This probably should never ever be done but putting this point here as something that should not be done.
Cant think of any more. Will update if i get any. Hope this helps in someway.