I am building an application in django that collects hotel information from various sources and format this data to a uniform format. There after I need to expose API to allow hotels access to web apps and devices using django-rest-framework.
So For example if I have 4 sources
[HotelPlus, xHotelService, HotelSignup, HotelSource]
So please let me know the best implementation practice in terms of django. Being a PHP developer, I prefer to do this by writing a custom third party services implementing an interface so adding more sources becomes easy. That way I only need to call execute() method from the cron task and rest is done by the service controller (fetching feed and populating it in database).
But I am new to python django, so I dont have much idea of creating services or middleware is a right fit for this task.
For fetching data from the sources you will need dedicated worker processes and broker so that your main django process won't be blocked. You can use celery for that and it already supports django.
After writing the tasks for fetching and formatting the data, you should need a scheduler to call this tasks periodically. You can use celery beat for that.
Related
We are thinking about creating a dynamic and scheduled data-fetching application from a number of data sources (rest API calls). The considerations are as follows
User shall be able to configure API/webservice endpoints, frequency of fetching, and response content type (can be JSON or CSV)
Once the user completes the configuration part, a job queue will be created programmatically.
A scheduler framework shall be used used to make requests to the endpoints and push the response into the respective queues. We are thinking of a queue here to preserve the order of the responses and also as an intermediate storage for the raw response from the endpoints.
The items stored in the queues shall be processed using python/pandas. We are planning to use a NoSQL DB storage for this.
Question
For this purpose is it better to use celery or RabbitMQ? We are thinking of using Celery as it has a relatively simple implementation.
Any thoughts on this is greatly appreciated.
Thank you.
I need some direction as to how to achieve the following functionality using Django.
I want my application to enable multiple users to submit jobs to make calls to an API.
Each user job will require multiple API calls and will store the results in a db or a file.
Each user should be able to submit multiple jobs.
In case of some failure such as network blocked or API not returning results I want the application to pause for a while and then resume completing that job.
Basically want the application to pickup from where it was left off.
Any ideas as to how I could implement this solution or any technologies such as celery I should be looking at or even if you can suggest an opensource project where I can learn how to perform this would be a great help.
You can do this with rabbitmq and celery.
This post might be helpful.
https://medium.com/#ffreitasalves/executing-time-consuming-tasks-asynchronously-with-django-and-celery-8578eebab356
I am trying to develop a multithreaded web server, it has the following task:
Collect data from various data sources (API calls), I was planning to do this using multiple threads.
Store the collected data in a memory data structure
Do some processing on the data structure using another thread
This data structure would be queried by the multiple clients; maybe I could also make separate threads for each client request.
Now regarding language and platform, I was considering either python or JAVA. I did some research on Flask framework for python, but I do not know how it will accommodate the multithreaded nature of web server.
Please suggest how I could achieve the above functionality in my project.
Flask, with some of the available addons, is very suited for what you want to do. Keep in mind that flask is pure python, and therefore you can access any of the excellent available python libraries.
As far as I understand what you have in mind, you can:
1- define a url that, when visited, executes the data gathering from external sources by means of, e.g. python-requests (http://docs.python-requests.org/en/latest/)
2- do the same periodically by scheduling the function above
3- store the collected data in a (e.g.) Redis database (which is memory based) or one of the many available databases (all of the nosql dbs have python bindings that you can access from a flask application)
4- define urls for the visiting clients to access the latest versions of the data. You will just need to define the data extraction functions (from redis or whatever you decide to use) and design a nice template to show them.
Flask/Werkzeug will take care of the multithreading necessary to handle simultaneous requests from different clients.
I am trying to design a web based app at the moment, that involves requests being made by users to trigger analysis of their previously entered data. The background analysis could be done on the same machine as the web server or be run on remote machines, and should not significantly impede the performance of the website, so that other users can also make analysis requests while the background analysis is being done. The requests should go into some form of queueing system, and once an analysis is finished, the results should be returned and viewable by the user in their account.
Please could someone advise me of the most efficient framework to handle this project? I am currently working on Linux, the analysis software is written in Python, and I have previously designed dynamic sites using Django. Is there something compatible with this that could work?
Given your background and the analysys code already being written in Python, Django + Celery seems like an obvious candidate here. We're currently using this solution for a very processing-heavy app with one front-end django server, one dedicated database server, and two distinct celery servers for the background processing. Having the celery processes on distinct servers keeps the djangon front responsive whatever the load on the celery servers (and we can add new celery servers if required).
So well, I don't know if it's "the most efficient" solution but it does work.
I am a data scientist and database veteran but a total rookie in web development and have just finished developing my first Ruby On Rails app. This app accepts data from users submitting data to my frontend webpage and returns stats on the data submitted. Some users have been submitting way too much data - its getting slow and I think I better push the data crunching to a backed python or java app, not a database. I don't even know where to start. Any ideas on how to best architect this application? The job flow is > data being submitted from the fronted app which pushes it to the > backend for my server app to process and > send back to my Ruby on Rails page. Any good tutorials that cover this? Please help!
What should I be reading up on?
I doesn't look like you need another app, but a different approach to how you process data. How about processing in background? There are several gems to accomplish that.
Are you sure your database is well maintained and efficient (good indexes, normalised, clean etc)
Or can you not make use of messaging queues, so you keep your rails crud app, then the jobs are just added to a queue. Python scripts on the backend (or different machine) read from the queue, process then insert back into the database or add results to a results queue or whereever you want to read them from