python celery for every work is efficient? - python

I'm making API server with python flask.
In my case, it is real production level, So I have to be careful when developing server.
After google searching, found that celery&redis is suitable for task queueing.
So I installed celery&redis via pip3 install 'celery[redis]' and defined task, and run.
Everything was fine, but I got some question about it.
Assume that there is user model. Maybe CRUD for user model like this.
Register user(with photo)
Delete user
Get a single user
In my personal think, only Register user need to celery&redis.
Because upload photo can take long time, so it have to treated with asynchronize work.
Delete user and Get a slngle user just query to db and retreive it.
So it doesn't takes longer time. (it means, do not need to work with celery)
It is right? Or, any missing feature I do not know?
To summarize my question, I want to know that is there any standard for celery?
Thanks!

You have it about right. You can put whatever processing you want in celery, but the rule that you just used--use celery for things that take a long time--is the one we use most in our production environment. You can also use celery when you want to scale out an operation across servers more easily. For example, when scraping a large number of pages, you might want to execute that in parallel to speed up what would otherwise be a long-running task.

I do think there is a great tutorial about this topic.
using-celery-with-flask
And you can also check out this repo.

Related

How do I deploy this app for my job: EC2, Elastic Beanstalk, something else entirely?

I'm tasked with creating a web app (I think?) for my job that will tracker something in our system. It'll be an internal tool that staff uses to keep track of the status of one of the things we do. It should look like trello, with cards that drag from step to step. That frontend exists, but my job is to make the system update when the cards are dragged. This requires using an API in Python and isn't that complicated to grab from/update. I have no idea how to put all of this together. My job is almost completely nontechnical and there's no one internally who knows what I'm doing except for me. I'm in so over my head here and have no idea where to begin. Is this something I should deploy on Elastic Beanstalk? EC2? How do I tie this together and put it somewhere?
Are you trying to pull in live data from Trello or from your companies own internal project management tool?
An EC2 might be useful, but honestly, it may be completely unnecessary if your company has its own servers. An EC2 is basically just a collection of rental computers to help with scaling. I have never used beanstalk so my input would be useless there.
From what I can assume from the question, you could have a python script running to pull from the API and make the changes without an EC2.
First thing you should do is gather as much information about what the end product should look like. From your question, I have the feeling that you have only a vague idea of what the stakeholders want. Don't be afraid to ask more clarification about an unclear task. It's better to spend 30 minutes discussing and taking note than to show the end-product after a month and realizing that's not what your boss/team wanted.
Question I would Ask
Who is going to be using this app? (technical or non-technical person)
For what purpose is this being developed?
Does it need to be on the web or can it be used locally?
How many users need to have access to this application?
Are we handling sensitive information with this application?
Will this need to be augmented with other functionality at some point?
This is just a sample of what I would ask, during the conversation with the stakeholder a lot more will pop up for sure.
What I think you have to do
You need to make a monitoring system for the tasks that need to be done by your development team (like a Kanban)
What I think you already have
A frontend with the card that are draggable to each bin. I also assume that you can create a new card and delete one in the frontend. The frontend is most likely written in React, Angular or Vue.js. You might also have no frontend framework (a mix of jQuery and vanilla js), but usually frontend developper end up picking a framework of sort to help the development.
A backend API in Python (in Flask or with Django-rest-framework most likely) that is communicating with a SQL database like postgresql or a Document database like MongoDB.
I'm making a lot of assumption here, but your aim should be to understand the technology you will be working with in order to check which hosting would work best. For instance, if the database that is setup is a MySQL database you might have some trouble with some hosting provider.
What I think you are missing
Currently the frontend and the backend don't communicate to each other. When you drag a card it won't persist if you refresh the page. Also, all of this is sitting in your computer and cannot be used by any one from your staff. You need to first connect the frontend with the backend so that the application has persistance. Then you need to deploy this application somewhere so that it is reachable by your staff.
What I would do is first work locally to make sure that the layer of persistance is working. This imply having the API server, the frontend server and the database server running simultaneously on your computer to develop. You should then fetch data from the API to know which cards are there in the database and then create them visually in your frontend at the right spot.
When you drop a card to a new spot after having dragging it should trigger a POST request to your API server in order to update the status of this particular card (look at the documentation of your API to check what you need to send).
The server should be sending back an updated version of the cards status if the POST request was sucessful, so your application should then just redraw the card at the right spot (it won't make a difference for you since they are already at the right spot and your frontend framework will most likely won't act on this change since the state hasn't changed). That's all I would do for that part.
I would then move to the deployment phase to make sure that whatever you did locally can still work online. I would use Heroku to start instead of jumping directly to AWS. Heroku is a service built on top of AWS which manage a lot of the complexity of AWS for you. This is great for prototyping and it means that when your stuff is ready you can migrate to AWS easily and be confident that a setup exist to make your app work. You might also be tied up to your company servers, which is another thing I would ask to the stakeholder (i.e. where can I put this application and where I can't put it).
The flow for a frontend + api + database application on Heroku is usually as follow. You create a github repo for your frontend (make it private) and you create an app on Heroku that will watch this repository for changes. It will re-deploy the application for you when it sees a change at a specific subdomain of Heroku hosting. You will need to configure some procfiles that will tell Heroku what to do with a given application type. This is where you need to double check what frontend you are using since that might change the procfiles used. It's most likely a node.js based frontend (React, Angular or Vue) so head over here for the documentation of how to put that online.
You will need to make a repo for the backend also that is separate from the frontend, these two entities are distinct and they only communicate through HTTP request (frontend->backend) and JSON (backend->frontend). You will need to follow the same idea as with the frontend to deploy, head over here.
Once you have these two online, you need to create a database on Heroku. This is done by adding a datastore to your api, head over here. There are some framework specific configuration you need to do to make the API talk to an online database, but then you will need to find that configuration on the framework documentation. The database could also be already up and living on your server, if this is the case you just need to configure your online backend to talk to that particular database at a particular address.
Once all of the above is done, re-test your application to check if you get the same behavior as before. This is a usable MVP, however there are no layer of security. Anyone with the right URL could just fetch your frontend and start messing around with your data.
There is more engineering that need to be done to make this a viable end product. This leads us to my final remark: why you are not using a product like Trello, Jira, or even Github Project? If it is to save some money on not paying for a subscription I think you should factor in the cost of development, security and maintenance of this application.
Hope it helps!
One simple option is Heroku for deploy your API and your frontend application.

GAE: Best practice for dynamically generated projects

Let's say I am creating a python-based CMS on GAE (similar to Squarespace/Shopify) which allows users to create a website.
The platform will (automatically?) create a subdomain for each new user and duplicates the application.
Now there are two options:
1) Create a new Database for the new user, WITHIN the master GAE project. (I'm worried that if one user gets a lot of traffic it might slow down ALL websites.)
2) Duplicate the entire project. (This method seems difficult to accomplish because either I have to manually create an instance of the application for each user, or I have to figure out how to hijack gcloud.py (or appcfg.py) somehow and store my login credentials in the code.)
Which choice will most likely provide the most performance for the price? Is choice 2 allowed by Google (or even possible)?
Edit:
I've done some more research about this, and it's not documented very much. I found this in the docs https://cloud.google.com/sdk/docs/scripting-gcloud which talks about running gcloud from scripts, although I don't think that means from python. I am looking into appengine-jenkins to see if it will work for my purpose. Let me know if you have any additional information about this.
Also, it seems like gcloud is adding a create command within the projects command which might be useful for me if I can figure out how to run gcloud from my script. https://cloud.google.com/sdk/gcloud/reference/alpha/projects/create

Django registration alternatives

I'm looking at django-registration. It's in alpha 0.8, and hasn't been updated for 12/13 months. But it seems this is what most people use? I'm just wondering if there is a production standard package out there for managing users on a django site, or do people tend to roll their own?
It hasn't been updated because it works very well ;)
Frankly, you really should use this package, along with django-profiles, django-invitation...
The only problem (for me) is the lack of example templates in django-registration
But you can look at this repository to get some
Try django-registration-redux, a maintained fork of django-registration:
https://github.com/macropin/django-registration
https://django-registration-redux.readthedocs.org/en/latest/
I am working on a GAE app that is going to use its own registration module. It works using AJAX, can create new users in database, send codes for verifying new users and recovering existing users by email. I am almost done with this module and I am sure it won't take much time to configured for using with Django models for works with database. The important thing you could take from there is the concept of the registration/verification/restoring processes. Please feel free to clone the module from here and participate in developing it. You can also ping me if you have any questions, I will help with pleasure! Thanks.

I18N Using Django/Python

Im looking to optimize our translation workflow for a django/python based project.
Currently we run a command to export our gettext files, send it to the translators, receive it back. Well you get the drill.
What in your opinion is the best way to optimize this workflow. Tools which integrate nicely and allow translations to be pushed and pulled from and to the system?
Options i've seen so far:
http://trac.transifex.org/ (supported in django 1.3)
Transifex was designed for pretty much this. It doesn't pull the strings from the project/app automatically yet, but it can be extended to do so if desired.
Transifex has two ways to automate this: If your POT file is on a public server, you can setup a resource to auto-fetch the POT file frequently and update your resource.
The second option is to use the client app and run it every time you build/deploy/commit.

Is there a production-safe way to measure time spent in Production w/Python?

I want to be able to instrument Python applications so that I know:
Page generation time.
Percentage of time spent in external requests (mysql, api calls).
Number of mysql queries, what the MySQL queries were.
I want this data from production (not offline profiling) - because the time spent in various places will be different under load.
In PHP I can do this with XHProf or instrumentation-for-php. In Ruby on Rails/.NET/Java, I can do this with New Relic.
Is there such a package recommended for Python or django?
Yes, it's perfectly possible. E.g. use some magic switch in URL, like "?profile-me" which triggers profiling in Django middleware.
There are a number of snippets on the Internet, like this one: http://djangosnippets.org/snippets/70/ or modules like this one: http://code.google.com/p/django-profiling/ - but I haven't used any of them so I cannot recommend anything.
Anyway, the approach they take is similar to what I do - i.e. use Python Hotshot profiler module in a middleware that wraps your view. For the MySQL part, you can just use connection.queries form Django.
The nice thing about Hotshot is that its output can be browsed using Kcachegrind like here: http://www.rkblog.rk.edu.pl/w/p/django-profiling-hotshot-and-kcachegrind/
New Relic now had a package for Python, including Django through mod_wsgi.
https://support.newrelic.com/help/kb/python
django-prometheus is a good choice for handling production workloads, especially in a container environment like Kubernetes. Out of the box, it has middleware for tracking request latencies and counts (by view method), as well as Database and cache access times. It wouldn't be a good solution for tracking which queries are actually executing, but that's where a logging solution like ELK would come into play. If it helps, I've written a post which walks through how to add custom metrics to a Django application.

Categories

Resources