Background process in GAE

Background process in GAE - python

I am developing a website using Google App Engine and Django 1.0 (app-engine-patch)
A major part of my program has to run in the background and change local data and also post to a remote URL
Can someone suggest an effective way of doing this?

Check out The Task Queue Python API.

Without using a third-party system, I think currently your only option is to use the cron functionality.
You'd still be bound by the usual GAE script-execution-time limitations, but it wouldn't happen on a page load.
There is plans for background processing, see this App Engine issue #6, and this roadmap update

I second dbr's recommendation of http://code.google.com/appengine/docs/python/config/cron.html (and hopes for better future approaches, such as the promised "task queues").
Nevertheless I suspect that if you do indeed need major (as in CPU heavy) background processing, GAE may not be the most hospitable environment for that. You may want to consider running those heavy background tasks in other environments, and have them communicate with GAE proper e.g. via the "bulk load/download" APIs, see http://code.google.com/appengine/docs/python/tools/uploadingdata.html (and http://code.google.com/appengine/docs/python/tools/uploadingdata.html#Downloading_Data_from_App_Engine for the downloading part).
Google's documentation only describes the usage of the command-line appcfg.py for these purposes (I can't find a proper documentation of the APIs it uses!), but, if you do need more programmatic usage of these APIs, it's not hard to evince them from appcfg.py's sources.

Related

Proper way to show server's RAM, CPU and GPU usage on react app

I have a project in which the backend is written in FastAPI and the frontend uses React. My goal here is to add a component that will monitor some of the pc/server's performances in real time. Right now the project is still under development in local environment so basically I'll need to fetch the CPU/GPU usage and RAM (derived from my PC) from Python and then send them to my React app. My question here is, what is the cheapest way to accomplish this? Is setting an API and fetching a GET request every ten seconds a good approach or there're some better ones?

Explanation & Tips:
I know EXACTLY what you're describing. I also made a mobile app using Flutter and Python. I have been trying to get multiple servers to host the API instead of one server. I personally think Node.Js is worth checking out since it allows clustering which is extremely powerful. If you want to stick with python, the best way to get memory usage in python is using psutil like this: memory = psutil.virtual_memory().percent, but for the CPU usage you would have to do some sort of caching or multi threading because you cannot get the CPU usage without a delay cpu = psutil.cpu_percent(interval=1). If you want your API to be fast then the periodic approach is bad, it will slow down your server, also if you do anything wrong on the client side, you could end up DDOSing your API, which is an embarrassing thing that I did when I first published my app. The best approach is to only call the API when it is needed, and for example, flutter has cached widgets which was very useful, because I would have to fetch that piece of data only once every few hours.
Key Points:
-Only call the API when it is crucial to do so.
-Python cannot get the CPU usage in real-time.
-Node performed better than my Flask API (not FastAPI).
-Use client-side caching if possible.

Examples for writing a daemon or a service in Linux

I have been looking at daemons for Linux such as httpd and have also looked at some code that can be used as a skeleton. I have done a fair amount of research and now I want to practice writing it. However, I'm not sure of what can I use a daemon for. Any good examples/ideas that I can try to execute?
I was thinking of using a daemon along with libnotify on Ubuntu to have pop-up notifications of select tweets.
Is this a bad example for implementing a daemon?
Will you even need a daemon for this?
Can this be implemented as a service rather than a daemon?

First: PEP 3143 tries to enumerate all of the fiddly details you have to get right to write a daemon in Python. And it specifies a library that takes care of those details for you.
The PEP was deferred—at least in part because the community felt it was more a responsibility of POSIX or some Linux standards group or something to first define exactly what is essential to being a daemon, before Python could have its own position on how to implement one. But it's still a great guide. However, the reference implementation of that proposed library still lives on, as python-daemon, which you can install from PyPI.
Meanwhile, the really interesting question for this project isn't so much service vs. daemon, as root vs. user. Do you want a single process that keeps track of all users' twitter accounts, and sends notifications to anyone who's logged in? Just a per-user process? Or maybe both, a single process watching all the tweets, then sending notifications via user processes?
Of course you don't really need a daemon or service for this. For example, it could be a GUI app whose main window is a configuration dialog, which keeps running (maybe with a traybar thingy) even when you close the config dialog, and it would work just as well. The question isn't whether you need a daemon, but whether it's more appropriate. Which really is a design choice.

Apache + mod_wsgi / Lighttpd + wsgi - am I going to see differences in performance?

I'm a newbie to developing with Python and I'm piecing together the information I need to make intelligent choices in two other open questions. (This isn't a duplicate.)
I'm not developing using a framework but building a web app from scratch using the gevent library. As far as front-end web servers go, it seems I have three choices: nginx, apache, and lighttpd.
From all accounts that I've read, nginx's mod_wsgi isn't suitable.
That leaves two choices - lighttpd and Apache. Under heavy load, am I going to see major differences in performance and memory consumption characteristics? I'm under the impression Apache tends to be memory hungry even when not using prefork, but I don't know how suitable lighttp is for Python apps.
Are there any caveats or benefits to using lighttpd over apache? I really want to hear all the information you can possibly bore me with!

Apache...
Apache is by far the most widely used web server out there. Which is a good thing. There is so much more information on how to do stuff with it, and when something goes wrong there are a lot of people who know how to fix it. But, it is also the slowest out of the box; requring a lot of tweaking and a beefier server than Lighttpd. In your case, it will be a lot easier to get off the ground using Apache and Python. There are countless AMP packages out there, and many guides on how to setup python and make your application work. Just a quick google search will get you on your way. Under heavy load, Lighttpd will outshine Apache, but Apache is like a train. It just keeps chugging along.
Pros
Wide User Base
Universal support
A lot of plugins
Cons
Slow out of the box
Requires performance tweaking
Memory whore (No way you could get it working on a 64MB VPS)
Lighttpd...
Lighttpd is the new kid on the block. It is fast, powerful, and kicks ass performance wise (not to mention use like no memory). Out of the box, Lighttpd wipes the floor with Apache. But, not as many people know Lighttpd, so getting it to work is harder. Yes, it is the second most used webserver, but it does not have as much community support behind it. If you look here, on stackoverflow, there is this dude who keeps asking about how to get his Python app working but nobody has helped him. Under heavy load, if configured correctly, Lighttpd will out preform Apache (I did some tests a while back, and you might see a 200-300% performance increase in requests per second).
Pros
Fast out of the box
Uses very little memory
Cons
Not as much support as Apache
Sometimes just does not work
Nginx
If you were running a static website, then you would use nginx. you are correct in saying nginx's mod_wsgi isn't suitable.
Conclusion
Benefits? There are both web servers; designed to be able to replace one another. If both web servers are tuned correctly and you have ample hardware, then there is no real benefit of using one over another. You should try and see which web server meets your need, but asking me; I would say go with Lighttpd. It is, in my opinion, easier to configure and just works.
Also, You should look at Cherokee Web Server. Mad easy to set up and, the performance aint half bad. And you should ask this on Server Fault as well.

That you have mentioned gevent is important. Does that mean you are specifically trying to implement a long polling application? If you are and that functionality is the bulk of the application, then you will need to put your gevent server behind a front end web server that is implemented using async techniques rather that processes/threading model. Lighttd is an async server and fits that bill whereas Apache isn't. So use of Apache isn't good as front end proxy for long polling application. If that is the criteria though, would actually suggest you use nginx rather than Lighttpd.
Now if you are not doing long polling or anything else that needs high concurrency for long running requests, then you aren't necessarily going to gain too much by using gevent, especially if intention is to use a WSGI layer on top. For WSGI applications, ultimately the performance difference between different servers is minimal because your application is unlikely to be a hello world program that the benchmarks all use. The real bottlenecks are not the server but your application code, database, external callouts, lack of caching etc etc. In light of that, you should just use whatever WSGI hosting mechanism you find easier to use initially and when you properly work out what the hosting requirements are for your application, based on having an actual real application to test, then you can switch to something more appropriate if necessary.
In summary, you are just wasting your time trying to prematurely optimize by trying to find what may be the theoretically best server when in practice your application is what you should be concentrating on initially. After that, you also should be looking at application monitoring tools, because without monitoring tools how are you even going to determine if one hosting solution is better than another.

Can I use Google App Engine for processing data?

I would like to execute some long running JRuby scripts [ nothing to do with web requests and url fetch ] on Google App Engine. There is a 30 seconds limit on URL Fetch requests. Does the same apply for plain JRuby/Python scripts? If yes, is there a workaround?

The 30-second applies to everything that happens on AppEngine. It's really not an ideal platform for hosting long-running processes. There are some techniques that you can use to simulate what you want. Task Queues can be set up to perform work in the background, for example.
Still, you might want to look into one of the vast variety of hosting alternatives that will let you simply launch a process and let it keep running.

A good multithreaded python webserver?

I am looking for a python webserver which is multithreaded instead of being multi-process (as in case of mod_python for apache). I want it to be multithreaded because I want to have an in memory object cache that will be used by various http threads. My webserver does a lot of expensive stuff and computes some large arrays which needs to be cached in memory for future use to avoid recomputing. This is not possible in a multi-process web server environment. Storing this information in memcache is also not a good idea as the arrays are large and storing them in memcache would lead to deserialization of data coming from memcache apart from the additional overhead of IPC.
I implemented a simple webserver using BaseHttpServer, it gives good performance but it gets stuck after a few hours time. I need some more matured webserver. Is it possible to configure apache to use mod_python under a thread model so that I can do some object caching?

CherryPy. Features, as listed from the website:
A fast, HTTP/1.1-compliant, WSGI thread-pooled webserver. Typically, CherryPy itself takes only 1-2ms per page!
Support for any other WSGI-enabled webserver or adapter, including Apache, IIS, lighttpd, mod_python, FastCGI, SCGI, and mod_wsgi
Easy to run multiple HTTP servers (e.g. on multiple ports) at once
A powerful configuration system for developers and deployers alike
A flexible plugin system
Built-in tools for caching, encoding, sessions, authorization, static content, and many more
A native mod_python adapter
A complete test suite
Swappable and customizable...everything.
Built-in profiling, coverage, and testing support.

Consider reconsidering your design. Maintaining that much state in your webserver is probably a bad idea. Multi-process is a much better way to go for stability.
Is there another way to share state between separate processes? What about a service? Database? Index?
It seems unlikely that maintaining a huge array of data in memory and relying on a single multi-threaded process to serve all your requests is the best design or architecture for your app.

Twisted can serve as such a web server. While not multithreaded itself, there is a (not yet released) multithreaded WSGI container present in the current trunk. You can check out the SVN repository and then run:
twistd web --wsgi=your.wsgi.application

Its hard to give a definitive answer without knowing what kind of site you are working on and what kind of load you are expecting. Sub second performance may be a serious requirement or it may not. If you really need to save that last millisecond then you absolutely need to keep your arrays in memory. However as others have suggested it is more than likely that you don't and could get by with something else. Your usage pattern of the data in the array may affect what kinds of choices you make. You probably don't need access to the entire set of data from the array all at once so you could break your data up into smaller chunks and put those chunks in the cache instead of the one big lump. Depending on how often your array data needs to get updated you might make a choice between memcached, local db (berkley, sqlite, small mysql installation, etc) or a remote db. I'd say memcached for fairly frequent updates. A local db for something in the frequency of hourly and remote for the frequency of daily. One thing to consider also is what happens after a cache miss. If 50 clients all of a sudden get a cache miss and all of them at the same time decide to start regenerating those expensive arrays your box(es) will quickly be reduced to 8086's. So you have to take in to consideration how you will handle that. Many articles out there cover how to recover from cache misses. Hope this is helpful.

Not multithreaded, but twisted might serve your needs.

You could instead use a distributed cache that is accessible from each process, memcached being the example that springs to mind.

web.py has made me happy in the past. Consider checking it out.
But it does sound like an architectural redesign might be the proper, though more expensive, solution.

Perhaps you have a problem with your implementation in Python using BaseHttpServer. There's no reason for it to "get stuck", and implementing a simple threaded server using BaseHttpServer and threading shouldn't be difficult.
Also, see http://pymotw.com/2/BaseHTTPServer/index.html#module-BaseHTTPServer about implementing a simple multi-threaded server with HTTPServer and ThreadingMixIn

I use CherryPy both personally and professionally, and I'm extremely happy with it. I even do the kinds of thing you're describing, such as having global object caches, running other threads in the background, etc. And it integrates well with Apache; simply run CherryPy as a standalone server bound to localhost, then use Apache's mod_proxy and mod_rewrite to have Apache transparently forward your requests to CherryPy.
The CherryPy website is http://cherrypy.org/

I actually had the same issue recently. Namely: we wrote a simple server using BaseHTTPServer and found that the fact that it's not multi-threaded was a big drawback.
My solution was to port the server to Pylons (http://pylonshq.com/). The port was fairly easy and one benefit was it's very easy to create a GUI using Pylons so I was able to throw a status page on top of what's basically a daemon process.
I would summarize Pylons this way:
it's similar to Ruby on Rails in that it aims to be very easy to deploy web apps
it's default templating language, Mako, is very nice to work with
it uses a system of routing urls that's very convenient
for us performance is not an issue, so I can't guarantee that Pylons would perform adequately for your needs
you can use it with Apache & Lighthttpd, though I've not tried this
We also run an app with Twisted and are happy with it. Twisted has good performance, but I find Twisted's single-threaded/defer-to-thread programming model fairly complicated. It has lots of advantages, but would not be my choice for a simple app.
Good luck.

Just to point out something different from the usual suspects...
Some years ago while I was using Zope 2.x I read about Medusa as it was the web server used for the platform. They advertised it to work well under heavy load and it can provide you with the functionality you asked.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.