Python App Engine debug/dev mode

Python App Engine debug/dev mode - python

I'm working on an App Engine project (Python) where we'd like to make certain changes to the app's behavior when debugging/developing (most often locally). For example, when debugging, we'd like to disable our rate-limiting decorators, turn on the debug param in the WSGIApplication, maybe add some asserts.
As far as I can tell, App Engine doesn't naturally have any concept of a global dev-mode or debug-mode, so I'm wondering how best to implement such a mode. The options I've been able to come up with so far:
Use google.appengine.api.app_identity.get_default_version_hostname() to get the hostname and check if it begins with localhost. This seems... unreliable, and doesn't allow for using the debug mode in a deployed app instance.
Use os.environ.get('APPLICATION_ID') to get the application id, which according to this page is automatically prepended with dev~ by the development server. Worryingly, the very source of this information is in a box warning:
Do not get the App ID from the environment variable. The development
server simulates the production App Engine service. One way in which
it does this is to prepend a string (dev~) to the APPLICATION_ID
environment variable, which is similar to the string prepended in
production for applications using the High Replication Datastore. You
can modify this behavior with the --default_partition flag, choosing a
value of "" to match the master-slave option in production. Google
recommends always getting the application ID using get_application_id,
as described above.
Not sure if this is an acceptable use of the environment variable. Either way it's probably equally hacky, and suffers the same problem of only working with a locally running instance.
Use a custom app-id for development (locally and deployed), use the -A flag in dev_appserver.py, and use google.appengine.api.app_identity.get_application_id() in the code. I don't like this for a number of reasons (namely having to have two separate app engine projects).
Use a dev app engine version for development and detect with os.environ.get('CURRENT_VERSION_ID').split('.')[0] in code. When deployed this is easy, but I'm not sure how to make dev_appserver.py use a custom version without modifying app.yaml. I suppose I could sed app.yaml to a temp file in /tmp/ with the version replaced and the relative paths resolved (or just create a persistent dev-app.yaml), then pass that into dev_appserver.py. But that seems also kinda dirty and prone to error/sync issues.
Am I missing any other approaches? Any considerations I failed to acknowledge? Any other advice?

In regards to "detecting" localhost development we use the following in our applications settings / config file.
IS_DEV_APPSERVER = 'development' in os.environ.get('SERVER_SOFTWARE', '').lower()
That used in conjunction with the debug flag should do the trick for you.

Related

Hot reloading properties in a Python Flask/Django app

Gurus, Wizards, Geeks
I am tasked with providing Python Flask apps (more generally, webapps written in python) a way to reload properties on the fly.
Specifically, my team and I currently deploy python apps with a {env}.properties file that contains various environment specific configurations in a key value format (yaml for instance). Ideally, these properties are reloaded by the app when changed. Suppose a secondary application existed that updates the previously mentioned {env}.properties file, the application should ALSO be able to read and use new values.
Currently, we read the {env}.properties at startup and the values are accessed via functions stored in a context.py. I could write a function that could periodically update the variables. Before starting an endeavor like this, I thought I would consult the collective to see if someone else has solved this for Django or Flask projects (as it seems like a reasonable request for feature flags, etc).

One such pattern is the WSGI application factory pattern.
In short, you define a function that instantiates the application object. This pattern works with all WSGI-based frameworks.
The Flask docs explain application factories pretty well.
This allows you to define the application dynamically on-the-fly, without the need to redeploy or deploy many configurations of an application. You can change just about anything about the app this way, including configuration, routes, middlewares, and more.
A simple example of this would be something like:
def get_settings(env):
"""get the (current, updated) application settings"""
...
return settings
def create_app(env: str):
if env not in ('dev', 'staging', 'production'):
raise ValueError(f'{env} is not a valid environment')
app = Flask(__name__)
app.config.update(get_settings(env))
return app
Then, you could set FLASK_APP environment variable to something like "myapp:create_app('dev')" and that would do it. This is also the same way you could specify this for servers like gunicorn.
The get_settings function should be written to return the newest settings. It could even do something like retrieve settings from an external source like S3, a config service, or anything.

Why is App Engine Returning the Wrong Application ID?

The App Engine Dev Server documentation says the following:
The development server simulates the production App Engine service. One way in which it does this is to prepend a string (dev~) to the APPLICATION_IDenvironment variable. Google recommends always getting the application ID using get_application_id
In my application, I use different resources locally than I do on production. As such, I have the following for when I startup the App Engine instance:
import logging
from google.appengine.api.app_identity import app_identity
# ...
# other imports
# ...
DEV_IDENTIFIER = 'dev~'
application_id = app_identity.get_application_id()
is_development = DEV_IDENTIFIER in application_id
logging.info("The application ID is '%s'")
if is_development:
logging.warning("Using development configuration")
# ...
# set up application for development
# ...
# ...
Nevertheless, when I start my local dev server via the command line with dev_appserver.py app.yaml, I get the following output in my console:
INFO: The application ID is 'development-application'
WARNING: Using development configuration
Evidently, the dev~ identifier that the documentation claims will be preprended to my application ID is absent. I have also tried to use the App Engine Launcher UI to see if that changed anything, but it did not.
Note that 'development-application' is the name of my actual application, but I expected it to be 'dev~development-application'.

Google recommends always getting the application ID using get_application_id
But, that's if you cared about the application ID -- you don't: you care about the partition. Check out the source -- it's published at https://code.google.com/p/googleappengine/source/browse/trunk/python/google/appengine/api/app_identity/app_identity.py .
get_app_identity uses os.getenv('APPLICATION_ID') then passes that to internal function _ParseFullAppId -- which splits it by _PARTITION_SEPARATOR = '~' (thus removing again the dev~ prefix that dev_appserver.py prepended to the environment variable). That's returned as the "partition" to get_app_identity (which ignores it, only returning the application ID in the strict sense).
Unfortunately, there is no architected way to get just the partition (which is in fact all you care about).
I would recommend that, to distinguish whether you're running locally or "in production" (i.e, on Google's servers at appspot.com), in order to access different resources in each case, you take inspiration from the way Google's own example does it -- specifically, check out the app.py example at https://cloud.google.com/appengine/docs/python/cloud-sql/#Python_Using_a_local_MySQL_instance_during_development .
In that example, the point is to access a Cloud SQL instance if you're running in production, but a local MySQL instance instead if you're running locally. But that's secondary -- let's focus instead on, how does Google's own example tell which is the case? The relevant code is...:
if (os.getenv('SERVER_SOFTWARE') and
os.getenv('SERVER_SOFTWARE').startswith('Google App Engine/')):
...snipped: what to do if you're in production!...
else:
...snipped: what to do if you're in the local server!...
So, this is the test I'd recommend you use.
Well, as a Python guru, I'm actually slightly embarassed that my colleagues are using this slightly-inferior Python code (with two calls to os.getenv) -- me, I'd code it as follows...:
in_prod = os.getenv('SERVER_SOFTWARE', '').startswith('Google App Engine/')
if in_prod:
...whatever you want to do if we're in production...
else:
...whatever you want to do if we're in the local server...
but, this is exactly the same semantics, just expressed in more elegant Python (exploiting the second optional argument to os.getenv to supply a default value).
I'll be trying to get this small Python improvement into that example and to also place it in the doc page you were using (there's no reason anybody just needing to find out if their app is being run in prod or locally should ever have looked at the docs about Cloud SQL use -- so, this is a documentation goof on our part, and, I apologize). But, while I'm working to get our docs improved, I hope this SO answer is enough to let you proceed confidently.

That documentation seems wrong, when I run the commands locally it just spits out the name from app.yaml.
That being said, we use
import os
os.getenv('SERVER_SOFTWARE', '').startswith('Dev')
to check if it is the dev appserver.

GAE development server keep full text search indexes after restart?

Is there anyway of forcing the GAE dev server to keep full text search indexes after restart? I am finding that the index is lost whenever the dev server is restarted.
I am already using a static datastore path when I launch the dev server (the --datastore_path option).

This functionality was added a few releases ago (in either 1.7.1 or 1.7.2, I think). If you're using an SDK from the last few months it should be working. You can try explicitly setting the --search_indexes_path flag on dev_appserver.py; it's possible that the default location (/tmp/) isn't writable. Could you post the first few lines of the logs from when you start dev_appserver?

in case anyone else comes looking for this, it looks like the simple solution is now to specify
--storage_path=/not/the/tmp/dir
you can still override this with --datastore_path etc.
https://developers.google.com/appengine/docs/python/tools/devserver
(at the bottom of the page..)

Look like this is not an issue anymore. according to documentation (and my tests):
"The development web server simulates the App Engine datastore using a
file on your computer. This file persists between invocations of the
web server, so data you store will still be available the next time
you run the web server."
Please let me know if it is otherwise and I will follow up on that.

Using CherryPy/Cherryd to launch multiple Flask instances

Per suggestions on SO/SF and other sites, I am using CherryPy as the WSGI server to launch multiple instances of a Python web server I built with Flask. Each instance runs on its own port and sits behind Nginx. I should note that the below does work for me, but I'm troubled that I have gone about things the wrong way and it works "by accident".
Here is my current cherrypy.conf file:
[global]
server.socket_host = '0.0.0.0'
server.socket_port = 8891
request.dispatch: cherrypy.dispatch.MethodDispatcher()
tree.mount = {'/':my_flask_server.app}
Without diving too far into my Flask server, here's how it starts:
import flask
app = flask.Flask(__name__)
#app.route('/')
def hello_world():
return "hello"
And here is the command I issue on the command line to launch with Cherryd:
cherryd -c cherrypy.conf -i my_flask_server
Questions are:
Is wrapping Flask inside CherryPy still the preferred method of using Flask in production? https://stackoverflow.com/questions/4884541/cherrypy-vs-flask-werkzeug
Is this the proper way to use a .conf file to launch CherryPy and import the Flask app? I have scoured the CherryPy documentation, but I cannot find any use cases that match what I am trying to do here specifically.
Is the proper way to launch multiple CherryPy/Flask instances on a single machine to execute multiple cherryd commands (daemonizing with -d, etc) with unique .conf files for each port to be used (8891, 8892, etc)? Or is there a better "CherryPy" way to accomplish this?
Thanks for any help and insight.

I can't speak for Flask, but I can for CherryPy. That looks like the "proper way"...mostly. That line about a MethodDispatcher is a no-op since it only affects CherryPy Applications, and you don't appear to have mounted any (just a single Flask app instead).
Regarding point 3, you have it right. CherryPy allows you to run multiple Server objects in the same process in order to listen on multiple ports (or protocols), but it doesn't have any sugar for starting up multiple processes. As you say, multiple cherryd commands with varying config files is how to do it (unless you want to use a more integrated cluster/config management tool like eggmonster).

Terminology: Mounting vs Grafting
In principle this is a proper way to serve a flask app through cherrypy, just a quick note on your naming:
It is worth noting here that tree.mount is not a configuration key by itself - tree will lead to cherrypy._cpconfig._tree_config_handler(k, v) being called with the arguments 'mount', {'/': my_flask_server.app}.
The key parameter is not used at all by the _tree_config_handler so in your config "mount" is just an arbitrary label for that specific dict of path mappings. It also does not "mount" the application (it's not a CherryPy app after all). By that I mean, it does not cherrypy.tree.mount(…) it but rather cherrypy.tree.grafts an arbitrary WSGI handler onto your "script-name" (paths, but in CherryPy terminology) namespace.
Cherrypy's log message somewhat misleadingly says "Mounted <app as string> on /"]
This is a somewhat important point since with graft, unlike mount, you cannot specify further options such as static file service for your app or streaming responses on that path.
So I would recommend changing the tree.mount config key to something descriptive that does not invite reading too much semantics about what happens within CherryPy (since there is the cherrypy.tree.mount method) due to that config. E.g., tree.flask_app_name if you're just mapping that one app in that dict (there can be many tree directives, all of them just getting merged into the paths namespace) or tree.wsgi_delegates if you map many apps in that dict.
Using CherryPy to serve additional content without making an app of it
Another side note, if you want cherrypy to e.g. provide static file service for your app, you don't have to create a boilerplate cherrypy app to hold that configuration. You just have to mount None with the appropriate additional config. The following files would suffice to have CherryPy to serve static content from the subdirectory 'static' if they are put into the directory where you launch cherryd to serve static content (invoke cherryd as cherryd -c cherrypy.conf -i my_flask_server -i static:
static.py
import cherrypy
# next line could also have config as an inline dict, but
# file config is often easier to handle
cherrypy.tree.mount(None, '/static-path', 'static.conf')
static.conf
# static.conf
[/]
tools.staticdir.on = True
tools.staticdir.root = os.getcwd()
tools.staticdir.dir = 'static'
tools.staticdir.index = 'index.html'

In Python, how can I test if I'm in Google App Engine SDK?

Whilst developing I want to handle some things slight differently than I will when I eventually upload to the Google servers.
Is there a quick test that I can do to find out if I'm in the SDK or live?

See: https://cloud.google.com/appengine/docs/python/how-requests-are-handled#Python_The_environment
The following environment variables are part of the CGI standard, with special behavior in App Engine:
SERVER_SOFTWARE:
In the development web server, this value is "Development/X.Y" where "X.Y" is the version of the runtime.
When running on App Engine, this value is "Google App Engine/X.Y.Z".

In app.yaml, you can add IS_APP_ENGINE environment variable
env_variables:
IS_APP_ENGINE: 1
and in your Python code check if it has been set
if os.environ.get("IS_APP_ENGINE"):
print("The app is being run in App Engine")
else:
print("The app is being run locally")

Based on the same trick, I use this function in my code:
def isLocal():
return os.environ["SERVER_NAME"] in ("localhost", "www.lexample.com")
I have customized my /etc/hosts file in order to be able to access the local version by prepending a "l" to my domain name, that way it is really easy to pass from local to production.
Example:
production url is www.example.com
development url is www.lexample.com

I just check the httplib (which is a wrapper around appengine fetch)
def _is_gae():
import httplib
return 'appengine' in str(httplib.HTTP)

A more general solution
A more general solution, which does not imply to be on a Google server, detects if the code is running on your local machine.
I am using the code below regardless the hosting server:
import socket
if socket.gethostname() == "your local computer name":
DEBUG = True
ALLOWED_HOSTS = ["127.0.0.1", "localhost", ]
...
else:
DEBUG = False
ALLOWED_HOSTS = [".your_site.com",]
...
If you use macOS you could write a more generic code:
if socket.gethostname().endswith(".local"): # True in your local computer
...
Django developers must put this sample code in the file settings.py of the project.
EDIT:
According to Jeff O'Neill in macOS High Sierra socket.gethostname() returns a string ending in ".lan".

The current suggestion from Google Cloud documentation is:
if os.getenv('GAE_ENV', '').startswith('standard'):
# Production in the standard environment
else:
# Local execution.

Update on October 2020:
I tried using os.environ["SERVER_SOFTWARE"] and os.environ["APPENGINE_RUNTIME"] but both didn't work so I just logged all keys from the results from os.environ.
In these keys, there was GAE_RUNTIME which I used to check if I was in the local environment or cloud environment.
The exact key might change or you could add your own in app.yaml but the point is, log os.environ, perhaps by adding to a list in a test webpage, and use its results to check your environment.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.