Concepts, use and testing Cloud Datastore in local

Concepts, use and testing Cloud Datastore in local - python

I'm really confused with the way to try Datastore in local. Please, give me a minute to explain.
I'm developing a app composed to few microservices like a only gae app. In a parte of the app, I use the datastore. So when I run my app, I use the development server and when I save something in the datastore calling some method I can see perfectly the entity in the gae's admin web portal.
Well, now, instead of calling directly to ndb library and his methods I've built a small library over ndb to abstract his functionallity, then I can call insertUser() instead of work directly with ndb. So, the problems appear when I try test this small library that I built (I've written a test.py file to do this).
At first, I thought that this does not can work because this test was executing without the deveserver running. After I searched info about how simulated the datastore in the local and I found this, but after I found too the unittest in local with the stubs, and now I don't understand nothing.
I've tried both (gcloud datastore emulator and stub with unittest) and I don't get do simple example:
I want test that a entity is saved in Datastore and after I want test that I can read this entity
I suppose that dev_server (in SDK) emulate the datastore (because I can see the list of my entities there), but then, why use the datastore emulator in local dev?, and then, why is necesary uses the stub to datastore if we have a datastore emulator to do all test that I want? I don't understand.
I understand that maybe my question is more of concepts than code but I need understand really right how is the best way to work with this.

Finally I think I solved and understood my problem. If I were working with other system that I want connect to Cloud Datastore, I would need use the "emulator". But isn't my case. So, I need use the stubs with unittest because there are not a simple way (I think is imposible) to do this with the dev_server (when he is running).
But i found two mainly problems:
The first, the way to import google_appengine libraries, because in the documentation isn't very clear, (in my view), finally searching user opinions I found that "my solution was something like this":
sys.path.insert(1, '../../../../google_appengine')
if 'google' in sys.modules:
del sys.modules['google']
from google.appengine.ext import ndb
from google.appengine.ext import testbed
The second was that when I execute a test (one of few of I had) the next unittest failed, for example, when in the first unittest, I save the data and in the second I test if the data is saved correctly with a read method.
When I initialized datastore_v3_stub I use save_changes=True to specify that I want the changes be permanent, but when I use it, don't work and I see that the changes maybe don't be saved.
After, I found in the tesbed docs the param datastore_file, when I used this and specify a file where save temporarily the database, all tests began to work fine.
self.testbed.init_datastore_v3_stub(enable=True, save_changes=True, datastore_file='./dbFile')
Besides, I added a final condition (unittest library) to delete this file, so, I erase the file when the test ends. (Avoiding errors in the next execution).
#classmethod
def tearDownClass(self):
"""
Elimina el fichero de la bd temporal tras la ejecución de todos los tests.
"""
os.remove('./dbFile')
I think that GAE and all Google Cloud Platform is a very good solution to develop fast apps but I think too that they need revise and extend his docs, specially to no-experts programmers (like me).
I hope that this solution maybe help someone, if you think that I have some error please comment it.

Related

gcloud monitoring_v3 query fails on AttributeError 'WhichOneof'

Im using this lib for a while, everything was working great. Using it to query cpu utilization of gcloud machines.
this is my code:
query_obj = Query(metric_service_client, project, "compute.googleapis.com/instance/cpu/utilization",
minutes=mins_backward_check)
metric_res = query_obj.as_dataframe()
Everything was working fine until lately it started to fail.
I'm getting:
{AttributeError}'WhichOneof'
Deubbing it, i see it fails inside "as_dataframe()" code, specifically in this part:
data=[_extract_value(point.value) for point in time_series.points]
When it tries to extract the value from the point object.
The _extract_value values code seems to use the WhichOneof attribute which seems to be related to protobuff lib.
I didn't change any of those libs versions, anyone has any clue what causes it to fail now?

If you're confident (!) that you've not changed anything, then this would appear to be Google breaking its API and you may wish to file an issue on Google's issue tracker on one of these components:
https://issuetracker.google.com/issues/new?component=187228&template=1162638
https://issuetracker.google.com/issues/new?component=187143&template=800102
I think Cloud Monitoring is natively a gRPC-based API which would explain the protobuf reference.
A good sanity check is to use APIs Explorer and check the method you're using there to see whether you can account for the request|response, perhaps:
https://cloud.google.com/monitoring/api/ref_v3/rest/v3/projects.timeSeries/query
NOTE Your question may be easy to parse for someone familiar with the Cloud Monitoring Python SDK but isn't easy to repro. Please consider providing a simple repro of your issue, including requirements.txt and a full code snippet.

How to get url for sibling modules in app engine

I have an App Engine application that has multiple independent modules.
When deployed, these modules become available at http://<module>.<app_id>.appspot.com, and when testing locally with dev_appserver.py mod1.yaml mod2.yaml --port-9000, then mod1 runs in http://localhost:9000 and mod2 runs in http://localhost:9001. All the modules are running on the same project. So far so good.
Let's assume mod1 needs to talk to mod2. Is there a way I can get a url for mod2 within mod1, dynamically?
# In mod1's code
import google.some.magic
url_for_mod2 = magic.get_url_for_module('mod2') # http://localhost:9001 or http://mod2.id.appspot.com

It is possible to build cross-module urls that work on both GAE and on the development server using the modules.get_hostname() API. You can find an example in this answer: https://stackoverflow.com/a/31145647/4495081

As long as I know it is not possible. As you have 2 services, and maybe more in the future, it seems you are using a distributed microservices architecture and therefore you need to register your services for them to know where the other service is. For that I recommend etcd it is really pretty easy to use.
A nice fellow also written a python-client for it!

Why is App Engine Returning the Wrong Application ID?

The App Engine Dev Server documentation says the following:
The development server simulates the production App Engine service. One way in which it does this is to prepend a string (dev~) to the APPLICATION_IDenvironment variable. Google recommends always getting the application ID using get_application_id
In my application, I use different resources locally than I do on production. As such, I have the following for when I startup the App Engine instance:
import logging
from google.appengine.api.app_identity import app_identity
# ...
# other imports
# ...
DEV_IDENTIFIER = 'dev~'
application_id = app_identity.get_application_id()
is_development = DEV_IDENTIFIER in application_id
logging.info("The application ID is '%s'")
if is_development:
logging.warning("Using development configuration")
# ...
# set up application for development
# ...
# ...
Nevertheless, when I start my local dev server via the command line with dev_appserver.py app.yaml, I get the following output in my console:
INFO: The application ID is 'development-application'
WARNING: Using development configuration
Evidently, the dev~ identifier that the documentation claims will be preprended to my application ID is absent. I have also tried to use the App Engine Launcher UI to see if that changed anything, but it did not.
Note that 'development-application' is the name of my actual application, but I expected it to be 'dev~development-application'.

Google recommends always getting the application ID using get_application_id
But, that's if you cared about the application ID -- you don't: you care about the partition. Check out the source -- it's published at https://code.google.com/p/googleappengine/source/browse/trunk/python/google/appengine/api/app_identity/app_identity.py .
get_app_identity uses os.getenv('APPLICATION_ID') then passes that to internal function _ParseFullAppId -- which splits it by _PARTITION_SEPARATOR = '~' (thus removing again the dev~ prefix that dev_appserver.py prepended to the environment variable). That's returned as the "partition" to get_app_identity (which ignores it, only returning the application ID in the strict sense).
Unfortunately, there is no architected way to get just the partition (which is in fact all you care about).
I would recommend that, to distinguish whether you're running locally or "in production" (i.e, on Google's servers at appspot.com), in order to access different resources in each case, you take inspiration from the way Google's own example does it -- specifically, check out the app.py example at https://cloud.google.com/appengine/docs/python/cloud-sql/#Python_Using_a_local_MySQL_instance_during_development .
In that example, the point is to access a Cloud SQL instance if you're running in production, but a local MySQL instance instead if you're running locally. But that's secondary -- let's focus instead on, how does Google's own example tell which is the case? The relevant code is...:
if (os.getenv('SERVER_SOFTWARE') and
os.getenv('SERVER_SOFTWARE').startswith('Google App Engine/')):
...snipped: what to do if you're in production!...
else:
...snipped: what to do if you're in the local server!...
So, this is the test I'd recommend you use.
Well, as a Python guru, I'm actually slightly embarassed that my colleagues are using this slightly-inferior Python code (with two calls to os.getenv) -- me, I'd code it as follows...:
in_prod = os.getenv('SERVER_SOFTWARE', '').startswith('Google App Engine/')
if in_prod:
...whatever you want to do if we're in production...
else:
...whatever you want to do if we're in the local server...
but, this is exactly the same semantics, just expressed in more elegant Python (exploiting the second optional argument to os.getenv to supply a default value).
I'll be trying to get this small Python improvement into that example and to also place it in the doc page you were using (there's no reason anybody just needing to find out if their app is being run in prod or locally should ever have looked at the docs about Cloud SQL use -- so, this is a documentation goof on our part, and, I apologize). But, while I'm working to get our docs improved, I hope this SO answer is enough to let you proceed confidently.

That documentation seems wrong, when I run the commands locally it just spits out the name from app.yaml.
That being said, we use
import os
os.getenv('SERVER_SOFTWARE', '').startswith('Dev')
to check if it is the dev appserver.

Is there a better way to access my public api?

I am new to Flask.
I have a public api, call it api.example.com.
#app.route('/api')
def api():
name = request.args.get('name')
...
return jsonify({'address':'100 Main'})
I am building an app on top of my public api (call it www.coolapp.com), so in another app I have:
#app.route('/make_request')
def index():
params = {'name':'Fred'}
r = requests.get('http://api.example.com', params=params)
return render_template('really_cool.jinja2',address=r.text)
Both api.example.com and www.coolapp.com are hosted on the same server. It seems inefficient the way I have it (hitting the http server when I could access the api directly). Is there a more efficient way for coolapp to access the api and still be able to pass in the params that api needs?

Ultimately, with an API powered system, it's best to hit the API because:
It's user testing the API (even though you're the user, it's what others still access);
You can then scale easily - put a pool of API boxes behind a load balancer if you get big.
However, if you're developing on the same box you could make a virtual server that listens on localhost on a random port (1982) and then forwards all traffic to your api code.
To make this easier I'd abstract the API_URL into a setting in your settings.py (or whatever you are loading in to Flask) and use:
r = requests.get(app.config['API_URL'], params=params)
This will allow you to make a single change if you find using this localhost method isn't for you or you have to move off one box.
Edit
Looking at your comments you are hoping to hit the Python function directly. I don't recommend doing this (for the reasons above - using the API itself is better). I can also see an issue if you did want to do this.
First of all we have to make sure the api package is in your PYTHONPATH. Easy to do, especially if you're using virtualenvs.
We from api import views and replace our code to have r = views.api() so that it calls our api() function.
Our api() function will fail for a couple of reasons:
It uses the flask.request to extract the GET arg 'name'. Because we haven't made a request with the flask WSGI we will not have a request to use.
Even if we did manage to pass the request from the front end through to the API the second problem we have is using the jsonify({'address':'100 Main'}). This returns a Response object with an application type set for JSON (not just the JSON itself).
You would have to completely rewrite your function to take into account the Response object and handle it correctly. A real pain if you do decide to go back to an API system again...

Depending on how you structure your code, your database access, and your functions, you can simply turn the other app into package, import the relevant modules and call the functions directly.
You can find more information on modules and packages here.
Please note that, as Ewan mentioned, there's some advantages to using the API. I would advise you to use requests until you actually need faster requests (this is probably premature optimization).
Another idea that might be worth considering, depending on your particular code, is creating a library that is used by both applications.

Get current, requested URL in Python without a framework

I'm trying to get the URL that has been requested in Python without using a web framework.
For example, on a page (let's say /main/index.html), the user clicks on a URL to go to /main/foo/bar (/foo/bar doesn't exist). Apache (with mod_wsgi) then redirects the user to a PHP script at /main/, which then gets the url and searches MySQL for any matching fields. Then the rest of the field is returned. This helped in PHP:
$_SERVER["REQUEST_URI"];
I'd rather not use PHP since it's becoming increasingly difficult to maintain the PHP code whilst the database keeps changing in structure.
I'm pretty sure there's a better way altogether and any mention would be greatly appreciated. For the sake of relevancy, is this even possible (to get the requested URL in Python)? Should I just use a framework, although it seems quite simple?
Thanks in advance,
Jamie
Note: I don't want to use GET for security purposes.

Well, if you run your program as a CGI script, you can get the same information in os.environ. However, if I recall correctly, REQUEST_URI as such is not part of the CGI standard and you need to use os.environ['SCRIPT_NAME'], os.environ['PATH_INFO'] and os.environ['QUERY_STRING'] to get the equivalent data.
However, I seriously urge you to see some lightweight framework, such as Pyramid. Plain CGI with Python is slow and generally just pain in the ass.

Unlike PHP, Python is a general purpose language and doesn't have this built-in.
The way you can gather this information depends on the deployment solution:
CGI (mostly Apache with mod_python, deprecated): see #Antti Haapala solution
WSGI (most other deployment solutions): see #gurney alex solution
But you will encouter much more problems: session hanling, url management, cookies, and even juste simple POST/GET parsing. All of this need to be done manually if you don't use a framework.
Now, if you feel like a framework is overkill (but really, incredible tools like Django are worth it), you can use a micro framework like bottle.
Microframeworks will typically make this heavy lifting for you, but without the complicated setup or the additional advanced features. Bottle has actually zero setup an is a one file lib.
Hello word with bottle:
from bottle import route, run, request
#route('/hello/:name')
def index(name='World'):
return '<b>Hello %s! You are at %s</b>' % (name, request.path)
run(host='localhost', port=8080)
request.path contains what you want, and if you visit http://127.0.0.1:8080/hello/you, you will get:
Hello you! You are at /hello/you

When I want to get a URL outside of any framework using Apache2 and Mod_WSGI I use
environ.get('PATH_INFO')
inside of my application() function.

When using mod_python, if I recall correctly you can use something like:
from mod_python import util
def handler(request):
parameters = util.FieldStorage(request)
url = parameters.get("url", "/")
See http://www.modpython.org/live/current/doc-html/pyapi-util.html for more info on the mod_python.util module and the FieldStorage class (including examples)

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.