I just deployed a site on GAE which requires me to stage some data for dropdown fields (i.e. us states, status, etc.).
In development, I have created an entity for each type of data (US State entity for example) and was able to preload the data using the interactive console by creating the entity and then calling the put() method.
Now that the application is deployed I don’t know of a way to preload this data. How would you recommend doing this in a deployed instance?
I am using SDK version 1.7.0, python 2.7, High Replication Datastore (HRD), and memcache when data is retrieved.
Thanks in advance for your help!
If you want to do it programmatically, you may use the interactive console in production. Check out How do I activate the Interactive Console on App Engine?
You may also create a temporary request handler that'll do the job, deploy it (e.g. as a different version of the app to make it easy to delete) and launch the respective URL in your browser.
You can use the bulkloader to upload your entities to your deployed version. See the doc Uploading and Downloading Data for details and examples.
Related
I have tried to follow googles documentation on how to set up local development using a database (https://cloud.google.com/appengine/docs/standard/python/tools/using-local-server#Python_Using_the_Datastore). However, i do not have the experience level to follow along. I am not even sure if that was the right guide. The application is a Django project that uses python 2.7. To run the local host, i usually type dev_appserver.py --host 127.0.0.1 .
My questions are:
how do i download the data store database on google cloud. I do not want to download the entire database, just enough data to populate local host so i can do tests
once the database is download, what do i need to do to connect it to the localhost? Do i have to change a parameter somewhere?
do i need to download the datastore? Can i just make a duplicate on the cloud and then connect to that datastore?
When i run localhost, should it not already be connected to the datastore? Since the site works when it is running on the cloud. Where can i find the connection URI?
Thanks for the help
The development server is meant to simulate the whole App Engine Environment, if you examine the output of the dev_appserver.py command you'll see something like Starting Cloud Datastore emulator at: http://localhost:PORT. Your code will interact with that bundled Datastore automatically, pushing and retrieving data according to the code you wrote. Your data will be saved on a file in local storage and will persist across different runs of the development server unless it's explicitly deleted.
This option doesn't provide facilities to import data from your existing Cloud Datastore instance although it's a ready to go solution if your testing procedures can afford populating the local database with mock data through the use of a custom created script that does so programmatically. If you decide for this approach just write the data creation script and execute it before running the tests.
Now, there is another option to simulate local Datastore using the Cloud SDK that comes with handy features for your purposes. You can find the available information for it under Running the Datastore Emulator documentation page. This emulator has support to import entities downloaded from your production Cloud Datastore as well as for exporting them into files.
Back to your questions:
Export data from the Cloud instance into a GCS bucket following this, then download the data from the bucket to your filesystem following this, finally import the data into the emulator with the command shown here.
To use the emulator you need to first run gcloud beta emulators datastore start in a Cloud Shell and then in a separate tab run dev_appserver.py --support_datastore_emulator=true --datastore_emulator_port=8081 app.yaml.
The development server uses one of the two aforementioned emulators, in both cases it is not connected to your Cloud Datastore. You might create another project aimed for development purposes with a copy of your database and deploy your application there so you don't use the emulator at all.
Requests at datastore are made trough the endpoint https://datastore.googleapis.com/v1/projects/project-id although this is not related to how the emulators manage the connections in your local server.
Hope this helps.
I am a newbie who wants to deploy his flask app using google cloud functions. When I am searching it online, people are telling me to deploy it as a Flask app. I want to ask if there is any difference between those two.
A cloud instance or deploying flask app on google cloud VS cloud serverless function
As described by John and Kolban, Cloud Functions is a single purpose endpoint. You want to perform 1 thing, deploy 1 function.
However, if you want to have a many consistent things, like a microservice, you will have to deploy several endpoints that allow you to perform a CRUD on the same data object. You should prefer to deploy several endpoints (CRUD) and to have the capability to easily reuse class and object definitions and business logic. For this, a Flask webserver is that I recommend (and I prefer, I wrote an article on this).
A packaging in Cloud Run is the best for having a serverless platform and pay-per-use pricing model (and automatic scaling and...).
There is an additional great thing: Cloud Functions request object is based on Flask request object. By the way, and it's that I also present in my article, it's easy to switch from one platform to another one. You only have to choose according with your requirements, your skills,... I also wrote another article on this
If you deploy your Flask app as an application in a Compute Engine VM instance, you are basically configuring a computer and application to run your code. The notion of Cloud Functions relieves you from the chore and toil of having to create and manage the environment in which your program runs. A marketing mantra is "You bring the code, we bring the environment". When using Cloud Functions all you need do is code your application logic. The maintenance of the server, scaling up as load increases, making sure the server is available and much more is taken care of for you. When you run your code in your own VM instance, it is your responsibility to manage the whole environment.
References:
HTTP Functions
Deploying a Python serverless function in minutes with GCP
I have the following situation: a Python Flask app running on Google App engine; this app serves predictions from a Spacy machine learning model. Throughout the day, there is a workflow in place which adds new training data for this model, and the App has a cron job that retrains the model taking this new training data into account every evening.
The problem is that I want each App instance to reference this newly trained model after it becomes available. I can upload the model somewhere (say, Google Cloud Storage) but, ultimately, each instance needs to find out about the existence of this new model, download it, and load it into memory/initialize it; this takes time, so I'd like to only do this once per day/on start up.
I'm currently wondering - is there a way to auto-redeploy the App once a day or automatically restart the instances? Is there a different way I should be going about this?
(Note: I would prefer to stick with Google App engine for now.)
It sounds like you should be deploying a new version of your app daily, and then warming the new instance before migrating traffic to it. This is with the assumption that initial start up is slow for your app to load this new model so you can't interrupt the running version because it will disrupt your traffic at that time.
To deploy versions, follow the official guide here and then to warm up and migrate traffic use the guide here.
To automate this process you can use the Admin API -- the question will be how you get the model to a specific location for the new version. I would recommend using same file name for the model so that your actual code stays the same consistently per version. With that, you should be able to build that directory and deploy the app with the new version programmatically every day -- but it depends on the rest of your setup and how you are storing and automating any other part of the process.
This sounds like a very complex process, but what I can told is that after you did all your previous settings for things to be right, you can also use Cloud Build in order to automate the deployments on App Engine. You can see in this quickstart how this process will work.
Basically, you store your application inside a repository, and with every new commit a trigger will make the deployment of your App Engine application version. You can also use git as a repository in order to achieve that, following the steps in this guide.
If you want the whole process to be fully automated you can think of a solution for some auto-commits, like using the Cloud Scheduler.
What are the different options, with pros and cons, for periodically adding records to a Django app hosted on GAE?
Use a custom django management command on the remote datastore
Write an API in Django that exposes the datastore to be updated
Use a cron task on GAE to update
(am I missing anything else?)
1: Custom Django management command on "remote"
I'm currently using #1: django-nonrel on GAE and using custom management/django-admin commands for my models. For example, this is how I call my custom management command on the remote datastore:
manage.py remote mycommand
The advantage of this command is ease of development: I can test the the management command locally and simply add "remote" to use it on GAE.
2: Write an API in Django that exposes the datastore
I would have to use an extra server with cron to update.
3: Use a cron task in Google
I don't know how GAE likes having its users run a scraper periodically. Also, GAE doesn't have a real cron -- it simply hits a URL at a set intervals.
Use a cron job. That's what they're designed for. Whether or not scraping is okay depends on the terms of service on the site you're scraping.
I have a python script that is intended to run on my local machine every night. It's goal is to pull data from a third party server, do some processing on it, and execute bulk upload to GAE datastore.
My issue though is hot to run bulk upload from a python script. All examples I have seen (including Google's documentation) use command line "appcfg.py upload_data ..." and as far as I can see appcfg.py and bulkloader.py do not expose any API that is guaranteed not to change.
My two options as I see them now is to either execute "appcfg.py upload_data ..." command from my python script, which seems a roundabout way of doing things. Or to directly call appcfg.py's internal methods, which means I have to recode tings in case they change.
Appengine can run cron jobs. All you need is to write is a single script which pulls the data from third party server and upload it to appengine engine, Appenigne will do the rest for you. Appengine cron this has everything you need to know about running a cron job in appengine
This answer is now outdated. Please see the below link for my latest answer for bulk upload data to app engine.
How to upload data in bulk to the appengine datastore? Older methods do not work