appengine set up local host with datastore for testing - python

I have tried to follow googles documentation on how to set up local development using a database (https://cloud.google.com/appengine/docs/standard/python/tools/using-local-server#Python_Using_the_Datastore). However, i do not have the experience level to follow along. I am not even sure if that was the right guide. The application is a Django project that uses python 2.7. To run the local host, i usually type dev_appserver.py --host 127.0.0.1 .
My questions are:
how do i download the data store database on google cloud. I do not want to download the entire database, just enough data to populate local host so i can do tests
once the database is download, what do i need to do to connect it to the localhost? Do i have to change a parameter somewhere?
do i need to download the datastore? Can i just make a duplicate on the cloud and then connect to that datastore?
When i run localhost, should it not already be connected to the datastore? Since the site works when it is running on the cloud. Where can i find the connection URI?
Thanks for the help

The development server is meant to simulate the whole App Engine Environment, if you examine the output of the dev_appserver.py command you'll see something like Starting Cloud Datastore emulator at: http://localhost:PORT. Your code will interact with that bundled Datastore automatically, pushing and retrieving data according to the code you wrote. Your data will be saved on a file in local storage and will persist across different runs of the development server unless it's explicitly deleted.
This option doesn't provide facilities to import data from your existing Cloud Datastore instance although it's a ready to go solution if your testing procedures can afford populating the local database with mock data through the use of a custom created script that does so programmatically. If you decide for this approach just write the data creation script and execute it before running the tests.
Now, there is another option to simulate local Datastore using the Cloud SDK that comes with handy features for your purposes. You can find the available information for it under Running the Datastore Emulator documentation page. This emulator has support to import entities downloaded from your production Cloud Datastore as well as for exporting them into files.
Back to your questions:
Export data from the Cloud instance into a GCS bucket following this, then download the data from the bucket to your filesystem following this, finally import the data into the emulator with the command shown here.
To use the emulator you need to first run gcloud beta emulators datastore start in a Cloud Shell and then in a separate tab run dev_appserver.py --support_datastore_emulator=true --datastore_emulator_port=8081 app.yaml.
The development server uses one of the two aforementioned emulators, in both cases it is not connected to your Cloud Datastore. You might create another project aimed for development purposes with a copy of your database and deploy your application there so you don't use the emulator at all.
Requests at datastore are made trough the endpoint https://datastore.googleapis.com/v1/projects/project-id although this is not related to how the emulators manage the connections in your local server.
Hope this helps.

Related

How to move mysql database online with python

I have built an app that uses mysql database with Python, I would love to share some functionalities with different applications and that calls for an online database feature, kindly give me some insights over how i can move a python mysql database to online and how to make calls to it in order to facilitate for sharing of data between different applications.
I don't exactly know what you are calling a python database but there are some options here that you might want to consider
First, use heroku to host your app and heroku postgress to host your databaseOr you can use an EC2 aws machine to host your app and it's database (in case it's a custom code that you can't call from a browser using heroku)with both of these options you can access you database and the appp with the second one you can install other services such as ssh and other.

Where to host pub sub publisher on GCP?

I'm looking to create a publisher that streams and sends tweets containing a certain hashtag to a pub/sub topic.
The tweets will then be ingested with cloud dataflow and then loaded into a Big Query database.
In the following article they do something similar where the publisher is hosted on a docker image on a Google Compute Engine instance.
Can anyone recommend alternative Google Cloud resources that could host the publisher code more simply, that avoids the need to create a docker file etc?
The publisher would need to run constantly. Would cloud run for e.g. be a suitable alternative?
There are some workarounds I can think of:
A quick way to avoid containers architecture is having the on_data method inside a loop, for example, by using something like while(true) or start a Stream like explained in Create your Python script and run the code in a Compute Engine in the background with nohup python -u myscript.py. Or follow the steps described in Script on GCE to capture tweets that uses tweepy.Stream to start the streaming.
You might want to reconsider the Dockerfile option since its configuration could be not so difficult, see Tweets & pipelines where there is a script that read the data and publish to PubSub, you will see that 9 lines are used for the Docker file and it is deployed in App Engine using Cloud Build. Another implementation with a Docker file that requires more steps is twitter-for-bigquery, in case it helps, you will see that there are more specific steps and more configurations.
Cloud Functions is also another option, in this guide Serverless Twitter with Google Cloud you can check the Design section to know if it fits your use case.
Airflow with Twitter Scraper could work for you since Cloud Composer is a managed service for Airflow and you can create an Airflow environment quickly. It uses the Twint library, check the Technical section in the link for more details.
Stream Twitter Data into BigQuery with Cloud Dataprep is a workaround that put aside complex configurations. In this case the job won't run constantly but can be scheduled to run within minutes.

Running One Instance of Google App Engine with frontend in nodejs and backend server in python

I'm getting my feet wet with GCP and GAE, also nodejs and python and networking (I know).
[+] What I have:
Basically I have some nodejs code that takes in some input and is supposed to then send that input to some python code that will do more stuff to it. My first idea was to deploy the nodejs code via GAE, then host the python code in a python server, then make post requests from the nodejs front-end to the python server backend.
[+] What I would like to be able to do:
just deploy both my nodejs code and my python code in the same project and instance of GAE so that the nodejs is the frontend that people see but so that the python server is also running in the same environment and can just communicate with the nodejs without sending anything online.
[+] What I have read
https://www.netguru.co/blog/use-node-js-backend
Google App Engine - Front and Backend Web Development
and countless other google searches for this type of setup but to no avail.
If anyone can point me in the right direction I would really appreciate it.
You can't have both python and nodejs running in the same instance, but they can run as separate services, each with their own instance(s) inside the same GAE app/project. See Service isolation and maybe Deploying different languages services to the same Application [Google App Engine]
Using post requests can work pretty well, but will likely take some effort to ensure no outside access.
Since you intend to use as frontend the nodejs service you're limited to using only the flexible environment for it, which limits the inter-service communication options - you can't use push queues (properly supported only in the standard environment) which IMHO would be a better/more secure solution than post requests.
Another secure communication option would be for the nodejs service to place the data into the datastore and have the python service pick it up from there - the datastore is shared by all instances/versions/services inside the same GAE app. Also more loosely coupled IMHO - each service can function (at least for a while) without the other being alive (not possible if using the post requests).
Maybe of interest: How to tell if a Google App Engine documentation page applies to the standard or the flexible environment
UPDATE:
Node.JS is currently available in the standard environment as well, so you can use those features, see:
Now, you can deploy your Node.js app to App Engine standard environment
Google App Engine Node.js Standard Environment Documentation

AppEngine Initial Datastore

I have a python application that I've been running with the devserver and everything seems to work fine except I am having problems initializing my datastore. Basically I need to set up data store values from a bunch of files that are on my local drive, but I don't want to upload go google. I set up a simple python script inside my app directory that does all of the data creation, but now I'm having a lot of problems deploying my app. How do I get a dump of the data that dev_appserver is using and upload it to my application?
Thanks for any insights.
Download the data using appcfg.py (after enabling the remote_api), then re-'upload' it to local devappserver.
http://blog.mfabrik.com/2011/03/14/mirroring-app-engine-production-data-to-development-server-using-appcfg-py/

GAE bulk upload programmatically

I have a python script that is intended to run on my local machine every night. It's goal is to pull data from a third party server, do some processing on it, and execute bulk upload to GAE datastore.
My issue though is hot to run bulk upload from a python script. All examples I have seen (including Google's documentation) use command line "appcfg.py upload_data ..." and as far as I can see appcfg.py and bulkloader.py do not expose any API that is guaranteed not to change.
My two options as I see them now is to either execute "appcfg.py upload_data ..." command from my python script, which seems a roundabout way of doing things. Or to directly call appcfg.py's internal methods, which means I have to recode tings in case they change.
Appengine can run cron jobs. All you need is to write is a single script which pulls the data from third party server and upload it to appengine engine, Appenigne will do the rest for you. Appengine cron this has everything you need to know about running a cron job in appengine
This answer is now outdated. Please see the below link for my latest answer for bulk upload data to app engine.
How to upload data in bulk to the appengine datastore? Older methods do not work

Categories

Resources