I've created a shopping site with a backend and a frontend.
The backend is python (3.6.5) with Flask.
I want to deploy my site to Google App Engine (gae).
When in development, everything works fine.
When deployed (in production) each rpc gets it's own 'thread' and everything is a mess.
I tried slapping gunicorn on it with sync and gevent worker class, but to no avail.
In deployment, how can I make each connection/session remember it's own 'instance of the backend'?
-instead of gae/flask/gunicorn serving a new instance of the backend for each request?
I need each user connection to be consistent and 'its own'/'private'.
It's not possible to do this. App Engine will spread the request load to your application across all instances, regardless of which one previously handled a request from a specific IP. A specific instance may also come online or go offline due to load or underlying changes to App Engine (e.g., a data center needs maintenance).
If you need to maintain session state between multiple requests to your app, you have a couple options depending on the architecture:
Keep session state in cookies with Flask.session
Keep session state in storage with Memorystore
Related
My GCP app has been abused by some users. To stop their usage I have attempted to eliminate features that can be abused, and have employed firewall rules to block certain users. But bad users continue to try to access my app via certain legacy URLs such as myapp.appspot.com/badroute. Of course, I still want users to use the default URL myapp.appspot.com .
I have altered my code in the following manner, but I am still getting Instances to start from them, and I do not want Instances in such cases. What can I do differently to avoid the bad Instances starting OR is there anything I can do to force such Instances to stop quickly instead of after about 15 minutes?
class Dummy(webapp2.RequestHandler):
def get(self):
logging.info("Dummy: " )
self.redirect("/")
app = webapp2.WSGIApplication(
[('/', MainPage),
('/badroute', Dummy)],debug=True)
(I may be referring to Instances when I should be referring to Requests.)
So whats the objective? you want users that visit /badroute to be redirected to some /goodroute ? or you want /badroute to not hit GAE and incur cost?
Putting a google cloud load balancer in front could help.
For the first case you could setup a redirect rule (although you can do this directly within App Engine too, like you did in your code example).
If you just want it to not hit app engine you could setup google cloud load balancer to have the /badroute route to some file in a GCS bucket instead of your GAE service
https://cloud.google.com/load-balancing/docs/https/ext-load-balancer-backend-buckets
However you wouldnt be able to use your *.appsot.com base url. You'd get a static IP which you should then map a custom domain to it
DISCLAIMER: I'm not 100% sure if this would work.
Create a new service dummy.
Create and deploy a dispatch.yaml (GAE Standard // GAE Flex)
Add the links you want to block to the dispatch.yaml and point them to the dummy service.
Set up the Identity Aware Proxy (IAP) and enable it for the dummy service.
???
Profit
The idea is that the IAP will block the requests before they hit the dummy service. Since the requests never actually get forwarded to the service dummy you will not have an instance start. The bots will get a nice 403 page from Google's own infrastructure instead.
EDIT: Be sure to create the dummy service with 0 instances as the idea is to not have it cost money.
EDIT2:
So let me expand a bit on this answer.
You can have multiple GAE services running within one GCP project. Each service is it's own app. You can have one service running a python Flask app and another running a Java Springboot app. You can have each be either GAE Standard or GAE Flex. See this doc.
Normally all traffic gets routed to the default service. Using dispatch.yaml you can make request to certain endpoints go to a specific service.
If you create the dummy service as a GAE Standard app, and you don't actually need it to do anything, you can then route all the endpoints that get abused to this dummy service using the dispatch.yaml. Using GAE Standard you can have the service use 0 instances (and 0 costs).
Using the IAP you can then make sure only your own Google account can access this app (which you won't do). In effect this means that the abusers cannot really access the service, as the IAP blocks it before hitting the service, as you've set it up so only your Google account can access it.
Note, the dispatch.yaml is separate from any services, it's one of the per-project configuration files for GAE. It's not tied to a specific service.
As stated, the dummy app doesn't actually need to do anything, but you need to deploy it once though, as this basically creates the service.
Consider using cloudflare to mitigate bot abuse, customize firewall rules regarding route access, rate limit ips, etc. This can be combined with Google cloud load balancer, if you’d like—as mentioned in https://stackoverflow.com/a/69165767/806876.
References
Cloudflare GCP integration: https://www.cloudflare.com/integrations/google-cloud/
There is a little information I did not provide in my question about my app.yaml:
handlers:
- url: /.*
script: mainapp.app
By simply removing .* from the url specification, no Instance start is created. The user gets Error: Not Found, though. So that satisfies my needs.
Edo Akse's Answer pushed me to this answer by reading here, so I am accepting his answer. I am still not clear how to implement Edo's Answer, though.
I use an API to synchronise a lot of data to my app every night. The API itself uses a callback system, so I send a request to their API (including a webhook URL) and when the data is ready they will send a notification to my webhook to tell me to query the API again for the data.
The problem is, these callbacks flood in at a high rate (thousands per minute) to my webhook, which puts an awful lot of strain on my Flask web app (hosted on a Heroku dyno) and causes errors for end users. The webhook itself has been reduced down to forwarding the message on to a RabbitMQ queue (running on separate dynos), which then works through them progressively at its own pace. Unfortunately, this does not seem to be enough.
Is there something else I can do to manage this load? Is it possible to run a particular URL (or set of URLs) on a separate dyno from the public facing parts of the app? i.e. having two web dynos?
Thanks
you can deploy you application with SAME code on more than one dyno using free tier. For example, you application is called rob1 and hosted at https://rob1.herokuapp.com, and source code accessible from https://git.heroku.com/rob1.git. You can create application rob2, accessible from https://rob2.herokuapp.com and with source code hosted at https://git.heroku.com/rob2.git
Than, you can push code to 2nd application.
$ cd projects/bob1
$ git remote add heroku2 https://git.heroku.com/rob2.git
$ git push heroku2 master
As result, you have single repo on your machine, and 2 identical heroku applications running code of your project. Probably, you'll need to copy environment parameters of 1st app to 2nd one.
But anyway, you'll have 2 identical apps on free tier.
Later, if you have obtained domain name, for example robsapp.example.org, you can make it have to CNAME DNS records pointing to your heroku apps to make load balancing like this
rob1.herokuapp.com
rob2.herokuapp.com
as result, you have you application webhooks available on robsapp.example.org and it automatically load balance requests between rob1 and rob2 apps
I'm using the Python WSGI framework Falcon to make an app backend, and using Beaker to handle session management. In production, we're going to using Gunicorn in AWS.
There's something I've been unable to understand:
Gunicorn will run several workers, so does that mean the environment variables persist for different clients that have made requests? To put it another way, is a beaker session for one client only, or will it be available to several clients making requests in the same Gunicorn worker instance?
This is how I understand sessions to work from my reading:
A person logs into my app, and the user_id is added to the session with Beaker. Future requests from the same client will have this user_id stored in the session dict. Now, any future requests from that client will be able to access the variables stored in the session. Each client has its own session data.
Have I understood this properly?
The current method is to return an id to the client (on successful login) to pass to the backend when more user information is required.
Have I understood this properly?
yes, you do, for most part.
Gunicorn will run several workers, so does that mean the environment
variables persist for different clients that have made requests?
To put it another way, is a beaker session for one client only, or
will it be available to several clients making requests in the same
Gunicorn worker instance?
beaker save session data on server side, in a dedicated datastore identified by a unique session id, client side would send back session id via cookie, then server (gunicorn worker) could retrieve session data.
I recommend read a more detailed explanation on how session works, like this one: http://machinesaredigging.com/2013/10/29/how-does-a-web-session-work/
I have two development servers written with Python/Django - one API server(it's not solely an API server; it has UI and etc.), and another one is a demo app used to serve data by communicating to the API server. I invoke the demo app with iframe in the API server. After successfully getting response from the demo app, the original user session of the API server is lost(supposed to have two sessions -- one from the user of the API server, one from communication between the demo app and the API server).
Any idea what happened?
If you are running both on the same server, the session cookie might be overwritten since they both expect a sessionid cookie. If a sessionid doesn't exist a new one is generated, so when you access the outer app, you get a sessionid cookie, and that gets passed to the iframe app which doesn't recognize it and generates a new one. Try giving each app it's own unique SESSION_COOKIE_NAME
I am new to Python and Google App Engine. I have installed an existing Python application on localhost, and it is running fine for the static pages. But, when I try to open a page which is fetching data, it is showing an error message:
BadRequestError: app "dev~sample-app" cannot access app "dev~template-builder"'s data
template-builder is the name of my online application. I think there is some problem with accessing the Google App Engine data on localhost. What should I do to get this to work?
Since you are just getting started I assume you don't care much about what is in your local datastore. Therefore, when starting your app, pass the --clear_datastore to dev_appserver.py.
What is happening? As daemonfire300 said, you are having conflicting application IDs here. The app you are trying to run has the ID "sample-app". Your datastore holds data for "template-builder". The easiest way to deal with it is clearing the datastore (as described above).
If you indeed want to keep both data, pass --default_partition=dev~sample-app to dev_appserver.py (or the other way around, depending on which app ID you want to use).