I have an arquitecture where a Cloud Function gets trigered whenever a file gets uploaded to a bucket and send the task to an API built on Flask and deployed on App Engine.
I want to make this process internal so that only the Cloud Function can access the App Engine endpoints, but I am struggling with the process.
As these two services are serverless, I can't just filter the traffic in the App Engine firewall since the Cloud Function will have a different IP each time a new instance is created.
I have tried to follow this guide, in which they recommend to associate all the function egress traffic to a Serverless VPC Connector asigned to a subnet, and then control all the traffic of that subnet with a NAT, assigning it a static IP address. This way I could filter on my App Engine firewall by the NAT IP, which will always be the same.
After following all the steps, I am still not able to filter the traffic. With this configuration done, if I open the traffic to everyone and print the IP routes given by the App Engine header X-Forwarded-For when I send a simple GET request from the Cloud Function, it returns the following 0.0.0.0, 169.254.1.1 (it is a list since this header records the clint IP and the proxies involved in the route). The static IP address assigned to my NAT is 34.78.XX.XX, so it seems that the function is not using the NAT to redirect the traffic.
I have read somewhere that when the destiny IP is hosted on Google Cloud, the traffic will not go through the NAT gateway, so maybe this solution won't work on my usecase.
Any idea what am I doing wrong, or if there exist any alternatives for making the process private?
There are 2 solutions to solve this problem. And the choice depends on what you believe in!
Network based solution
If you want to keep your App Engine internal only, I means at network point of view, you can set the |ingress control to internal-only to accept only traffic coming from the VPC of your project
From there, you need to deploy your Cloud Functions, with a VPC connector (to route the traffic to the VPC) and set the egress control to All to route the traffic, public and private, to the VPC.
Indeed, even is you set your App Engine in ingress internal mode, the service is still publicly accessible, but there is an additional check on the request origin to be sure that it comes from the project VPCs. Therefore, when you call App Engine with your Cloud Functions, you call a public endpoint, and you need to route the public traffic to your VPC for being accept on App Engine internal only ingress.
This solution works only with VPC on the project. Cross project set up is impossible
Identity based solution
Google says: Don't trust the network
So, based on that, Google trust the identity of the traffic and request. You can keep your service private (not accessible by anyone except authorized access) only by controlling the authentication of the connection.
For that, you need to activate IAP on your App Engine service and to authorize only the service account of your Cloud Functions.
Then, in your cloud functions, you need to generate an identity token manually and to add it in the header of your request.
Be careful, there is a trap here. The audience is the IAP Client ID (that you can find in the APIs & Services -> Credential page)
Only the valid requests, checked by IAP, will trigger your App Engine service. In case of attacks, IAP will absorb the bad traffic, not App Engine.
So now, what do you trust?
Related
My GCP app has been abused by some users. To stop their usage I have attempted to eliminate features that can be abused, and have employed firewall rules to block certain users. But bad users continue to try to access my app via certain legacy URLs such as myapp.appspot.com/badroute. Of course, I still want users to use the default URL myapp.appspot.com .
I have altered my code in the following manner, but I am still getting Instances to start from them, and I do not want Instances in such cases. What can I do differently to avoid the bad Instances starting OR is there anything I can do to force such Instances to stop quickly instead of after about 15 minutes?
class Dummy(webapp2.RequestHandler):
def get(self):
logging.info("Dummy: " )
self.redirect("/")
app = webapp2.WSGIApplication(
[('/', MainPage),
('/badroute', Dummy)],debug=True)
(I may be referring to Instances when I should be referring to Requests.)
So whats the objective? you want users that visit /badroute to be redirected to some /goodroute ? or you want /badroute to not hit GAE and incur cost?
Putting a google cloud load balancer in front could help.
For the first case you could setup a redirect rule (although you can do this directly within App Engine too, like you did in your code example).
If you just want it to not hit app engine you could setup google cloud load balancer to have the /badroute route to some file in a GCS bucket instead of your GAE service
https://cloud.google.com/load-balancing/docs/https/ext-load-balancer-backend-buckets
However you wouldnt be able to use your *.appsot.com base url. You'd get a static IP which you should then map a custom domain to it
DISCLAIMER: I'm not 100% sure if this would work.
Create a new service dummy.
Create and deploy a dispatch.yaml (GAE Standard // GAE Flex)
Add the links you want to block to the dispatch.yaml and point them to the dummy service.
Set up the Identity Aware Proxy (IAP) and enable it for the dummy service.
???
Profit
The idea is that the IAP will block the requests before they hit the dummy service. Since the requests never actually get forwarded to the service dummy you will not have an instance start. The bots will get a nice 403 page from Google's own infrastructure instead.
EDIT: Be sure to create the dummy service with 0 instances as the idea is to not have it cost money.
EDIT2:
So let me expand a bit on this answer.
You can have multiple GAE services running within one GCP project. Each service is it's own app. You can have one service running a python Flask app and another running a Java Springboot app. You can have each be either GAE Standard or GAE Flex. See this doc.
Normally all traffic gets routed to the default service. Using dispatch.yaml you can make request to certain endpoints go to a specific service.
If you create the dummy service as a GAE Standard app, and you don't actually need it to do anything, you can then route all the endpoints that get abused to this dummy service using the dispatch.yaml. Using GAE Standard you can have the service use 0 instances (and 0 costs).
Using the IAP you can then make sure only your own Google account can access this app (which you won't do). In effect this means that the abusers cannot really access the service, as the IAP blocks it before hitting the service, as you've set it up so only your Google account can access it.
Note, the dispatch.yaml is separate from any services, it's one of the per-project configuration files for GAE. It's not tied to a specific service.
As stated, the dummy app doesn't actually need to do anything, but you need to deploy it once though, as this basically creates the service.
Consider using cloudflare to mitigate bot abuse, customize firewall rules regarding route access, rate limit ips, etc. This can be combined with Google cloud load balancer, if you’d like—as mentioned in https://stackoverflow.com/a/69165767/806876.
References
Cloudflare GCP integration: https://www.cloudflare.com/integrations/google-cloud/
There is a little information I did not provide in my question about my app.yaml:
handlers:
- url: /.*
script: mainapp.app
By simply removing .* from the url specification, no Instance start is created. The user gets Error: Not Found, though. So that satisfies my needs.
Edo Akse's Answer pushed me to this answer by reading here, so I am accepting his answer. I am still not clear how to implement Edo's Answer, though.
There is an issue of getting real user request IP address inside web application that is running on Cloud Run service. By some reason the web application obtains the same IP address for all users requests - 169.254.8.129 . I'm assuming it's a load balancer in front of cloud run service overrides requests IPs with his own.
I have double checked already this issue with different apps on Flask, FastApi and ASP.NET Core in Cloud Run. All apps returning the same results and all having the same issue.
But, when I am checking those apps on VM and everything works fine there.
How can I get the user's IP-Address in my Cloud-Run Flask app?
I have found some part of the answer, but still cannot handle the same for FastApi.
The address 169.254.8.129 is the address of the proxy sitting in front of your Cloud Run service.
You can extract the list of IP addresses from the HTTP header X-Forwarded-For. This list usually includes the client and each proxy or load balancer in between the client and your application.
X-Forwarded-For
I am testing out a very basic Pub/Sub subscription. I have the push endpoint set to an App I have deployed through a Python Flex service in App Engine. The service is in a project with Identity-Aware Proxy enabled. The IAP is configured to allow through users authenticated with our domain.
I do not see any of the push requests being processed by my app.
I turned off the IAP protection and then I see that the requests are processed. I turn it back on and they are no longer processed.
I had similar issues with IAP when trying to get a Cron service running; that issue resolved itself after I deployed a new test app in the same project.
Has anyone had success with configuring a push subscription through IAP? I also experimented with putting different service accounts on the IAP access list and none of them worked.
I'm not aware of a way to get Pub/Sub push subscriptions + Flex + IAP working. I wonder... it might work if the subscriber is on Standard.
Some other potential workarounds:
- Switch to a Pull subscriber.
- Set up a Cloud Functions function as your Pub/Sub subscriber -- https://cloud.google.com/functions/docs/writing/background -- and then in that function pass the request on to the GAE app, using https://cloud.google.com/iap/docs/authentication-howto to authenticate as a service account.
Sorry, I wish I had a better answer for you, but AFAIK those are the options that work today.
--Matthew, IAP engineering lead
I had a pretty similar issue - a GAE 2nd G standard application in project A, which is wired under IAP, that cannot receive the pushed pub/sub message from project B.
My workaround is:
Setup Cloud Function (HTTP triggered) in project A;
Setup the subscription of project B Pub/Sub topic to push the message to above Cloud Function endpoint;
The above Cloud Function works like a proxy to filter (needed based on my case, ymmv) and forwards the Pub/Sub message in a http request to the GAE app;
Since the Cloud Function is within same project with the GAE app, there is only needed to add the IAP authentication for above http request (which fetches the token assigned from the specific SA).
There should be a project A's SA setup in Project B IAM, which may have at least Pub/Sub Subscriber and Pub/Sub Viewer roles.
Hope this could be an option for your case.
Is there a way to specify a proxy server when using urlfetch on Google App Engine?
Specifically, every time I make a call using urlfetch, I want GAE to go through a proxy server. I want to do this on production, not just dev.
I want to use a proxy because there are problems with using google's outbound IP addresses (rate limiting, no static outbound IP, sometimes blacklisted, etc.). Setting a proxy is normally easy if you can edit the http message itself, but GAE's API does not appear to let you do this.
You can always roll your own:
In case of fixed destination: just setup a fixed port forwarding on a proxy server. Then send requests from GAE to proxy. If you have multiple destinations, then set forwarding on separate ports, one for each destination.
In case of dynamic destination (too much to handle via fixed port forwarding), your GAE app adds a custom http header (X-Something) containing final destination and then connects to custom proxy. Custom proxy inspects this field and forwards the request to the destination.
We ran into this issue and reached out to Google Cloud support. They suggested we use Google App Engine flexible with some app.yaml settings, custom network, and an ip-forwarding NAT gateway instance.
This didn't work for us because many core features from App Engine Standard are not implemented in App Engine Flexible. In essence we would need to rewrite our product.
So, to make applicable URL fetch requests appear to have a static IP we made a custom proxy: https://github.com/csgactuarial/app-engine-proxy
For redundancy reasons, I suggest implementing this as a multi region, multi zone, load balanced system.
I need to get an App Engine app talking to and sharing data with an external database,
The best option i can come up with is outputting the external database data to an xml file and then processing this in my app engine app and storing it inside the datastore,
although the data being shared is sensitive data such as login details so outputting this to an xml file is not exactly a great idea, is it possible for the app engine app to directly query the database? or is there a secure option for using xml files?
oh and im using python/django and the external database will be hosted on another domain
Google Apps' Secure Data Connector (SDC) is designed for this kind of tasks -- indeed, it even works when the "other database" lives behind a firewall (a common case for enterprise data), and for other Google Apps (Docs, Spreadsheets, ...) as well as App Engine.
As the docs summarize things, the flow is:
Google Apps forwards authorized data
requests from users who are within
the Google Apps domain to the Google
tunnel protocol servers.
The tunnel servers validate that a
user is authorized to make the
request to the specified resource.
Google tunnel servers are connected
by an encrypted tunnel to SDC, which
runs within a company's internal
network.
The tunnel protocol allows SDC to
connect to a Google tunnel server,
authenticate, and encrypt the data
that flows across the Internet.
SDC uses resource rules to validate
if a user is authorized to make a
request to a specified resource.
An optional intranet firewall can be
used to provide extra network
security.
SDC performs a network request to
the specified resource or services.
The service validates the signed
request, checks the credentials, and
if the user is authorized, returns
the data.
If you don't have to worry about firewalls, and have no security worries whatsoever, you can simplify things (as Daniel's answer suggests) by just using urlfetch directly (no tunnels, no validation, no encryption, no filtering, ...) -- but your worry about "the data being shared is sensitive data such as login details" suggests that this is not the case.
It's not a problem of XML vs other formats -- the problem is that sensitive data should not travel "in clear" over unprotected channels, nor be made available to all and sundry, and it's often nicer to have specialized infrastructure deal with encryption, filtering, and authorization problems, as the SDC does, rather than having to code all of this (and make it totally secure and locked-down) in your own app or specialized infrastructure middleware. For these purposes, the SDC can be very helpful, even if you only need a fraction of its functionality.
You may want to consider exposing a set of web services on the external domain where your database is hosted, and then use the App Engine's URL Fetch API to communicate with your external domain via HTTPS.