Does my default app have to be deployed to appspot.com?

Does my default app have to be deployed to appspot.com? - python

I asked a question about the default app as it related to microservices on app engine and got a great response here, but I have another related question.
Does my default app have to be accessible via appspot.com? When I run the deploy command that's where it puts it, but I'd rather have it not accessible like that. I really just want a semi-empty (like hello world sized) app that satisfies the default app requirement.
It does seem like Google is shoehorning multi-app/microservices into an environment that was originally setup to only serve a single web facing app backed by other modules. It seems very ungraceful and hacky.

You can customize your app to perform differently based on the URL that was used.
For example, you can use domain specific routes with webapp2 or you can check the domain in your handler by checking the value of self.request.url and responding accordingly.
You could for example, have myapp.appspot.com return a 404 but have www.mydomain.com provide content to users.

It depends what you mean by "accessible".
Yes, the app will have a presence on appspot.com, in the sense that requests can make it to some instance of some version of some service inside your app, based on the Routing via URL rules, the most generic ones being:
Sends a request to the named service, version, and instance:
https://instance-dot-version-dot-service-dot-app-id.appspot.com
http://instance.version.service.my-custom-domain.com
Also, from Default service:
The default service is defined by explicitly giving a service the name
"default," or by not including the name parameter in the service's
config file. Requests that specify no service or an invalid service
are routed to the default service. You can designate a default version
for a service, when appropriate, in the Google Cloud Platform Console
versions tab.
But what your app code responds to such requests is really up to you. Nothing stops, for example, your default service handler from simply returning a 404 or your "Hello world" page, for example, if you don't want it to do anything else. As if it wouldn't be there. Yet it still serves the role of the default service.

Related

Eliminating nuisance Instance starts

My GCP app has been abused by some users. To stop their usage I have attempted to eliminate features that can be abused, and have employed firewall rules to block certain users. But bad users continue to try to access my app via certain legacy URLs such as myapp.appspot.com/badroute. Of course, I still want users to use the default URL myapp.appspot.com .
I have altered my code in the following manner, but I am still getting Instances to start from them, and I do not want Instances in such cases. What can I do differently to avoid the bad Instances starting OR is there anything I can do to force such Instances to stop quickly instead of after about 15 minutes?
class Dummy(webapp2.RequestHandler):
def get(self):
logging.info("Dummy: " )
self.redirect("/")
app = webapp2.WSGIApplication(
[('/', MainPage),
('/badroute', Dummy)],debug=True)
(I may be referring to Instances when I should be referring to Requests.)

So whats the objective? you want users that visit /badroute to be redirected to some /goodroute ? or you want /badroute to not hit GAE and incur cost?
Putting a google cloud load balancer in front could help.
For the first case you could setup a redirect rule (although you can do this directly within App Engine too, like you did in your code example).
If you just want it to not hit app engine you could setup google cloud load balancer to have the /badroute route to some file in a GCS bucket instead of your GAE service
https://cloud.google.com/load-balancing/docs/https/ext-load-balancer-backend-buckets
However you wouldnt be able to use your *.appsot.com base url. You'd get a static IP which you should then map a custom domain to it

DISCLAIMER: I'm not 100% sure if this would work.
Create a new service dummy.
Create and deploy a dispatch.yaml (GAE Standard // GAE Flex)
Add the links you want to block to the dispatch.yaml and point them to the dummy service.
Set up the Identity Aware Proxy (IAP) and enable it for the dummy service.
???
Profit
The idea is that the IAP will block the requests before they hit the dummy service. Since the requests never actually get forwarded to the service dummy you will not have an instance start. The bots will get a nice 403 page from Google's own infrastructure instead.
EDIT: Be sure to create the dummy service with 0 instances as the idea is to not have it cost money.
EDIT2:
So let me expand a bit on this answer.
You can have multiple GAE services running within one GCP project. Each service is it's own app. You can have one service running a python Flask app and another running a Java Springboot app. You can have each be either GAE Standard or GAE Flex. See this doc.
Normally all traffic gets routed to the default service. Using dispatch.yaml you can make request to certain endpoints go to a specific service.
If you create the dummy service as a GAE Standard app, and you don't actually need it to do anything, you can then route all the endpoints that get abused to this dummy service using the dispatch.yaml. Using GAE Standard you can have the service use 0 instances (and 0 costs).
Using the IAP you can then make sure only your own Google account can access this app (which you won't do). In effect this means that the abusers cannot really access the service, as the IAP blocks it before hitting the service, as you've set it up so only your Google account can access it.
Note, the dispatch.yaml is separate from any services, it's one of the per-project configuration files for GAE. It's not tied to a specific service.
As stated, the dummy app doesn't actually need to do anything, but you need to deploy it once though, as this basically creates the service.

Consider using cloudflare to mitigate bot abuse, customize firewall rules regarding route access, rate limit ips, etc. This can be combined with Google cloud load balancer, if you’d like—as mentioned in https://stackoverflow.com/a/69165767/806876.
References
Cloudflare GCP integration: https://www.cloudflare.com/integrations/google-cloud/

There is a little information I did not provide in my question about my app.yaml:
handlers:
- url: /.*
script: mainapp.app
By simply removing .* from the url specification, no Instance start is created. The user gets Error: Not Found, though. So that satisfies my needs.
Edo Akse's Answer pushed me to this answer by reading here, so I am accepting his answer. I am still not clear how to implement Edo's Answer, though.

How to secure my Azure WebApp with the built-in authentication mechanism

I created a Flask-Webservice with Python that runs independently inside a docker container. I then uploaded the docker image to an Azure Container Registry. From there I can create a WebService (for Containers) with some few clicks in the Azure Portal, that runs this container. So far so good. It behaves just as I want it to.
But of course I don't want anyone to access the service. So I need some kind if authentication. Luckily (or so I thought) there is a built-in authentication-mechanism (I think it is based on OAuth ... I am not that well versed in security issues). Its documentation is a bit sparse on what actually happens and also concentrates on solutions in C#.
I first created a project with Google as described here and then configured the WebApp-Authentication with the Client-Id and Secret. I of course gave Google a java script source and callback-url, too.
When I now log off my Google account and try a GET-Request to my Webservice in the Browser (the GET should just return a "hello world"-String), I am greeted with a Login Screen ... just as I expected.
When I now login to Google again, I am redirected to the callback-url in the browser with some kind of information in the parameters.
a token perhaps? It looks something like this:
https://myapp.azurewebsites.net/.auth/login/google/callback?state=redirxxx&code=xxx&authuser=xxx&session_state=xxx&prompt=xxx).
Here something goes wrong, because an error appears.
An error occurred.
Sorry, the page you are looking for is currently unavailable.
Please try again later.
If you are the system administrator of this resource then you should check the error log for details.
Faithfully yours, nginx.
As far as I now, nginx is a server software that hosts my code. I can imagine that it also should handle the authentication process. It obviously lets all requests through to my code when authentication is turned off, but blocks un-authenticated accesses otherwise and redirects to the google login. Google then checks if your account is authorized for the application and redirects you to the callback with the access token along with it. This then returns a cookie which should grant my browser access to the app. (I am just reproducing the documentation here).
So my question is: What goes wrong. Does my Browser not accept the cookie. Did I something wrong when configuring Google+ or the Authentication in the WebApp. Do I have to use a certain development stack to use the authentication. Is it not supported for any of the technologies I use (Python, Flask...).
EDIT
#miknik:
In Microsofts documentation of the authentication/authorization it says
The authentication and authorization module runs in the same sandbox
as your application code. When it's enabled, every incoming HTTP
request passes through it before being handled by your application
code.
...
The module runs separately from your application code and is
configured using app settings. No SDKs, specific languages, or changes
to your application code are required.
So while you are probably right that the information in the callback-redirect is the authorization grant/code and that after that this code should now be used to get an access token from Google, I don't quite understand how this would work in my situation.
As far as I can see it Microsofts WebApp for Container-Resource on Azure should take care of getting the token automatically and return it as part of the response to the callback-request. The documentation states 4 steps:
Sign user in: Redirects client to /.auth/login/.
Post-authentication: Provider redirects client to /.auth/login//callback.
Establish authenticated session: App Service adds authenticated cookie to response.
Serve authenticated content: Client includes authentication cookie in subsequent requests (automatically handled by browser).
It seems to me that step 2 fails and that that would be exactly what you wrote: that the authorization grant is to be used by the server to get the access token but isn't.
But I also don't have any control over that. Perhaps someone could clear things up by correcting me on some other things:
First I can't quite figure out which parts of my problem represent which role in the OAuth-scheme.
I think I am the Owner, and by adding users to the list in the Google+-Project I authorize them to use my service.
Google is obviously the authorization server
my WebService (or better yet my WebApp for Containers) is the resource server
and finally an application or postman that does the requests is the Client
In the descriptions of OAuth I read the problematic step boils down to: the resource server gets the access token from the authorization server and passes it along to the client. And Azures WebApps Resource is prompted (and enabled) to do so by being called with the callback-url. Am I right somewhere in this?
Alas, I agree that I don't quite understand the whole protocol. But I find most descriptions on the net less than helpful because they are not specific to Azure. If anyone knows a good explanation, general or Azure-specific, please make a comment.

I found a way to make it work and I try to explain what went wrong as good as I can. Please correct me if I go wrong or use the wrong words.
As I suspected the problem wasn't so much that I didn't understand OAuth (or at least how Azure manages it) but the inner workings of the Azure WebApp Service (plus some bad programming on my part). Azure runs an own Server and is not using the built-in server of flask. The actual problem was that my flask-program didn't implement a WSGI-Interface. As I could gather this is another standard for python scripts to interact with any server. So while rudimentary calls from the server (I think Azure uses nginx) were possible, more elaborate calls, like the redirect to the callback url went to dev/null.
I build a new app following this tutorial and then secured it by following the authentication/authorization-tutorial and everything worked fine. The code in the tutorial implements WSGI and is probably more conform to what Azure expects. My docker solution was too simple.
My conclusion: read up on this WSGI-standard that flask always warned me about and I didn't listen and implement it in any code that goes beyond fiddeling around in development.

Google App Engine inter module communication authorization

In the Google Docs it says
You can configure any manual or basic scaling module to accept requests from other modules in your app by restricting its handler to only allow administrator accounts, specifying login: admin for the appropriate handler in the module's configuration file. With this restriction in place, any URLFetch from any other module in the app will be automatically authenticated by App Engine, and any request that is not from the application will be rejected.
so i did that, but unfortunately it does not work. I am requesting a url from module A on module B which is protected by the login: admin property
I can fetch that url in the browser which shows me the login page and after i continue as admin i can fetch my route.
How is it supposed to work? As far as i understand it should add a header to the request which includes some kind of authorization token.
If i fetch that same url within a request on module A i get the same redirect. urllib2 follows the 302 status code by default and the result is the login page.
I am running the environment using the gcloud preview app run command. Module A is a default module and module B is a Managed VM Container, might this be the problem here?

I can confirm this is occurring, and I've reproduced the issue. The issue is being tracked over in the App Engine public issue tracker. Follow there for any updates.
For now, I think it's much better to be manually-inspecting the X-Appengine-Inbound-Appid header, as this is managed by the infrastructure and can't be spoofed.
You could also implement OAuth, but that adds overhead you may not want or need on a small app.

GAE: Can't Use Google Server Side API's (Whitelisting Issue)

To use Google API's, after activating them from the Google Developers Console, one needs to generate credentials. In my case, I have a backend that is supposed to consume the API server side. For this purpose, there is an option to generate what the Google page calls "Key for server applications". So far so good.
The problem is that in order to generate the key, one has to mention IP addresses of servers that would be whitelisted. But GAE has no static IP address that I could use there.
There is an option to manually get the IP's by executing:
dig -t TXT _netblocks.google.com #ns1.google.com
However there is no guarantee that the list is static (further more, it is known to change from time to time), and there is no programatic way I could automate the use of adding IP's that I get from dig into the Google Developers Console.
This leaves me with two choices:
Forget about GAE for this project, ironically, GAE cannot be used as a backend for Google API's (better use Amazon or some other solution for that). or
Program something like a watchdog over the output of the dig command that would notify me if there's a change, and then I would manually update the whitelist (no way I am going to do this - too dangerous), or allow all IP's to use the Google API granted it has my API key. Not the most secure solution but it works.
Is there any other workaround? Can it be that GAE does not support consuming Google API's server side?

You can use App Identity to access Google's API from AppEngine. See: https://developers.google.com/appengine/docs/python/appidentity/. If you setup your app using the cloud console, it should have already added your app's identity with permission to your project, but you can always check that out. From the "Permissions" Tab in cloud console for your project, make sure your service account is added under "Service Accounts" (in the form of your_app_id#appspot.gserviceaccount.com)
Furthermore, if you use something like the JSON API Libs available for python, you can use the bundled oauth2 library to do all of this for you using AppAssertionCredentials to authorize the API you wish to use. See: https://developers.google.com/api-client-library/python/guide/google_app_engine#ServiceAccounts

Yes, you should use App Identity. Forget about getting an IP or giving up on GAE :-) Here is an example of how to use Big Query, for example, inside a GAE application:
static {
// initializes Big Query
JsonFactory jsonFactory = new JacksonFactory();
HttpTransport httpTransport = new UrlFetchTransport();
AppIdentityCredential credential = new AppIdentityCredential(Arrays.asList(Constants.BIGQUERY_SCOPE));
bigquery = new Bigquery.Builder(httpTransport, jsonFactory, credential)
.setApplicationName(Constants.APPLICATION_NAME).setHttpRequestInitializer(credential)
.setBigqueryRequestInitializer(new BigqueryRequestInitializer(Constants.API_KEY)).build();
}

Heroku router logging format

I've recently deployed a Flask app on Heroku. It provides an API on top of an existing API and requires a confidential API key for the original service from the user. The app is really just a few forms, the values of which are passed with ajax to a specific URL on the server. Nothing fancy. I take steps to not store confidential information in the app and want no traces of it anywhere within the app.
Looking at the logs from heroku logs --source heroku, the heroku router process stores all HTTP requests for the app, including those requests that include the confidential information.
Is there a way to specify the log format for the heroku process so as to not store the URL served?

As other commenters mentioned, it is a bad practice to put confidential info in a URL. These could get cached or logged by a number of systems (e.g. routers, proxy servers, caches) on the roundtrip to the server. There are a couple ways to solve this:
Put them in a the Authorization header. This is probably the most common way authentication is handled for REST-based APIs.
Put them in the POST body. This works to get it out of the URL, but is a little weird semantically to say that your are POSTing the credentials to some resource (if this a REST API), unless it is a login call.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.