Eliminating nuisance Instance starts

Eliminating nuisance Instance starts - python

My GCP app has been abused by some users. To stop their usage I have attempted to eliminate features that can be abused, and have employed firewall rules to block certain users. But bad users continue to try to access my app via certain legacy URLs such as myapp.appspot.com/badroute. Of course, I still want users to use the default URL myapp.appspot.com .
I have altered my code in the following manner, but I am still getting Instances to start from them, and I do not want Instances in such cases. What can I do differently to avoid the bad Instances starting OR is there anything I can do to force such Instances to stop quickly instead of after about 15 minutes?
class Dummy(webapp2.RequestHandler):
def get(self):
logging.info("Dummy: " )
self.redirect("/")
app = webapp2.WSGIApplication(
[('/', MainPage),
('/badroute', Dummy)],debug=True)
(I may be referring to Instances when I should be referring to Requests.)

So whats the objective? you want users that visit /badroute to be redirected to some /goodroute ? or you want /badroute to not hit GAE and incur cost?
Putting a google cloud load balancer in front could help.
For the first case you could setup a redirect rule (although you can do this directly within App Engine too, like you did in your code example).
If you just want it to not hit app engine you could setup google cloud load balancer to have the /badroute route to some file in a GCS bucket instead of your GAE service
https://cloud.google.com/load-balancing/docs/https/ext-load-balancer-backend-buckets
However you wouldnt be able to use your *.appsot.com base url. You'd get a static IP which you should then map a custom domain to it

DISCLAIMER: I'm not 100% sure if this would work.
Create a new service dummy.
Create and deploy a dispatch.yaml (GAE Standard // GAE Flex)
Add the links you want to block to the dispatch.yaml and point them to the dummy service.
Set up the Identity Aware Proxy (IAP) and enable it for the dummy service.
???
Profit
The idea is that the IAP will block the requests before they hit the dummy service. Since the requests never actually get forwarded to the service dummy you will not have an instance start. The bots will get a nice 403 page from Google's own infrastructure instead.
EDIT: Be sure to create the dummy service with 0 instances as the idea is to not have it cost money.
EDIT2:
So let me expand a bit on this answer.
You can have multiple GAE services running within one GCP project. Each service is it's own app. You can have one service running a python Flask app and another running a Java Springboot app. You can have each be either GAE Standard or GAE Flex. See this doc.
Normally all traffic gets routed to the default service. Using dispatch.yaml you can make request to certain endpoints go to a specific service.
If you create the dummy service as a GAE Standard app, and you don't actually need it to do anything, you can then route all the endpoints that get abused to this dummy service using the dispatch.yaml. Using GAE Standard you can have the service use 0 instances (and 0 costs).
Using the IAP you can then make sure only your own Google account can access this app (which you won't do). In effect this means that the abusers cannot really access the service, as the IAP blocks it before hitting the service, as you've set it up so only your Google account can access it.
Note, the dispatch.yaml is separate from any services, it's one of the per-project configuration files for GAE. It's not tied to a specific service.
As stated, the dummy app doesn't actually need to do anything, but you need to deploy it once though, as this basically creates the service.

Consider using cloudflare to mitigate bot abuse, customize firewall rules regarding route access, rate limit ips, etc. This can be combined with Google cloud load balancer, if you’d like—as mentioned in https://stackoverflow.com/a/69165767/806876.
References
Cloudflare GCP integration: https://www.cloudflare.com/integrations/google-cloud/

There is a little information I did not provide in my question about my app.yaml:
handlers:
- url: /.*
script: mainapp.app
By simply removing .* from the url specification, no Instance start is created. The user gets Error: Not Found, though. So that satisfies my needs.
Edo Akse's Answer pushed me to this answer by reading here, so I am accepting his answer. I am still not clear how to implement Edo's Answer, though.

Related

How do I deploy this app for my job: EC2, Elastic Beanstalk, something else entirely?

I'm tasked with creating a web app (I think?) for my job that will tracker something in our system. It'll be an internal tool that staff uses to keep track of the status of one of the things we do. It should look like trello, with cards that drag from step to step. That frontend exists, but my job is to make the system update when the cards are dragged. This requires using an API in Python and isn't that complicated to grab from/update. I have no idea how to put all of this together. My job is almost completely nontechnical and there's no one internally who knows what I'm doing except for me. I'm in so over my head here and have no idea where to begin. Is this something I should deploy on Elastic Beanstalk? EC2? How do I tie this together and put it somewhere?

Are you trying to pull in live data from Trello or from your companies own internal project management tool?
An EC2 might be useful, but honestly, it may be completely unnecessary if your company has its own servers. An EC2 is basically just a collection of rental computers to help with scaling. I have never used beanstalk so my input would be useless there.
From what I can assume from the question, you could have a python script running to pull from the API and make the changes without an EC2.

First thing you should do is gather as much information about what the end product should look like. From your question, I have the feeling that you have only a vague idea of what the stakeholders want. Don't be afraid to ask more clarification about an unclear task. It's better to spend 30 minutes discussing and taking note than to show the end-product after a month and realizing that's not what your boss/team wanted.
Question I would Ask
Who is going to be using this app? (technical or non-technical person)
For what purpose is this being developed?
Does it need to be on the web or can it be used locally?
How many users need to have access to this application?
Are we handling sensitive information with this application?
Will this need to be augmented with other functionality at some point?
This is just a sample of what I would ask, during the conversation with the stakeholder a lot more will pop up for sure.
What I think you have to do
You need to make a monitoring system for the tasks that need to be done by your development team (like a Kanban)
What I think you already have
A frontend with the card that are draggable to each bin. I also assume that you can create a new card and delete one in the frontend. The frontend is most likely written in React, Angular or Vue.js. You might also have no frontend framework (a mix of jQuery and vanilla js), but usually frontend developper end up picking a framework of sort to help the development.
A backend API in Python (in Flask or with Django-rest-framework most likely) that is communicating with a SQL database like postgresql or a Document database like MongoDB.
I'm making a lot of assumption here, but your aim should be to understand the technology you will be working with in order to check which hosting would work best. For instance, if the database that is setup is a MySQL database you might have some trouble with some hosting provider.
What I think you are missing
Currently the frontend and the backend don't communicate to each other. When you drag a card it won't persist if you refresh the page. Also, all of this is sitting in your computer and cannot be used by any one from your staff. You need to first connect the frontend with the backend so that the application has persistance. Then you need to deploy this application somewhere so that it is reachable by your staff.
What I would do is first work locally to make sure that the layer of persistance is working. This imply having the API server, the frontend server and the database server running simultaneously on your computer to develop. You should then fetch data from the API to know which cards are there in the database and then create them visually in your frontend at the right spot.
When you drop a card to a new spot after having dragging it should trigger a POST request to your API server in order to update the status of this particular card (look at the documentation of your API to check what you need to send).
The server should be sending back an updated version of the cards status if the POST request was sucessful, so your application should then just redraw the card at the right spot (it won't make a difference for you since they are already at the right spot and your frontend framework will most likely won't act on this change since the state hasn't changed). That's all I would do for that part.
I would then move to the deployment phase to make sure that whatever you did locally can still work online. I would use Heroku to start instead of jumping directly to AWS. Heroku is a service built on top of AWS which manage a lot of the complexity of AWS for you. This is great for prototyping and it means that when your stuff is ready you can migrate to AWS easily and be confident that a setup exist to make your app work. You might also be tied up to your company servers, which is another thing I would ask to the stakeholder (i.e. where can I put this application and where I can't put it).
The flow for a frontend + api + database application on Heroku is usually as follow. You create a github repo for your frontend (make it private) and you create an app on Heroku that will watch this repository for changes. It will re-deploy the application for you when it sees a change at a specific subdomain of Heroku hosting. You will need to configure some procfiles that will tell Heroku what to do with a given application type. This is where you need to double check what frontend you are using since that might change the procfiles used. It's most likely a node.js based frontend (React, Angular or Vue) so head over here for the documentation of how to put that online.
You will need to make a repo for the backend also that is separate from the frontend, these two entities are distinct and they only communicate through HTTP request (frontend->backend) and JSON (backend->frontend). You will need to follow the same idea as with the frontend to deploy, head over here.
Once you have these two online, you need to create a database on Heroku. This is done by adding a datastore to your api, head over here. There are some framework specific configuration you need to do to make the API talk to an online database, but then you will need to find that configuration on the framework documentation. The database could also be already up and living on your server, if this is the case you just need to configure your online backend to talk to that particular database at a particular address.
Once all of the above is done, re-test your application to check if you get the same behavior as before. This is a usable MVP, however there are no layer of security. Anyone with the right URL could just fetch your frontend and start messing around with your data.
There is more engineering that need to be done to make this a viable end product. This leads us to my final remark: why you are not using a product like Trello, Jira, or even Github Project? If it is to save some money on not paying for a subscription I think you should factor in the cost of development, security and maintenance of this application.
Hope it helps!

One simple option is Heroku for deploy your API and your frontend application.

How to secure my Azure WebApp with the built-in authentication mechanism

I created a Flask-Webservice with Python that runs independently inside a docker container. I then uploaded the docker image to an Azure Container Registry. From there I can create a WebService (for Containers) with some few clicks in the Azure Portal, that runs this container. So far so good. It behaves just as I want it to.
But of course I don't want anyone to access the service. So I need some kind if authentication. Luckily (or so I thought) there is a built-in authentication-mechanism (I think it is based on OAuth ... I am not that well versed in security issues). Its documentation is a bit sparse on what actually happens and also concentrates on solutions in C#.
I first created a project with Google as described here and then configured the WebApp-Authentication with the Client-Id and Secret. I of course gave Google a java script source and callback-url, too.
When I now log off my Google account and try a GET-Request to my Webservice in the Browser (the GET should just return a "hello world"-String), I am greeted with a Login Screen ... just as I expected.
When I now login to Google again, I am redirected to the callback-url in the browser with some kind of information in the parameters.
a token perhaps? It looks something like this:
https://myapp.azurewebsites.net/.auth/login/google/callback?state=redirxxx&code=xxx&authuser=xxx&session_state=xxx&prompt=xxx).
Here something goes wrong, because an error appears.
An error occurred.
Sorry, the page you are looking for is currently unavailable.
Please try again later.
If you are the system administrator of this resource then you should check the error log for details.
Faithfully yours, nginx.
As far as I now, nginx is a server software that hosts my code. I can imagine that it also should handle the authentication process. It obviously lets all requests through to my code when authentication is turned off, but blocks un-authenticated accesses otherwise and redirects to the google login. Google then checks if your account is authorized for the application and redirects you to the callback with the access token along with it. This then returns a cookie which should grant my browser access to the app. (I am just reproducing the documentation here).
So my question is: What goes wrong. Does my Browser not accept the cookie. Did I something wrong when configuring Google+ or the Authentication in the WebApp. Do I have to use a certain development stack to use the authentication. Is it not supported for any of the technologies I use (Python, Flask...).
EDIT
#miknik:
In Microsofts documentation of the authentication/authorization it says
The authentication and authorization module runs in the same sandbox
as your application code. When it's enabled, every incoming HTTP
request passes through it before being handled by your application
code.
...
The module runs separately from your application code and is
configured using app settings. No SDKs, specific languages, or changes
to your application code are required.
So while you are probably right that the information in the callback-redirect is the authorization grant/code and that after that this code should now be used to get an access token from Google, I don't quite understand how this would work in my situation.
As far as I can see it Microsofts WebApp for Container-Resource on Azure should take care of getting the token automatically and return it as part of the response to the callback-request. The documentation states 4 steps:
Sign user in: Redirects client to /.auth/login/.
Post-authentication: Provider redirects client to /.auth/login//callback.
Establish authenticated session: App Service adds authenticated cookie to response.
Serve authenticated content: Client includes authentication cookie in subsequent requests (automatically handled by browser).
It seems to me that step 2 fails and that that would be exactly what you wrote: that the authorization grant is to be used by the server to get the access token but isn't.
But I also don't have any control over that. Perhaps someone could clear things up by correcting me on some other things:
First I can't quite figure out which parts of my problem represent which role in the OAuth-scheme.
I think I am the Owner, and by adding users to the list in the Google+-Project I authorize them to use my service.
Google is obviously the authorization server
my WebService (or better yet my WebApp for Containers) is the resource server
and finally an application or postman that does the requests is the Client
In the descriptions of OAuth I read the problematic step boils down to: the resource server gets the access token from the authorization server and passes it along to the client. And Azures WebApps Resource is prompted (and enabled) to do so by being called with the callback-url. Am I right somewhere in this?
Alas, I agree that I don't quite understand the whole protocol. But I find most descriptions on the net less than helpful because they are not specific to Azure. If anyone knows a good explanation, general or Azure-specific, please make a comment.

I found a way to make it work and I try to explain what went wrong as good as I can. Please correct me if I go wrong or use the wrong words.
As I suspected the problem wasn't so much that I didn't understand OAuth (or at least how Azure manages it) but the inner workings of the Azure WebApp Service (plus some bad programming on my part). Azure runs an own Server and is not using the built-in server of flask. The actual problem was that my flask-program didn't implement a WSGI-Interface. As I could gather this is another standard for python scripts to interact with any server. So while rudimentary calls from the server (I think Azure uses nginx) were possible, more elaborate calls, like the redirect to the callback url went to dev/null.
I build a new app following this tutorial and then secured it by following the authentication/authorization-tutorial and everything worked fine. The code in the tutorial implements WSGI and is probably more conform to what Azure expects. My docker solution was too simple.
My conclusion: read up on this WSGI-standard that flask always warned me about and I didn't listen and implement it in any code that goes beyond fiddeling around in development.

Does my default app have to be deployed to appspot.com?

I asked a question about the default app as it related to microservices on app engine and got a great response here, but I have another related question.
Does my default app have to be accessible via appspot.com? When I run the deploy command that's where it puts it, but I'd rather have it not accessible like that. I really just want a semi-empty (like hello world sized) app that satisfies the default app requirement.
It does seem like Google is shoehorning multi-app/microservices into an environment that was originally setup to only serve a single web facing app backed by other modules. It seems very ungraceful and hacky.

You can customize your app to perform differently based on the URL that was used.
For example, you can use domain specific routes with webapp2 or you can check the domain in your handler by checking the value of self.request.url and responding accordingly.
You could for example, have myapp.appspot.com return a 404 but have www.mydomain.com provide content to users.

It depends what you mean by "accessible".
Yes, the app will have a presence on appspot.com, in the sense that requests can make it to some instance of some version of some service inside your app, based on the Routing via URL rules, the most generic ones being:
Sends a request to the named service, version, and instance:
https://instance-dot-version-dot-service-dot-app-id.appspot.com
http://instance.version.service.my-custom-domain.com
Also, from Default service:
The default service is defined by explicitly giving a service the name
"default," or by not including the name parameter in the service's
config file. Requests that specify no service or an invalid service
are routed to the default service. You can designate a default version
for a service, when appropriate, in the Google Cloud Platform Console
versions tab.
But what your app code responds to such requests is really up to you. Nothing stops, for example, your default service handler from simply returning a 404 or your "Hello world" page, for example, if you don't want it to do anything else. As if it wouldn't be there. Yet it still serves the role of the default service.

GAE: Can't Use Google Server Side API's (Whitelisting Issue)

To use Google API's, after activating them from the Google Developers Console, one needs to generate credentials. In my case, I have a backend that is supposed to consume the API server side. For this purpose, there is an option to generate what the Google page calls "Key for server applications". So far so good.
The problem is that in order to generate the key, one has to mention IP addresses of servers that would be whitelisted. But GAE has no static IP address that I could use there.
There is an option to manually get the IP's by executing:
dig -t TXT _netblocks.google.com #ns1.google.com
However there is no guarantee that the list is static (further more, it is known to change from time to time), and there is no programatic way I could automate the use of adding IP's that I get from dig into the Google Developers Console.
This leaves me with two choices:
Forget about GAE for this project, ironically, GAE cannot be used as a backend for Google API's (better use Amazon or some other solution for that). or
Program something like a watchdog over the output of the dig command that would notify me if there's a change, and then I would manually update the whitelist (no way I am going to do this - too dangerous), or allow all IP's to use the Google API granted it has my API key. Not the most secure solution but it works.
Is there any other workaround? Can it be that GAE does not support consuming Google API's server side?

You can use App Identity to access Google's API from AppEngine. See: https://developers.google.com/appengine/docs/python/appidentity/. If you setup your app using the cloud console, it should have already added your app's identity with permission to your project, but you can always check that out. From the "Permissions" Tab in cloud console for your project, make sure your service account is added under "Service Accounts" (in the form of your_app_id#appspot.gserviceaccount.com)
Furthermore, if you use something like the JSON API Libs available for python, you can use the bundled oauth2 library to do all of this for you using AppAssertionCredentials to authorize the API you wish to use. See: https://developers.google.com/api-client-library/python/guide/google_app_engine#ServiceAccounts

Yes, you should use App Identity. Forget about getting an IP or giving up on GAE :-) Here is an example of how to use Big Query, for example, inside a GAE application:
static {
// initializes Big Query
JsonFactory jsonFactory = new JacksonFactory();
HttpTransport httpTransport = new UrlFetchTransport();
AppIdentityCredential credential = new AppIdentityCredential(Arrays.asList(Constants.BIGQUERY_SCOPE));
bigquery = new Bigquery.Builder(httpTransport, jsonFactory, credential)
.setApplicationName(Constants.APPLICATION_NAME).setHttpRequestInitializer(credential)
.setBigqueryRequestInitializer(new BigqueryRequestInitializer(Constants.API_KEY)).build();
}

Deploying an app to users' appspot

I am working on a Python App, which runs on App Engine. Is there a way I can publish the app on each customers' appSpot account, so that the App uses the users' cloud storage? Instead of running the App on my AppSpot account and all the users storing the data on my Cloud space?

Yes, absolutely.
You just need to have each client create an App Engine account with an application to which you have administrator access. You can adjust the settings on the application to forbid downloads of your code by the other administrators if that's appropriate for your agreement with the client. This also allows the clients to be billed directly for their instances' usage, and makes it completely impossible for data to leak between different clients' instances.
Using multiple applications for multiple clients who are licensing your application almost certainly does not violate part 4.4 of the TOS, although don't take this as legal advice.

No, you cannot do that. The app is hosted and run in the administrator's account which would be you. What you can do is, release the source code and point your users do install it in their appspot account, just like creating a new application.

I suppose it's not exactly what you need. But it can give you an idea where to go. Please check DryDrop project. There is small Python application you can ask each user to install on their account, then they can configure it to fetch your site files from your GitHub repo through webhooks functionality. I didn't try it, but, theoretically, you update your site, commit it to your repo, and all users get your updated application automatically. You can share your thoughts if that works for you.

Maybe. If it's an open source app that you're giving away, you can publish the source and instruct users to upload it to their own accounts.
If you're selling the app, displaying ads or otherwise trying to monetize the service, you probably want to stick with one instance. Using multiple instances to avoid paying for quota usage is direct violation of the App Engine TOS:
4.4. You may not develop multiple Applications to simulate or act as a
single Application or otherwise access
the Service in a manner intended to
avoid incurring fees.

No. Writing an application that deploys other applications is in violation of the terms of service.
Note we don't have any 'hard' limits - those limits that aren't billing enabled can be increased on application to us if you provide a reasonable use-case.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.