My callback webhook is overloaded - what can I do?

My callback webhook is overloaded - what can I do? - python

I use an API to synchronise a lot of data to my app every night. The API itself uses a callback system, so I send a request to their API (including a webhook URL) and when the data is ready they will send a notification to my webhook to tell me to query the API again for the data.
The problem is, these callbacks flood in at a high rate (thousands per minute) to my webhook, which puts an awful lot of strain on my Flask web app (hosted on a Heroku dyno) and causes errors for end users. The webhook itself has been reduced down to forwarding the message on to a RabbitMQ queue (running on separate dynos), which then works through them progressively at its own pace. Unfortunately, this does not seem to be enough.
Is there something else I can do to manage this load? Is it possible to run a particular URL (or set of URLs) on a separate dyno from the public facing parts of the app? i.e. having two web dynos?
Thanks

you can deploy you application with SAME code on more than one dyno using free tier. For example, you application is called rob1 and hosted at https://rob1.herokuapp.com, and source code accessible from https://git.heroku.com/rob1.git. You can create application rob2, accessible from https://rob2.herokuapp.com and with source code hosted at https://git.heroku.com/rob2.git
Than, you can push code to 2nd application.
$ cd projects/bob1
$ git remote add heroku2 https://git.heroku.com/rob2.git
$ git push heroku2 master
As result, you have single repo on your machine, and 2 identical heroku applications running code of your project. Probably, you'll need to copy environment parameters of 1st app to 2nd one.
But anyway, you'll have 2 identical apps on free tier.
Later, if you have obtained domain name, for example robsapp.example.org, you can make it have to CNAME DNS records pointing to your heroku apps to make load balancing like this
rob1.herokuapp.com
rob2.herokuapp.com
as result, you have you application webhooks available on robsapp.example.org and it automatically load balance requests between rob1 and rob2 apps

Related

Eliminating nuisance Instance starts

My GCP app has been abused by some users. To stop their usage I have attempted to eliminate features that can be abused, and have employed firewall rules to block certain users. But bad users continue to try to access my app via certain legacy URLs such as myapp.appspot.com/badroute. Of course, I still want users to use the default URL myapp.appspot.com .
I have altered my code in the following manner, but I am still getting Instances to start from them, and I do not want Instances in such cases. What can I do differently to avoid the bad Instances starting OR is there anything I can do to force such Instances to stop quickly instead of after about 15 minutes?
class Dummy(webapp2.RequestHandler):
def get(self):
logging.info("Dummy: " )
self.redirect("/")
app = webapp2.WSGIApplication(
[('/', MainPage),
('/badroute', Dummy)],debug=True)
(I may be referring to Instances when I should be referring to Requests.)

So whats the objective? you want users that visit /badroute to be redirected to some /goodroute ? or you want /badroute to not hit GAE and incur cost?
Putting a google cloud load balancer in front could help.
For the first case you could setup a redirect rule (although you can do this directly within App Engine too, like you did in your code example).
If you just want it to not hit app engine you could setup google cloud load balancer to have the /badroute route to some file in a GCS bucket instead of your GAE service
https://cloud.google.com/load-balancing/docs/https/ext-load-balancer-backend-buckets
However you wouldnt be able to use your *.appsot.com base url. You'd get a static IP which you should then map a custom domain to it

DISCLAIMER: I'm not 100% sure if this would work.
Create a new service dummy.
Create and deploy a dispatch.yaml (GAE Standard // GAE Flex)
Add the links you want to block to the dispatch.yaml and point them to the dummy service.
Set up the Identity Aware Proxy (IAP) and enable it for the dummy service.
???
Profit
The idea is that the IAP will block the requests before they hit the dummy service. Since the requests never actually get forwarded to the service dummy you will not have an instance start. The bots will get a nice 403 page from Google's own infrastructure instead.
EDIT: Be sure to create the dummy service with 0 instances as the idea is to not have it cost money.
EDIT2:
So let me expand a bit on this answer.
You can have multiple GAE services running within one GCP project. Each service is it's own app. You can have one service running a python Flask app and another running a Java Springboot app. You can have each be either GAE Standard or GAE Flex. See this doc.
Normally all traffic gets routed to the default service. Using dispatch.yaml you can make request to certain endpoints go to a specific service.
If you create the dummy service as a GAE Standard app, and you don't actually need it to do anything, you can then route all the endpoints that get abused to this dummy service using the dispatch.yaml. Using GAE Standard you can have the service use 0 instances (and 0 costs).
Using the IAP you can then make sure only your own Google account can access this app (which you won't do). In effect this means that the abusers cannot really access the service, as the IAP blocks it before hitting the service, as you've set it up so only your Google account can access it.
Note, the dispatch.yaml is separate from any services, it's one of the per-project configuration files for GAE. It's not tied to a specific service.
As stated, the dummy app doesn't actually need to do anything, but you need to deploy it once though, as this basically creates the service.

Consider using cloudflare to mitigate bot abuse, customize firewall rules regarding route access, rate limit ips, etc. This can be combined with Google cloud load balancer, if you’d like—as mentioned in https://stackoverflow.com/a/69165767/806876.
References
Cloudflare GCP integration: https://www.cloudflare.com/integrations/google-cloud/

There is a little information I did not provide in my question about my app.yaml:
handlers:
- url: /.*
script: mainapp.app
By simply removing .* from the url specification, no Instance start is created. The user gets Error: Not Found, though. So that satisfies my needs.
Edo Akse's Answer pushed me to this answer by reading here, so I am accepting his answer. I am still not clear how to implement Edo's Answer, though.

Running a python script saved in local machine from google sheets on a button press

I am trying to create Jira issues with data populated in a row in google sheet, I plan to put a button to read the contents of the row and create Jira issues, I have figured the Jira API wrote the script for it and also the Google sheets API to read the row values to put in the Jira API.
How do I link the button to the python script in my local machine in a simple manner, I went through other similar asks here, but they are quite old and hoping now some new way might be available.
Please help me achieve this in a simple way, any help is greatly appreciated.
Thank You and Stay Safe.

Google sheets cannot run code on your local machine. That means you have a few options:
Click the button locally
Instead of clicking a button on the google sheet, you can run the script yourself from the command line. This is probably how you tested it, and not what you want.
Watch the spreadsheet
You could have your python script setup to run every few minutes. This has the benefit of being very straightforward to setup (google for cron jobs), but does not have a button, and may be slower to update. Also, it stops working if you turn off your computer.
Make script available for remote execution
You can make it so that your script can be run remotely, but it requries extra work. You could buy a website domain, and point it towards your computer (using dynamic dns), and then make the google sheet request your new url. This is a lot of work, and costs real money. This is probably not the best way
Move the script into the cloud
This is probably what you want: cut your machine out of the loop. You can use Google AppScripts, and rewrite your jira code there. You can then configure the google AppScript to run on a button click.

Unfortunately, you really can't get a button press in a Google Sheet to launch a local Python script-- Google Sheets / your browser cannot access your local files and programs in that way.
You can create a button that runs a Google Apps Script (GAS). This is some code based on JavaScript, attached to the spreadsheet, hosted/run by Google. Here's a tutorial on how to run via button press.
If you can port your script into GAS, that is one solution.
If you want to keep the script in Python, you basically need to deploy it and then use GAS to call your Python script. The simplest way I can think of (which is not super simple, but is totally doable!) is as follows:
1. Make your Python script into an API.
Use something like Flask or FastAPI to setup your own API. The aim that when a certain URL is visited, it will trigger your Python program to run a function which does all the work. With FastAPI it might look like this:
from fastapi import FastAPI
app = FastAPI()
def main():
print("Access Google Sheet via API...")
# your code here
print("Upload to JIRA via API...")
# your code here
#app.get("/")
def root():
main()
return {"message": "Done"}
Here, "/" is the API endpoint. When you visit (or make a "get" request) to the URL of the deployed app, simply ending in "/", the root function will get called, which calls your main function. (You could set up different URL endings to do different things).
We can test this locally. If you follow the setup instructions for FastAPI, you should be able to run the command uvicorn main:app --reload which launches a server at http://127.0.0.1:8000. If you visit that URL in your browser, the script should get run and the message "Done" should appear in your browser.
2. Deploy your Python app
There are many services that can host your Python program, such as Heroku or Google Cloud. They may offer free trials but this generally costs money. FastAPI has instructions for deploying to Deta which seems to currently have a free tier.
When your app is app and running, there should be an associated web address such as "https://1kip8d.deta.dev/". If you access this in the browser it will run your script and return the "Done" message.
3. Hit your Python API from Google Sheets, using GAS
The last step it to "hit" that URL using GAS, instead of visiting it manually in the browser. Following the tutorial mentioned above, create a GAS script linked to your spreadsheet, and a button which is "assigned" to your script. The script will look something like this:
function myFunction() {
var response = UrlFetchApp.fetch("https://1kip8d.deta.dev/");
Logger.log(response.getContentText());
}
Now, whenever you press the button, GAS will visit that URL, which will cause your Python script to execute.

You might want to check out Google Colaboratory. It's a service by Google that can host your Python code (called a "notebook"), connect with your Google Drive (and other Google services), and make calls out to web endpoints (which would be your Jenkins server). I think those are the three pieces you're dealing with here.
Just to be clear... your code wouldn't be local anymore (if that's really important to you). Instead, it would be hosted by Google. The notebooks are saved to your Google Drive account, so you get the security that provides.

How do I deploy this app for my job: EC2, Elastic Beanstalk, something else entirely?

I'm tasked with creating a web app (I think?) for my job that will tracker something in our system. It'll be an internal tool that staff uses to keep track of the status of one of the things we do. It should look like trello, with cards that drag from step to step. That frontend exists, but my job is to make the system update when the cards are dragged. This requires using an API in Python and isn't that complicated to grab from/update. I have no idea how to put all of this together. My job is almost completely nontechnical and there's no one internally who knows what I'm doing except for me. I'm in so over my head here and have no idea where to begin. Is this something I should deploy on Elastic Beanstalk? EC2? How do I tie this together and put it somewhere?

Are you trying to pull in live data from Trello or from your companies own internal project management tool?
An EC2 might be useful, but honestly, it may be completely unnecessary if your company has its own servers. An EC2 is basically just a collection of rental computers to help with scaling. I have never used beanstalk so my input would be useless there.
From what I can assume from the question, you could have a python script running to pull from the API and make the changes without an EC2.

First thing you should do is gather as much information about what the end product should look like. From your question, I have the feeling that you have only a vague idea of what the stakeholders want. Don't be afraid to ask more clarification about an unclear task. It's better to spend 30 minutes discussing and taking note than to show the end-product after a month and realizing that's not what your boss/team wanted.
Question I would Ask
Who is going to be using this app? (technical or non-technical person)
For what purpose is this being developed?
Does it need to be on the web or can it be used locally?
How many users need to have access to this application?
Are we handling sensitive information with this application?
Will this need to be augmented with other functionality at some point?
This is just a sample of what I would ask, during the conversation with the stakeholder a lot more will pop up for sure.
What I think you have to do
You need to make a monitoring system for the tasks that need to be done by your development team (like a Kanban)
What I think you already have
A frontend with the card that are draggable to each bin. I also assume that you can create a new card and delete one in the frontend. The frontend is most likely written in React, Angular or Vue.js. You might also have no frontend framework (a mix of jQuery and vanilla js), but usually frontend developper end up picking a framework of sort to help the development.
A backend API in Python (in Flask or with Django-rest-framework most likely) that is communicating with a SQL database like postgresql or a Document database like MongoDB.
I'm making a lot of assumption here, but your aim should be to understand the technology you will be working with in order to check which hosting would work best. For instance, if the database that is setup is a MySQL database you might have some trouble with some hosting provider.
What I think you are missing
Currently the frontend and the backend don't communicate to each other. When you drag a card it won't persist if you refresh the page. Also, all of this is sitting in your computer and cannot be used by any one from your staff. You need to first connect the frontend with the backend so that the application has persistance. Then you need to deploy this application somewhere so that it is reachable by your staff.
What I would do is first work locally to make sure that the layer of persistance is working. This imply having the API server, the frontend server and the database server running simultaneously on your computer to develop. You should then fetch data from the API to know which cards are there in the database and then create them visually in your frontend at the right spot.
When you drop a card to a new spot after having dragging it should trigger a POST request to your API server in order to update the status of this particular card (look at the documentation of your API to check what you need to send).
The server should be sending back an updated version of the cards status if the POST request was sucessful, so your application should then just redraw the card at the right spot (it won't make a difference for you since they are already at the right spot and your frontend framework will most likely won't act on this change since the state hasn't changed). That's all I would do for that part.
I would then move to the deployment phase to make sure that whatever you did locally can still work online. I would use Heroku to start instead of jumping directly to AWS. Heroku is a service built on top of AWS which manage a lot of the complexity of AWS for you. This is great for prototyping and it means that when your stuff is ready you can migrate to AWS easily and be confident that a setup exist to make your app work. You might also be tied up to your company servers, which is another thing I would ask to the stakeholder (i.e. where can I put this application and where I can't put it).
The flow for a frontend + api + database application on Heroku is usually as follow. You create a github repo for your frontend (make it private) and you create an app on Heroku that will watch this repository for changes. It will re-deploy the application for you when it sees a change at a specific subdomain of Heroku hosting. You will need to configure some procfiles that will tell Heroku what to do with a given application type. This is where you need to double check what frontend you are using since that might change the procfiles used. It's most likely a node.js based frontend (React, Angular or Vue) so head over here for the documentation of how to put that online.
You will need to make a repo for the backend also that is separate from the frontend, these two entities are distinct and they only communicate through HTTP request (frontend->backend) and JSON (backend->frontend). You will need to follow the same idea as with the frontend to deploy, head over here.
Once you have these two online, you need to create a database on Heroku. This is done by adding a datastore to your api, head over here. There are some framework specific configuration you need to do to make the API talk to an online database, but then you will need to find that configuration on the framework documentation. The database could also be already up and living on your server, if this is the case you just need to configure your online backend to talk to that particular database at a particular address.
Once all of the above is done, re-test your application to check if you get the same behavior as before. This is a usable MVP, however there are no layer of security. Anyone with the right URL could just fetch your frontend and start messing around with your data.
There is more engineering that need to be done to make this a viable end product. This leads us to my final remark: why you are not using a product like Trello, Jira, or even Github Project? If it is to save some money on not paying for a subscription I think you should factor in the cost of development, security and maintenance of this application.
Hope it helps!

One simple option is Heroku for deploy your API and your frontend application.

Flask deployment

I've created a shopping site with a backend and a frontend.
The backend is python (3.6.5) with Flask.
I want to deploy my site to Google App Engine (gae).
When in development, everything works fine.
When deployed (in production) each rpc gets it's own 'thread' and everything is a mess.
I tried slapping gunicorn on it with sync and gevent worker class, but to no avail.
In deployment, how can I make each connection/session remember it's own 'instance of the backend'?
-instead of gae/flask/gunicorn serving a new instance of the backend for each request?
I need each user connection to be consistent and 'its own'/'private'.

It's not possible to do this. App Engine will spread the request load to your application across all instances, regardless of which one previously handled a request from a specific IP. A specific instance may also come online or go offline due to load or underlying changes to App Engine (e.g., a data center needs maintenance).
If you need to maintain session state between multiple requests to your app, you have a couple options depending on the architecture:
Keep session state in cookies with Flask.session
Keep session state in storage with Memorystore

How to run python scripts online?

I have a simple python script that gets the local weather forecast and sends an email with the data
I want to run this script daily, i found out that cron is used for this purpose but online cron jobs require a url
I wanted to ask how to host my python scripts so that they run online through a url, if possible that is...

I would recommend using Heroku with a python buildpack as a starting point. Using the flask library, you can very minimally start a web container and expose the endpoint online which can then be queried from your cron service. Heroku also provides a free account which ideally should fit your need.
As a peek into how easy it is to setup flask, well..
from flask import Flask
app = Flask(__name__)
#app.route('/cron-me')
def cron_me():
call_my_function_here()
return 'Success'
.. and you're done ¯\_(ツ)_/¯

Try setting up a simple Flask app at www.pythonanywhere.com, I think it will do the job for you even with the free account.
EDIT: And for sending e-mails, you can use Mailgun, with the free version you can send e-mails to a small number of addresses that need to be validated from the recipient side (to avoid people using it for spam :-))

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.