Celery with Google Cloud MemoryStore for a Flask website - python

We are building a simple single-page website using flask to be deployed on GKE. In this , we have queries on MSSQL databases (used by another application) for which we want to use celery with Google cloud memorystore redis instance to run those queries scheduled once a day , then use that result data from queries on the website for that day as we do not want to query the databases everytime there is a visitor to the site (as the data is mostly static for a day).
Now , I am quite new to Software development and particularly DevOps less so. After reading up on resources online ,I couldn't much about it and I am still unsure about how this works .
Is The result data after completing the celery task stored in the Redis Result backend(Google cloud memorystore) in Google storage the entire day and can be accessed anytime in my Python code using celery task variable whenever a user visits the site ? Or should I access the data stored in Redis Result backend(GCM) in Google storage using another query to google cloud db in my code ? Or is the data stored in Redis Result backend(GCM) only temporary until the task is marked as Done and cannot be accessed throughout the day ? How do I move forward ? Can someone please point this out ?

Related

Google Cloud Platform - Django - Scheduled task to remove expired data

I have a Django application connected to Cloud Run at Google Cloud Platform.
I am in need to schedule a task to run everyday. The task should go through a table in the database and remove rows that exceed todays date.
I have looked in to Cloud functions, however, when I try to create one, it appears to only support Flask, and not Django.
Any ideas on how to precede to make this scheduled function and access the database?

How to trigger python script with Hasura event

I'm currently building a selfhosted Vuejs webapp (with account logins).
The webapp needs to be a Python webscraper GUI where my user has control over the Python scraper.
So for example, the user fills in an endpoint, starts scraper, view results, trigger a new more depth scraper, etc.
I have the Python scripts for scraping.
And I have decided to go with VueJS + AWS cognito + Hasura for the frontend
I have difficulties understanding how to trigger the python scripts and dump the results in the database and show them to the frontend.
I do like the 3 factor approach:
The data from my scrapers can be many db entries, so I don't like to enter them in the database via mutations.
Do I have to make Flask endpoints to let Hasura trigger these webhooks?
I'm not familiar with serverless.
How do I make my Python scraper scripts serverless?
Or can I just use SQLalchemy to dump the scraper results into the database?
But how do I notify my frontend user that the data is ready?
There are a lot of questions in this one and the answer(s) will be somewhat opinionated so it might not be the greatest fit for StackOverflow.
That being said, after reading through your post I'd recommend that for your first attempt at this you use SQLalchemy to store the results of your scraper jobs directly into the Database. It sounds like you have the most familiarity with this approach.
With Hasura, you can simply have a subscription to the results of the job that you query in your front end so the UI will automatically update on the Vue side as soon as the results become available.
You'll have to decide how you want to kick off the scraping jobs, you have a few different options:
Expose an API endpoint from your Python app and let the UI trigger it
Use Hasura Actions
Build a GQL server in Python and attach it to Hasura using Remote Schemas
Allow your app put a record into the Database using a graphql mutation that includes information about the scrape job and then allow Hasura to trigger a webhook endpoint in your Python app using Hasura Event Triggers
Hasura doesn't care how the data gets into the database it provides a ton of functionality and value even if you're using a different Database access layer in another part of your stack.

My callback webhook is overloaded - what can I do?

I use an API to synchronise a lot of data to my app every night. The API itself uses a callback system, so I send a request to their API (including a webhook URL) and when the data is ready they will send a notification to my webhook to tell me to query the API again for the data.
The problem is, these callbacks flood in at a high rate (thousands per minute) to my webhook, which puts an awful lot of strain on my Flask web app (hosted on a Heroku dyno) and causes errors for end users. The webhook itself has been reduced down to forwarding the message on to a RabbitMQ queue (running on separate dynos), which then works through them progressively at its own pace. Unfortunately, this does not seem to be enough.
Is there something else I can do to manage this load? Is it possible to run a particular URL (or set of URLs) on a separate dyno from the public facing parts of the app? i.e. having two web dynos?
Thanks
you can deploy you application with SAME code on more than one dyno using free tier. For example, you application is called rob1 and hosted at https://rob1.herokuapp.com, and source code accessible from https://git.heroku.com/rob1.git. You can create application rob2, accessible from https://rob2.herokuapp.com and with source code hosted at https://git.heroku.com/rob2.git
Than, you can push code to 2nd application.
$ cd projects/bob1
$ git remote add heroku2 https://git.heroku.com/rob2.git
$ git push heroku2 master
As result, you have single repo on your machine, and 2 identical heroku applications running code of your project. Probably, you'll need to copy environment parameters of 1st app to 2nd one.
But anyway, you'll have 2 identical apps on free tier.
Later, if you have obtained domain name, for example robsapp.example.org, you can make it have to CNAME DNS records pointing to your heroku apps to make load balancing like this
rob1.herokuapp.com
rob2.herokuapp.com
as result, you have you application webhooks available on robsapp.example.org and it automatically load balance requests between rob1 and rob2 apps

Scaling Google BigQuery Extracts in an App Engine Flask app for ETL

Trying to deploy an ETL script which extracts data from BigQuery via Pandas-gbq and Google Sheets, and then uploads the transformed whole back to BigQuery. I want to deploy it as Flask app in App Engine.
I am using Sheets API to access Google Sheets and Pandas-gbq to access Google BigQuery. I have increased App Time out to 6000 seconds. While I get a response for a small number of rows (~100), for larger loads it boots workers with increasing PIDs and then shuts down.
I do not get an error message, and the status of the job displays Ran Successfully, however the data is not appended to the correct location as it was when the number of rows was small or if I ran it locally.
Do I need more computing power from the VM or another way to run the process? What would be the best way to deploy a bunch of such apps scheduled via a cron job to run at various times in a week?
It would be a difficult task to rewrite all the scripts, so any method to directly deploy them via app engine should help.
Thanks in advance.

Web Scraping with Google Compute Engine / App Engine

I've written a python script that uses Selenium to scrape information from a website and stores it in a csv file. It works well on my local machine when I manually execute it but I now want to run the script automatically once per hour for several weeks and safe the data in a database. It may take about 5-10 minutes to run the script.
I've just started off with Google Cloud and it looks like there are several ways of implementing it with either Compute Engine or App Engine. So far, I get stuck at a certain point with all three ways that I found so far (e.g. getting the scheduled task call a URL of my backend instance and getting that instance to kick off the script). I've tried to:
Execute the script via Compute Engine and use datastore or cloud sql. Unclear if crontab can easily be set up.
Use Task Queues and Scheduled Tasks on App Engine.
Use backend instance and Scheduled Tasks on App Engine.
I'd be curious to hear from others what they would recommend as the easiest and most appropriate way given that this is truly a backend script that does not need a user front end.
App Engine is feasible but only if you limit your use of Selenium to a .remote out to a site such as http://crossbrowsertesting.com/ -- feasible but messy.
I'd use Compute Engine -- and cron is trivial to use on any Linux image, see e.g http://www.thegeekstuff.com/2009/06/15-practical-crontab-examples/ !

Categories

Resources