How to update RSS Feed every 5 seconds in Python using Flask - python

I did a lot of research and nothing relevant worked. Basically I am trying to scrape RSS Feed and populate the data in a table format on a webpage created using Python Flask. I have scraped the data in a dictionary form. But it does not fetch the data in real-time (or every 5 seconds) on the webpage.
Here is the code for scraping RSS Feed using formfeed, rss_feed.py.
import feedparser
import time
def feed_data():
RSSFeed = feedparser.parse("https://www.upwork.com/ab/feed/jobs/rss?sort=recency&paging=0%3B10&api_params=1&q=&securityToken=2c2762298fe1b719a51741dbacb7d4f5c1e42965918fbea8d2bf1185644c8ab2907f418fe6b1763d5fca3a9f0e7b34d2047f95b56d12e525bc4ba998ae63f0ff&userUid=424312217100599296&orgUid=424312217104793601")
feed_dict = {}
for i in range(len(RSSFeed.entries)):
feed_list = []
feed_list.append(RSSFeed.entries[i].title)
feed_list.append(RSSFeed.entries[i].link)
feed_list.append(RSSFeed.entries[i].summary)
published = RSSFeed.entries[i].published
feed_list.append(published[:len(published)-6])
feed_dict[i] = feed_list
return feed_dict
if __name__=='__main__':
while True:
feed_dict = feed_data()
#print(feed_dict)
#print("==============================")
time.sleep(5)
Using the time.sleep() works on this script. But when I import it in the app.py, it fails to reload every 5 seconds. Here is the code to run the Flask app, app.py:
from flask import Flask, render_template
import rss_feed
feed_dict = rss_feed.feed_data()
app = Flask(__name__)
#app.route("/")
def hello():
return render_template('home.html', feed_dict=feed_dict)
I tried using BackgroundScheduler from APScheduler as well. Nothing seems to be working. Formfeed's 'etag' and 'modified' not being recognized for some reason (is it deprecated?). I even tried using the 'refresh' attribute in the meta tag. But that of course only updates the Jinja2 template and not the code itself:
<meta http-equiv="refresh" content="5">
I am really stuck on this.
Here is a link to the (half complete) app: https://rss-feed-scraper.herokuapp.com/

Your
feed_dict = rss_feed.feed_data()
is at module level.
When Python starts, it executes these lines and won't reload it until you restart your app.
If you are interested in this topic, please google for runtime vs compile time python.
That said, I'd suggest that you do the polling with a JavaScript function which polls the remote RSS feed every 5 seconds.
This would look something like
setInterval(function(){
//code goes here that will be run every 5 seconds.
}, 5000);

I tried a bunch of things but this is what I found was the easiest solution to this problem:
from flask import Flask, render_template
import rss_feed
app = Flask(__name__)
feed_dict={}
def update_data(interval):
Timer(interval, update_data, [interval]).start()
global feed_dict
feed_dict = rss_feed.feed_data()
update_data(5)
#app.route("/")
def hello():
#feed_dict = rss_feed.feed_data()
#feed_dict=feed_data()
# time.sleep(5)
return render_template('home.html', feed_dict=feed_dict)
A simple update_data() solved the whole problem, did not need any additional module, JavaScript, AJAX etc. etc.

Related

I am unable to load full pickle list on azure webapp

I have an Azure Web app stacked on python and running a flask app to call a function and this function returns a list of country name which I have saved in pickle file. Lets say I have a total of 100 countries so whenever I run the app it reads 100 countries from that pickle file but sometimes it's stuck to 98 or 99 countries so not sure where I am loosing 1 or 2 countries from that list. This issue only happens on azure web app otherwise it retrieves full 100 countries. Below is the code I'm using to load the pickle file having country list of 100:
import pickle
path=os.getcwd()+'\\'
def example():
country_list=pickle.load(open(path+"support_file/country_list.p","rb"))
print(len(country_list))
return country_list
Here is my flask app.py to call the function:
from other_file import example
from flask import Flask, request
app = Flask(__name__)
#app.route("/", methods=["POST", "GET"])
def query():
if request.method == "POST":
return example()
else:
return "Hello!"
if __name__ == "__main__":
app.run()
The above list is then used in a function and my output depends on all the elements of this list but if an element or two goes missing while loading this pickle then my output changes. So I'm not missing out this elements consistently but it happens for say 1 in every 20 times, so is this a problem of Azure Web app or is something wrong with my pickle? I tried to recreate the pickle but same problem keeps on coming up once in a while.
It seems pickle load reads till it's buffer is full. So, you would have to iterate like below, until it gets an EOF exception. Unfortunately, I could not find a graceful way to run the loop without catching exception. You might also need to cache the list instead of unpickling on every request to optimize performance.
with open(os.getcwd()+'/support_file/country_list.p','rb') as f:
country_list = []
while True:
try:
country_list.append(pickle.load(f))
except EOFError:
break

Flask: How to use url_for() outside the app context?

I'm writing a script to collect the emails of those users that didn't receive an email confirmation email and resend it to them. The script works obviously outside of flask app context. I would like to use url_for() but can't get it right.
def resend(self, csv_path):
self.ctx.push()
with open(csv_path) as csv_file:
csv_reader = csv.reader(csv_file)
for row in csv_reader:
email = row[0]
url_token = AccountAdmin.generate_confirmation_token(email)
confirm_url = url_for('confirm_email', token=url_token, _external=True)
...
self.ctx.pop()
The first thing I had to do was to set SERVER_NAME in config. But then I get this error message:
werkzeug.routing.BuildError: Could not build url for endpoint
'confirm_email' with values ['token']. Did you mean 'static' instead?
This is how it's defined, but I don't think it can even find this, because it's not registered when ran as script:
app.add_url_rule('/v5/confirm_email/<token>', view_func=ConfirmEmailV5.as_view('confirm_email'))
Is there a way to salvage url_for() or do I have to build my own url?
Thanks
It is much easier and proper to get the URL from the application context.
You can either import the application and manually push context with app_context
https://flask.palletsprojects.com/en/2.0.x/appcontext/#manually-push-a-context
from flask import url_for
from whereyoudefineapp import application
application.config['SERVER_NAME'] = 'example.org'
with application.app_context():
url_for('yourblueprint.yourpage')
Or you can redefine your application and register the wanted blueprint.
from flask import Flask, url_for
from whereyoudefineyourblueprint import myblueprint
application = Flask(__name__)
application.config['SERVER_NAME'] = 'example.org'
application.register_blueprint(myblueprint)
with application.app_context():
url_for('myblueprint.mypage')
We can also imagine different ways to do it without the application, but I don't see any adequate / proper solution.
Despite everything, I will still suggest this dirty solution.
Let's say you have the following blueprint with the following routes inside routes.py.
from flask import Blueprint
frontend = Blueprint('frontend', __name__)
#frontend.route('/mypage')
def mypage():
return 'Hello'
#frontend.route('/some/other/page')
def someotherpage():
return 'Hi'
#frontend.route('/wow/<a>')
def wow(a):
return f'Hi {a}'
You could use the library inspect to get the source code and then parse it in order to build the URL.
import inspect
import re
BASE_URL = "https://example.org"
class FailToGetUrlException(Exception):
pass
def get_url(function, complete_url=True):
source = inspect.getsource(function)
lines = source.split("\n")
for line in lines:
r = re.match(r'^\#[a-zA-Z]+\.route\((["\'])([^\'"]+)\1', line)
if r:
if complete_url:
return BASE_URL + r.group(2)
else:
return r.group(2)
raise FailToGetUrlException
from routes import *
print(get_url(mypage))
print(get_url(someotherpage))
print(get_url(wow).replace('<a>', '456'))
Output:
https://example.org/mypage
https://example.org/some/other/page
https://example.org/wow/456

How can I get my site to rerun some python flask code every time someone visits or every few minutes?

This is my first time developing a website and I am coming across some issues. I have some python code that scrapes some data with beautifulsoup4 and displays the number on my site with flask. However, I found that my site does not automatically update the values at all, and rather only updates when I manually make my host reload.
How can I make it so that my python script "re-scrapes" every time a visitor visits my site, or just every 5 minutes or so? Any help would be greatly appreciated!
emphasized text
Host- Pythonanywhere
Here is my current backend python code:
import bs4 as bs
import urllib.request
from flask import Flask, render_template
app = Flask(__name__)
link = urllib.request.urlopen('https://www.health.pa.gov/topics/disease/coronavirus/Pages/Cases.aspx')
soup = bs.BeautifulSoup(link, 'lxml')
body = soup.find('body') # get the body so you can do soup.find_all() inside it
tables = soup.find_all('table')
for table in tables:
table_rows = table.find_all('tr')
for tr in table_rows:
td = tr.find_all('td')
row = [i.text for i in td]
if row.count('Bucks') > 0:
print(row[1])
# Bucksnum shows the amount of cases in bucks county,
bucksnum = str(row[1])
data = bucksnum
# this is the part that connects the flask file to the html file
#app.route("/")
def home():
return render_template("template.html", data=data)
#app.route("/")
def index():
return bucksnum
if __name__ == '__main__':
app.run(host='0.0.0.0')
index()
Your application is only gathering the data once, when it starts up. If you want it to grab the data every single time someone visits the page, you could place your code in which you grab and process the table data into the relevant view function indicated by the #app.route('/route') wrapper, and the function will be run every time it's visited.
You would need to use schedulers, check this thread that discusses a similar issue, you can use it to call a certain function that updates your data every interval of time.

Adding a view.py to a flask web application

I just started working with web tools and flask.I have a python script with 9 functions and I am trying to make a flask application. the main view of this application would do the same thing as my python script (in which some functions are intermediate meaning they do not produce the final output and 2 functions produce the final output). since for one route I have 9 functions, what do you suggest? shall I rename my original script as view.py and call it in the app.py (under the corresponding route) or there is better way?
Creating a separate file is always a better choice. It will make the code more readable and understandable.
Simply create a file and call the method in app.py like this.
from flask import Flask, jsonify
from views import your_method_name
# Initialize the app
app = Flask(__name__)
# Route (e.g. http://127.0.0.1:5000/my-url)
#app.route("/my-url", methods=['POST'])
def parse():
response = your_method_name() # call the method
return jsonify(response)
if __name__ == '__main__':
app.run()

Optimise python function fetching multi-level json attributes

I have a 3 level json file. I am fetching the values of some of the attributes from each of the 3 levels of json. At the moment, the execution time of my code is pathetic as it is taking about 2-3 minutes to get the results on my web page. I will be having a much larger json file to deal with in production.
I am new to python and flask and haven't done much of web programming. Please suggest me ways I could optimise my below code! Thanks for help, much appreciated.
import json
import urllib2
import flask
from flask import request
def Backend():
url = 'http://localhost:8080/surveillance/api/v1/cameras/'
response = urllib2.urlopen(url).read()
response = json.loads(response)
components = list(response['children'])
urlComponentChild = []
for component in components:
urlComponent = str(url + component + '/')
responseChild = urllib2.urlopen(urlComponent).read()
responseChild = json.loads(responseChild)
camID = str(responseChild['id'])
camName = str(responseChild['name'])
compChildren = responseChild['children']
compChildrenName = list(compChildren)
for compChild in compChildrenName:
href = str(compChildren[compChild]['href'])
ID = str(compChildren[compChild]['id'])
urlComponentChild.append([href,ID])
myList = []
for each in urlComponentChild:
response = urllib2.urlopen(each[0]).read()
response = json.loads(response)
url = each[0] + '/recorder'
responseRecorder = urllib2.urlopen(url).read()
responseRecorder = json.loads(responseRecorder)
username = str(response['subItems']['surveillance:config']['properties']['username'])
password = str(response['subItems']['surveillance:config']['properties']['password'])
manufacturer = str(response['properties']['Manufacturer'])
model = str(response['properties']['Model'])
status = responseRecorder['recording']
myList.append([each[1],username,password,manufacturer,model,status])
return myList
APP = flask.Flask(__name__)
#APP.route('/', methods=['GET', 'POST'])
def index():
""" Displays the index page accessible at '/'
"""
if request.method == 'GET':
return flask.render_template('index.html', response = Backend())
if __name__ == '__main__':
APP.debug=True
APP.run(port=62000)
Ok, caching. So what we're going to do is start returning values to the user instantly based on data we already have, rather than generating new data every time. This means that the user might get slightly less up to date data than is theoretically possible to get, but it means that the data they do receive they receive as quickly as is possible given the system you're using.
So we'll keep your backend function as it is. Like I said, you could certainly speed it up with multithreading (If you're still interested in that, the 10 second version is that I would use grequests to asynchronously get data from a list of urls).
But, rather than call it in response to the user every time a user requests data, we'll just call it routinely every once in a while. This is almost certainly something you'd want to do eventually anyway, because it means you don't have to generate brand new data for each user, which is extremely wasteful. We'll just keep some data on hand in a variable, update that variable as often as we can, and return whatever's in that variable every time we get a new request.
from threading import Thread
from time import sleep
data = None
def Backend():
.....
def main_loop():
while True:
sleep(LOOP_DELAY_TIME_SECONDS)
global data
data = Backend()
APP = flask.Flask(__name__)
#APP.route('/', methods=['GET', 'POST'])
def index():
""" Displays the index page accessible at '/'
"""
if request.method == 'GET':
# Return whatever data we currently have cached
return flask.render_template('index.html', response = data)
if __name__ == '__main__':
data = Backend() # Need to make sure we grab data before we start the server so we never return None to the user
Thread(target=main_loop).start() #Loop and grab new data at every loop
APP.debug=True
APP.run(port=62000)
DISCLAIMER: I've used Flask and threading before for a few projects, but I am by no means an expert on it or web development, at all. Test this code before using it for anything important (or better yet, find someone who knows that they're doing before using it for anything important)
Edit: data will have to be a global, sorry about that - hence the disclaimer

Categories

Resources