making a cache for json which can be explicitly purged - python

I have many routes on blueprints that do something along these lines:
# charts.py
#charts.route('/pie')
def pie():
# ...
return jsonify(my_data)
The data comes from a CSV which is grabbed once every x hours by a script which is separate from the application. The application reads this using a class which is then bound to the blueprint.
# __init__.py
from flask import Blueprint
from helpers.csv_reader import CSVReader
chart_blueprint = Blueprint('chart_blueprint', __name__)
chart_blueprint.data = CSVReader('statistics.csv')
from . import charts
My goal is to cache several of the responses of the route, as the data does not change. However, the more challenging problem is to be able to explicitly purge the data on my fetch script finishing.
How would one go about this? I'm a bit lost but I do imagine I will need to register a before_request on my blueprints

ETag and Expires were made for exactly this:
class CSVReader(object):
def read_if_reloaded(self):
# Do your thing here
self.expires_on = self.calculate_next_load_time()
self.checksum = self.calculate_current_checksum()
#charts.route('/pie')
def pie():
if request.headers.get('ETag') == charts.data.checksum:
return ('', 304, {})
# ...
response = jsonify(my_data)
response.headers['Expires'] = charts.data.expires_on
response.headers['ETag'] = charts.data.checksum
return response

Sean's answer is great for clients that come back and request the same information before the next batch is read in, but it does not help for clients that are coming in cold.
For new clients you can use cache servers such as redis or memcachd that can store the pre-calculated results. These servers are very simple key-value stores, but they are very fast. You can even set how long the values will be valid before it expires.
Cache servers help if calculating the result is time consuming or computationally expensive, but if you are simply returning items from a file it will not make a dramatic improvement.
Here is a flask pattern for using the werkzeug cache interface flask pattern and here is a link to the flask cache extention

Related

Struggling with how to iterate data

I am learning Python3 and I have a fairly simple task to complete but I am struggling how to glue it all together. I need to query an API and return the full list of applications which I can do and I store this and need to use it again to gather more data for each application from a different API call.
applistfull = requests.get(url,authmethod)
if applistfull.ok:
data = applistfull.json()
for app in data["_embedded"]["applications"]:
print(app["profile"]["name"],app["guid"])
summaryguid = app["guid"]
else:
print(applistfull.status_code)
I next have I think 'summaryguid' and I need to again query a different API and return a value that could exist many times for each application; in this case the compiler used to build the code.
I can statically call a GUID in the URL and return the correct information but I haven't yet figured out how to get it to do the below for all of the above and build a master list:
summary = requests.get(f"url{summaryguid}moreurl",authmethod)
if summary.ok:
fulldata = summary.json()
for appsummary in fulldata["static-analysis"]["modules"]["module"]:
print(appsummary["compiler"])
I would prefer to not yet have someone just type out the right answer but just drop a few hints and let me continue to work through it logically so I learn how to deal with what I assume is a common issue in the future. My thought right now is I need to move my second if up as part of my initial block and continue the logic in that space but I am stuck with that.
You are on the right track! Here is the hint: the second API request can be nested inside the loop that iterates through the list of applications in the first API call. By doing so, you can get the information you require by making the second API call for each application.
import requests
applistfull = requests.get("url", authmethod)
if applistfull.ok:
data = applistfull.json()
for app in data["_embedded"]["applications"]:
print(app["profile"]["name"],app["guid"])
summaryguid = app["guid"]
summary = requests.get(f"url/{summaryguid}/moreurl", authmethod)
fulldata = summary.json()
for appsummary in fulldata["static-analysis"]["modules"]["module"]:
print(app["profile"]["name"],appsummary["compiler"])
else:
print(applistfull.status_code)

Optimal way to initialize heavy services only once in FastAPI

The FastAPI application I started working on, uses several services, which I want to initialize only once, when the application starts and then use the methods of this object in different places.
It can be a cloud service or any other heavy class.
Possible ways is to do it with Lazy loading and with Singlenton pattern, but I am looking for better approach for FastAPI.
Another possible way, is to use Depends class and to cache it, but its usage makes sense only with route methods, not with other regular methods which are called from route methods.
Example:
async def common_parameters(q: Optional[str] = None, skip: int = 0, limit: int = 100):
return {"q": q, "skip": skip, "limit": limit}
async def non_route_function(commons: dict = Depends(common_parameters)):
print(commons) # returns `Depends(common_parameters)`
#router.get('/test')
async def test_endpoint(commons: dict = Depends(common_parameters)):
print(commons) # returns correct dict
await non_route_function()
return {'success': True}
There can be also used #app.on_event("startup") event to initialize heavy class there, but have no idea how to make this initialized object accessible from every place, without using singleton.
Another ugly way is also to save initialized objects into #app( and then get this app from requests, but then you have to pass request into each non-route function.
All of the ways I have described are either ugly, uncovenient, non-pythonic or worse practice, we also don't have here thread locals and proxy objects like in flask, so what is the best approach for such kind of problem I have described above?
Thanks!
It's usually a good idea to initialize the heavy objects before launching the FastAPI application. That way you're done with initialization when the application starts listening for connections (and is made available by the load balancer).
You can set up these dependencies and do any initialization in the same location as you set up your app and main routers, since they're a part of the application as well. I usually expose the heavy object through a light weight service that exposes useful endpoints to the controllers themselves, and the service object is then injected through Depends.
Exactly how you want to perform the initialization will depend on what other requirements you have in the application - for example if you're planning to re-use the infrastructure in cli tools or use them in cron as well.
This is the way I've been doing it in a few projects, and so far it has worked out fine and kept code changes located in their own vicinities.
Simulated heavy class in heavylifting/heavy.py with from .heavy import HeavyLifter in __init__.py:
import time
class HeavyLifter:
def __init__(self, initial):
self.initial = initial
time.sleep(self.initial)
def do_stuff(self):
return 'we did stuff'
A skeleton project created in a module named foo (heavylifting lives under foo/heavylifting for now to make sense of the imports below):
foo/app.py
from fastapi import FastAPI, APIRouter
from .heavylifting import HeavyLifter
heavy = HeavyLifter(initial=3)
from .views import api_router
app = FastAPI()
app.include_router(api_router)
foo/services.py
The service layer in the application; the services are the operations and services that the application exposes to controllers, handling business logic and other co-related activities. If a service needs access to heavy, it adds a Depends requirement on that service.
class HeavyService:
def __init__(self, heavy):
self.heavy = heavy
def operation_that_requires_heavy(self):
return self.heavy.do_stuff()
class OtherService:
def __init__(self, heavy_service: HeavyService):
self.heavy_service = heavy_service
def other_operation(self):
return self.heavy_service.operation_that_requires_heavy()
foo/app_services.py
This exposes the services defined to the application as dependency lightweight injections. Since the services only attach their dependencies and gets returned, they're quickly created for a request and then discarded afterwards.
from .app import heavy
from .services import HeavyService, OtherService
from fastapi import Depends
async def get_heavy_service():
return HeavyService(heavy=heavy)
async def get_other_service_that_uses_heavy(heavy_service: HeavyService = Depends(get_heavy_service)):
return OtherService(heavy_service=heavy_service)
foo/views.py
Example of an exposed endpoint to make FastAPI actually serve something and test the whole service + heavy chain:
from fastapi import APIRouter, Depends
from .services import OtherService
from .app_services import get_other_service_that_uses_heavy
api_router = APIRouter()
#api_router.get('/')
async def index(other_service: OtherService = Depends(get_other_service_that_uses_heavy)):
return {'hello world': other_service.other_operation()}
main.py
The application entrypoint. Could live in app.py as well.
from fooweb.app import app
if __name__ == '__main__':
import uvicorn
uvicorn.run('fooweb.app:app', host='0.0.0.0', port=7272, reload=True)
This way the heavy client gets initialized on startup, and uvicorn starts serving requests when everything is live. Depending on how the heavy client is implemented it might need to pool and recreate sockets if they can get disconnected for inactivity (as most database libraries offer).
I'm not sure if the example is easy enough to follow, or that if it serves what you need, but hopefully it'll at least get you a bit further.

python requests...or something else... mysteriously caching? hashes don't change right when file does

I have a very odd bug. I'm writing some code in python3 to check a url for changes by comparing sha256 hashes. The relevant part of the code is as follows:
from requests import get
from hashlib import sha256
def fetch_and_hash(url):
file = get(url, stream=True)
f = file.raw.read()
hash = sha256()
hash.update(f)
return hash.hexdigest()
def check_url(target): # passed a dict containing hash from previous examination of url
t = deepcopy(target)
old_hash = t["hash"]
new_hash = fetch_and_hash(t["url"])
if old_hash == new_hash:
t["justchanged"] = False
return t
else:
t["justchanged"] = True
return handle_changes(t, new_hash) # records the changes
So I was testing this on an old webpage of mine. I ran the check, recorded the hash, and then changed the page. Then I re-ran it a few times, and the code above did not reflect a new hash (i.e., it followed the old_hash == new_hash branch).
Then I waited maybe 5 minutes and ran it again without changing the code at all except to add a couple of debugging calls to print(). And this time, it worked.
Naturally, my first thought was "huh, requests must be keeping a cache for a few seconds." But then I googled around and learned that requests doesn't cache.
Again, I changed no code except for print calls. You might not believe me. You might think "he must have changed something else." But I didn't! I can prove it! Here's the commit!
So what gives? Does anyone know why this is going on? If it matters, the webpage is hosted on a standard commercial hosting service, IIRC using Apache, and I'm on a lousy local phone company DSL connection---don't know if there are any serverside caching settings going on, but it's not on any kind of CDN.
So I'm trying to figure out whether there is some mysterious ISP cache thing going on, or I'm somehow misusing requests... the former I can handle; the latter I need to fix!

Create own IVR that will access Database

I'm currently interning with a company and have been tasked with researching some methods of using telephony. The goal is to provide our clients with the ability to call in and through an IVR-prompted questions, get information back. The information will be from our database.
I have successfully done this using Twilio and a small python app. It does exactly what I'm looking to do, except the cost factor can be a bit high, especially if we have 30,000+ clients calling for minutes on end.
My goal is to find a way to replicate what I've done with Twilio, but on our own server. I've found options like Asterisk and IncrediblePBX, but because my limited knowledge of Linux, every error I run into results in scouring the internet for answers. Ultimately, I'm not sure if I'm heading in the right direction.
This is an example of what I'd like to accomplish:
Client calls into number. They're directed to provide an account number, (possibly their phone number) At that point it will take this information and talk to a database. Gathering this information it will relay back to the client the status of their account etc.
Questions:
I was hoping to use Google Voice to route calls similar to Twilio, is this possible? Alternatively, could my company switch to a VoIP and do the same thing?
If I move away from Twilio, can Asterisk perform the necessary tasks? Receiving calls and running the app to gather database information.
Current code for Twilio, in Python:
from flask import Flask, request, redirect
import twilio.twiml
import json
from urllib.request import urlopen
app = Flask(__name__)
callers = {
"+": "Nicholas",
}
#app.route("/", methods=['GET', 'POST'])
def initial():
# Get the caller's phone number from the incoming Twilio request
from_number = request.values.get('From', None)
resp = twilio.twiml.Response()
# if the caller is someone we know:
if from_number in callers:
# Greet the caller by name
caller = callers[from_number]
else:
caller = ""
resp = twilio.twiml.Response()
resp.say("Hello " + caller)
resp.say("Thank you for calling.")
your_number = list(from_number)
del your_number[0]
del your_number[0]
resp.say("You are calling from: ")
x = 0
while x < len(your_number):
resp.say(your_number[x])
x += 1
print("Please enter the neighborhood I.D. you're looking for.")
with resp.gather(numDigits=1, action="/handle-first", method="POST") as g:
g.say("Please enter the neighborhood I.D. you're looking for.")
return str(resp)
#app.route("/handle-first", methods=['GET', 'POST'])
def handle_key():
digit_pressed = request.values.get('Digits', '')
resp = twilio.twiml.Response()
url = 'http://localhost/...'
response = urlopen(url)
data = json.loads(response.readall().decode('utf-8'))
current = data['rows'][0]['Neighborhood']
print(current)
resp.say("You have chosen " + current + "as your neighborhood.")
with resp.gather(numDigits=1, action="/handle-second", method="POST") as h:
h.say("Press 1 to choose another Neighborhood?")
return str(resp)
#app.route("/handle-second", methods=['GET', 'POST'])
def handle_key2():
digit_pressed = request.values.get('Digits', '')
resp = twilio.twiml.Response()
if digit_pressed == "1":
return redirect("/")
else:
resp.say("Thank you for calling. Good-bye.")
return str(resp)
if __name__ == "__main__":
app.run(debug=True)
Asterisk is fundamentally different than Twilio, I can honestly say they are somewhat opposites. The track you are seeking to follow is that of Asterisk dialplan with a combination of AGI. AGI is the Asterisk Gateway Interface, and will enable you a way to interact with Asterisk via a simple scripting mechanism. The API is fairly simple and there is a whole range of ready made libraries you can use. Let's put it this way, when it comes to AGI, just pick your poison - PHP, Python, JAVA, Ruby - all are available.
My personal favorite, and many in the Asterisk world as well, is PHP - with an LGPL component called PHP-AGI. It's fairly old and has been around for years, but works well with all versions of Asterisk. If you wish to walk on the bleeding edge and require stronger call control, you can go with ARI (Asterisk REST interface), however, judging from your requirement - that's an overkill.
You can read a bit about AGI development with PHP at this link:
https://www.packtpub.com/books/content/asterisk-gateway-interface-scripting-php
(Shameful plug - I wrote the book)
In addition, you can refer to the following presentation for additional information (again, my own presentation) - It's a little old, but still valid in terms of concept and usage:
http://osdc.org.il/2006/html/Asterisk_AGI_Programming.ppt
Good luck
Yes, asterisk can do all task you can program. It even have api for access raw stream.
However no, you can't use SAME code.
For google voice check chan_motif code.
For db access use dialplan+realtime or func_odbc or fastagi/agi interface.
You are probably looking for an open source IVR solution. The only one I am aware of is OpenVBX.
I am in no way affiliate with them or Twilio.

Lookup speed: State or Database?

I have a bunch of wordlists on a server of mine, and I've been planning to make a simple open-source JSON API that returns if a password is on the list1, as a method of validation. I'm doing this in Python with Flask, and literally just returning if input is present.
One small problem: the wordlists total about 150 million entries, and 1.1GB of text.
My API (minimal) is below. Is it more efficient to store every row in MongoDB and look up repeatedly, or to store the entire thing in memory using a singleton, and populate it on startup when I call app.run? Or are the differences subjective?
Furthermore, is it even good practice to do the latter? I'm thinking the lookups might start to become taxing if I open this to the public. I've also had someone suggest a Trie for efficient searching.
Update: I've done a bit of testing, and document searching is painfully slow with such a high number of records. Is it justifiable to use a database with proper indexes for a single column of data that needs to be efficiently searched?
from flask import Flask
from flask.views import MethodView
from flask.ext.pymongo import PyMongo
import json
app = Flask(__name__)
mongo = PyMongo(app)
class HashCheck(MethodView):
def post(self):
return json.dumps({'result' :
not mongo.db.passwords.find({'pass' : request.form["password"])})
# Error-handling + test cases to come. Negate is for bool.
def get(self):
return redirect('/')
if __name__ == "__main__":
app.add_url_rule('/api/', view_func=HashCheck.as_view('api'))
app.run(host="0.0.0.0", debug=True)
1: I'm a security nut. I'm using it in my login forms and rejecting common input. One of the wordlists is UNIQPASS.
What I would suggest is a hybrid approach. As requests are made do two checks. The first in a local cache and the the second in the MongoDB store. If the first fails but the second succeeds then add it to the in memory cache. Over time the application will "fault" in the most common "bad passwords"/records.
This has two advantages:
1) The common words are rejected very fast from within memory.
2) The startup cost is close to zero and amortized over many queries.
When storing the word list in MongoDB I would make the _id field hold each word. By default you will get a ObjectId which is a complete waste in this case. We can also then leverage the automatic index on _id. I suspect the poor performance you saw was due to there not being an index on the 'pass' field. You can also try adding one on the 'pass' field with:
mongo.db.passwords.create_index("pass")
To complete the _id scenario: to insert a word:
mongo.db.passwords.insert( { "_id" : "password" } );
Queries then look like:
mongo.db.passwords.find( { "_id" : request.form["password"] } )
As #Madarco mentioned you can also shave another bit off the query time by ensuring results are returned from the index by limiting the returned fields to just the _id field ({ "_id" : 1}).
mongo.db.passwords.find( { "_id" : request.form["password"] }, { "_id" : 1} )
HTH - Rob
P.S. I am not a Python/Pymongo expert so might not have the syntax 100% correct. Hopefully it is still helpful.
Given that your list is totally static and fits in memory, I don't see a compelling reason to use a database.
I agree that a Trie would be efficient for your goal. A hash table would work too.
PS: it's too bad about Python's Global Interpreter Lock. If you used a language with real multithreading, you could take advantage of the unchanging data structure and run the server across multiple cores with shared memory.
I would suggest checking out and trying redis as an option too. Its fast, very fast, and has nice python bindings. I would try to create a set in redis of the word list, then use the SISMEMBER function to check if the word is in the set. SISMEMBER is an O(1) operation, so it should be faster than a mongo query.
Thats assuming you want the whole list in memory of course, and that you are willing to do away with the mongo. . .
Here's more info on redis's SISMEMBER, and the python bindings
for redis
I'd recommend kyotocabinet, it's very fast. I've used it in similar circumstances:
import kyotocabinet as kyc
from flask import Flask
from flask.views import MethodView
import json
app = Flask(__name__)
dbTree = kyc.DB()
if not dbTree.open('./passwords.kct', DB.OREADER):
print >>sys.stderr, "open error: " + str(dbTree.error())
raise SystemExit
app = Flask(__name__)
class HashCheck(MethodView):
def post(self):
return json.dumps({'result' :
dbTree.check(request.form["password"]) > 0 })
# Error-handling + test cases to come. Negate is for bool.
def get(self):
return redirect('/')
if __name__ == "__main__":
app.add_url_rule('/api/', view_func=HashCheck.as_view('api'))
app.run(host="0.0.0.0", debug=True)

Categories

Resources