mongoengine can not load data correctly in concurrency - python

in a web service (flask web server, deployed with gunicorn, gevent workers) , there is a request handler will query a set of objects and update the status like below:
def update_status(job_id,info_ids):
infos = Info.objects(job_id=job_id, info_id__in = info_ids)
if len(infos) == 0:
logger.error('infos are not found')
for i in infos:
pass
I'm sure the infos are in the database, other service will request to this in concurrency, but in some request , I logged the error in logs(infos are not found). I'm quite confused why the data can not be loaded sometimes.

I'm not sure about the other parts of your code, but I guess the Info class is your mongoengine document like this:
class Info(mongoengine.Document):
job_id = mongoengine.StringField(...)
info_id = mongoengine.StringField(...)
If it's true, you must use Info.objects instead of Info.object to access queryset of your collection (note the 's' at the end of objects). So, your code must be something like this:
infos = Info.objects(job_id=job_id, info_id__in = info_ids)

Related

How to loop GETs until a certain response is received

I'm looking for some advice, or a relevant tutorial regarding the following:
My task is to set up a flask route that POSTs to API endpoint X, receives a new endpoint Y in X's response, then GETs from endpoint Y repeatedly until it receives a certain status message in the body of Y's response, and then returns Y's response.
The code below (irrelevant data redacted) accomplishes that goal in, I think, a very stupid way. It returns the appropriate data occasionally, but not reliably. (It times out 60% of the time.) When I console log very thoroughly, it seems as though I have bogged down my server with multiple while loops running constantly, interfering with each other.
I'll also receive this error occasionally:
SIGPIPE: writing to a closed pipe/socket/fd (probably the client disconnected) on request /book
import sys, requests, time, json
from flask import Flask, request
# create the Flask app
app = Flask(__name__)
# main booking route
#app.route('/book', methods=['POST']) #GET requests will be blocked
def book():
# defining the api-endpoints
PRICING_ENDPOINT = ...
# data to be sent to api
data = {...}
# sending post request and saving response as response object
try:
r_pricing = requests.post(url = PRICING_ENDPOINT, data = data)
except requests.exceptions.RequestException as e:
return e
sys.exit(1)
# extracting response text
POLL_ENDPOINT = r_pricing.headers['location']
# setting data for poll
data_for_poll = {...}
r_poll = requests.get(POLL_ENDPOINT, data = data_for_poll)
# poll loop, looking for 'UpdatesComplete'
j = 1
poll_json = r_poll.json()
update_status = poll_json['Status']
while update_status == 'UpdatesPending':
time.sleep(2)
j = float(j) + float(1)
r_poll = requests.get(POLL_ENDPOINT, data = data_for_poll)
poll_json = r_poll.json()
update_status = poll_json['Status']
return r_poll.text
This is more of an architectural issue more than a Flask issue. Long-running tasks in Flask views are always a poor design choice. In this case, the route's response is dependent on two endpoints of another server. In effect, apart from carrying the responsibility of your app, you are also carrying the responsibility of another server.
Since the application's design seems to be a proxy for another service, I would recommend creating the proxy in the right way. Just like book() offers the proxy for PRICING_ENDPOINT POST request, create another route for POLL_ENDPOINT GET request and move the polling logic to the client code (JS).
Update:
If you cannot for some reason trust the client (browser -> JS) with the POLL_ENDPOINT information in a hidden proxy like situation, then maybe move the polling to a task runner like Celery or Python RQ. Although, it will introduce additional components to your application, it would be the right way to go.
Probably you get that error because of the HTTP connection time out with your API server that is looping. There are some standards for HTTP time connection and loop took more time that is allowed for the connection. The first (straight) solution is to "play" with Apache configs and increase the HTTP connection time for your wsgi. You can also make a socket connection and in it check the update status and close it while the goal was achieved. Or you can move your logic to the client side.

Flask session not persisting after refresh

I have this route in a flask app:
#APP.route('/comparisons', methods=['POST'])
def save_comparison():
if 'comparisons' not in session or not session['comparisons']:
session['comparisons'] = []
entity_id = request.form.get('entity_id')
session['comparisons'].append(entity_id)
session.modified = True
return entity_id
APP.secret_key = 'speakfriend'
This adds entity_ids to session['comparisons'] as expected. However, when I refresh the page, the session object does not have a 'comparisons' property anymore, so the list of comparisons is empty. What am I missing?
Update:
I left out what I didn't know was the important information. The vue app also makes calls to a flask api, which sets its own session. The SECRET_KEYs were different. So, when there was an api call between webserver calls (or visa versa) the session from one application would be replaced by the session from the other. Neither was mutually intelligible (different SECRET_KEYs). Since these are always deployed together using docker-compose the solution was to use a common an env variable to pass the same secret to both.

Is my API a REST architekture?

I'm currently designing a webservice in Python with Flask. I now got very confused if it is a RESTful service or just a regular webservice. I've been reading quite a few sources about RESTful services, but still I'm not able to say if my service is a REST architekture or not.
The requests to my API are stateless.
Here is what I have:
from flask import Flask,request
if __name__ == "__main__":
appLogger.info("RestFul service initialized and started")
app.run(host="0.0.0.0",port=int("80"),debug=True)
#app.route('/',methods = ['POST'])
def add():
"""
This function is mapped to the POST request of the REST interface
"""
#check if a JSON object is declared in the header
if request.headers['Content-Type'] == 'application/json; charset=UTF-8' and request.data:
try:
data = json.dumps(request.json)
#check if recieved JSON object is valid according to the scheme
#if (validateJSON(data)):
try:
saveToMongo(data)
appLogger.info("Record saved to MongoDB")
return "JSON Message saved in MongoDB"
except:
appLogger.error("Could not write to MongoDB")
except:
appLogger.error("Recieved invalid JSON")
else:
appLogger.error("Content-Type not defined or empty content")
raise FailedRequest
I non of the possible responses, I'll return a json, which is actually the payload of a request. It's always a regular http-response with a custome text as a result description.
Is that right, that because of this fact, it is not a RESTful service, and if I want to call it a RESTful service, that I would need to return back a json object? Or am I completely wrong? Is my API just a simple RPC?
I see only one resource / to which a POST request can be made. There is no way to GET a collection of objects or a single object saved in this way.
One could argue that suche a trivial system does not violate any REST principle. But I think this is not enough to call a system RESTful. It is a trivial RPC system with a single anonymous 'save' method.

App Engine Backends Configuration (python)

I got timeout error on my GAE server when it tries to send large files to an EC2 REST server. I found Backends Python API would be a good solution to my example but I had some problems on configuring it.
Following some instructions, I have added a simple backends.yaml in my project folder. But I still received the following error, which seems like I failed to create a backend instance.
File "\Google\google_appengine\google\appengine\api\background_thread\background_thread.py", line 84, in start_new_background_thread
raise ERROR_MAP[error.application_error](error.error_detail)
FrontendsNotSupported
Below is my code, and my question is:
Currently, I got timeout error in OutputPage.py, how do I let this script run on a backend instance?
update
Following Jimmy Kane's suggestions, I created a new script przm_batchmodel_backend.py for the backend instance. After staring my GAE, now it I have two ports (a default and a backend) running my site. Is that correct?
app.yaml
- url: /backend.html
script: przm_batchmodel.py
backends.yaml
backends:
- name: mybackend
class: B1
instances: 1
options: dynamic
OutputPage.py
from przm import przm_batchmodel
from google.appengine.api import background_thread
class OutputPage(webapp.RequestHandler):
def post(self):
form = cgi.FieldStorage()
thefile = form['upfile']
#this is the old way to initiate calculations
#html= przm_batchmodel.loop_html(thefile)
przm_batchoutput_backend.przmBatchOutputPageBackend(thefile)
self.response.out.write(html)
app = webapp.WSGIApplication([('/.*', OutputPage)], debug=True)
przm_batchmodel.py
def loop_html(thefile):
#parses uploaded csv and send its info. to the REST server, the returned value is a html page.
data= csv.reader(thefile.file.read().splitlines())
response = urlfetch.fetch(url=REST_server, payload=data, method=urlfetch.POST, headers=http_headers, deadline=60)
return response
przm_batchmodel_backend.py
class BakendHandler(webapp.RequestHandler):
def post(self):
t = background_thread.BackgroundThread(target=przm_batchmodel.loop_html, args=[thefile])
t.start()
app = webapp.WSGIApplication([('/backend.html', BakendHandler)], debug=True)
You need to create an application 'file'/script for the backend to work. Just like you do with the main.
So something like:
app.yaml
- url: /backend.html
script: przm_batchmodel.py
and on przm_batchmodel.py
class BakendHandler(webapp.RequestHandler):
def post(self):
html = 'test'
self.response.out.write(html)
app = webapp.WSGIApplication([('/backend.html', OutputPage)], debug=True)
May I also suggest using the new feature modules which are easier to setup?
Edit due to comment
Possible the setup was not your problem.
From the docs
Code running on a backend can start a background thread, a thread that
can "outlive" the request that spawns it. They allow backend instances
to perform arbitrary periodic or scheduled tasks or to continue
working in the background after a request has returned to the user.
You can only use backgroundthread on backends.
So edit again. Move the part of the code that is:
t = background_thread.BackgroundThread(target=przm_batchmodel.loop_html, args=[thefile])
t.start()
self.response.out.write(html)
To the backend app

Django beginners issue issues

I am new to web development just pieced together my first django web app and integrated with apache using mod_wsgi.
the app has some 15 parameters on which you could query multiple SQL server databases and the result can be downloaded as .xls file; have deployed the same on the company network.
the problem is when i access the web app on one machine and set query parameters, the same parameters get set in the web app when i try opening it from a different machine (web client) .
Its like there is just one global object which is being served to all the web client.
Am using django template tags to set values in the app's html pages.
not using any models in the django project as am querying SQL server DB which are already built.
the query function from my views.py looks like
def query(self,request):
"""
"""
print "\n\n\t inside QUERY PAGE:",request.method,"\n\n"
self.SummaryOfResults_list = []
if self.vmd_cursor != -1:
self.vmd_cursor.close()
if request.method == 'POST':
QueryPage_post_dic = request.POST
print "\n\nQueryPage_post_dic :",QueryPage_post_dic
self.err_list = []
self.err_list = db_qry.validate_entry(QueryPage_post_dic)
if len(self.err_list):
return HttpResponseRedirect('/error/')
else:
channel_numbers,JPEG_Over_HTTP,Codec,format,rate_ctrl,transport,img_sz,BuildInfo_versions, self.numspinner_values_dic = db_qry.process_postdata(QueryPage_post_dic, self.numspinner_values_dic)
return self.get_result(request,channel_numbers,JPEG_Over_HTTP,Codec,format,rate_ctrl,transport,img_sz,BuildInfo_versions)
else:
print "\nself.Cam_Selected_list inside qry :",self.Cam_Selected_list
if (len(self.Cam_Selected_list) != 1):
return HttpResponseRedirect('/error/')
self.tc_dic,self.chnl_dic,self.enbl_dic,self.frmt_dic,self.cdectyp_dic,self.imgsz_dic,self.rtctrl_dic,self.jpg_ovr_http_dic,self.trnsprt_dic,self.cdec_dic,self.typ_dic,self.resolution_dic, self.vmd_cursor = populate_tbls.Read_RefTbls(self.Cam_Selected_list[0])
c = self.get_the_choices(self.Cam_Selected_list[0])
c['camera_type']= self.Cam_Selected_list[0]
for k,v in self.numspinner_values_dic.items():
c[k] = v
self.vmd_cursor.execute("SELECT DISTINCT [GD Build Info] FROM MAIN")
res_versions = self.vmd_cursor.fetchall()
version_list = []
ver_list = ['',' ']
for version in res_versions:
tmp_ver = version[0].encode()
if (tmp_ver not in ver_list):
version_list.append(tmp_ver)
c['build_info'] = version_list
print "\n\n c dic :",c
c.update(csrf(request))
return render_to_response('DBQuery.html',c)
and the dictionary being passed to render_to_response holds the values that setting checkboxes and multiselect boxes (dojo)
thanks
Its like there is just one global object which is being served to all the web client.
What you're saying is probably exactly what's happening. Unless you're building whatever object that self in that example code is anew for each request, it will be practically randomly shared between clients.
You can store your global variable in SQL DB that you are using. This way you can retain the value/state of the variable across request -> response cycle.
If you need faster response time explore Key->Value Pair in memory datastore like redis.
To add to what's mentioned by AKX I suggest that you read up on HTTP Request -> HTTP Response cycle and How web applications work.

Categories

Resources