Background task with progress tracking without Celery or Redis

Background task with progress tracking without Celery or Redis - python

My project do some export from one to another service. For do it, need a long background task and show progress as some text like current step of progress. One problem is how to get this text from long task.
I know about Celery and Redis. But it is needed additional resource like server. The my project is too small and does not count on attendance of more than a couple of people once a month. So I don't want to buy a shared hosting or a machine.
I tried to save this current step text into session. But response during long task is always null. I think because flask is busy with the task and does not return a valid value for the session. Tried to run the task in a new thread. Then I get an error about accessing the session from the wrong thread.

My solution
Create global dict
step = dict()
Create random key and give this key with global step into module a background task. Start a new thread and return key.
#app.route('/api/transfer', methods=['post'])
def api_transfer():
global step
key = token_urlsafe(10)
service = Service(step, key, session.get('token'))
t = threading.Thread(target=long_task, args=(service,))
t.start()
return {'key': key}
And send request with this key to get response current step for it
#app.route('/api/transfer/current_step', methods=['post'])
def api_current_step():
global step
return {'step': step[request.json['key']]}
Now for every request to export will create new key and text for only it
{'jlIG7VKtbMHtXg': 'some step for jlIG7VKtbMHtXg'}
{'jlIG7VKtbMHtXg': 'some step for jlIG7VKtbMHtXg', 'wxr4jKyP7c72sg': 'some step for wxr4jKyP7c72sg'}
And at the end of the task, remove this key from the dictionary
del self.step[self.key]

Related

Python GSpead On-Change Listener?

I'm making a script that checks a google sheet from a google form and returns the result as a live-feed visualization of a poll. I need to figure out how to update the value counts, but only when the google sheet is updated, as opposed to checking every 60 seconds (or something).
Here is my current setup:
string = ""
while True:
responses = gc.open("QOTD Responses").sheet1
data = pd.DataFrame(responses.get_all_records())
vals = data['Response'].value_counts()
str = "{} currently has {} votes. \n{} currently has {} votes.".format(vals.index[0], vals[0], vals.index[1],
vals[1])
if(str != string):
string = str
print(string)
time.sleep(60) # Updates 1440 times per day
I'm almost certain that there has to be a better way to do this, but what would that be?
Thanks!

You won't be able to do it with Python alone. You'll need to integrate with a trigger function from Google Apps script.
You could use the onEdit trigger function to send a signal to your python script (via an http call for example).
To use a simple trigger, simply create a function that uses one of
these reserved function names:
onOpen(e) runs when a user opens a spreadsheet, document,
presentation, or form that the user has permission to edit.
onEdit(e) runs when a user changes a value in a spreadsheet.

redis block until key exists

I'm new to Redis and was wondering if there is a way to be able to await geting a value by it's key until the key exists. Minimal code:
async def handler():
data = await self._fetch(key)
async def _fetch(key):
return self.redis_connection.get(key)
As you know, if such key doesnt exist, it return's None. But since in my project, seting key value pair to redis takes place in another application, I want the redis_connection get method to block untill key exists.
Is such expectation even valid?

It is not possible to do what you are trying to do without implementing some sort of polling redis GET on your client. On that case your client would have to do something like:
async def _fetch(key):
val = self.redis_connection.get(key)
while val is None:
# Sleep and retry here
asyncio.sleep(1)
val = self.redis_connection.get(key)
return val
However I would ask you to completelly reconsider the pattern you are using for this problem.
It seems to me that what you need its to do something like Pub/Sub https://redis.io/topics/pubsub.
So the app that performs the SET becomes a publisher, and the app that does the GET and waits until the key is available becomes the subscriber.
I did a bit of research on this and it looks like you can do it with asyncio_redis:
Subscriber https://github.com/jonathanslenders/asyncio-redis/blob/b20d4050ca96338a129b30370cfaa22cc7ce3886/examples/pubsub/receiver.py.
Sender(Publisher): https://github.com/jonathanslenders/asyncio-redis/blob/b20d4050ca96338a129b30370cfaa22cc7ce3886/examples/pubsub/sender.py
Hope this helps.

Except the keyspace notification method mentioned by #Itamar Haber, another solution is the blocking operations on LIST.
handler method calls BRPOP on an empty LIST: BRPOP notify-list timeout, and blocks until notify-list is NOT empty.
The other application pushes the value to the LIST when it finishes setting the key-value pair as usual: SET key value; LPUSH notify-list value.
handler awake from the blocking operation with the value you want, and the notify-list is destroyed by Redis automatically.
The advantage of this solution is that you don't need to modify your handler method too much (with the keyspace notification solution, you need to register a callback function). While the disadvantage is that you have to rely on the notification of another application (with keyspace notification solution, Redis does the notification automatically).

The closest you can get to this behavior is by enabling keyspace notifications and subscribing to the relevant channels (possibly by pattern).
Note, however, that notifications rely on PubSub that is not guaranteed to deliver messages (at-most-once semantics).

After Redis 5.0 there is built-in stream which supports blocking read. The following are sample codes with redis-py.
#add value to my_stream
redis.xadd('my_stream',{'key':'str_value'})
#read from beginning of stream
last_id='0'
#blocking read until there is value
last_stream_item = redis.xread({"my_stream":last_id},block=0)
#update last_id
last_id = last_stream_item[0][1][0][0]
#wait for next value to arrive on stream
last_stream_item = redis.xread({"my_stream":last_id},block=0)

Flask get request not using the updated version of a global variable

I'm new to both flask and python. I've got an application I'm working on to hold weather data. I'm allowing for both get and post commands to come into my flask application. unfortunately, the automated calls for my API are not always coming back with the proper results. I'm currently storing my data in a global variable when a post command is called, the new data is appended to my existing data. Unfortunately sometimes when the get is called, it is not receiving the most up to date version of my global data variable. I believe that the issue is that the change is not being passed up from the post function to the global variable before the get is called because I can run the get and the proper result comes back.
weatherData = [filed with data read from csv on initialization]
class FullHistory(Resource):
def get(self):
ret = [];
for row in weatherData:
val = row['DATE']
ret.append({"DATE":str(val)})
return ret
def post(self):
global weatherData
newWeatherData = weatherData
args = parser.parse_args()
newVal = int(args['DATE'])
newWeatherData.append({'DATE':int(args['DATE']),'TMAX':float(args['TMAX']),'TMIN':float(args['TMIN'])})
weatherData = newWeatherData
#time.sleep(5)
return {"DATE":str(newVal)},201
class SelectHistory(Resource):
def get(self, date_id):
val = int(date_id)
bVal = False
#time.sleep(5)
global weatherData
for row in weatherData:
if(row['DATE'] == val):
wd = row
bVal = True
break
if bVal:
return {"DATE":str(wd['DATE']),"TMAX":float(wd['TMAX']),"TMIN":float(wd['TMIN'])}
else:
return "HTTP Error code 404",404
def delete(self, date_id):
val = int(date_id)
wdIter = None
for row in weatherData:
if(row['DATE'] == val):
wdIter = row
break
if wdIter != None:
weatherData.remove(wdIter)
return {"DATE":str(val)},204
else:
return "HTTP Error code 404",404
Is there any way I can assure that my global variable is up to date or make my API wait to return until I'm sure that the update has been passed along? This was supposed to be a simple application. I would really rather not have to learn how to use threads in python just yet. I've made sure that my calls get request is not starting until after the post has given a response. I know that one workaround was to use sleep to delay my responses, I would rather understand why my update isn't occurring immediately in the first place.

I believe your problem is the application context. As stated here:
The application context is created and destroyed as necessary. It
never moves between threads and it will not be shared between
requests. As such it is the perfect place to store database connection
information and other things. The internal stack object is called
flask._app_ctx_stack. Extensions are free to store additional
information on the topmost level, assuming they pick a sufficiently
unique name and should put their information there, instead of on the
flask.g object which is reserved for user code.
Though it says you can store data at the "topmost level," it's not reliable, and if you extrapolate your project to use worker processes with uWSGI, for instance, you'll need persistence to share data between threads regardless. You should be using a database, redis, or at very least updating your .csv file each time you mutate your data.

Getting information from long running background process in Django

I would like to have a computational simulation running on a background process (started with redis rq) where I can query its current state, as well as change parameters using Django.
For the sake of simplicity: let's say I want to run the following code for a long time (which I would set up through a python worker):
def simulation(a=1):
value = 0
while a != None:
value += a
time.sleep(5)
Then, by visiting a URL, it would tell me the current value of value. I could also POST to a URL to change the value of a i.e. a=None to stop the simulation or a=-10 to change the behavior.
What is the best way to do this?

This best way I've found to do this is using cache
from django.core.cache import cache
def simulation(a=1):
value = 0
while a != None:
value += a
cache.set('value', value, 3600)
time.sleep(5)
a = cache.get('a', None)
This does work, but it's quite slow for my needs. Perhaps there's a method using sockets, but I wasn't abe to get it to work. The socket is blocked in the background process.

django/python : error when get value from dictionary

I have python/django code hosted at dotcloud and redhat openshift. For handling different user, I use token and save it in dictionary. But when I get the value from dict, it sometimes throws an error(key value error).
import threading
thread_queue = {}
def download(request):
dl_val = request.POST["input1"]
client_token = str(request.POST["pagecookie"])
# save client token as keys and thread object as value in dictionary
thread_queue[client_token] = DownloadThread(dl_val,client_token)
thread_queue[client_token].start()
return render_to_response("progress.html",
{ "dl_val" : dl_val, "token" : client_token })
The code below is executed in 1 second intervals via javascript xmlhttprequest to server.
It will check variable inside another thread and return the value to user page.
def downloadProgress(request, token):
# sometimes i use this for check the content of dict
#resp = HttpResponse("thread_queue = "+str(thread_queue))
#return resp
prog, total = thread_queue[str(token)].getValue() # problematic line !
if prog == 0:
# prevent division by zero
return HttpResponse("0")
percent = float(prog) / float(total)
percent = round(percent*100, 2)
if percent >= 100:
try:
f_name = thread_queue[token].getFileName()[1]
except:
downloadProgress(request,token)
resp = HttpResponse('<a href="http://'+request.META['HTTP_HOST']+
'/dl/'+token+'/">'+f_name+'</a><br />')
return resp
else:
return HttpResponse(str(percent))
After testing for several days, it sometimes return :
thread_queue = {}
It sometimes succeeds :
thread_queue = {'wFVdMDF9a2qSQCAXi7za': , 'EVukb7QdNdDgCf2ZtVSw': , 'C7pkqYRvRadTfEce5j2b': , '2xPFhR6wm9bs9BEQNfdd': }
I never get this result when I'm running django locally via manage.py runserver, and accessing it with google chrome, but when I upload it to dotcloud or openshift, it always gives the above problem.
My question :
How can I solve this problem ?
Does dotcloud and openshift limit their python cpu usage ?
Or is the problem inside the python dictionary ?
Thank You.

dotCloud has 4 worker processes by default for the python service. When you run the dev server locally, you are only running one process. Like #martijn said, your issue is related to the fact that your dict isn't going to be shared between these processes.
To fix this issue, you could use something like redis or memcached to store this information instead. If you need a more long term storage solution, then using a database is probably better suited.
dotCloud does not limit the CPU usage, The CPU is shared amongst others on the same host, and allows bursting, but in the end everyone has the same amount of CPU.
Looking at your code, you should check to make sure there is a value in the dict before you access it, or at a minimum surround the code with a try except block, to handle the case when the data isn't there.
str_token = str(token)
if str_token in thread_queue:
prog, total = thread_queue[str_token].getValue() # problematic line !
else:
# value isn't there, do something else

Presumably dotcloud and openshift run multiple processes of your code; the dict is not going to be shared between these processes.
Note that that also means the extra processes will not have access to your extra tread either.
Use an external database for this kind of information instead. For long-running asynchronous jobs like these you also need to run them in a separate worker process. Look at Celery for an all-in-one solution for asynchronous job handling, for example.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Background task with progress tracking without Celery or Redis - python

Related

Python GSpead On-Change Listener?

redis block until key exists

Flask get request not using the updated version of a global variable

Getting information from long running background process in Django

django/python : error when get value from dictionary

Categories

Resources