I'm working with the huey task queue https://github.com/coleifer/huey in flask . I'm trying to run a task and get a task id number back from my initial function:
#main.route('/renew',methods=['GET', 'POST'])
def renew():
print(request.form)
user =request.form.get('user')
pw =request.form.get('pw')
res =renewer(user,pw)
res(blocking=True) # Block for up to 5 seconds
print(res)
return res.id
After running this I plug the outputted id (which is the same as the result in the screenshot)
into :
#main.route('/get_result_by_id',methods=['GET', 'POST'])
def get_result_by_id():
print(request.form)
id =request.form.get('id')
from ..tasking.tasks import my_huey
res = my_huey.result(id)
if res==None:
res = 'no value'
return res
However I'm getting 'no value'
How can I access the value in the data store?
When you are doing res(blocking=True) in def renew() you are fetching the result from the result store and effectively removing it. When you then try to fetch the result again using the id, it will just return nothing.
You have 2 options to solve this:
Either use res(blocking=True, preserve=True) to preserve the result in the result store so you can still fetch it with your second call.
Use a storage that uses expiring results like RedisExpireStorage. When configuring this storage while setting up the huey instance, you can specify for how long your results should be stored. This would give you x amount of time to do the second call based on the task/result id.
Related
I am defining a method which fetches all accountIDs from an organization.
If I am using get_paginator('list_accounts'), then am I okay if I do not check the NextToken?
Code to get the list of all AWS account IDs in the organization:
def get_all_account_ids():
org_client = boto3.client('organizations')
paginator = org_client.get_paginator('list_accounts')
page_iterator = paginator.paginate()
account_ids = []
for page in page_iterator:
for acct in page['Accounts']:
print(acct['Id']) # print the account id
# add to account_ids list
account_ids.append(acct['Id'])
return account_ids
I have seen examples of using either get_paginator() call or while loop checking for NextToken. But I have not seen example using both paginator and NextToken?
No you don't have to check NextToken. That's the point of paginators:
Paginators are a feature of boto3 that act as an abstraction over the process of iterating over an entire result set of a truncated API operation.
I currently have my flask running static where I run the ETL job independently and then use the result dataset in my Flask to display chartjs line charts.
However, I'd like to integrate the ETL piece in my web framework where my users can login and submit the input parameter(locations of the input files and an added version id) using HTML form, which will then be used by my ETL job to run and use the resultant data directly to display the charts on same page.
Current Setup:
My custom ETL module has submodules that act together to form a simple pipeline process:
globals.py - has my globals such as location of the s3 and the etc. ideally i'd like my user's form inputs to be stored here so that they can be used directly in all my submodules wherever necessary.
s3_bkt = 'abc' #change bucket here
s3_loc = 's3://'+s3_bkt+'/'
ip_loc = 'alv-input/'
#Ideally ,I'd like my users form inputs to be sitting here
# ip1 = 'alv_ip.csv'
# ip2 = 'Input_product_ICL_60K_Seg15.xlsx'
#version = 'v1'
op_loc = 'alv-output/'
---main-module.py - main function
import module3 as m3
import globals as g
def main(ip1,ip2,version):
data3,ip1,ip2,version = m3.module3(ip1,ip2,version)
----perform some actions on the data and return---
return res_data
---module3.py
import module2 as m2
def mod3(ip1,ip2,version):
data2,ip1,ip2,version = m2.mod2(ip1,ip2,version)
----perform some actions on the data and return---
return data3
---module2.py
import module1 as m1
import globals as g
def mod2(ip1,ip2,version):
data1,ip1,ip2,version = m1.mod1(ip1,ip2,version)
data_cnsts = pd.read_csv(ip2) #this is where i'll be using the user's input for ip2
----perform some actions on the datasets and write them to location with version_id to return---
data1.to_csv(g.op_loc+'data2-'+ version + '.csv', index=False)
return data2
---module1.py
import globals as g
def mod1(ip1,ip2,version):
#this is the location where the form input for the data location should be actually used
data = pd.read_csv(g.s3_loc+g.ip_loc+ip1)
----perform some actions on the data and return---
return data1
Flask setup:
import main-module as mm
app = Flask(__name__)
#this is where the user first hits and submits the form
#app.route('/form')
def form():
return render_template('form.html')
#app.route('/result/',methods=['GET', 'POST'])
def upload():
msg=''
if request.method == 'GET':
return f"The URL /data is accessed directly. Try going to '/upload' to submit form"
if request.method == 'POST':
ip1 = request.form['ip_file']
ip2 = request.form['ip_sas_file']
version = request.form['version']
data = mm.main(ip1,ip2,version)
grpby_vars = ['a','b','c']
grouped_data = data.groupby(['mob'])[grpby_vars].mean().reset_index()
#labels for the chart
a = [val for val in grouped_data['a']]
#values for the chart
b = [round(val*100,3) for val in grouped_data['b']]
c = [round(val*100,3) for val in grouped_data['c']]
d = [val for val in grouped_data['d']]
return render_template('results.html', title='Predictions',a=a,b=b,c=c,d=d)
The Flask setup works perfectly fine without using any form inputs from the user(if the ETL job and Flask is de-coupled i.e. when I run the ETL and supply the result data location to Flask directly.
Problem:
The problem after integration is that I'm not very sure how to pass these inputs from users to all my sub-module. Below is the error I get when I use the current setup.
data3,ip1,ip2,version = m3.module3(ip1,ip2,version)
TypeError: module3() missing 3 required positional arguments: 'ip1', 'ip2', and 'version'
So, definitely this is due to the issue with my param passing across my sub-modules.
So my question is how do I use the data from the form as a global variable across my sub-modules. Ideally I'd like them to be stored in my globals so that I'd not have to be passing them as a param through all modules.
Is there a standard way to achieve that? Might sound very trivial but I'm struggling hard to get to my end-state.
Thanks for reading through :)
I realized my dumb mistake, I should've have just included
data,ip1,ip2,version = mm.main(ip1,ip2,version)
I could also instead use my globals file by initiating the inputs with empty string and then import my globals in the flask file and update the values. This way I can avoid the route of passing the params through my sub-modules.
I want to batch create users for admin. But, the truth is make_password is a time-consuming task. Then, if I returned the created user-password list util the new users are all created, it will let front user waiting for a long time. So, i would like to do somethings like code showed below. Then, I encountered a problem. Cause I can not figure out how to lock the user_id_list for creating, someone registered during the thread runtime will cause an Dulplicate Key Error error thing like that.
So, I am looking forward your good solutions.
def BatchCreateUser(self, request, context):
"""批量创建用户"""
num = request.get('num')
pwd = request.get('pwd')
pwd_length = request.get('pwd_length') or 10
latest_user = UserAuthModel.objects.latest('id') # retrieve the lastest registered user id
start_user_id = latest_user.id + 1 # the beginning user id for creating
end_user_id = latest_user.id + num # the end user id for creating
user_id_list = [i for i in range(start_user_id, end_user_id + 1)] # user id list for creating
raw_passwords = generate_cdkey(num, pwd_length, False) # generating passwords
Thread(target=batch_create_user, args=(user_id_list, raw_passwords)).start() # make a thread to perform this time-consuming task
user_password_list = list(map(list, zip(*[user_id_list, raw_passwords]))) # return the user id and password immediately without letting front user waiting so long
return {'results': user_password_list}
I'm using django-rq in my project.
What I want to achieve:
I have a first view that loads a template where an image is acquired from webcam and saved on my pc. Then, the view calls a second view, where an asynchronous task to process the image is enqueued using rq. Finally, after a 20-second delay, a third view is called. In this latter view I'd like to retrieve the result of the asynchronous task.
The problem: the job object is correctly created, but the queue is always empty, so I cannot use queue.fetch_job(job_id). Reading here I managed to find the job in the FinishedJobRegistry, but I cannot access it, since the registry is not iterable.
from django_rq import job
import django_rq
from rq import Queue
from redis import Redis
from rq.registry import FinishedJobRegistry
redis_conn = Redis()
q = Queue('default',connection=redis_conn)
last_job_id = ''
def wait(request): #second view, starts the job
template = loader.get_template('pepper/wait.html')
job = q.enqueue(processImage)
print(q.is_empty()) # this is always True!
last_job_id = job.id # this is the expected job id
return HttpResponse(template.render({},request))
def ceremony(request): #third view, retrieves the result
template = loader.get_template('pepper/ceremony.html')
print(q.is_empty()) # True
registry = FinishedJobRegistry('default', connection=redis_conn)
finished_job_ids = registry.get_job_ids() #here I have the correct id (last_job_id)
return HttpResponse(template.render({},request))
The question: how can I retrieve the result of the asynchronous job from the finished job registry? Or, better, how can I correctly enqueue the job?
I have found an other way to do it: I'm simply using a global list of jobs, that I'm modifying in the views. Anyway, I'd like to know the right way to do this...
I want to get the top followed followers of a user in twitter using python-twitter. And that without getting the 'Rate limit exceeded' error message.
I can get followers of a user then get the number of folowers of each one, but the problem is when that user is big (thousands).
I use the following function to get the followers ids of a particular user:
def GetFollowerIDs(self, userid=None, cursor=-1):
url = 'http://twitter.com/followers/ids.json'
parameters = {}
parameters['cursor'] = cursor
if userid:
parameters['user_id'] = userid
json = self._FetchUrl(url, parameters=parameters)
data = simplejson.loads(json)
self._CheckForTwitterError(data)
return data
and my code is:
import twitter
api = twitter.Api(consumer_key='XXXX',
consumer_secret='XXXXX',
access_token_key='XXXXX',
access_token_secret='XXXXXX')
user=api.GetUser(screen_name="XXXXXX")
users=api.GetFollowerIDs(user)
#then i make a request per follower in users so that I can sort them according to the number of followers.
the problem is that when the user has a lot of followers i get the 'Rate limit exceeded' error message.
I think you need to get the results in chunks as explained in this link.
This is the work around currently shown on the github page. But if you would want an unlimited stream, you should upgrade the subscription for your twitter application.
def GetFollowerIDs(self, userid=None, cursor=-1, count = 10):
url = 'http://twitter.com/followers/ids.json'
parameters = {}
parameters['cursor'] = cursor
if userid:
parameters['user_id'] = userid
remaining = count
while remaining > 1:
remaining -= 1
json = self._FetchUrl(url, parameters=parameters)
try:
data = simplejson.loads(json)
self._CheckForTwitterError(data)
except twitterError:
break
return data
def main():
api = twitter.Api(consumer_key='XXXX',
consumer_secret='XXXXX',
access_token_key='XXXXX',
access_token_secret='XXXXXX')
user=api.GetUser(screen_name="XXXXXX")
count = 100 # you can find optimum value by trial & error
while(#users not empty):
users=api.GetFollowerIDs(user,count)
Or another possibility might be to try running Cron jobs in intervals as explained here.
http://knightlab.northwestern.edu/2014/03/15/a-beginners-guide-to-collecting-twitter-data-and-a-bit-of-web-scraping/
Construct your scripts in a way that cycles through your API keys to stay within the rate limit.
Cronjobs — A time based job scheduler that lets you run scripts at designated times or intervals (e.g. always at 12:01 a.m. or every 15 minutes).