python Queue not passing correct data - python

Just found the Queue module which is helping me adapt the pyftpdlib module. I'm running an very strict FTP server, and my goal is to restrict the filenames available to upload. This is to prevent people from uploading whatever they want (it's actually the backend of an upload client, not a complete FTP server).
I have this in the ftpserver Authorizer:
def fetch_worlds(queue, username):
while queue.empty():
worlds = models.World.objects.filter(is_synced=True, user__username=username)
print worlds
queue.put(worlds, timeout=1)
class FTPAuthorizer(ftpserver.DummyAuthorizer):
def __init__(self):
self.q = Queue.Queue()
self.t = None # Thread
self.world_item = None
def has_perm(self, username, perm, path=None):
print "Checking permission\n"
if perm not in ['r','w']:
return False
# Check world name
self.t = threading.Thread(target=fetch_worlds, args=(self.q, username))
self.t.daemon = True
self.t.start()
self.world_item = self.q.get()
print "WORLDITEM: %s" % self.world_item
if path is not None:
path = os.path.basename(path)
for world in self.world_item:
test = "{0}_{1}.zip".format(username, world.name)
if path == test:
print "Match on %s" % test
return True
return False
My issue is, after the server starts, the first time I STOR a file, it does an initial db call and gets all the worlds properly. But when I then add another world (for example, set is_synced=True on one, it still returns the old data from self.q.get(). has_perm() is called every time a file is uploaded, and it needs to return live data (to check if a file is allowed).
For example, brand new server:
STOR file.zip, self.q.get() returns <World1, World2>
Update the database via other methods etc
STOR file2.zip, inside fetch_worlds, print worlds returns <World1, World2, World3> but self.q.get() returns <World1, World2>
Just found the Queue module and it seemed like it would be helpful but I can't get the implementation right.
(also couldn't add tag pyftpdlib)

i think this is what could be happening here:
when has_perm is called, you create a Thread that will query a database (?) to add elements to the queue
after calling start the call to the database will take some time
meanwhile in your main thread you entered q.get which will block.
the db call finishes and the result is added to the queue
and is immediately removed from the queue again by the blocking q.get
the queue is now empty, your thread enters the while-loop again and executes the same query again and puts the result onto the queue.
the next call to q.get will return that instead of what it expects.
you see, you could have a race condition here, that already is aparent from the fact that you're adding something to a queue in a loop while you don't have a loop when pulling.
also you assume the element you get from the queue is the result to what you have put onto it before. that doesn't have to be true. if you call has_perm two times this will result in two calls to fetch_worlds with the possibility that the queue.empty() check for one of the calls fails. so only one result will be put onto the queue. now you have two threads waiting on q.get, but only one will get a result, while the oter one waits until one becomes ready...
has_perm looks like it should be a blocking call anyway.

Related

Add webserver to existing python service

I have a script that continually runs, processing data that it gets from an external device. The core logic follows something like:
from external_module import process_data, get_data, load_interesting_things
class MyService:
def __init__(self):
self.interesting_items = load_interesting_things()
self.run()
def run(self):
try:
while True:
data = get_data()
for item in self.interesting_items:
item.add_datapoint(process_data(data, item))
except KeyboardInterrupt:
pass
I would like to add the ability to request information for the various interesting things via a RESTful API.
Is there a way in which I can add something like a Flask web service to the program such that the web service can get a stat from the interesting_items list to return? For example something along the lines of:
#app.route("/item/<idx>/average")
def average(idx: int):
avg = interesting_items[idx].getAverage()
return jsonify({"average":avg})
Assuming there is the necessary idx bounds checking and any appropriate locking implemented.
It does not have to be Flask, but it should be light weight. I want to avoid using a database. I would prefer to use a webservice, but if it is not possible without completely restructuring the code base I can use a socket instead, but this is less preferable.
The server would be running on a local network only and usually only handling a single user, sometimes it may have a few.
I needed to move the run() method out of the __init__() method, so that I could have a global reference to the service, and start the run method in a separate thread. Something along the lines of:
service = MyService()
service_thread = threading.Thread(target=service.run, daemon=True)
service_thread.start()
app = flask.Flask("appname")
...
#app.route("/item/<idx>/average")
def average(idx: int):
avg = service.interesting_items[idx].getAverage()
return jsonify({"average":avg})
...
app.run()

Locust/Python: Splitting a tasks array with if conditions in a SequentialTaskSet

I'm new to Locust, and Python in general. I've been using JMeter for several years, and I'm trying to implement similar logic to what I've used there for handling failures in login.
I want to run a simple scenario: Login, Enter a folder, Logout
Where I'm having trouble is implementing this logic:
If any login call fails, I don't want to attempt the folder enter (avoid a cascading failure after login), but I still want to run the logout to ensure there is no active session.
My current SequentialTaskSet reads like this, which works for executing the child tasksets:
class folder_enter_scalability_taskset(SequentialTaskSet):
def on_start(self):
self.tm = TransactionManager()
#task
def seedata_config(self):
seeddata = next(seeddata_reader)
self.client.username = seeddata['username']
self.client.password = seeddata['password']
tasks = [login_taskset, folder_enter_taskset, logout_taskset]
Is there a way to split up the tasks array into multiple steps within the same SequentialTaskSet?
Always Login
if login passed, enter folder
Always Logout
Thanks to phil in the comments for pointing me in the right direction, I now have a working code with try-finally and if-else instead of a tasks array. Hope this can help someone else:
class folder_enter_scalability_taskset(SequentialTaskSet):
def on_start(self):
self.tm = TransactionManager()
#task
def seedata_config(self):
seeddata = next(seeddata_reader)
self.client.username = seeddata['username']
self.client.password = seeddata['password']
self.client.userrole = seeddata['userrole']
#task
def folder_enter_scalability(self):
try:
login_taskset.login(self)
if self.client.loginFail is False:
folder_enter_taskset.folder_enter(self)
else:
logging.error("Login failed, skipping to logout")
finally:
logout_taskset.logout(self)
I wasn't having luck with getting it to skip folder enter if I placed it in a try-else-finally condition, so I went with adding this self.client.loginFail in the login taskset to support if-else within try.
It's initiated as 'False', and flipped to 'True' in case of failure in the taskset.

Requesting a Simpy resource never succeeds

I'm currently trying to model a service counter with SimPy but I'm running into difficulties with using yield to hold the resources. Under the Counter.arrive() function, if the line "yield req" exists, the entire function skips execution (at least I think that's what happens since I don't get any of the print output). However, if I comment out that line, the code executes like nothing happens. This is a problem because without the yield the code is not blocked until the request is approved and the entire simulation fails because everyone gets to use the resource.
Code snippet as follows:
import simpy
class Counter:
def __init__(self, env, name, staff):
self.env = env
self.staff = simpy.Resource(env, staff)
self.name = name
self.dreq = []
def arrive(self, name):
...
req = self.staff.request()
yield req
output = "Req: %s\n" % req
self.dreq.append(req)
...
print(output)
...
def customer(env, counter, name):
print("Customer %s arrived at %s" %(name,env.now))
counter.arrive(name)
yield env.timeout(5)
print("Customer %s left at %s" %(name,env.now))
...
env = simpy.Environment()
counter = Counter(env, "A", 1)
def setup(env, counter, MAX_CUST):
for i in range(MAX_CUST):
env.process(customer(env,counter, 1))
yield env.timeout(1)
env.process(setup(env,counter,5))
env.run(until=100)
Edit: I understand that using yield should pause the function until the request gets approved but the very first request does not go through as well which does not make sense as there is 1 unit of the resource available at the start.
Docs for convenience: https://simpy.readthedocs.io/en/3.0.6/topical_guides/resources.html
Requests (and timeouts and everything which you need to yield) gets processed by simpy, so it needs to arrive at simpy to get processed. You tell simpy to process customer with env.process:
env.process(customer(env,counter, 1))
In customer you call counter.arrive(name). Because arrive is a generator (because of yield) it does nothing until something calls next on it. Simpy needs to know of it to process it properly. You should be able to do this by:
env.process(counter.arrive(name))
which should fix your problem.
Note that also in this code you never release the ressource, so only one customer can actually arrive.

rq queue always empty

I'm using django-rq in my project.
What I want to achieve:
I have a first view that loads a template where an image is acquired from webcam and saved on my pc. Then, the view calls a second view, where an asynchronous task to process the image is enqueued using rq. Finally, after a 20-second delay, a third view is called. In this latter view I'd like to retrieve the result of the asynchronous task.
The problem: the job object is correctly created, but the queue is always empty, so I cannot use queue.fetch_job(job_id). Reading here I managed to find the job in the FinishedJobRegistry, but I cannot access it, since the registry is not iterable.
from django_rq import job
import django_rq
from rq import Queue
from redis import Redis
from rq.registry import FinishedJobRegistry
redis_conn = Redis()
q = Queue('default',connection=redis_conn)
last_job_id = ''
def wait(request): #second view, starts the job
template = loader.get_template('pepper/wait.html')
job = q.enqueue(processImage)
print(q.is_empty()) # this is always True!
last_job_id = job.id # this is the expected job id
return HttpResponse(template.render({},request))
def ceremony(request): #third view, retrieves the result
template = loader.get_template('pepper/ceremony.html')
print(q.is_empty()) # True
registry = FinishedJobRegistry('default', connection=redis_conn)
finished_job_ids = registry.get_job_ids() #here I have the correct id (last_job_id)
return HttpResponse(template.render({},request))
The question: how can I retrieve the result of the asynchronous job from the finished job registry? Or, better, how can I correctly enqueue the job?
I have found an other way to do it: I'm simply using a global list of jobs, that I'm modifying in the views. Anyway, I'd like to know the right way to do this...

Threading memory profiling

So I hope this isn't a duplicate, however I either haven't been able to find the adequate solution or I just am not 100% on what I'm looking for. I've written a program to thread lots of requests. I create a thread to
Fetch responses from a number of api's such as this: share.yandex.ru/gpp.xml?url=MY_URL as well as scraping blogs
Parse the responses of all requests from the example above/ json/ using python-goose to extract articles
Return the parsed results back to the primary thread and insert into a database.
It's all been going well until it needs to pull back larger amounts of data which i haven't tested before. The primary reason for this is that it takes me over my shared memory limit on a shared Linux server (512mb) initiating a kill. This should be enough as it's only a few thousand requests, although i could be wrong. I'm clearing all large data variables/ objects within the main thread but that doesn't seem to help either.
I ran a memory_profile on the primary function which creates the threads with a thread class which looks like this:
class URLThread(Thread):
def __init__(self,request):
super(URLThread, self).__init__()
self.url = request['request']
self.post_id = request['post_id']
self.domain_id = request['domain_id']
self.post_data = request['post_params']
self.type = request['type']
self.code = ""
self.result = ""
self.final_results = ""
self.error = ""
self.encoding = ""
def run(self):
try:
self.request = get_page(self.url,self.type)
self.code = self.request['code']
self.result = self.request['result']
self.final_results = response_handler(dict(result=self.result,type=self.type,orig_url=self.url ))
self.encoding = chardet.detect(self.result)
self.error = self.request['error']
except Exception as e:
exc_type, exc_obj, exc_tb = sys.exc_info()
fname = os.path.split(exc_tb.tb_frame.f_code.co_filename)[1]
errors.append((exc_type, fname, exc_tb.tb_lineno,e,'NOW()'))
pass
#profile
def multi_get(uris,timeout=2.0):
def alive_count(lst):
alive = map(lambda x : 1 if x.isAlive() else 0, lst)
return reduce(lambda a,b : a + b, alive)
threads = [ URLThread(uri) for uri in uris ]
for thread in threads:
thread.start()
while alive_count(threads) > 0 and timeout > 0.0:
timeout = timeout - UPDATE_INTERVAL
sleep(UPDATE_INTERVAL)
return [ {"request":x.url,
"code":str(x.code),
"result":x.result,
"post_id":str(x.post_id),
"domain_id":str(x.domain_id),
"final_results":x.final_results,
"error":str(x.error),
"encoding":str(x.encoding),
"type":x.type}
for x in threads ]
And the results look like this on the first batch of requests i pump through it (FYI it's a link as the output text isn't readable in here, i can't paste a html table or embed an image until i get 2 more points ):
http://tinypic.com/r/28c147d/8
And it doesn't seem to drop any of the memory in subsequent passes (I'm batching 100 requests/ threads through at 1 time). By this i mean once a batch of threads is complete they seem to stay in memory ad every time it runs another, memory is added as below:
http://tinypic.com/r/nzkeoz/8
Am I doing something really stupid here?
Python will generally free the memory taken up by an object when there are no references to that object left. Your multi_get function returns a list that contains references to every thread that you have created. So it's unlikely that Python would free that memory. But we would need to see what the code that is calling multi_get is doing in order to be sure.
To start freeing the memory you will need to stop returning references to the threads from this function. Or if you want to continue to do that, at least delete them somewhere del x.

Categories

Resources