Pika - Rabbitmq, using Basic.get to cosume single message from queue - python

I'm using the method shown here like this:
while method_frame is None:
method_frame, header_frame, method_frame= channel.basic.get("test_queue)
It's looks like this polling is not so efficient this way, because basic get is working also if queue is empty, and bringing empty messages.
I need a kind of logic which takes a single message, only when I have the opportunity to take care of it, that's why I chose basic.get and not basic.consume.
Do anybody has and idea for doing a more efficient polling maybe by using some pika's library other mechanism?

Try using basic.consume(ack=true) with basic.qos(prefetch_count=1).
You need to see how to do that with your particular library

Related

multi-thread application with queue, best approach to deliver each reply to the right caller?

Consider a multi-thread application, in which different pieces of code send commands to a background thread/service through a command queue, and consequently the service puts the replies in a reply queue. Is there a commonly accepted “strategy” for ensuring that a specific reply gets delivered to the rightful caller?
Coming to my specific case (a program in Python3), I was thinking about setting both the command and reply queues to maxsize=1, so that each caller can just put the command and wait for the reply (which will surely be its own), but this could potentially affect the performances of the application. Or else send a sort of unique code (a hash or similar) with the command, and have the background service include that same string in the reply, so that a caller can go through the replies, looking for its own reply and putting back the other replies in the queue. Honestly I don't like either of them. Is there something else that could be done?
I’m asking this because I’ve spent a fair amount of hours investigating online about threading, and reading through the official documentation, but I couldn’t make up my mind on this. I’m unsure which could be the right/best approach and most importantly I’d like to know if there is a mainstream approach to achieve this.
I don’t provide any code because the question deals with general application design.
Associating a unique identifier with each request is basically the standard solution to this problem.
This is the solution employed by protocols from various eras, from DNS to HTTP/2.
You can build whatever abstractions you like on top of it. Consider this semi-example using Twisted's Deferred:
def request(args):
uid = next(id_generator)
request_queue.put((uid, args))
result = waiting[uid] = Deferred()
return result
def process_responses():
uid, response = response_queue.get()
result = waiting.pop(uid)
result.callback(response)
#inlineCallbacks
def foo_doer():
foo = yield request(...)
# foo is the response from the response queue.
The basic mechanism is nothing more than unique-id-tagged items in the two queues. But the user isn't forced to track these UIDs. Instead, they get an easy-to-use abstraction that just gives them the result they want.

python best way to publish/receive data between programs

I'm trying to figure out the best way to publish and receive data between separate programs. My ideal setup is to have one program constantly receive market data from an external websocket api and to have multiple other programs use this data. Since this is market data from an exchange, the lower the overhead the better.
My first thoughts were to write out a file and have the others read it, but that seems like there would be file locking issues. Another approach I tried was to use UDP sockets, but it seems like the socket blocks the rest of the program when receiving. I'm pretty new at writing full fledged programs instead of little scripts so sorry if this a dumb question. Any suggestions would be appreciated. Thanks!
You can use SQS, It is easy to use and the Python
documentation for it is great. If you want a free one you can use Kafka
Try something like an message queue, e.g. https://github.com/kr/beanstalkd, and you essentially control it via the client ... one that collects and sends, and one that consumes and marks what it has read ... and so on.
Beanstalk is super-light-weight and simple compared to other message queues which are more like multi app. systems rather than queues necessarily.

Handling multiple clients with twisted and spyne

I'm trying to create a simple python server that can handle multiple RCP calls at the same time. I would like to use twisted for the networking and spyne to handle the RPCs. I found a good example in the spyne github repo here, but when I make a call to say_hello_with_sleep using curl I get an error.
exceptions.AssertionError: It looks like this protocol is not
async-compliant yet
This is the only one of the RPCs thats doesn't seem to work and the one that defines the type of nonblocking call I'm looking for.
The final RPCs that I need to implement will take around 40 sec to process before returning the request and Im honestly not sure if this is the best way to go about handling multiple requests at the same time.
Any help or direction would be greatly appreciated. Thanks!
This is fixed and will be released as part of Spyne 2.13.
You can use code from the master branch of http://github.com/arskom/spyne if you can't wait an indefinite amount of time until the release. Code only gets merged there if it passes all tests.

Python Tornado - making POST return immediately while async function keeps working

so I have a handler below:
class PublishHandler(BaseHandler):
def post(self):
message = self.get_argument("message")
some_function(message)
self.write("success")
The problem that I'm facing is that some_function() takes some time to execute and I would like the post request to return straight away when called and for some_function() to be executed in another thread/process if possible.
I'm using berkeley db as the database and what I'm trying to do is relatively simple.
I have a database of users each with a filter. If the filter matches the message, the server will send the message to the user. Currently I'm testing with thousands of users and hence upon each publication of a message via a post request it's iterating through thousands of users to find a match. This is my naive implementation of doing things and hence my question. How do I do this better?
You might be able to accomplish this by using your IOLoop's add_callback method like so:
loop.add_callback(lambda: some_function(message))
Tornado will execute the callback in the next IOLoop pass, which may (I'd have to dig into Tornado's guts to know for sure, or alternatively test it) allow the request to complete before that code gets executed.
The drawback is that that long-running code you've written will still take time to execute, and this may end up blocking another request. That's not ideal if you have a lot of these requests coming in at once.
The more foolproof solution is to run it in a separate thread or process. The best way with Python is to use a process, due to the GIL (I'd highly recommend reading up on that if you're not familiar with it). However, on a single-processor machine the threaded implementation will work just as fine, and may be simpler to implement.
If you're going the threaded route, you can build a nice "async executor" module with a mutex, a thread, and a queue. Check out the multiprocessing module if you want to go the route of using a separate process.
I've tried this, and I believe the request does not complete before the callbacks are called.
I think a dirty hack would be to call two levels of add_callback, e.g.:
def get(self):
...
def _defered():
ioloop.add_callback(<whatever you want>)
ioloop.add_callback(_defered)
...
But these are hacks at best. I'm looking for a better solution right now, probably will end up with some message queue or simple thread solution.

Python twisted asynchronous write using deferred

With regard to the Python Twisted framework, can someone explain to me how to write asynchronously a very large data string to a consumer, say the protocol.transport object?
I think what I am missing is a write(data_chunk) function that returns a Deferred. This is what I would like to do:
data_block = get_lots_and_lots_data()
CHUNK_SIZE = 1024 # write 1-K at a time.
def write_chunk(data, i):
d = transport.deferredWrite(data[i:i+CHUNK_SIZE])
d.addCallback(write_chunk, data, i+1)
write_chunk(data, 0)
But, after a day of wandering around in the Twisted API/Documentation, I can't seem to locate anything like the deferredWrite equivalence. What am I missing?
As Jean-Paul says, you should use IProducer and IConsumer, but you should also note that the lack of deferredWrite is a somewhat intentional omission.
For one thing, creating a Deferred for potentially every byte of data that gets written is a performance problem: we tried it in the web2 project and found that it was the most significant performance issue with the whole system, and we are trying to avoid that mistake as we backport web2 code to twisted.web.
More importantly, however, having a Deferred which gets returned when the write "completes" would provide a misleading impression: that the other end of the wire has received the data that you've sent. There's no reasonable way to discern this. Proxies, smart routers, application bugs and all manner of network contrivances can conspire to fool you into thinking that your data has actually arrived on the other end of the connection, even if it never gets processed. If you need to know that the other end has processed your data, make sure that your application protocol has an acknowledgement message that is only transmitted after the data has been received and processed.
The main reason to use producers and consumers in this kind of code is to avoid allocating memory in the first place. If your code really does read all of the data that it's going to write to its peer into a giant string in memory first (data_block = get_lots_and_lots_data() pretty directly implies that) then you won't lose much by doing transport.write(data_block). The transport will wake up and send a chunk of data as often as it can. Plus, you can simply do transport.write(hugeString) and then transport.loseConnection(), and the transport won't actually disconnect until either all of the data has been sent or the connection is otherwise interrupted. (Again: if you don't wait for an acknowledgement, you won't know if the data got there. But if you just want to dump some bytes into the socket and forget about it, this works okay.)
If get_lots_and_lots_data() is actually reading a file, you can use the included FileSender class. If it's something which is sort of like a file but not exactly, the implementation of FileSender might be a useful example.
The way large amounts of data is generally handled in Twisted is using the Producer/Consumer APIs. This doesn't give you a write method that returns a Deferred, but it does give you notification about when it's time to write more data.

Categories

Resources