Sending messages to Faust topic from local file / List - python

I want to be able to either consume messages from a Kafka broker or a local file with data in it. How do I do this with Faust without writing a very similar function without Faust that just uses a simple for loop to iterate over messages?
Or is it better to just avoid Faust in this case? Still learning all of this, not sure if this should be even done.
#app.agent(input_topic)
async def myagent(messages):
async for item in stream:
result = do_something(item)
await output_topic.send(result)
How do I modify this code block to be able to accept messages from a given file/list as well (depending on a config variable that will be set)? Or to send the messages from a file/list to the input topic?

As you said, you don't need Faust. (Plus, it can't read files).
Use kafka-python, aiokafka, etc. Use open('file') like you would with any other file, and read it, then produce data from it

If you need stateful processing, bytewax is a good option instead of Faust because of the flexibility of inputs.

Related

Kafka consume single message on request

I want to make a flask application/API with gunicorn that on every request-
reads a single value from a Kafka topic
does some processing
and returns the processed value to the user(or any application calling the API).
So, far I couldn't find any examples of it. So, is the following function is the correct way of doing this?
consumer = KafkaConsumer(
"first_topic",
bootstrap_servers='xxxxxx',
auto_offset_reset='xxxx',
group_id="my_group")
def get_value_from_topic:
for msg in consumer:
return msg
if __name__ == "__main__":
print(get_value_from_topic())
Or is there any better way of doing this using any library like Faust?
My reason for using Kafka is to avoid all the hassle of synchronization among the flask workers(in the case of traditional database) because I want to use a value from Kafka only once.
This seems okay at first glance. Your consumer iterator is iterated once, and you return that value.
The more idiomatic way to do that would be like this, however
def get_value_from_topic():
return next(consumer)
With your other settings, though, there's no guarantee this only polls one message because Kafka consumers poll in batches, and will auto-commit those batches of offsets. Therefore, you'll want to disable auto commits and handle that on your own, for example committing after handling the http request will give you at-least-once delivery, and committing before will give you at-most-once. Since you're interacting with an HTTP server, Kafka can't give you exactly once processing

RabbitMQ cleanup after consumers

I having problem handling the following scenario:
I have one publisher which wants to upload a lot of binary information (Like images), so instead I want it to save the image and upload a path or some reference to that file.
I have multiple different consumers which are reading from this MQ and do different things.
In order to do that, I simply send the information in fan-out to some exchange and define several queues for each different consumers.
This could work just fine, except for the trashing of the FS. Since no one is responsible for deleting the saved images. I need some way of defining a hook to the time every consumer is done consuming a message from an exchnage? Maybe setting some callback for the cleanup of the message in the exchnage?
Few notes:
Everything happens locally, we can assume that everything is on the same FS for simplicity.
I know that I can simply let the publisher save the image and give FS links for the different consumers, but this solution is problematic, since I want the publisher to be oblivious to the consumers. I don't want to update the publisher's code every time a new consumer may be used (or one can be removed).
I am working with python. (pika module)
I am new to Message Queues, so if you have a better suggestion to get things done, I would love to learn about it.
Once the image is processed by consumer publish message FileProcessed with the information related to the file. That message can be picked up by another consumer which is in charge of cleaning up the messages, so that consumer will remove the file.
Additionally, make sure that your messages is re-queued in case of failure, so they will be picked up later and their processing will be retried. Make sure the retry count is limited, when the limit is reached route the message to dead letter exchange.
Some useful links below:
pika.BasicProperties for handling retries.
RabbitMQ tutorial
Pika DLX Implementation

Is there a way to access MQTT messages in Python as a generator/iterator?

Currently I need to write a python script, which should analyze some data provided via MQTT. The method I have to use for this needs a generator/iterator as a parameter. Sadly it seems like the paho-mqtt lib in python can access the messages only via the on_message callback method and just putting a 'yield' in that callback shouldn't work. Is there a way to access the published messages as a generator or is there a possibility to put them into one (maybe via multithreading) or is there maybe another package I could use for this? I am not that familiar with python and I couldn't find a solution.
Hope someone has an idea.
Cheers
Niklas
Meanwhile there is such a library: asyncio-mqtt
It is a wrapper around paho-mqtt that build on top of standard asyncio.
The nice thing is that it provides a contextmanager/generator interface for subscribing and reading messages which it makes it really convenient to work with. No more callbacks. From the docs
async with Client("test.mosquitto.org") as client:
async with client.filtered_messages("floors/+/humidity") as messages:
await client.subscribe("floors/#")
async for message in messages:
print(message.payload.decode())
Nice is also that it can be combined with other async code to add features. For instance a timeout or combination of several async streams into one (see aiostream). The (async) generator approach is really modular and flexible, which is not the case for the callback approach.

python best way to publish/receive data between programs

I'm trying to figure out the best way to publish and receive data between separate programs. My ideal setup is to have one program constantly receive market data from an external websocket api and to have multiple other programs use this data. Since this is market data from an exchange, the lower the overhead the better.
My first thoughts were to write out a file and have the others read it, but that seems like there would be file locking issues. Another approach I tried was to use UDP sockets, but it seems like the socket blocks the rest of the program when receiving. I'm pretty new at writing full fledged programs instead of little scripts so sorry if this a dumb question. Any suggestions would be appreciated. Thanks!
You can use SQS, It is easy to use and the Python
documentation for it is great. If you want a free one you can use Kafka
Try something like an message queue, e.g. https://github.com/kr/beanstalkd, and you essentially control it via the client ... one that collects and sends, and one that consumes and marks what it has read ... and so on.
Beanstalk is super-light-weight and simple compared to other message queues which are more like multi app. systems rather than queues necessarily.

Pika - Rabbitmq, using Basic.get to cosume single message from queue

I'm using the method shown here like this:
while method_frame is None:
method_frame, header_frame, method_frame= channel.basic.get("test_queue)
It's looks like this polling is not so efficient this way, because basic get is working also if queue is empty, and bringing empty messages.
I need a kind of logic which takes a single message, only when I have the opportunity to take care of it, that's why I chose basic.get and not basic.consume.
Do anybody has and idea for doing a more efficient polling maybe by using some pika's library other mechanism?
Try using basic.consume(ack=true) with basic.qos(prefetch_count=1).
You need to see how to do that with your particular library

Categories

Resources