Kafka producer with python multi threading - python

I'm planning to build a critical component which produces the messages to kafka. I'm just thinking, is there any way that python multithreading will help us in writing the efficient kafka producer with asyncio, multithreading ?
Also here I'm planning to create a thread on demand, the use case is like kafka producer script needs to consume the textfiles and produce it kafka (catch here is, this needs to be done on demand).
Design will be like, kafka producer script will read the ondemand request from redis/rabbitmq queue, once we got the request, I'm planning to create a thread for each request. request contain which textfile to read and send.
Is it possible to implement in python using multithreading and asyncio ? any help on this appreciated. thanks

Related

Can I used Producer and Consumer Design with Multiprocessing and Threading together instead of kafka-python

Hi everyone I need some guidance it is easy to just implement Producer and Consumer Design using only mutiprocsessing with threading or it is done by kafka python lib.
I am thinking to create one process for producer with multiple threads to deal with multiple api calls and then create multiple process consumers which get data from Queue and do some machine learning task.
OR I can used Kafka python to create producers and consumers.
SO I need some guidance which solution is better or if anyone have more suitable solution so kindly guide me.

Concurrent processing of messages (Python)

I have the following scenario:
There is one thread that manages long-polling HTTP connection (non-stop) from an API. When a new message arrives, it must be processed within the special process() method.
I just want to design it in a way that incoming messages will be processed concurrently, but there is another important point: in the end of each processing an answer should be passed to the outcoming queue, which is organized in a separated thread. From there the answers will be sent via HTTP.
Here is a scheme:
Let's consider that it can be 30-50 messages in a second, and procces method will work from 1 up to 10 seconds.
The question is: what library or framework can I use to implement this architecture?
As far as I have researched, Python Tornado have good benchmarks, but here I do not need a web framework, just a tool that can provide a concurrent running of message processors.
Your message rate is pretty low. So you may freely use "standard" tools like RabbitMQ/Redis, Celery ("Celery Project") and asyncio.
RabbitMQ/Redis with Celery - are great tools to implement queues and manage your tasks and processes.
Asyncio is faster than Tornado but it doesn't matter for your task. What is more important is that asyncio gives you all the benefits of modern async/await coroutine technique.

Need queue module to be shared between two applications

I need to share some queue between two applications on same machine, one is Tornado which is going to occasionally add message to that queue and another is python script runs from cron which is going in every iteration add new messages. Can anyone suggest me module for this ?
(Can this be solved with redis usage, I avoid to use mysql for this purpose )
I would use redis with a list. You can push a element top, and rpop to remove from the tail.
See redis rpop
and redis lpushx
The purest way I can think of to do this is with IPC. Python has very good support for IPC between two processes when one process spawns another, but not in your scenario. There are python modules for ipc such as sysv_ipc and posix_ipc. But if you are going to have your main application built in tornado, why not just have it listen on a zeromq socket for published messages.
Here is a link with more information. You want the Publisher-Subscriber model.
http://zeromq.github.io/pyzmq/eventloop.html#tornado-ioloop
Your cron job will start and publish messages a to zeromq socket. Your already running application will receive them as subscriber.
Try RabbitMQ for hosting the queue independent of your applications, then access using Pika, which even comes with a Tornado adapter. Just pick the appropriate model: queue/exchange/topic and protocol of the message you want (strings, json, xml, yaml) and you are set.

Python Server, Job Queue, Launch Multiprocessing Job

I need to create a python server that can accept multiple job requests. Then from those it requests, it processes each Job one at a time but the server can still accept new Jobs while processing a task.
Does anyone have an suggestions on how to do this?
Thanks
Sure. Create a multiprocessing.Pool which will by default spawn one process per core. Then use the original process to run an HTTP service or something else that accepts jobs via some protocol. The main process then listens for new requests and submits them to the pool for async processing.
Use twisted. Twisted is an event-driven networking engine. Twisted also supports many common network protocols, including SMTP, POP3, IMAP, SSHv2, and DNS.

Python stdout to queue broker or a websocket

Is there a way to push stdout into a queue broker or to a websocket?
So far I've been unable to find a clear explanation on how to do this.
I have several processes running in parallel and the idea is to create a UI where you can switch from process to process and take a look at what they are doing.
One approach that will work (and is non-blocking, can serve multiple clients) is using Python, Twisted and Autobahn:
Connect one or multiple ProcessProtocol instances
http://twistedmatrix.com/documents/current/core/howto/process.html
to a Twisted WebSocket server
https://github.com/tavendo/AutobahnPython
https://github.com/tavendo/AutobahnPython/tree/master/examples/websocket/broadcast
https://github.com/tavendo/AutobahnPython/tree/master/examples/wamp/pubsub/simple
Disclosure: I am author of Autobahn.

Categories

Resources