I have many doubts about design a simply python program..
I have opened a socket from a server that stream data via simply telnet server.
I have 3 type of strings that begins with RED,BLUE,YELLOW and after that string the data, example:
RED 21763;22;321321
BLUE 1;32132;3432
BLUE 1222;332;3
YELLOW 1;32132;3432
I would split data in three objects, like queue, and then fork three process to elaborate this data in parallel meanwhile they arrive to socket in a sort of very basic realtime computation of these data.
So to achive my goal shoud use thread/fork process and objects like queues for interprocess comunications?
Or there is any different kind of approch that could I use? I'm don't known anything about multithreading programming :)
Thanks for helping.
This should give you a brief idea of threads vs fork.
Creation of threads require lot less overhead. I would go with the thread architecture. Each of the three thread functions will be supplied with the respective queue on which it needs to do the realtime computation. Use of synchronization and mutual exclusion mechanisms will prevent unexpected behaviors. You can also use valgrind with drd for debugging your multithreaded program.
Related
I need to write python script which performs several tasks:
read commands from console and send to server over tcp/ip
receive server response, process and make output to console.
What is the best way to create such a script? Do I have to create separate thread to listen to server response, while interacting with user in main thread? Are there any good examples?
Calling for a best way or code examples is rather off topic, but this is too long to be a comment.
There are three general ways to build those terminal emulator like applications :
multiple processes - the way the good old Unix cu worked with a fork
multiple threads - a variant from the above using light way threads instad of processes
using select system call with multiplexed io.
Generally, the 2 first methods are considered more straightforward to code with one thread (or process) processing upward communication while the other processes the downward one. And the third while being trickier to code is generally considered as more efficient
As Python supports multithreading, multiprocessing and select call, you can choose any method, with a slight preference for multithreading over multiprocessing because threads are lighter than processes and I cannot see a reason to use processes.
Following in just my opinion
Unless if you are writing a model for rewriting it later in a lower level language, I assume that performance is not the key issue, and my advice would be to use threads here.
I need to write a very specific data processing daemon.
Here is how I thought it could work with multiprocessing :
Process #1: One process to fetch some vital meta data, they can be fetched every second, but those data must be available in process #2. Process #1 writes the data, and Process #2 reads them.
Process #2: Two processes which will fetch the real data based on what has been received in process #1. Fetched data will be stored into a (big) queue to be processed "later"
Process #3: Two (or more) processes which poll the queue created in Process #2 and process those data. Once done, a new queue is filled up to be used in Process #4
Process #4 : Two processes which will read the queue filled by Process(es) #3 and send the result back over HTTP.
The idea behind all these different processes is to specialize them as much as possible and to make them as independent as possible.
All thoses processes will be wrapped into a main daemon which is implemented here :
http://www.jejik.com/articles/2007/02/a_simple_unix_linux_daemon_in_python/
I am wondering if what I have imagined is relevant/stupid/overkill/etc, especially if I run daemon multiprocessing.Process(es) within a main parent process which will daemonized.
Furthermore I am a bit concerned about potential locking problems. In theory processes that read and write data uses different variables/structures so that should avoid a few problems, but I am still concerned.
Maybe using multiprocessing for my context is not the right thing to do. I would love to get your feedback about this.
Notes :
I can not use Redis as a data structure server
I thought about using ZeroMQ for IPC but I would avoid using another extra library if multiprocessing can do the job as well.
Thanks in advance for your feedback.
Generally, your division in different workers with different tasks as well as your plan to let them communicate already looks good. However, one thing you should be aware of is whenever a processing step is I/O or CPU bound. If you are I/O bound, I'd go for the threading module whenever you can: the memory footprint of your application will be smaller and the communication between threads can be more efficient, as shared memory is allowed. Only if you need additional CPU power, go for multiprocessing. In your system, you can use both (it looks like process 3 (or more) will do some heavy computing, while the other workers will predominantly be I/O bound).
This is for a moderation bot for C&C Renegade, in case anyone wants some background.
I have a class which will act as a parent to a load of subclasses that provide IRC connections, connections to the gamelog (UDP socket), etc, and I want to know if it is possible to split some of these subclasses (notably the two socket connections [IRC, gamelog]) into their own threads using the threading module.
If anyone has any suggestions, even if it's just saying it can't be done, I'd appreciate the input.
Tom
Edit: I have experience with working with threaded applications, so I'm not a complete noob, honest.
It is feasible, take a look at:
multiprocessing
Besides the simple process forking, it also provides memory sharing - which is likely to be needed.
The best option would be to run your app with gevent coroutines. Those are much more light-weight than threads and processes. The library has been created based on green threads execution units. Here you can find a good comparison and benchmark of the execution models of Eventlet (A python library that provides a synchronous interface to do asynchronous I/O operations which uses green threads to achieve cooperative sockets) and node.js.
Im in the process of writing a python script to act as a "glue" between an application and some external devices. The script itself is quite straight forward and has three distinct processes:
Request data (from a socket connection, via UDP)
Receive response (from a socket connection, via UDP)
Process response and make data available to 3rd party application
However, this will be done repetitively, and for several (+/-200 different) devices. So once its reached device #200, it would start requesting data from device #001 again. My main concern here is not to bog down the processor whilst executing the script.
UPDATE:
I am using three threads to do the above, one thread for each of the above processes. The request/response is asynchronous as each response contains everything i need to be able to process it (including the senders details).
Is there any way to allow the script to run in the background and consume as little system resources as possible while doing its thing? This will be running on a windows 2003 machine.
Any advice would be appreciated.
If you are using blocking I/O to your devices, then the script won't consume any processor while waiting for the data. How much processor you use depends on what sorts of computation you are doing with the data.
Twisted -- the best async framework for Python -- would allow you do perform these tasks with the minimal hogging of system resources, most especially though not exclusively if you want to process several devices "at once" rather than just round-robin among the several hundreds (the latter might result in too long a cycle time, especially if there's a risk that some device will have very delayed answer or even fail to answer once in a while and result in a "timeout"; as a rule of thumb I'd suggest having at least half a dozens devices "in play" at any given time to avoid this excessive-delay risk).
I'm trying to write a small wsgi application which will put some objects to an external queue after each request. I want to make this in batch, ie. make the webserver put the object to a buffer-like structure in memory, and another thread and/or process for sending these objects to the queue in batch, when buffer is big enough or after certain timeout, and clearing the buffer. I don't want to be in NIH syndrome and not want to bother with threading stuff, however I could not find a suitable code for this job. Any suggestions?
Examine https://docs.python.org/library/queue.html to see if it meets your needs.
Since you write "thread and/or process", see also multiprocessing.Queue and multiprocessing.JoinableQueue from 2.6. Those are interprocess variants of Queue.
Use a buffered stream if you are using python 3.0.