I need to write python script which performs several tasks:
read commands from console and send to server over tcp/ip
receive server response, process and make output to console.
What is the best way to create such a script? Do I have to create separate thread to listen to server response, while interacting with user in main thread? Are there any good examples?
Calling for a best way or code examples is rather off topic, but this is too long to be a comment.
There are three general ways to build those terminal emulator like applications :
multiple processes - the way the good old Unix cu worked with a fork
multiple threads - a variant from the above using light way threads instad of processes
using select system call with multiplexed io.
Generally, the 2 first methods are considered more straightforward to code with one thread (or process) processing upward communication while the other processes the downward one. And the third while being trickier to code is generally considered as more efficient
As Python supports multithreading, multiprocessing and select call, you can choose any method, with a slight preference for multithreading over multiprocessing because threads are lighter than processes and I cannot see a reason to use processes.
Following in just my opinion
Unless if you are writing a model for rewriting it later in a lower level language, I assume that performance is not the key issue, and my advice would be to use threads here.
Related
I'm currently working on Python project that receives a lot os AWS SQS messages (more than 1 million each day), process these messages, and send then to another SQS queue with additional data. Everything works fine, but now we need to speed up this process a lot!
From what we have seen, or biggest bottleneck is in regards to HTTP requests to send and receive messages from AWS SQS api. So basically, our code is mostly I/O bound due to these HTTP requests.
We are trying to escalate this process by one of the following methods:
Using Python's multiprocessing: this seems like a good idea, but our workers run on small machines, usually with a single core. So creating different process may still give some benefit, since the CPU will probably change process as one or another is stuck at an I/O operation. But still, that seems a lot of overhead of process managing and resources for an operations that doesn't need to run in parallel, but concurrently.
Using Python's threading: since GIL locks all threads at a single core, and threads have less overhead than processes, this seems like a good option. As one thread is stuck waiting for an HTTP response, the CPU can take another thread to process, and so on. This would get us to our desired concurrent execution. But my question is how dos Python's threading know that it can switch some thread for another? Does it knows that some thread is currently on an I/O operation and that he can switch her for another one? Will this approach absolutely maximize CPU usage avoiding busy wait? Do I specifically has to give up control of a CPU inside a thread or is this automatically done in Python?
Recently, I also read about a concept called green-threads, using Eventlet on Python. From what I saw, they seem the perfect match for my project. The have little overhead and don't create OS threads like threading. But will we have the same problems as threading referring to CPU control? Does a green-thread needs to warn the CPU that it may take another one? I saw on some examples that Eventlet offers some built-in libraries like Urlopen, but no Requests.
The last option we considered was using Python's AsyncIo and async libraries such as Aiohttp. I have done some basic experimenting with AsyncIo and wasn't very pleased. But I can understand that most of it comes from the fact that Python is not a naturally asynchronous language. From what I saw, it would behave something like Eventlet.
So what do you think would be the best option here? What library would allow me to maximize performance on a single core machine? Avoiding busy waits as much as possible?
I'm trying to build a system that would manage a small database and populate it with data from the web.
I would like to have this process run in the background, but still have some way of interacting with it.
How can I go about this in Python?
I would like to know how to do this in two cases:
from within the same python script something like daemon = task.start() followed by daemon.get_info() or daemon.do_something()
from the shell (via another program I could make) myclient get_info or myclient do_something
Could someone give me some key concepts to go look into?
edit: I just read this blogpost, is socket programming (as indicated in his last example) the best way to go about this?
So in the end I landed on some terminology that I was missing.
The core concept seems to be inter-process communication (ipc).
On unix variants the two easiest ways of implementing this are:
Named Pipes (one way communication)
Sockets (two way)
A python script that would make use of these could spawn another thread which would repeatedly read from the pipe and communicate messages back to the main thread through a queue.
I have many doubts about design a simply python program..
I have opened a socket from a server that stream data via simply telnet server.
I have 3 type of strings that begins with RED,BLUE,YELLOW and after that string the data, example:
RED 21763;22;321321
BLUE 1;32132;3432
BLUE 1222;332;3
YELLOW 1;32132;3432
I would split data in three objects, like queue, and then fork three process to elaborate this data in parallel meanwhile they arrive to socket in a sort of very basic realtime computation of these data.
So to achive my goal shoud use thread/fork process and objects like queues for interprocess comunications?
Or there is any different kind of approch that could I use? I'm don't known anything about multithreading programming :)
Thanks for helping.
This should give you a brief idea of threads vs fork.
Creation of threads require lot less overhead. I would go with the thread architecture. Each of the three thread functions will be supplied with the respective queue on which it needs to do the realtime computation. Use of synchronization and mutual exclusion mechanisms will prevent unexpected behaviors. You can also use valgrind with drd for debugging your multithreaded program.
This is for a moderation bot for C&C Renegade, in case anyone wants some background.
I have a class which will act as a parent to a load of subclasses that provide IRC connections, connections to the gamelog (UDP socket), etc, and I want to know if it is possible to split some of these subclasses (notably the two socket connections [IRC, gamelog]) into their own threads using the threading module.
If anyone has any suggestions, even if it's just saying it can't be done, I'd appreciate the input.
Tom
Edit: I have experience with working with threaded applications, so I'm not a complete noob, honest.
It is feasible, take a look at:
multiprocessing
Besides the simple process forking, it also provides memory sharing - which is likely to be needed.
The best option would be to run your app with gevent coroutines. Those are much more light-weight than threads and processes. The library has been created based on green threads execution units. Here you can find a good comparison and benchmark of the execution models of Eventlet (A python library that provides a synchronous interface to do asynchronous I/O operations which uses green threads to achieve cooperative sockets) and node.js.
I'm quite new to python threading/network programming, but have an assignment involving both of the above.
One of the requirements of the assignment is that for each new request, I spawn a new thread, but I need to both send and receive at the same time to the browser.
I'm currently using the asyncore library in Python to catch each request, but as I said, I need to spawn a thread for each request, and I was wondering if using both the thread and the asynchronous is overkill, or the correct way to do it?
Any advice would be appreciated.
Thanks
EDIT:
I'm writing a Proxy Server, and not sure if my client is persistent. My client is my browser (using firefox for simplicity)
It seems to reconnect for each request. My problem is that if I open a tab with http://www.google.com in it, and http://www.stackoverflow.com in it, I only get one request at a time from each tab, instead of multiple requests from google, and from SO.
I answered a question that sounds amazingly similar to your, where someone had a homework assignment to create a client server setup, with each connection being handled in a new thread: https://stackoverflow.com/a/9522339/496445
The general idea is that you have a main server loop constantly looking for a new connection to come in. When it does, you hand it off to a thread which will then do its own monitoring for new communication.
An extra bit about asyncore vs threading
From the asyncore docs:
There are only two ways to have a program on a single processor do
“more than one thing at a time.” Multi-threaded programming is the
simplest and most popular way to do it, but there is another very
different technique, that lets you have nearly all the advantages of
multi-threading, without actually using multiple threads. It’s really
only practical if your program is largely I/O bound. If your program
is processor bound, then pre-emptive scheduled threads are probably
what you really need. Network servers are rarely processor bound,
however.
As this quote suggests, using asyncore and threading should be for the most part mutually exclusive options. My link above is an example of the threading approach, where the server loop (either in a separate thread or the main one) does a blocking call to accept a new client. And when it gets one, it spawns a thread which will then continue to handle the communication, and the server goes back into a blocking call again.
In the pattern of using asyncore, you would instead use its async loop which will in turn call your own registered callbacks for various activity that occurs. There is no threading here, but rather a polling of all the open file handles for activity. You get the sense of doing things all concurrently, but under the hood it is scheduling everything serially.