ssh - difference blocking and non blocking mode

ssh - difference blocking and non blocking mode - python

It's sounds like so stupid question since every developer who use any SSH library should have probably asked himself this question (?). But I can't really find what is the difference between blocking or non-blocking...
I mean ok... One blocks till it receives the answer, the other sends the queries and returns immediately, then you check by yourself the reply buffer... I got that part.
But why to use one rather than the other? I can't manage to find the answer...
Is it about performances? And if there is a difference, why?
Thanks in advance for any answer to this questions.
--- Edit: Forget about the following "bonus question", I've finally coded non blocking mode and experience the same problem, it must be something in libssh2. So I still don't get the added value of non-blocking mode... ---
Bonus question:
I'm not really sure could this difference explain something I'm experiencing?
I have a python script which connects to many hosts to run several commands.
It was using paramiko library in non-blocking mode. Paramiko is pure python and really slow for establishing ssh connections to many hosts...
I'm changing it for pylibssh2 which is python bindings for the C library libssh2. Since I didn't get the difference, I started to code in blocking mode.
Results:
- libss2 is much faster than paramiko (connection to 230 hosts in parallel in 4s instead of 1m30s)
- For running commands successively, libssh2 is also faster.
- When I run commands through ssh from several parallel threads, the code with libssh2 in blocking mode becomes slowlier than paramiko in non-blocking mode.
- I also noticed that the CPU consumption is very low compared with previous version. I guess part of this is related to C vs python but it seems than beyond the SSH API, my script itself performs less actions. Are threads blocking each other when sending commands through SSH in blocking mode?

The reason is that if you want to do two things at once, say read from some other network connection, and your SSH session, you have two options:
use blocking APIs, and use two threads or processes so you can do them both
use non-blocking APIs so the same thread can do both
This latter approach is called Asynchronous I/O. See for example twisted which uses it extensively.

Related

How to handle a burst of connection to a port?

I've built a server listening on a specific port on my server using Python (asyncore and sockets) and I was curious to know if there was anything possible to do when there is too many people connecting at once on my server.
The code in itself cannot be changed, but will adding more process works? or is it from an hardware perspective and I should focus on adding a load balancer in front and balancing the requests on multiple servers?
This questions is borderline StackOverflow (code/python) and ServerFault (server management). I decided to go with SO because of the code, but if you think ServerFault is better, let me know.

1.
asyncore relies on operating system for whole connection handling, therefore what you are asking is OS dependent. It has very little to do with Python. Using twisted instead of asyncore wouldn't solve your problem.
On Windows, for example, you can listen only for 5 connections coming in simultaneously.
So, first requirement is, run it on *nix platform.
The rest depends on how long your handlers are taking and on your bandwith.
2.
What you can do is combine asyncore and threading to speed-up waiting for next connection.
I.e. you can make Handlers that are running in separate threads. It will be a little messy but it is one of possible solutions.
When server accepts a connection, instead of creating new traditional handler (which would slow down checking for following connection - because asyncore waits until that handler does at least a little bit of its job), you create a handler that deals with read and write as non-blocking.
I.e. it starts a thread and does the job, then, when it has data ready, only then sends it upon following loop()'s check.
This way, you allow asyncore.loop() to check the server's socket more often.
3.
Or you can use two different socket_maps with two different asyncore.loop()s.
You use one map (dictionary), let say the default one - asyncore.socket_map to check the server, and use one asyncore.loop(), let say in main thread, only for server().
And you start the second asyncore.loop() in a thread using your custom dictionary for client handlers.
So, One loop is checking only server that accepts connections, and when it arrives, it creates a handler which goes in separate map for handlers, which is checked by another asyncore.loop() running in a thread.
This way, you do not mix the server connection checks and client handling. So, server is checked immediately after it accepts one connection. The other loop balances between clients.
If you are determined to go even faster, you can exploit the multiprocessor computers by having more maps for handlers.
For example, one per CPU and as many threads with asyncore.loop()s.
Note, sockets are IO operations using system calls and select() is one too, therefore GIL is released while asyncore.loop() is waiting for results. This means, that you will have total advantage of multithreading and each CPU will deal with its number of clients in literally parallel way.
What you would have to do is make the server distributing the load and starting threading loops upon connection arrivals.
Don't forget that asyncore.loop() ends when the map empties. So the loop() in a thread that manages clients must be started when new connection is accepted and restarted if at some time there are no more connections present.
4.
If you want to be able to run your server on multiple computers and use them as a cluster, then you install the process balancer in front.
I do not see the serious need for it if you wrote the asyncore server correctly and want to run it on single computer only.

Whats the best way to implement python TCP client?

I need to write python script which performs several tasks:
read commands from console and send to server over tcp/ip
receive server response, process and make output to console.
What is the best way to create such a script? Do I have to create separate thread to listen to server response, while interacting with user in main thread? Are there any good examples?

Calling for a best way or code examples is rather off topic, but this is too long to be a comment.
There are three general ways to build those terminal emulator like applications :
multiple processes - the way the good old Unix cu worked with a fork
multiple threads - a variant from the above using light way threads instad of processes
using select system call with multiplexed io.
Generally, the 2 first methods are considered more straightforward to code with one thread (or process) processing upward communication while the other processes the downward one. And the third while being trickier to code is generally considered as more efficient
As Python supports multithreading, multiprocessing and select call, you can choose any method, with a slight preference for multithreading over multiprocessing because threads are lighter than processes and I cannot see a reason to use processes.
Following in just my opinion
Unless if you are writing a model for rewriting it later in a lower level language, I assume that performance is not the key issue, and my advice would be to use threads here.

python SocketServer stuck on waitpid() syscall

I am using Python (2.7) SocketServer with ForkingMixIn. It worked well.
However sometimes on heavy usage (tons of rapidly connecting/disconnecting clients) the "server" stuck, consuming all the idle CPU (shown 100% CPU by top). If I use strace from CLI on the process it shows it does endless sequence of waitpid() syscall. According to command "ps" there are no child processes though at this point.
After this problem my server implementation goes unusable and only its restarting helps :( Clients can connect but no anwser, I guess just the "backlog" queue is used on OS side, but the python code never accepts the connection.
It can be easily reproduced eg with some privimitive HTTP implementation, and a browser (I used chrome) with CTRL-R (reload) hold down for something like 10 seconds. Of course the problem is triggered without this "brutal" try as well "on normal usage" just more rarely, and it was quite hard to even come with the idea what can be the problem. I wrote my own implementation of something like SocketServer with os.fork(), and socket functions, and it does not have this problem, but I am more happy with some "already ready", and "standard" solution.
The problem: it is not a nice thing, as my script implementing a server can be DoS'ed very easily in this way.
What I could notice: I installed a singal handler for SIGCHLD. It seems if I remove that, I can't reproduce the problem, however then I can see zombie processes (I guess since they are not wait()'ed). Even if I install signal handler with signal.SIG_IGN, I expereince this problem.
Can anybody help what can be the problem and how I can solve this? I'd like use singal handler anyway since it's also not so nice to leave many zombie processes, especially after a long run.
Thanks for any idea.

maybe related: What is the cost of many TIME_WAIT on the server side?
it is possible that you have all your max connections in a time_wait state.
check sysctl net.core.somaxconn for maximum connections.
check sysctl net.ipv4 for other configuration details (e.g. tw
check ulimit -n for max open file descriptors (sockets included)
you can try: sysctl net.ipv4.tcp_tw_reuse=1 to quickly reuse those sockets (don't keep it enabled unless you know what you're doing.)
check for file handle leaks.
[not-so] stupid question: how is your SocketServer implementation different from the standard one + ForkingMixIn?
However, it is really easy to abuse a ForkingMixIn (fork bomb), you might want to use green threads, e.g. the eventlet library ( http://eventlet.net/doc/index.html )
this might be your problem.
this: http://bugs.python.org/issue7978
this: http://mail.python.org/pipermail/python-bugs-list/2010-April/095492.html
this: http://twistedmatrix.com/trac/ticket/733
you will see that SIGCHLD handler is discouraged unless you take some extra measures (signal.siginterrupt(signal.SIGCHLD, False) in handler, or using a wake-up fd in select() call)

Python Job Service Daemon?

What packages should I look at for writing a python daemon and processing jobs? Also, what do I need to do for a python daemon?

I'm pretty happy with beanstalkd, which has client libraries available in various languages:
Daemon:
http://kr.github.com/beanstalkd/
Python client library:
http://code.google.com/p/pybeanstalk/

Your question is a bit ambiguous, but I'm assuming you mean you would like to write a python daemon that will process jobs that get thrown in a queue. If not, please say as much. :-)
I've heard a lot of great things about redis. The folks at github built resque as a job processing daemon for Ruby. If you're language flexible, you could just use that, but if you're not, you could emulate it in as much or as little depth as you like making use of redis as your queue system. Depending on how pluggable and extensible you need it to be, this could be a really simple thing to implement.
Another option I ran across after some more googling is redqueue. It looks like it might already implement most of a job queue.
If you're using django, you may wish to consider the Celery project. It's a job queue system based on RabbitMQ which is yet another queuing server with excellent reviews.
As far as creating a daemon in python, there are a number of options. You can look at this page on activestate, which is a good start. Better yet, you can use python-daemon to do it all for you. But if you use one of the above options or beanstalkd as recommended by mczepiel, you probably won't have to make your process run as a daemon.

I have recently (this week) implemented a queue in RabbitMQ with a python daemon extracting the information and storing it on a database (using Django ORM). The daemon has a intermediate buffer so it will wait a little and write in the database in batches, instead of writing each time a little message arrives.
I've made the integration with the queue using this little flopsy module, which is easy to set up. The only problem I've got it to be able to set up a timeout for waiting a message, as the module has not a clear way of doing that. After a while playing with the interactive shell and making a few dir(), I manage to get to the socket object and set up the timeout.
I considered also Celery, but seems to be more focused on using internally a RabbitMQ to allow you to launch tasks (periodically or asynchronously), more that using a queue to communicating with other systems. In our case, the queue can be feed both by Python systems and Ruby ones.
Once I've completed the process, I've made some adjustments to allow running it as a daemon (mostly storing the standard output to a file to allow easy logging) and then create a bash script that launch a start-stop-daemon command. I've followed more or less this schema
I discovered python-daemon just about one day late, so after the work is done it makes no sense revisiting it, but maybe it makes more sense for a Python project.

Python script performance as a background process

Im in the process of writing a python script to act as a "glue" between an application and some external devices. The script itself is quite straight forward and has three distinct processes:
Request data (from a socket connection, via UDP)
Receive response (from a socket connection, via UDP)
Process response and make data available to 3rd party application
However, this will be done repetitively, and for several (+/-200 different) devices. So once its reached device #200, it would start requesting data from device #001 again. My main concern here is not to bog down the processor whilst executing the script.
UPDATE:
I am using three threads to do the above, one thread for each of the above processes. The request/response is asynchronous as each response contains everything i need to be able to process it (including the senders details).
Is there any way to allow the script to run in the background and consume as little system resources as possible while doing its thing? This will be running on a windows 2003 machine.
Any advice would be appreciated.

If you are using blocking I/O to your devices, then the script won't consume any processor while waiting for the data. How much processor you use depends on what sorts of computation you are doing with the data.

Twisted -- the best async framework for Python -- would allow you do perform these tasks with the minimal hogging of system resources, most especially though not exclusively if you want to process several devices "at once" rather than just round-robin among the several hundreds (the latter might result in too long a cycle time, especially if there's a risk that some device will have very delayed answer or even fail to answer once in a while and result in a "timeout"; as a rule of thumb I'd suggest having at least half a dozens devices "in play" at any given time to avoid this excessive-delay risk).

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.