I can't understand polling/select in python

I can't understand polling/select in python - python

I'm doing some threaded asynchronous networking experiment in python, using UDP.
I'd like to understand polling and the select python module, I've never used them in C/C++.
What are those for ? I kind of understand a little select, but does it block while watching a resource ? What is the purpose of polling ?

Okay, one question a time.
What are those for?
Here is a simple socket server skeleton:
s_sock = socket.socket()
s_sock.bind()
s_sock.listen()
while True:
c_sock, c_addr = s_sock.accept()
process_client_sock(c_sock, c_addr)
Server will loop and accept connection from a client, then call its process function to communicate with client socket. There is a problem here: process_client_sock might takes a long time, or even contains a loop(which is often the case).
def process_client_sock(c_sock, c_addr):
while True:
receive_or_send_data(c_sock)
In which case, the server is unable to accept any more connections.
A simple solution would be using multi-process or multi-thread, just create a new thread to deal with request, while the main loop keeps listening on new connections.
s_sock = socket.socket()
s_sock.bind()
s_sock.listen()
while True:
c_sock, c_addr = s_sock.accept()
thread = Thread(target=process_client_sock, args=(c_sock, c_addr))
thread.start()
This works of course, but not well enough considering performance. Because new process/thread takes extra CPU and memory, not idle for servers might get thousands connections.
So select and poll system calls tries to solve this problem. You give select a set of file descriptors and tell it to notify you if any fd is ready to read/write/ or exception happens.
does it(select) block while watching a resource?
Yes, or no depends on the parameter you passed to it.
As select man page says, it will get struct timeval parameter
int select(int nfds, fd_set *readfds, fd_set *writefds,
fd_set *exceptfds, struct timeval *timeout);
struct timeval {
long tv_sec; /* seconds */
long tv_usec; /* microseconds */
};
There are three cases:
timeout.tv_sec == 0 and timeout.tv_usec = 0
No-blocking, return immediately
timeout == NULL
block forever until a file descriptor is ready.
timeout is normal
wait for certain time, if still no file descriptor is available, timeout and return.
What is the purpose of polling ?
Put it into simple words: polling frees CPU for other works when waiting for IO.
This is based on the simple facts that
CPU is way more faster than IO
waiting for IO is a waste of time, because for the most time, CPU will be idle
Hope it helps.

If you do read or recv, you're waiting on only one connection. If you have multiple connections, you will have to create multiple processes or threads, a waste of system resource.
With select or poll or epoll, you can monitor multiple connections with only one thread, and get notified when any of them has data available, and then you call read or recv on the corresponding connection.
It may block infinitely, block for a given time, or not block at all, depending on the arguments.

select() takes in 3 lists of sockets to check for three conditions (read, write, error), then returns (usually shorter, often empty) lists of sockets that actually are ready to be processed for those conditions.
s1 = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s1.bind((Local_IP, Port1))
s1.listen(5)
s2 = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s2.bind((Local_IP, Port2))
s2.listen(5)
sockets_that_might_be_ready_to_read = [s1,s2]
sockets_that_might_be_ready_to_write_to = [s1,s2]
sockets_that_might_have_errors = [s1,s2]
([ready_to_read], [ready_to_write], [has_errors]) =
select.select([sockets_that_might_be_ready_to_read],
[sockets_that_might_be_ready_to_write_to],
[sockets_that_might_have_errors], timeout)
for sock in ready_to_read:
c,a = sock.accept()
data = sock.recv(128)
...
for sock in ready_to_write:
#process writes
...
for sock in has_errors:
#process errors
So if a socket has no attempted connections after waiting timeout seconds, then the list ready_to_read will be empty - at which point it doesn't matter if the accept() and recv() would block - they won't get called for the empty list....
If a socket is ready to read, then if will have data, so it won't block then, either.

Related

Implement gethostbyaddr() with asyncore

I was having fun with socket.gethostbyaddr(), searching how to speed up a really simple code that generate some IP address randomly and try to solve them. The problem comes when no host can be found, there is a timeout that can be really long (about 10 seconds...)
By chance, I found this article, he solves the problem by using Multi-threading : https://www.depier.re/attempts_to_speed_up_gethostbyaddr/
I was wondering if it is possible to do something equivalent using Asyncore ? That's what I tried to do first but failed miserably...
Here is a template :
import socket
import random
def get_ip():
a = str(random.randint(140,150))
b = str(random.randint(145,150))
c = str(random.randint(145,150))
for d in range(100):
addr = a + "." + b + "." + c +"."+ str(1 + d)
yield addr
for addr in get_ip():
try:
o = socket.gethostbyaddr(addr)
print addr + "...Ok :"
print "---->"+ str(o[0])
except:
print addr + "...Nothing"

You are looking for a way how to convert several IPs to names (or vice versa) in parallel. Basically it is a DNS request/response operation and the gethostbyaddr is doing this lookup synchronously, i.e. in a blocking manner. It sends the request, waits for the response, returns the result.
asyncio and similar libraries use so called coroutines and cooperative scheduling. Cooperative means that coroutines are written to support the concurency. A running coroutine explicitly returns the control (using await or yield from) to a waiting scheduler which then selects another coroutine and runs it until that one returns the control and so on. Only one coroutine can be running at a time. For a smooth run coroutines must not execute code for a longer time without returning the control. A blocking operation in a coroutine blocks the whole programs. That prohibits the usage of gethostbyaddr.
A solution requires support for asynchronous DNS lookups. A coroutine sends the DNS request, sets a timeout, arranges that a DNS response will be passed to it and returns the control. Thus multiple coroutines can send their requests one after another before they wait for all the responses.
There are third party libraries for async DNS, but I have never used them. Looking at aiodns examples, it seems quite easy to write the code you are looking for. asyncore.gather would be probably the core of such function.

How can I achieve multi-threading in Python? [socket programming]

I am trying to create a separate thread in a client that revives messages from a server socket in a non-blocking manner. Since my original code is too long and a bit of a hassle to explain in order to understand it, I have created an example program which focuses on what I want to do. I try to create two separate threads , say Thread t1 and Thread t2. Thread t1 polls the socket to check for any received data whereas Thread t2 does whatever task it is assigned to do. What I am expecting it to do is, Thread t1 always polls and if a data is received it prints it on the screen and Thread t2 executes in parallel doing whatever it is doing. But, I cannot get it working for some reason.
My example program is:
import threading
import time
import threading
from time import sleep
import socket
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.connect(('localhost', 5555))
s.setblocking(0)
s.sendall(str.encode('Initial Hello'))
def this_thing():
while True:
try:
data = s.recv(4096)
print(data.decode('utf-8'))
except:
pass / break #not sure which one to use. Neither of it works
def that_thing():
for i in range(10000):
sleep(3)
s.sendall(str.encode('Hello')
print('happening2')
threading.Thread(target=this_thing(), args=[]).start()
threading.Thread(target=that_thing(), args=[]).start()
Note: The server socket is a simple server that sends a message to all connected sockets if a message was received by it.
When I run the program by breaking out in the exception in Thread t1, only my Thread t2 is keeps running. I.e Thread t1 does not receive any data sent from the server

The reason this is happening is because the "target" argument takes a callable object.
from the docs docs.python.org/2/library/threading.html
"target is the callable object to be invoked by the run() method"
in your version
threading.Thread(target=this_thing(), args=[]).start()
threading.Thread(target=that_thing(), args=[]).start()
when you say target=this_thing(), it will try and evaluate the value of a call to this_thing, in your case, it will enter a while True loop, and then if it was to finish it would evaluate to None.
What you want to do is replace these 2 lines with
threading.Thread(target=this_thing, args=[]).start()
threading.Thread(target=that_thing, args=[]).start()
Note that you are now passing in the function itself. A function is a callable object.

The correct solution for python 3+ isn't multithreading but asyncio.
Check out this awesome speech from David Beazley on the matter (49 mins):
https://www.youtube.com/watch?v=ZzfHjytDceU
Asyncio / sockets example: https://gist.github.com/gregvish/7665915

Stopping a Client Thread

I use the following class the listen to around 20 udp ports. There is a problem though with this class in regard to how I stop it. Since I join the thread in the stop method I will have to wait for up to one second for each class to stop since recv has a timeout of one second. How would you recommend I solve this issue?
class UpdClient(threading.Thread):
def __init__(self,port):
super(UpdClient, self).__init__()
self.port = port
self.finished = False
self.sock = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
self.sock.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
self.sock.bind(('225.0.0.10', self.port))
self.sock.settimeout(1)
def run(self):
while not self.finished:
try:
message = self.sock.recv(4096)
print("*")
except socket.timeout:
continue
def stop(self):
self.finished = True
if self.is_alive():
self.join()
print("Exiting :" + str(self.port))

There is one easy fix you can do to improve this: Split your stop function up into two separate functions, like this:
def stop(self):
self.finished = True
print("Stopping :" + str(self.port))
def wait(self):
self.stop()
if self.is_alive():
self.join()
print("Exiting :" + str(self.port))
And then do this:
for t in threads:
t.stop()
for t in threads:
t.wait()
With 20 threads, this should reduce your average stop time from ~10 seconds to ~1.1 seconds.
But if you want better than this, like a guarantee of 1 second, or an average time below 1 second, there's no good, easy way around this. Some possibly-bad and/or hard options include:
send a message to your own socket, as suggested by User. If your code knows how to handle "garbage" messages, or if your protocol makes it simple to add a new message type that can be easily distinguished from the "real" messages, this should wake your threads up to shut them down very quickly.
close the sockets out from under the client threads. On some platforms, this will cause the recv to fail immediately (you'll want an except to handle that, of course). On others, it will cause it to EOF immediately (which you already handle). There are some platforms where neither happens, and it just continues to block. So you'll really need to test on every platform you care about.*
self.daemon = True. Then you can hard-kill all the threads just by exiting without joining them. With all the downsides that implies.
Completely rewrite your app to use a single-threaded reactor or a multi-threaded proactor (ideally indirectly, through something like asyncio, twisted, or gevent…), instead of a thread per client.
Change the 1-second waits to a loop over waits of no more than 100ms (or however long is acceptable for quit time).
Just accept the 1-second time to quit.
* Off the top of my head, I believe Windows guarantees an error, Linux guarantees either an error or continuing to block but usually continues to block, BSD doesn't guarantee anything but usually continues to block, SysV doesn't guarantee anything but usually EOFs. But don't trust the top of my head; test the platforms you care about.

Under Windows, add this:
def stop(self):
self.sock.close()
# ...
This creates the error:
OSError: [WinError 10004] A blocking operation was interrupted by a call to WSACancelBlockingCall
in the Thread.

Sending data through a socket from another thread does not work in Python

This is my 'game server'. It's nothing serious, I thought this was a nice way of learning a few things about python and sockets.
First the server class initialized the server.
Then, when someone connects, we create a client thread. In this thread we continually listen on our socket.
Once a certain command comes in (I12345001001, for example) it spawns another thread.
The purpose of this last thread is to send updates to the client.
But even though I see the server is performing this code, the data isn't actually being sent.
Could anyone tell where it's going wrong?
It's like I have to receive something before I'm able to send. So I guess somewhere I'm missing something
#!/usr/bin/env python
"""
An echo server that uses threads to handle multiple clients at a time.
Entering any line of input at the terminal will exit the server.
"""
import select
import socket
import sys
import threading
import time
import Queue
globuser = {}
queue = Queue.Queue()
class Server:
def __init__(self):
self.host = ''
self.port = 2000
self.backlog = 5
self.size = 1024
self.server = None
self.threads = []
def open_socket(self):
try:
self.server = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
self.server.bind((self.host,self.port))
self.server.listen(5)
except socket.error, (value,message):
if self.server:
self.server.close()
print "Could not open socket: " + message
sys.exit(1)
def run(self):
self.open_socket()
input = [self.server,sys.stdin]
running = 1
while running:
inputready,outputready,exceptready = select.select(input,[],[])
for s in inputready:
if s == self.server:
# handle the server socket
c = Client(self.server.accept(), queue)
c.start()
self.threads.append(c)
elif s == sys.stdin:
# handle standard input
junk = sys.stdin.readline()
running = 0
# close all threads
self.server.close()
for c in self.threads:
c.join()
class Client(threading.Thread):
initialized=0
def __init__(self,(client,address), queue):
threading.Thread.__init__(self)
self.client = client
self.address = address
self.size = 1024
self.queue = queue
print 'Client thread created!'
def run(self):
running = 10
isdata2=0
receivedonce=0
while running > 0:
if receivedonce == 0:
print 'Wait for initialisation message'
data = self.client.recv(self.size)
receivedonce = 1
if self.queue.empty():
print 'Queue is empty'
else:
print 'Queue has information'
data2 = self.queue.get(1, 1)
isdata2 = 1
if data2 == 'Exit':
running = 0
print 'Client is being closed'
self.client.close()
if data:
print 'Data received through socket! First char: "' + data[0] + '"'
if data[0] == 'I':
print 'Initializing user'
user = {'uid': data[1:6] ,'x': data[6:9], 'y': data[9:12]}
globuser[user['uid']] = user
print globuser
initialized=1
self.client.send('Beginning - Initialized'+';')
m=updateClient(user['uid'], queue)
m.start()
else:
print 'Reset receivedonce'
receivedonce = 0
print 'Sending client data'
self.client.send('Feedback: ' +data+';')
print 'Client Data sent: ' + data
data=None
if isdata2 == 1:
print 'Data2 received: ' + data2
self.client.sendall(data2)
self.queue.task_done()
isdata2 = 0
time.sleep(1)
running = running - 1
print 'Client has stopped'
class updateClient(threading.Thread):
def __init__(self,uid, queue):
threading.Thread.__init__(self)
self.uid = uid
self.queue = queue
global globuser
print 'updateClient thread started!'
def run(self):
running = 20
test=0
while running > 0:
test = test + 1
self.queue.put('Test Queue Data #' + str(test))
running = running - 1
time.sleep(1)
print 'Updateclient has stopped'
if __name__ == "__main__":
s = Server()
s.run()

I don't understand your logic -- in particular, why you deliberately set up two threads writing at the same time on the same socket (which they both call self.client), without any synchronization or coordination, an arrangement that seems guaranteed to cause problems.
Anyway, a definite bug in your code is you use of the send method -- you appear to believe that it guarantees to send all of its argument string, but that's very definitely not the case, see the docs:
Returns the number of bytes sent.
Applications are responsible for
checking that all data has been sent;
if only some of the data was
transmitted, the application needs to
attempt delivery of the remaining
data.
sendall is the method that you probably want:
Unlike send(), this method continues
to send data from string until either
all data has been sent or an error
occurs.
Other problems include the fact that updateClient is apparently designed to never terminate (differently from the other two thread classes -- when those terminate, updateClient instances won't, and they'll just keep running and keep the process alive), redundant global statements (innocuous, just confusing), some threads trying to read a dict (via the iteritems method) while other threads are changing it, again without any locking or coordination, etc, etc -- I'm sure there may be even more bugs or problems, but, after spotting several, one's eyes tend to start to glaze over;-).

You have three major problems. The first problem is likely the answer to your question.
Blocking (Question Problem)
The socket.recv is blocking. This means that execution is halted and the thread goes to sleep until it can read data from the socket. So your third update thread just fills the queue up but it only gets emptied when you get a message. The queue is also emptied by one message at a time.
This is likely why it will not send data unless you send it data.
Message Protocol On Stream Protocol
You are trying to use the socket stream like a message stream. What I mean is you have:
self.server = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
The SOCK_STREAM part says it is a stream not a message such as SOCK_DGRAM. However, TCP does not support message. So what you have to do is build messages such as:
data =struct.pack('I', len(msg)) + msg
socket.sendall(data)
Then the receiving end will looking for the length field and read the data into a buffer. Once enough data is in the buffer it can grab out the entire message.
Your current setup is working because your messages are small enough to all be placed into the same packet and also placed into the socket buffer together. However, once you start sending large data over multiple calls with socket.send or socket.sendall you are going to start having multiple messages and partial messages being read unless you implement a message protocol on top of the socket byte stream.
Threads
Even though threads can be easier to use when starting out they come with a lot of problems and can degrade performance if used incorrectly especially in Python. I love threads so do not get me wrong. Python also has a problem with the GIL (global interpreter lock) so you get bad performance when using threads that are CPU bound. Your code is mostly I/O bound at the moment, but in the future it may become CPU bound. Also you have to worry about locking with threading. A thread can be a quick fix but may not be the best fix. There are circumstances where threading is quite simply the easiest way to break some time consuming process. So do not discard threads as evil or terrible. In Python they are considered bad mainly because of the GIL, and in other languages (including Python) because of concurrency issues so most people recommend you to use multiple processes with Python or use asynchronous code. The subject of to use a thread or not is very complex as it depends on the language (way your code is run), the system (single or multiple processors), and contention (trying to share a resource with locking), and other factors, but generally asynchronous code is faster because it utilizes more CPU with less overhead especially if you are not CPU bound.
The solution is the usage of the select module in Python, or something similar. It will tell you when a socket has data to be read, and you can set a timeout parameter.
You can gain more performance by doing asynchronous work (asynchronous sockets). To turn a socket into asynchronous mode you simply call socket.settimeout(0) which will make it not block. However, you will constantly eat CPU spinning waiting for data. The select module and friends will prevent you from spinning.
Generally for performance you want to do as much asynchronous (same thread) as possible, and then expand with more threads that are also doing as much asynchronously as possible. However as previously noted Python is an exception to this rule because of the GIL (global interpreter lock) which can actually degrade performance from what I have read. If you are interesting you should try writing a test case and find out!
You should also check out the thread locking primitives in the threading module. They are Lock, RLock, and Condition. They can help multiple threads share data with out problems.
lock = threading.Lock()
def myfunction(arg):
with lock:
arg.do_something()
Some Python objects are thread safe and others are not.
Sending Updates Asynchronously (Improvement)
Instead of using a third thread only to send updates you could instead use the client thread to send updates by checking the current time with the last time an update was sent. This would remove the usage of a Queue and a Thread. Also to do this you must convert your client code into asynchronous code and have a timeout on your select so that you can at interval check the current time to see if an update is needed.
Summary
I would recommend you rewrite your code using asynchronous socket code. I would also recommend that you use a single thread for all clients and the server. This will improve performance and decrease latency. It would make debugging easier because you would have no possible concurrency issues like you have with threads. Also, fix your message protocol before it fails.

Proper way of cancelling accept and closing a Python processing/multiprocessing Listener connection

(I'm using the pyprocessing module in this example, but replacing processing with multiprocessing should probably work if you run python 2.6 or use the multiprocessing backport)
I currently have a program that listens to a unix socket (using a processing.connection.Listener), accept connections and spawns a thread handling the request. At a certain point I want to quit the process gracefully, but since the accept()-call is blocking and I see no way of cancelling it in a nice way. I have one way that works here (OS X) at least, setting a signal handler and signalling the process from another thread like so:
import processing
from processing.connection import Listener
import threading
import time
import os
import signal
import socket
import errno
# This is actually called by the connection handler.
def closeme():
time.sleep(1)
print 'Closing socket...'
listener.close()
os.kill(processing.currentProcess().getPid(), signal.SIGPIPE)
oldsig = signal.signal(signal.SIGPIPE, lambda s, f: None)
listener = Listener('/tmp/asdf', 'AF_UNIX')
# This is a thread that handles one already accepted connection, left out for brevity
threading.Thread(target=closeme).start()
print 'Accepting...'
try:
listener.accept()
except socket.error, e:
if e.args[0] != errno.EINTR:
raise
# Cleanup here...
print 'Done...'
The only other way I've thought about is reaching deep into the connection (listener._listener._socket) and setting the non-blocking option...but that probably has some side effects and is generally really scary.
Does anyone have a more elegant (and perhaps even correct!) way of accomplishing this? It needs to be portable to OS X, Linux and BSD, but Windows portability etc is not necessary.
Clarification:
Thanks all! As usual, ambiguities in my original question are revealed :)
I need to perform cleanup after I have cancelled the listening, and I don't always want to actually exit that process.
I need to be able to access this process from other processes not spawned from the same parent, which makes Queues unwieldy
The reasons for threads are that:
They access a shared state. Actually more or less a common in-memory database, so I suppose it could be done differently.
I must be able to have several connections accepted at the same time, but the actual threads are blocking for something most of the time. Each accepted connection spawns a new thread; this in order to not block all clients on I/O ops.
Regarding threads vs. processes, I use threads for making my blocking ops non-blocking and processes to enable multiprocessing.

Isnt that what select is for??
Only call accept on the socket if the select indicates it will not block...
The select has a timeout, so you can break out occasionally occasionally to check
if its time to shut down....

I thought I could avoid it, but it seems I have to do something like this:
from processing import connection
connection.Listener.fileno = lambda self: self._listener._socket.fileno()
import select
l = connection.Listener('/tmp/x', 'AF_UNIX')
r, w, e = select.select((l, ), (), ())
if l in r:
print "Accepting..."
c = l.accept()
# ...
I am aware that this breaks the law of demeter and introduces some evil monkey-patching, but it seems this would be the most easy-to-port way of accomplishing this. If anyone has a more elegant solution I would be happy to hear it :)

I'm new to the multiprocessing module, but it seems to me that mixing the processing module and the threading module is counter-intuitive, aren't they targetted at solving the same problem?
Anyway, how about wrapping your listen functions into a process itself? I'm not clear how this affects the rest of your code, but this may be a cleaner alternative.
from multiprocessing import Process
from multiprocessing.connection import Listener
class ListenForConn(Process):
def run(self):
listener = Listener('/tmp/asdf', 'AF_UNIX')
listener.accept()
# do your other handling here
listen_process = ListenForConn()
listen_process.start()
print listen_process.is_alive()
listen_process.terminate()
listen_process.join()
print listen_process.is_alive()
print 'No more listen process.'

Probably not ideal, but you can release the block by sending the socket some data from the signal handler or the thread that is terminating the process.
EDIT: Another way to implement this might be to use the Connection Queues, since they seem to support timeouts (apologies, I misread your code in my first read).

I ran into the same issue. I solved it by sending a "stop" command to the listener. In the listener's main thread (the one that processes the incoming messages), every time a new message is received, I just check to see if it's a "stop" command and exit out of the main thread.
Here's the code I'm using:
def start(self):
"""
Start listening
"""
# set the command being executed
self.command = self.COMMAND_RUN
# startup the 'listener_main' method as a daemon thread
self.listener = Listener(address=self.address, authkey=self.authkey)
self._thread = threading.Thread(target=self.listener_main, daemon=True)
self._thread.start()
def listener_main(self):
"""
The main application loop
"""
while self.command == self.COMMAND_RUN:
# block until a client connection is recieved
with self.listener.accept() as conn:
# receive the subscription request from the client
message = conn.recv()
# if it's a shut down command, return to stop this thread
if isinstance(message, str) and message == self.COMMAND_STOP:
return
# process the message
def stop(self):
"""
Stops the listening thread
"""
self.command = self.COMMAND_STOP
client = Client(self.address, authkey=self.authkey)
client.send(self.COMMAND_STOP)
client.close()
self._thread.join()
I'm using an authentication key to prevent would be hackers from shutting down my service by sending a stop command from an arbitrary client.
Mine isn't a perfect solution. It seems a better solution might be to revise the code in multiprocessing.connection.Listener, and add a stop() method. But, that would require sending it through the process for approval by the Python team.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.