urllib2 times out but doesn't close socket connection

urllib2 times out but doesn't close socket connection - python

I'm making a python URL grabber program. For my purposes, I want it to time out really really fast, so I'm doing
urllib2.urlopen("http://.../", timeout=2)
Of course it times out correctly as it should. However, it doesn't bother to close the connection to the server, so the server thinks the client is still connected. How can I ask urllib2 to just close the connection after it times out?
Running gc.collect() doesn't work and I'd like to not use httplib if I can't help it.
The closest I can get is: the first try will time out. The server reports that the connection closed just as the second try times out. Then, the server reports the connection closed just as the third try times out. Ad infinitum.
Many thanks.

I have a suspicion that the socket is still open in the stack frames. When Python raises an exception it stores the stack frames so debuggers and other tools can view the stack and introspect values.
For historical reasons, and now for backwards compatibility, the stack information is stored (on a per-thread basis) in sys (see sys.exc_info(), sys.exc_type and others). This is one of the things which has been removed in Python 3.0.
What that means for you is the stack is still alive, and referenced. There stack contains the local data for some function which has the open socket. That's why the socket isn't yet closed. It's only when the stack trace is removed that everything will be gc'ed.
To test if that's the case, insert something like
try:
1/0
except ZeroDivisionError:
pass
in your except clause. That's a quick way to replace the current exception with something else.

This is SUCH a hack, but the following code works. If the request is in another function AND it does not raise an exception, then the socket is always closed.
def _fetch(self, url):
try:
return urllib2.urlopen(urllib2.Request(url), timeout=5).read()
except urllib2.URLError, e:
if isinstance(e.reason, socket.timeout):
return None
else:
raise e
def fetch(self, url):
x = None
while x is None:
x = self._fetch(url)
print "Timeout"
return x
Does ANYONE have a better way?

Related

How can I auto restart a python Socket after a crash without rebooting my server? [duplicate]

I've written a simple multi-threaded game server in python that creates a new thread for each client connection. I'm finding that every now and then, the server will crash because of a broken-pipe/SIGPIPE error. I'm pretty sure it is happening when the program tries to send a response back to a client that is no longer present.
What is a good way to deal with this? My preferred resolution would simply close the server-side connection to the client and move on, rather than exit the entire program.
PS: This question/answer deals with the problem in a generic way; how specifically should I solve it?

Assuming that you are using the standard socket module, you should be catching the socket.error: (32, 'Broken pipe') exception (not IOError as others have suggested). This will be raised in the case that you've described, i.e. sending/writing to a socket for which the remote side has disconnected.
import socket, errno, time
# setup socket to listen for incoming connections
s = socket.socket()
s.bind(('localhost', 1234))
s.listen(1)
remote, address = s.accept()
print "Got connection from: ", address
while 1:
try:
remote.send("message to peer\n")
time.sleep(1)
except socket.error, e:
if isinstance(e.args, tuple):
print "errno is %d" % e[0]
if e[0] == errno.EPIPE:
# remote peer disconnected
print "Detected remote disconnect"
else:
# determine and handle different error
pass
else:
print "socket error ", e
remote.close()
break
except IOError, e:
# Hmmm, Can IOError actually be raised by the socket module?
print "Got IOError: ", e
break
Note that this exception will not always be raised on the first write to a closed socket - more usually the second write (unless the number of bytes written in the first write is larger than the socket's buffer size). You need to keep this in mind in case your application thinks that the remote end received the data from the first write when it may have already disconnected.
You can reduce the incidence (but not entirely eliminate) of this by using select.select() (or poll). Check for data ready to read from the peer before attempting a write. If select reports that there is data available to read from the peer socket, read it using socket.recv(). If this returns an empty string, the remote peer has closed the connection. Because there is still a race condition here, you'll still need to catch and handle the exception.
Twisted is great for this sort of thing, however, it sounds like you've already written a fair bit of code.

Read up on the try: statement.
try:
# do something
except socket.error, e:
# A socket error
except IOError, e:
if e.errno == errno.EPIPE:
# EPIPE error
else:
# Other error

SIGPIPE (although I think maybe you mean EPIPE?) occurs on sockets when you shut down a socket and then send data to it. The simple solution is not to shut the socket down before trying to send it data. This can also happen on pipes, but it doesn't sound like that's what you're experiencing, since it's a network server.
You can also just apply the band-aid of catching the exception in some top-level handler in each thread.
Of course, if you used Twisted rather than spawning a new thread for each client connection, you probably wouldn't have this problem. It's really hard (maybe impossible, depending on your application) to get the ordering of close and write operations correct if multiple threads are dealing with the same I/O channel.

I face with the same question. But I submit the same code the next time, it just works.
The first time it broke:
$ packet_write_wait: Connection to 10.. port 22: Broken pipe
The second time it works:
[1] Done nohup python -u add_asc_dec.py > add2.log 2>&1
I guess the reason may be about the current server environment.

My answer is very close to S.Lott's, except I'd be even more particular:
try:
# do something
except IOError, e:
# ooops, check the attributes of e to see precisely what happened.
if e.errno != 23:
# I don't know how to handle this
raise
where "23" is the error number you get from EPIPE. This way you won't attempt to handle a permissions error or anything else you're not equipped for.

Why is port scanning with python sockets so much slower on Windows than it is on linux?

I've tried looking around online through different python docs, forums, and other people's questions but I haven't found anyone with this same question.
What my scripts typically look like is I'll create a socket connection that tries connecting to ports 1-9999 and will only tell me when a port is open. When I run this on windows it takes 1 second to scan a port before moving on to the next one (60 ports/m. ~16.5m for 1000 ports). When I run the same scripts on linux, it'll cycle through all 9999 ports very quickly, while still returning the same desired results.
I was hoping to be able to build cross-compatible tools, but it appears linux
is just the better operating system when it comes to my networking needs? I have both at my disposal so I don't mind using one over the other. I'd just like to know if there's anything that could be done to make port scanning almost as equally fast on both operating systems, otherwise I won't spend as much time building on/for windows.
The difference in speed is the same regardless of which network I'm on.
My questions are:•Why is the performance so different on windows compared to linux when given the same functions?•Is there anything that can be done to make port scanning with sockets faster like it is on linux?
--edit--
here's the piece I use to check ports
def whole_scan(Host_):
service = ''
host = Host_
max_port = 9999
min_port = 1
def scan_host(host, port, r_code = 1):
try:
s = socket(AF_INET, SOCK_STREAM)
code = s.connect_ex((host, port))
if code == 0:
r_code = code
s.close()
except Exception, e:
pass
return r_code
hostip = gethostbyname(host)
for port in range(min_port, max_port):
try:
response = scan_host(host,port)
if response == 0:
try:
service = getservbyport(port)
except Exception, e:
service = 'n/a'
print(" |--port: %d\t%s" % (port,service.upper()))
except Exception, e:
pass
I've also verified my firewall is disabled and adding the value to my registry to disable the limit on connections had no change on performance. I'm on windows 10.

Windows limits the concurrent number of half-open connections and that may be at play here if you are opening that many connection requests at a time. For example, on Windows 7 try setting this key value to 0 (to disable it)
HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters\EnableConnectionRateLimiting

I doubt that this is causing the performance problem, however, there is a bug in your scan_host() function.
This function attempts to return r_code, however, r_code is only set if connect_ex() returns 0. Should connect_ex() return a non-zero value, or an exception were to occur in the same block of code, r_code would not be set, and the return statement would raise a NameError exception. This exception will propagate to the calling code, which catches it then and ignores it and all other exceptions.
It's not a good idea to ignore exceptions; perhaps you might learn something relevant to the problem, perhaps not, but I suggest that you log the exceptions that are occurring.
Also, it would be useful if you added some debug print statements into your code. This will help you locate the part of your code where the majority of time is spent.
There is also the line:
hostip = gethostbyname(host)
which never seems to be executed - I can't tell because perhaps the indentation in your post is not quite right.
Another thing to consider is DNS. Possibly the DNS server used by Windows is slower, or there is some issue there. You could eliminate that by using the IP address instead of host name:
response = scan_host(gethostbyname(host), port)

Pyro4 sometimes throwing "Connection reset by peer" (Erno 104) after exceeding 4 concurrent connections

I have searched and searched and can't find an answer. I am trying to open a Pyro connection between two unix devices. I can connect 4 times to the device using a Pyro4 Proxy with an identical URI string. On the fifth connection, the instance hangs on my get data function call. It goes through the core.py pyro package and ends up waiting to get the data. Very occasionally, one of these open connections that was created after the fourth one will throw a ConnectionClosedError exception that looks like this:
ConnectionClosedError("receiving: connection lost: "+str(x))
ConnectionClosedError: receiving: connection lost: [Errno 104] Connection reset by peer
If I haven't been clear, the following is what causes this issue:
-Open 4 connections on different SSH sessions to the device and run repeated tests which setup a pyro proxy. (These work just fine and complete without error)
-Open more connections, all hanging on my call to get data. They hang for at least 5 minutes, and some will infrequently raise the above exception.
-Not all of them will do this. Once 1 of the 4 running tests finishes, the 5th test that was hanging will pick up and finish just fine. The others will follow, but never any more than 4 at a time.
Lastly, the following code (in socketutil.py) is where the exception is actually happening:
def receiveData(sock, size):
"""Retrieve a given number of bytes from a socket.
It is expected the socket is able to supply that number of bytes.
If it isn't, an exception is raised (you will not get a zero length result
or a result that is smaller than what you asked for). The partial data that
has been received however is stored in the 'partialData' attribute of
the exception object."""
try:
retrydelay=0.0
msglen=0
chunks=[]
if hasattr(socket, "MSG_WAITALL"):
# waitall is very convenient and if a socket error occurs,
# we can assume the receive has failed. No need for a loop,
# unless it is a retryable error.
# Some systems have an erratic MSG_WAITALL and sometimes still return
# less bytes than asked. In that case, we drop down into the normal
# receive loop to finish the task.
while True:
try:
data=sock.recv(size, socket.MSG_WAITALL)
if len(data)==size:
return data
# less data than asked, drop down into normal receive loop to finish
msglen=len(data)
chunks=[data]
break
except socket.timeout:
raise TimeoutError("receiving: timeout")
except socket.error:
x=sys.exc_info()[1]
err=getattr(x, "errno", x.args[0])
if err not in ERRNO_RETRIES:
################HERE:###############
raise ConnectionClosedError("receiving: connection lost: "+str(x))
time.sleep(0.00001+retrydelay) # a slight delay to wait before retrying
retrydelay=__nextRetrydelay(retrydelay)
Would really appreciate some direction here. Thanks in advance!

Turns out it was the minimum number of threads that the server was creating on boot. For some reason, it wouldn't add any more when it should have.

Python telnetlib: surprising problem

I am using the Python module telnetlib to create a telnet session (with a chess server), and I'm having an issue I really can't wrap my brain around. The following code works perfectly:
>>> f = login("my_server") #code for login(host) below.
>>> f.read_very_eager()
This spits out everything the server usually prints upon login. However, when I put it inside a function and then call it thus:
>>> def foo():
... f = login("my_server")
... return f.read_very_eager()
...
>>> foo()
I get nothing (the empty string). I can check that the login is performed properly, but for some reason I can't see the text. So where does it get swallowed?
Many thanks.
For completeness, here is login(host):
def login(host, handle="guest", password=""):
try:
f = telnetlib.Telnet(host) #connect to host
except:
raise Error("Could not connect to host")
f.read_until("login: ")
try:
f.write(handle + "\n\r")
except:
raise Error("Could not write username to host")
if handle == "guest":
f.read_until(":\n\r")
else:
f.read_until("password: ")
try:
f.write(password + "\n\r")
except:
raise Error("Could not write password to host")
return f

The reason why this works when you try it out manually but not when in a function is because when you try it out manually, the server has enough time to react upon the login and send data back. When it's all in one function, you send the password to the server and never wait long enough for the server to reply.
If you prefer a (probably more correct) technical answer:
In file telnetlib.py (c:\python26\Lib\telnetlib.py on my Windows computer), function read_very_eager(self) calls self.sock_avail() Now, function sock_avail(self) does the following:
def sock_avail(self):
"""Test whether data is available on the socket."""
return select.select([self], [], [], 0) == ([self], [], [])
What this does is really simple: if there is -anything- to read from our socket (the server has answered), it'll return True, otherwise it'll return False.
So, what read_very_eager(self) does is: check if there is anything available to read. If there is, then read from the socket, otherwise just return an empty string.
If you look at the code of read_some(self) you'll see that it doesn't check if there is any data available to read. It'll try reading till there is something available, which means that if the server takes for instance 100ms before answering you, it'll wait 100ms before returning the answer.

I'm having the same trouble as you, unfortunately the combination of select.select, which I have in a while loop until I am able to read, and then calling read_some() does not work for me, still only reading 1% of the actual output. If I put a time.sleep(10) on before I read and do a read_very_eager() it seems to work...this is a very crude way of doing things but it does work..I wish there was a better answer and I wish I had more reputation points so I could respond to user387821 and see if he has any additional tips.

How should I correctly handle exceptions in Python3

I can't understand what sort of exceptions I should handle 'here and now', and what sort of exceptions I should re-raise or just don't handle here, and what to do with them later (on higher tier). For example: I wrote client/server application using python3 with ssl communication. Client is supposed to verify files on any differences on them, and if diff exists then it should send this 'updated' file to server.
class BasicConnection:
#blablabla
def sendMessage(self, sock, url, port, fileToSend, buffSize):
try:
sock.connect((url, port))
while True:
data = fileToSend.read(buffSize)
if not data: break
sock.send(data)
return True
except socket.timeout as toErr:
raise ConnectionError("TimeOutError trying to send File to remote socket: %s:%d"
% (url,port)) from toErr
except socket.error as sErr:
raise ConnectionError("Error trying to send File to remote socket: %s:%d"
% (url,port)) from sErr
except ssl.SSLError as sslErr:
raise ConnectionError("SSLError trying to send File to remote socket: %s:%d"
% (url,port)) from sslErr
finally:
sock.close()
Is it right way to use exceptions in python? The problem is: what if file.read() throws IOError? Should I handle it here, or just do nothing and catch it later? And many other possible exceptions?
Client use this class (BasicConnection) to send updated files to server:
class PClient():
def __init__(self, DATA):
'''DATA = { 'sendTo' : {'host':'','port':''},
'use_ssl' : {'use_ssl':'', 'fileKey':'', 'fileCert':'', 'fileCaCert':''},
'dirToCheck' : '',
'localStorage': '',
'timeToCheck' : '',
'buffSize' : '',
'logFile' : ''} '''
self._DATA = DATA
self._running = False
self.configureLogging()
def configureLogging(self):
#blablabla
def isRun(self):
return self._running
def initPClient(self):
try:
#blablabla
return True
except ConnectionError as conErr:
self._mainLogger.exception(conErr)
return False
except FileCheckingError as fcErr:
self._mainLogger.exception(fcErr)
return False
except IOError as ioErr:
self._mainLogger.exception(ioErr)
return False
except OSError as osErr:
self._mainLogger.exception(osErr)
return False
def startPClient(self):
try:
self._running = True
while self.isRun():
try :
self._mainLogger.debug("Checking differences")
diffFiles = FileChecker().checkDictionary(self._dict)
if len(diffFiles) != 0:
for fileName in diffFiles:
try:
self._mainLogger.info("Sending updated file: %s to remote socket: %s:%d"
% (fileName,self._DATA['sendTo']['host'],self._DATA['sendTo']['port']))
fileToSend = io.open(fileName, "rb")
result = False
result = BasicConnection().sendMessage(self._sock, self._DATA['sendTo']['host'],
self._DATA['sendTo']['port'], fileToSend, self._DATA['buffSize'])
if result:
self._mainLogger.info("Updated file: %s was successfully delivered to remote socket: %s:%d"
% (fileName,self._DATA['sendTo']['host'],self._DATA['sendTo']['port']))
except ConnectionError as conErr:
self._mainLogger.exception(conErr)
except IOError as ioErr:
self._mainLogger.exception(ioErr)
except OSError as osErr:
self._mainLogger.exception(osErr)
self._mainLogger.debug("Updating localStorage %s from %s " %(self._DATA['localStorage'], self._DATA['dirToCheck']))
FileChecker().updateLocalStorage(self._DATA['dirToCheck'],
self._DATA['localStorage'])
self._mainLogger.info("Directory %s were checked" %(self._DATA['dirToCheck']))
time.sleep(self._DATA['timeToCheck'])
except FileCheckingError as fcErr:
self._mainLogger.exception(fcErr)
except IOError as ioErr:
self._mainLogger.exception(ioErr)
except OSError as osErr:
self._mainLogger.exception(osErr)
except KeyboardInterrupt:
self._mainLogger.info("Shutting down...")
self.stopPClient()
except Exception as exc:
self._mainLogger.exception(exc)
self.stopPClient()
raise RuntimeError("Something goes wrong...") from exc
def stopPClient(self):
self._running = False
Is it correct? May be someone spend his own time and just help me to understand pythonic style of handling exceptions? I can't understand what to do with such exceptions as NameError, TypeError, KeyError, ValueError...and so on.......They could be thrown at any statement, at any time... and what to do with them, if I want to logged everything.
And what information should people usually log? If error occurs, what info about it I should log? All traceback, or just relevant message about it or something else?
I hope somebody helps me.
Thanks a lot.

In general, you should "catch" the exceptions that you expect to happen (because they may be caused by user error, or other environmental problems outside of your program's control), especially if you know what your code might be able to do about them. Just giving more details in an error report is a marginal issue, though some programs' specs may require doing that (e.g. a long-running server that's not supposed to crash due to such problems, but rather log a lot of state information, give the user a summary explanation, and just keep working for future queries).
NameError, TypeError, KeyError, ValueError, SyntaxError, AttributeError, and so on, can be thought of as due to errors in the program -- bugs, not problems outside of the programmer's control. If you're releasing a library or framework, so that your code is going to be called by other code outside of your control, then such bugs may quite likely be in that other code; you should normally let the exception propagate to help the other programmer debug their own bugs. If you're releasing an application, you own the bugs, and you must pick the strategy that helps you find them.
If your bugs show up while an end-user is running the program, you should log a lot of state information, and give the user a summary explanation and apologies (perhaps with a request to send you the log info, if you can't automate that -- or, at least, ask permission before you send anything from the user's machine to yours). You may be able to save some of the user's work so far, but often (in a program that's known to be buggy) that may not work anyway.
Most bugs should show up during your own testing of course; in that case, propagating the exception is useful as you can hook it up to a debugger and explore the bug's details.
Sometimes some exceptions like these show up just because "it's easier to ask forgiveness than permission" (EAFP) -- a perfectly acceptable programming technique in Python. In that case of course you should handle them at once. For example:
try:
return mylist[theindex]
except IndexError:
return None
here you might expect that theindex is generally a valid index into mylist, but occasionally outside of mylist's bounds -- and the latter case, by the semantics of the hypothetic app in which this snippet belongs, is not an error, just a little anomaly to be fixed by considering the list to be conceptually extended on both sides with infinite numbers of Nones. It's easier to just try/except than to properly check for positive and negative values of the index (and faster, if being out of bounds is a truly rare occurrence).
Similarly appropriate cases for KeyError and AttributeError happen less frequently, thanks to the getattr builtin and get method of dicts (which let you provide a default value), collections.defaultdict, etc; but lists have no direct equivalent of those, so the try/except is seen more frequently for IndexError.
Trying to catch syntax errors, type errors, value errors, name errors, etc, is a bit rarer and more controversial -- though it would surely be appropriate if the error was diagnosed in a "plug-in", third-party code outside your control which your framework/application is trying to load and execute dynamically (indeed that's the case where you're supplying a library or the like and need to coexist peacefully with code out of your control which might well be buggy). Type and value errors may sometimes occur within an EAFP pattern -- e.g. when you try to overload a function to accept either a string or a number and behave slightly differently in each case, catching such errors may be better than trying to check types -- but the very concept of functions thus overloaded is more often than not quite dubious.
Back to "user and environmental errors", users will inevitably make mistakes when they give you input, indicate a filename that's not actually around (or that you don't have permission to read, or to write if that's what you're supposed to be doing), and so on: all such errors should of course be caught and result in a clear explanation to the user about what's gone wrong, and another chance to get the input right. Networks sometime go down, databases or other external servers may not respond as expected, and so forth -- sometimes it's worth catching such problems and retrying (maybe after a little wait -- maybe with an indication to the user about what's wrong, e.g. they may have accidentally unplugged a cable and you want to give them a chance to fix things and tell you when to try again), sometimes (especially in unattended long-running programs) there's nothing much you can do except an ordered shutdown (and detailed logging of every possibly-relevant aspect of the environment).
So, in brief, the answer to your Q's title is, "it depends";-). I hope I have been of use in listing many of the situations and aspects on which it can depend, and recommending what's generally the most useful attitude to take towards such issues.

To start with, you don't need any _mainLogger.
If you want to catch any exceptions, maybe to log or send them by email or whatever, do that at the highest possible level -- certainly not inside this class.
Also, you definitely don't want to convert every Exception to a RuntimeError. Let it emerge. The stopClient() method has no purpose right now. When it has, we'll look at it..
You could basically wrap the ConnectionError, IOError and OSError together (like, re-raise as something else), but not much more than that...

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

urllib2 times out but doesn't close socket connection - python

Related

How can I auto restart a python Socket after a crash without rebooting my server? [duplicate]

Why is port scanning with python sockets so much slower on Windows than it is on linux?

Pyro4 sometimes throwing "Connection reset by peer" (Erno 104) after exceeding 4 concurrent connections

Python telnetlib: surprising problem

How should I correctly handle exceptions in Python3

Categories

Resources