So I'm running socket.connect in Python to connect to a server like so (some parts removed cuz proprietary info):
self.sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
try:
self.sock.connect(("XXX.XXX.XXX.XXX", XXXX))
except socket.timeout:
DO_SOMETHING
Anyway what's weird is that this only works like 1 out of 10 times. 9 out of 10 times I get an error saying:
"TimeoutError: [WinError 10060] A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond"
I'm pretty sure I have the network configuration right for both server/client. The fact that it randomly works fine 1 out of 10 times seems to reinforce this. I think the problem is just timing. I think maybe 1 out of 10 times the timeout just so happens to happen after the server responds.
Anyway, I can't change how the server works in my case. Also changing the timeout fields of the socket object in Python does nothing to change this timeout (the timeout stays at a couple seconds, I want to change it to something much longer). This timeout seems to be some lower level Windows thing. Does anyone know how I would change the timeout related to WinError 10060?
Related
I'm trying to make a program which can automatically connect to a computer in the local network based on the port inputted by the server. then the client, with the same port, tries by using the arp -a command to find every computer in the local network and try to connect to him.
This is the connection Method:
def connect(self):
devices = []
for device in os.popen('arp -a'): devices.append(device)
for ip in devices:
b = re.findall(r"(?:\s|\A)(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})(?=\s|\Z)", ip)
try:
print(b[0])
client_socket = socket.socket()
client_socket.settimeout(3)
client_socket.connect((b[0], self.port))
if type(client_socket) != None:
return client_socket
except Exception as e:
print(e)
I get a pretty weird issue: when I try to be a server on one computer, it works out just fine. However, when I try to switch the roles and be the client on that computer, it suddenly cant find the target computer and when it tries its IP (which I know since I checked the IP address of the computer with ipconfig), it errors out:
[WinError 10061] No connection could be made because the target machine actively refused it
timed out
I'm trying this on 2 different computers and it connects perfectly when i try this with this computer as server. Any help would be appreciated.
Edit: Also, I thought it would be helpful to note that no matter how high i set the timeout to be on the socket, it just waits that amount of time with the correct IP, then says it timed out. This is despite the server binding the port already...
Edit 2.0: Thought about looking at connection errors of the connection program at the other computer... completely different. no 10061 errors, just timing out and list index errors which are perfectly understandable with the nature of the function. Why does it only does the 10061 error on one computer? why when its 2 different computers? I'd like to know.
I've got a python script that basically looks something like this:
#############################
# MAIN LOOP
while True:
client_socket = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
try:
client_socket.connect((url, socketnum))
packet = somedata
client_socket.sendall(packet)
except Exception as e:
# an error occurred
logging.error("An error occurred: {}".format(e))
pass
finally:
logging.info("Closing socket...")
client_socket.close()
time.sleep(70)
What I find is that if this script is run before an internet connection is established on the computer (an embedded Linux system), naturally, when the socket tries to connect, I get "Errno -3 Temporary failure in name resolution". However, if the internet connection is then established, the program STILL cannot resolve the hostname - the only way to get it to work is to restart the python script.
Since this system is not one where I can guarantee the presence of an internet connection at all times, is there anyway to get python to realise that the internet connection now exists and that name resolution information is now available?
EDIT: Some further testing shows that this only happens if the python program is started before any successful internet connection is established on the machine after a boot up. If the python program is started AFTER an internet connection has previously been established on the machine (even if it's subsequently been disconnected), the program operates correctly and will successfully connect to the internet after internet connectivity is restored.
So:
Bootup->Python started->Internet connection established = program doesn't work
Bootup->Internet connection established->Internet disconnected->Python started = program works fine.
Try flushing DNS cache in every iteration.
import os
...
while True:
os.popen('nscd -I hosts',"r")
...
or try service nscd restart command instead.
I'm going crazy writing a little socket server in python. Everything was working fine, but I noticed that in the case where the client just disappears, the server can't tell. I simulate this by pulling the ethernet cable between the client and server, close the client, then plug the cable back in. The server never hears that the client disconnected and will wait forever, never allowing more clients to connect.
I figured I'd solve this by adding a timeout to the read loop so that it would try and read every 10 seconds. I thought maybe if it tried to read from the socket it would notice the client was missing. But then I realized there really is no way for the server to know that.
So I added a heartbeat. If the server goes 10 seconds without reading, it will send data to the client. However, even this is successful (meaning doesn't throw any kind of exception). So I am able to both read and write to a client that isn't there any more. Is there any way to know that the client is gone without implementing some kind of challenge/response protocol between the client and server? That would be a breaking change in this case and I'd like to avoid it.
Here is the core of my code for this:
def _loop(self):
command = ""
while True:
socket, address = self._listen_socket.accept()
self._socket = socket
self._socket.settimeout(10)
socket.sendall("Welcome\r\n\r\n")
while True:
try:
data = socket.recv(1)
except timeout: # Went 10 seconds without data
pass
except Exception as e: # Likely the client closed the connection
break
if data:
command = command + data
if data == "\n" or data == "\r":
if len(command.strip()) > 0:
self._parse_command(command.strip(), socket)
command = ""
if data == '\x08':
command = command[:-2]
else: # Timeout on read
try:
self._socket.sendall("event,heartbeat\r\n") # Send heartbeat
except:
self._socket.close()
break
The sendall for the heartbeat never throws an exception and the recv only throws a timeout (or another exception if the client properly closes the connection under normal circumstances).
Any ideas? Am I wrong that sending to a client that doesn't ACK should generate an exception eventually (I've tested for several minutes).
The behavior you are observing is the expected behavior for a TCP socket connection. In particular, in general the TCP stack has no way of knowing that an ethernet cable has been pulled or that the (now physically disconnected) remote client program has shut down; all it knows is that it has stopped receiving acknowledgement packets from the remote peer, and for all it knows the packets could just be getting dropped by an overloaded router somewhere and the issue will resolve itself momentarily. Given that, it does what TCP always does when its packets don't get acknowledged: it reduces its transmission rate and its number-of-packets-in-flight limit, and retransmits the unacknowledged packets in the hope that they will get through this time.
Assuming the server's socket has outgoing data pending, the TCP stack will eventually (i.e. after a few minutes) decide that no data has gone through for a long-enough time, and unilaterally close the connection. So if you're okay with a problem-detection time of a few minutes, the easiest way to avoid the zombie-connection problem is simply to be sure to periodically send a bit of heartbeat data over the TCP connection, as you described. When the TCP stack tries (and repeatedly fails) to get the outgoing data sent-and-acknowledged, that is what eventually will trigger it to close the connection.
If you want something quicker than that, you'll need to implement your own challenge/response system with timeouts (either over the TCP socket, or over a separate TCP socket, or over UDP), but note that in doing so you are likely to suffer from false positives yourself (e.g. you might end up severing a TCP connection that was not actually dead but only suffering from a temporary condition of lost packets due to congestion). Whether or not that's a worthwhile tradeoff depends on what sort of program you are writing. (Note also that UDP has its own issues, particularly if you want your system to work across firewalls, etc)
I have created a python socket server, using a class inherited from SocketServer.BaseRequestHandler, overriding setup and handle methods. Of cource, SocketServer.BaseRequestHandler.setup is called at the end of my own setup.
This is my server class
class MyServer(SocketServer.ForkingMixIn, SocketServer.TCPServer):
timeout = 30
A typical forking socket server.
Here is how I run my server
while True:
try:
server = MyServer((host, port), MyRequestHandler)
print('Server listening on', (host, port))
server.timeout = 300 # seconds
server.serve_forever()
except:
print('Error with server, retrying in 5 seconds...')
print(sys.exc_info())
sleep(5)
host and port are predefined, no problem with them.
Server works fine, except when clients count reaches 40. After this number, no new connections will be accepted, all will be refused. I checked this with a client test python script from my own system. Only 40!
Why 40? I have checked source code for SocketServer and found nothing related to this. I currently have no clue regarding this issue. Any, and I really mean it, any help is appreciated :))
Thanks in advance
OS: CentOS 6.5
This is probably unrelated to Python. Tune your Linux kernel, in testing phase do stuff like:
turn syncookies off
increase file handles available for the user (every socket opened is also a file handle used - maybe you're running out of them?)
look at stuff like this: http://people.redhat.com/alikins/system_tuning.html#tcp
and: http://people.redhat.com/alikins/system_tuning.html#fds
check if stuff like fail2ban is installed (http://www.fail2ban.org/wiki/index.php/Main_Page)
check if rate limits are applied by iptables (in testing phase you could do iptables -F after making sure that default chain policy is ACCEPT)
and last but not in the very least, check dmesg, /var/log/messages, /var/log/syslog, etc
One thing that theoretically might be related to Python is SO_REUSEADDR:
http://www.unixguide.net/network/socketfaq/4.5.shtml
Check if you have it set for your socket.
UPDATE:
I just realized that since the 40 connections that your socket server maxes out at is actually pretty low, the simplest option could be running your socket server through systrace, just use -f flag to track forked processes as well. You could e.g. start socket server, open 35 simultaneous connections, and then connect systrace to a running process and set up 5 more connections and see what systrace reports. Very often in such situations syscalls fail with errors that are visible in systrace and allow pinpointing root cause relatively easily.
I really have now idea how I missed this in source!
class ForkingMixIn:
"""Mix-in class to handle each request in a new process."""
timeout = 300
active_children = None
max_children = 40
Yeah, now I see the max_children property.
Thanks guys
I'm using python socket to connect to a server but sometimes I get this:
error: [Errno 10060] A connection attempt failed because the connected
party did not properly respond after a period of time, or established
connection failed because connected host has failed to respond
when I call the socket.connect method
s= socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.bind((self._ipv4address, host_port))
try:
s.connect((dest_ip, dest_port))
except:
raise
Why am I seeing this error? And how do I solve the problem?
You don't need to bind the socket (unless the remote server has an expectation of incoming socket) - it is extremely rare that this would actually be a requirement to connect.
Instead of using sockets to open a website, use urllib2 or mechanize if you need to twiddle forms. They manage cookies, sessions, page state, etc.. Much easier.
Also, if you fail to to connect.. don't give up! Try again, some sites can be pokey to respond. Some may not respond for a while depending - handle it better. Instead of just raising the error, wrap your connection method with an exponential backoff decorator.