The server that runs my Python code which get some data off of a website has to jump through so many hoops just to get to the DNS server that the latency gets to the point where it times out the DNS resolution (this is out of my control). Sometimes it works, sometimes it doesn't.
So I am trying to do some exception handling and try to ensure that it works.
Goal:
Increase the DNS timeout. I am unsure of a good time but let's go with 30 seconds.
Try to resolve the website 5 times, if it resolves, proceed to scrape the website. If it doesn't, keep trying until the 5 attempts are up.
Here is the code using google.com as an example.
import socket
import http.client
#Confirm that DNS resolution is successful.
def dns_lookup(host):
try:
socket.getaddrinfo(host, 80)
except socket.gaierror:
return "DNS resolution to the host failed."
return True
#Scrape the targeted website.
def request_website_data():
conn = http.client.HTTPConnection("google.com")
conn.request("GET", "/")
res = conn.getresponse()
if (res.status == 200):
print("Connection to the website worked! Do some more stuff...")
else:
print("Connection to the website did not work. Terminating.")
#Attempt DNS resolution 5 times, if it succeeds then immediately request the website and break the loop.
for x in range(5):
dns_resolution = dns_lookup('google.com')
if dns_resolution == True:
request_website_data()
break
else:
print(dns_resolution)
I am looking through the socket library socket.settimeout(value) and I am unsure if that's what I'm looking for. What would I insert into my code to have a more forgiving and longer DNS resolution time?
Thank you.
Related
So first I'll describe what I am doing.
A game is providing a webinterface but only on IPv4 and I would like people out in the internet to reach it too. From my ISP I however only get public IPv6. And since I couldn't find anything on the internet to translate requests and responses I wrote a little app that does. So IPv6 requests get forwarded to the webserver and the webservers responses get translated back to IPv6. That's working fine.
The only troubling bit is that... not all requests get detected or whatever is happening, like I first visit the webpage and sometimes it just hangs and says that it's waiting for a style.css file but when I look at the console output there's no reported connection. And generally there's just a whole lot of delay when you try doing something with the webinterface.
Then here is my code: (A word of warning I don't really know what everything with the networking exactly does, the stuff around the sending I especially don't quite understand if it's even needed, I just found it online)
def handle_request(return_address):
ipv4side = socket.create_connection(("127.0.0.1", 7245))
request = return_address.recv(2048)
print(request)
request = str(request, 'utf-8')
p = re.compile('\\[[^]]*]:7250')
m = p.search(request)
request = request.replace(request[m.start():m.end()], '127.0.0.1:7245')
request = request.encode('utf-8')
msg_len = len(request)
totalsent = 0
while totalsent < msg_len:
sent = ipv4side.send(request[totalsent:])
if sent == 0:
raise RuntimeError("socket connection broken")
totalsent += sent
while True:
response = ipv4side.recv(2048)
if len(response) == 0:
ipv4side.close()
return
msg_len = len(response)
totalsent = 0
while totalsent < msg_len:
sent = return_address.send(response[totalsent:])
if sent == 0:
raise RuntimeError("socket connection broken")
totalsent += sent
ipv6side = socket.socket(socket.AF_INET6, socket.SOCK_STREAM)
ipv6side.bind((IPV6, 7250))
ipv6side.listen(20)
ipv6side.settimeout(30)
while True:
try:
connected_socket = ipv6side.accept()[0]
print("NEW CONNECTION!" + str(connected_socket))
Thread(target=handle_request, args=(connected_socket,)).start()
except socket.timeout:
print("nothing new...")
I hope anyone can help me with this :D
I fixed the problem! (I think)
So, I followed what user253751 said about how HTTP is allowed to use the same connection for more than one request so I made adjustments and also so that that works it now has to detect the end-of-response/request (eor) for the responses and requests.
First thing I did was wrap the whole handle_request code in a while statement for the multiple requests on one connection thing.
For the end-of-response I am using regex
eor = re.compile('\\r\\n\\r\\n\\Z')
and then at the appropriate place:
eorm = eor.search(str(response, 'ISO-8859-1', ignore)) # I replaced the utf-8
# by the correct ISO-
# 8859-1 used for HTTP
# just to be safe
if eorm or len(response) == 0:
safe_send(return_address, response)
ipv4side.close()
break
And for the end-of-request it's basically the same just it only looks that the request is 0 length.
The code responsible for sending I put into the safe_send function that takes a connection and a msg.
Aaand I coded it so that in case the server aborts a connection for some reason and thus throws an error when trying to receive, it resends the request on a new connection.
try:
response = ipv4side.recv(2048)
except ConnectionAbortedError:
ipv4side = socket.create_connection(("127.0.0.1", 7245))
safe_send(ipv4side, request)
continue
I hope this is a good explanation :]
Trying to create a program that runs 24x7 in the background and undertakes actions triggered by change in the up/down i.e connected/disconnected state of the ethernet adapter port in PCs/Laptops. How to implement the same in Python?
For ex:
conn = True
def conCheck()
if conn == False:
<trigger onward events>
else:
<else statements>
Any help will be greatly appreciated. Trying to take up and learn python.
Thanks and regards
You can write a function that can request a web page for checking the internet connection.
If there is no internet, You might be getting a ConnectionError.
Otherwise, You'll get a response from the URL. i.e. You're connected with the internet.
Use the following code & try to interrupt the connection your code is executing. You'll see the difference when your internet gets interrupted.
import time
import requests
url = "http://www.kite.com"
timeout = 5
conn = True
def conCheck():
try:
request = requests.get(url, timeout=timeout)
print("Connected to the Internet")
return True
except (requests.ConnectionError, requests.Timeout) as exception:
print("You're not connected to the internet. ")
return False
while True:
conCheck()
time.sleep(1)
My college has some ports. Something like this
http://www.college.in:913
I want a program to find the active ones. I mean I want those port number in which the website is working.
Here is a code. But it takes a lot of time.
from urllib.request import Request, urlopen
from urllib.error import URLError, HTTPError
for i in range(1,10000):
req = Request("http://college.edu.in:"+str(i))
try:
response = urlopen(req)
except URLError as e:
print("Error at port"+str(i) )
else:
print ('Website is working fine'+str(i))
It might be faster to try open a socket connection to each port in the range and then only try to make a request if the socket is actually open. But it's often slow to iterate through a bunch of ports. if it takes 0.5 seconds for each, and you're scanning 10000 ports that's a lot of time waiting.
# create an INET, STREAMing socket
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
# now connect to the web server on port 80 - the normal http port
s.connect(("www.python.org", 80))
s.close()
from https://docs.python.org/3/howto/sockets.html
You might also consider profiling the code and finding out where the slow parts are.
You can use python-nmap, which is similar to nmap.
I am writing a crawler using multiple proxies, basically (I have a pool of verified proxies) in a single process, I start 30 threads, each random picks one of the proxies and use it to fetch some url and I set the timeout of each request to 30 seconds.
However, after running for a while, I got the error of opening too many files, I guess there are some connections not closed?
And if I don't use proxy with same number of threads, there are no such error.
Could someone help?
Code:
code starting threads:
...
# the queue of url for request
queue.add_url(url)
for i in range(numOfThreads):
x = crawlerThread(...)
threadList.append(x)
x.start()
time.sleep(30)
...
code in crawling:
while currentUrl:
# different sessions use different ip of the servers
sessionId = (sessionId+1) % len(self.sessions)
try:
session, proxy = self.random_use_ip_session_proxy(sessionId)
if proxy:
# if proxy is not None, use proxy, otherwise use my own ip, returned proxy is list of two elements,
# the first one is proxy, second one is for counting
response = session.get(currentUrl, timeout=60, verify=False, proxies=proxy[0])
else:
response = session.get(currentUrl, timeout=60, verify=False)
except Exception as e:
# some error handling
... # analyze the code and produce more url
Edit:
Now the program won't report too many open file error (I am still seeing the number of socket connects grow rapidly till 10000), but suddenly it just stopped with no error, is it possible somehow it is killed by kernel? Where can I check this?
Just as Cory Shay mentioned in the comment, response.close() will close the connection, but the content of response is still available.
I have a simple question about Python:
I have another Python script listening on a port on a Linux machine.
I have made it so I can send a request to it, and it will inform another system that it is alive and listening.
My problem is that I don't know how to send this request from another python script running on the same machine (blush)
I have a script running every minute, and I would like to expand it to also send this request. I dont expect to get a response back, my listening-script postes to a database.
In Internet Explorer, I write like this: http://192.168.1.46:8193/?Ping
I would like to know how to do this from Python, and preferably just send and not hang if the other script is not running.
thanks
Michael
It looks like you are doing an HTTP request, rather than an ICMP ping.
urllib2, built-in to Python, can help you do that.
You'll need to override the timeout so you aren't hanging too long. Straight from that article, above, here is some example code for you to tweak with your desired time-out and URL.
import socket
import urllib2
# timeout in seconds
timeout = 10
socket.setdefaulttimeout(timeout)
# this call to urllib2.urlopen now uses the default timeout
# we have set in the socket module
req = urllib2.Request('http://www.voidspace.org.uk')
response = urllib2.urlopen(req)
import urllib2
try:
response = urllib2.urlopen('http://192.168.1.46:8193/?Ping', timeout=2)
print 'response headers: "%s"' % response.info()
except IOError, e:
if hasattr(e, 'code'): # HTTPError
print 'http error code: ', e.code
elif hasattr(e, 'reason'): # URLError
print "can't connect, reason: ", e.reason
else:
raise # don't know what it is
This is a bit outside my knowledge, but maybe this question might help?
Ping a site in Python?
Considered Twisted? What you're trying to achieve could be taken straight out of their examples. It might be overkill, but if you'll eventually want to start adding authentication, authorization, SSL, etc. you might as well start in that direction.