I'm trying to open an url with urllib2 patched with gevent on Windows XP:
from gevent import monkey
monkey.patch_all()
import urllib2
opener = urllib2.build_opener()
request = urllib2.Request("http://www.google.com")
response = opener.open(request)
And I get this exception during the opener.open call:
File "C:\Python26\lib\site-packages\gevent\socket.py", line 768, in getaddrinfo
sockaddr = (inet_ntop(AF_INET6, res), port, 0, 0)
File "C:\Python26\lib\site-packages\gevent\socket.py", line 133, in inet_ntop
raise NotImplementedError('inet_ntop() is not available on this platform')
NotImplementedError: inet_ntop() is not available on this platform
<SERPScrapper at 0xbc0f60> failed with NotImplementedError
Looking at the gevent socket.py source code it seems to be related to IPV6 on windows...
Any idea or proposition to solve this problem ?
edit: I don't get the problem with other url (ie: http://www.bing.com). It seems that google is using IPV6. Is there a way to force an IPV4 response ?
Try making your request to http://ipv4.google.com/ instead.
Related
I have this code with httplib where config.testConnection is an xlm string.
conn = httplib.HTTPSConnection(url1)
conn.request('POST', url2, config.testConnection, config.headers)
response = conn.getresponse()
data = response.read().decode('utf-8')
But I have ssl error : socket.sslerror: (1, 'error:14077410:SSL routines:SSL23_GET_SERVER_HELLO:sslv3 alert handshake failure')
With Python 2.4 I cannot use ssl._create_unverified_context() and I really need verified HTTPS connexion.
I have found this kind of pages https://bugzilla.redhat.com/show_bug.cgi?id=1064942 which sais that it might be a bug between python on server and java of webservice.
But I cannot modify any packages like this. Is there a workaround directly to put in code please ?
import requests
import socket
from unittest.mock import patch
orig_getaddrinfo = socket.getaddrinfo
def getaddrinfoIPv6(host, port, family=0, type=0, proto=0, flags=0):
return orig_getaddrinfo(host=host, port=port, family=socket.AF_INET6, type=type, proto=proto, flags=flags)
with patch('socket.getaddrinfo', side_effect=getaddrinfoIPv6):
r = requests.get('http://icanhazip.com')
print(r.text)
Instead of using a ipv4 proxy to connect to a website, I would like to connect using an ipv6 https proxy. I have scoured google for answers, and have not found any (that I understand)... Closest I have found is... (does not use the ipv6 proxy, instead uses my own ipv6). I am open to using something besides requests to do this, however, requests are prefered. I will be attempting to thread later on.
import requests
from requests.packages.urllib3.exceptions import InsecureRequestWarning
requests.packages.urllib3.disable_warnings(InsecureRequestWarning)
proxy = {"http":"http://username:password#[2604:0180:2:3b5:9ebc:64e9:166c:d9f9]", "https":"https://username:password#[2604:0180:2:3b5:9ebc:64e9:166c:d9f9]"}
url = "https://icanhazip.com"
r = requests.get(url, proxies=proxy, verify=False)
print(r.content)
If the code above does not work
import requests
proxy = {"http": "http://userame:password#168.235.109.30:18117", "https":"https://userame:password#168.235.109.30:18117"}
url = "https://icanhazip.com"
r = requests.get(url, proxies=proxy)
print(r.content)
This is my current provider for my ipv6 https proxy, however, they are using ipv6 over ipv4 to their clients, so this is why this code works, and the above code does not (if using the same provider) If you using a provider that supports ipv6 all by itself, then the code at the top should work for you.
You can use https://proxyturk.net/
Example curl command:
curl -m 90 -x http://proxyUsername:proxyPassword#93.104.200.99:20000 http://api6.ipify.org
You will see example result:
2a13:c206:2021:1522:9c5a:3ed5:156b:c1d0
tl;dr: Used the httplib to create a connection to a site. I failed, I'd love some guidance!
I've ran into some trouble. Read about socket and httplib of python's, altough I have some problems with the syntax, it seems.
Here is it:
connection = httplib.HTTPConnection('www.site.org', 80, timeout=10, 1.2.3.4)
The syntax is this:
httplib.HTTPConnection(host[, port[, strict[, timeout[, source_address]]]])
How does "source_address" behave? Can I make requests with any IP from it?
Wouldn't I need an User-Agent for it?
Also, how do I check if the connect is successful?
if connection:
print "Connection Successful."
(As far as I know, HTTP doesn't need a "are you alive" ping every one second, as long as both client & server are okay, when a request is made, it'll be processed. So I can't constantly ping.)
Creating the object does not actually connect to the website:
HTTPConnection.connect():
Connect to the server specified when the object was created.
source_address seems to be sent to the server with any request, but it doesn't
seem to have any effect. I'm not sure why you'd need to use a User-Agent for it.
Either way, it is an optional parameter.
You don't seem to be able to check if a connection was made, either, which
is strange.
Assuming what you want to do is get the contents of the website root, you can use this:
from httplib import HTTPConnection
conn = HTTPConnection("www.site.org", 80, timeout=10)
conn.connect()
conn.request("GET", "http://www.site.org/")
resp = conn.getresponse()
data = resp.read()
print(data)
(slammed together from the HTTPConnection documentation)
Honestly though, you should not be using httplib, but instead urllib2 or another HTTP library that is less... low-level.
I encountered an error today while trying to retrieve an XML by sending a 'GET' HTTP request.
from httplib import HTTPConnection
import urllib
params = urllib.urlencode({'sK': 'test', 'sXML': 1})
httpCon = HTTPConnection("http://www.podnapisi.net",80)
httpCon.request('GET', '/en/ppodnapisi/search',params)
r1 = httpCon.getresponse()
and here is the error i got:
.....
File "C:\Python27\lib\socket.py", line 553, in create_connection
for res in getaddrinfo(host, port, 0, SOCK_STREAM):
socket.gaierror: [Errno 11004] getaddrinfo failed
The XML that I am trying to retrieve HERE
How can I fix this error ?
Thanks in Advance ...
No scheme (http://) in the HTTPConnection constructor:
httpCon = HTTPConnection("www.podnapisi.net",80)
It already knows it's HTTP, it's an HTTPConnection object :)
You accidentally included the protocol prefix in the domain argument to HTTPConnection. You want:
httpCon = HTTPConnection("www.podnapisi.net", 80)
Generally, This error indicates there was a problem resolving the domain name to an IP address. In It might be just intermittent. If the problem persists, check the DNS configuration on your system.
For example, you can set it to use Google's public DNS server. For more information about how to configure your DNS server on Microsoft Windows, refer to Microsoft's knowledge database.
This question already has answers here:
Proxy with urllib2
(7 answers)
Closed 7 years ago.
I familiar with the fact that I should set the HTTP_RPOXY environment variable to the proxy address.
Generally urllib works fine, the problem is dealing with urllib2.
>>> urllib2.urlopen("http://www.google.com").read()
returns
urllib2.URLError: <urlopen error [Errno 10061] No connection could be made because the target machine actively refused it>
or
urllib2.URLError: <urlopen error [Errno 11004] getaddrinfo failed>
Extra info:
urllib.urlopen(....) works fine! It is just urllib2 that is playing tricks...
I tried #Fenikso answer but I'm getting this error now:
URLError: <urlopen error [Errno 10060] A connection attempt failed because the
connected party did not properly respond after a period of time, or established
connection failed because connected host has failed to respond>
Any ideas?
You can do it even without the HTTP_PROXY environment variable. Try this sample:
import urllib2
proxy_support = urllib2.ProxyHandler({"http":"http://61.233.25.166:80"})
opener = urllib2.build_opener(proxy_support)
urllib2.install_opener(opener)
html = urllib2.urlopen("http://www.google.com").read()
print html
In your case it really seems that the proxy server is refusing the connection.
Something more to try:
import urllib2
#proxy = "61.233.25.166:80"
proxy = "YOUR_PROXY_GOES_HERE"
proxies = {"http":"http://%s" % proxy}
url = "http://www.google.com/search?q=test"
headers={'User-agent' : 'Mozilla/5.0'}
proxy_support = urllib2.ProxyHandler(proxies)
opener = urllib2.build_opener(proxy_support, urllib2.HTTPHandler(debuglevel=1))
urllib2.install_opener(opener)
req = urllib2.Request(url, None, headers)
html = urllib2.urlopen(req).read()
print html
Edit 2014:
This seems to be a popular question / answer. However today I would use third party requests module instead.
For one request just do:
import requests
r = requests.get("http://www.google.com",
proxies={"http": "http://61.233.25.166:80"})
print(r.text)
For multiple requests use Session object so you do not have to add proxies parameter in all your requests:
import requests
s = requests.Session()
s.proxies = {"http": "http://61.233.25.166:80"}
r = s.get("http://www.google.com")
print(r.text)
I recommend you just use the requests module.
It is much easier than the built in http clients:
http://docs.python-requests.org/en/latest/index.html
Sample usage:
r = requests.get('http://www.thepage.com', proxies={"http":"http://myproxy:3129"})
thedata = r.content
Just wanted to mention, that you also may have to set the https_proxy OS environment variable in case https URLs need to be accessed.
In my case it was not obvious to me and I tried for hours to discover this.
My use case: Win 7, jython-standalone-2.5.3.jar, setuptools installation via ez_setup.py
Python 3:
import urllib.request
htmlsource = urllib.request.FancyURLopener({"http":"http://127.0.0.1:8080"}).open(url).read().decode("utf-8")
I encountered this on jython client.
The server was only talking TLS and the client using SSL context.
javax.net.ssl.SSLContext.getInstance("SSL")
Once the client was to TLS, things started working.