I'm writing this application where the user can perform a web search to obtain some information from a particular website.
Everything works well except when I'm connected to the Internet via Proxy (it's a corporate proxy).
The thing is, it works sometimes.
By sometimes I mean that if it stops working, all I have to do is to use any web browser (Chrome, IE, etc.) to surf the internet and then python's requests start working as before.
The error I get is:
OSError('Tunnel connection failed: 407 Proxy Authentication Required',)
My guess is that some sort of credentials are validated and the proxy tunnel is up again.
I tried with the proxies handlers but it remains the same.
My doubts are:
How do I know if the proxy need authentication, and if so, how to do it without hardcoding the username and password since this application will be used by others?
Is there a way to use the Windows default proxy configuration so it will work like the browsers do?
What do you think that happens when I surf the internet and then the python requests start working again?
I tried with requests and urllib.request
Any help is appreciated.
Thank you!
Check if there is any proxy setting in chrome
Related
I'm trying to test response time of webpages hosted on many backends. These hosts are behind load balancer and above that I have my domain.com.
I want to use python+selenium on these backends but with spoofed hostname, without messing with /etc/hosts or running fake DNS servers. Is that possible with pure selenium drivers?
To illustrate problem better, here is what's possible in curl and I'd like to do the same with python+selenium:
If you're on a UNIX system, you can try something as explained here:
https://unix.stackexchange.com/questions/10438/can-i-create-a-user-specific-hosts-file-to-complement-etc-hosts
Basically you still use a hosts file, but it's only for you, located in ~/.hosts, setting the HOSTALIASESenvironment variable.
In short, no.
Selenium drives browsers using the WebDriver standard, which is by definition limited to interactions with page content. Even though you can provide Selenium with configuration options for the browser, no browser provides control over Host headers or DNS resolution outside of a proxy.
But even if you could initiate a request for a particular IP address with a custom Host header, subsequent requests triggered by the content (redirection; downloading of page assets; AJAX calls; etc) would still be outside of your control and are prohibited from customizing the Host header, leading the browser to fall back to standard DNS resolution.
Your only options are modifying the local DNS resolution (via /etc/hosts) or providing an alternative (via a proxy).
I had been using urllib2 to parse data from html webpages. It was working perfectly for some time and stopped working permanently from one website.
Not only did the script stop working, but I was no longer able to access the website at all, from any browser. In fact, the only way I could reach the website was from a proxy, leading me to believe that requests from my computer were blocked.
Is this possible? Has this happened to anyone else? If that is the case, is there anyway to get unblocked?
It is indeed possible, maybe the sysadmin noticed that your IP was making way too many requests and decided to block it.
It could also be that the server has a limit of requests that you exceeded.
If you don't have a static IP, a restart of your router should reset your IP, making the ban useless.
We've decided not to use SSL anymore and unfortunately our server guy has quit and now I need to fix this. I've revoked the certs from Comodo, removed the SSL app from Heroku but that was apparently not enough and now we have serious problems with our site.
When visiting inteokej.nu one gets redirected to the app, but automatically http turns to https and instead of showing the domain (inteokej.nu) the app link is shown https://inteokej.herokuapp.com (I want inteokej.nu to be shown, not the actual app link).
That is a problem but not the biggest problem, which is that it's not possible to use the site anymore (e.g login, the static pages works though). When I try to login I first get a https security error and when I proceed I get to the following page: https://www.inteokej.nu/cgi-sys/defaultwebpage.cgi ("Sorry! If you are the owner of this website, please contact your hosting provider: webmaster#inteokej.nu").
I've now learned the hard way that SSL is a complex thing but I really need to get this site up again as soon as possible. So, where should I start and how could I proceed from this point? I guess there's some back end coding that should be done in the django code as well?
Thanks a lot in advance!
Your issue doesn't seem to be with SSL but DNS or at least however your server guy set things up.
The error page you're seeing isn't a Heroku error, inteokej.nu isn't being hosted on Heroku but on a server run by your DNS provider svenskadomaner.se .
If you use the Firefox Live HTTP Headers plugin you can follow the request/response cycle and you'll see that there is a 301 redirect from www.inteokej.nu to inteokej.herokuapp.com (probably an .htaccess redirect).
Check the DNS records for your domain (like here http://viewdns.info/dnsrecord/?domain=inteokej.nu ) you'll see that there is no CNAME record to Heroku, only an A Record to 46.22.116.5 which is an IP Address owned by svenskadomaner.se.
So the thing to do is to set up the custom domain as recommended on Heroku's site:
https://devcenter.heroku.com/articles/custom-domains
and set the CNAME to Heroku's recommendation.
One reason your server guy might have set things up like they did is that Heroku doesn't easily allow "naked domains", so people often do .htaccess redirects from example.com to www.example (which does work easily with CNAMEs).
Good luck!
I am behind my college's ISA Proxy | Forefront Threat Management Gateway. The proxy uses NTLM Auth, so we are given credentials along with the proxy server ip and port. I have been trying a lot of Python modules like urllib,ullib2,urllib3, requests,requests-ntlm httlib and even cntlm and ntlm proxy. Nothing is working in my case. It's returning " 407 Proxy Authentication Required ( Forefront
TMG requires authorization to fulfill the request. Access to the Web Proxy filter is denied. )" or any socket error. I even tried ntlmaps. It too didn't worked out. I know NTLM is already kind of depreciated. But still some of you guys may be working behind your corporate proxy with NTLM Auth. So Any work arounds? I want a pythonic code that works on both Windows and Linux that can communicate to internet through the intermediate upstream proxy.
check out this article http://www.codemiller.com/blog/2011/05/28/overcoming-auth-pop-ups/
Maybe it will help - maybe it wont, but worth trying some of the solutions - as one of them worked for me in working around the glorious ntlm auth.
have you tried cntlm? with http_proxy environment varible? if you wish i can post step by step guide to solve this problem permanently..
My (Python) AppEngine program fetches a web page from another site to scrape data from it -- but it seems like the 3rd party site is blocking requests from Google App Engine! -- I can fetch the page from development mode, but not when deployed.
Can I get around this by using a free proxy of some sort?
Can I use a free proxy to hide the fact that I am requesting from App Engine?
How do I find/choose a proxy? -- what do I need? -- how do I perform the fetch?
Is there anything else I need to know or watch out for?
Probably the correct approach is to request permission from the owners of the site you are scraping.
Even if you use a proxy, there is still a big chance that requests coming through the proxy will end up blocked as well.
Have you considered changing the user-agent?
result = urlfetch.fetch(u,headers = {'User-Agent': "Mozilla/5.0"},allow_truncated=True)
The API will always append "AppEngine-Google;" to the user-agent, but this might work if the restriction is not based on a IP address range.
What you are talking about is a valid bug in app engine sdk. Have a look at http://code.google.com/p/googleappengine/issues/detail?id=544 for bug updates, and workarounds for java and python.
I'm currently having the same problem and i was thinking about this solution (not yet tried) :
-> develop an app that fetch what you want
-> run it locally
-> fetch your local server from your initial
so the proxy is your computer which you know as not blocked
Let me know if it's works !
Well to be fair, if they don't want you doing that then you probably shouldn't. It's not nice to be mean.
But if you really want to do it, the best approach would be creating a simple proxy script and running it on a VPS or some computer with a decent enough connection.
Basically you expose a REST API from your server to your GAE, then the server just makes all the same requests it gets to the target site and returns the output.