[Windows 7 64 bit; Python 2.7]
If I try to use Urllib2, I get this error
Traceback (most recent call last):
File "C:\Users\cYanide\Documents\Python Challenge\1\1.py", line 7, in <module>
response = urllib2.urlopen('http://python.org/')
File "C:\Python27\lib\urllib2.py", line 126, in urlopen
return _opener.open(url, data, timeout)
File "C:\Python27\lib\urllib2.py", line 400, in open
response = meth(req, response)
File "C:\Python27\lib\urllib2.py", line 513, in http_response
'http', request, response, code, msg, hdrs)
File "C:\Python27\lib\urllib2.py", line 438, in error
return self._call_chain(*args)
File "C:\Python27\lib\urllib2.py", line 372, in _call_chain
result = func(*args)
File "C:\Python27\lib\urllib2.py", line 521, in http_error_default
raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)
**urllib2.HTTPError: HTTP Error 407: Proxy Authentication Required**
Now, I'm behind a college proxy which requires authentication so that's probably the reason why this is happening. But isn't Urllib2 supposed to pull the authentication and proxy information from the system settings?
I understand there's some extra code I can insert into my program to 'hardcode' the proxy information in the program but I really don't want to do that unless it's the last resort. It would hinder the portability of the program across computers with different authentication IDs and Passwords in the college.
Your program should see the environment variables which are set in Windows. So have these two environment variables in your Windows.
HTTP_PROXY = http://username:password#proxyserver.domain.com
HTTPS_PROXY = https://username:password#proxyserver.domain.com
And go ahead with executing your script. It should pick up the proper authenticators and proceed with the connection.
Related
I'm running a Python program that downloads one single video from YouTube every hour using Pytube onto my local machine and I have no problems with it at all. However, when I deploy the same program onto my Linux server, I receive the following error:
urllib.error.HTTPError: HTTP Error 429: Too Many Requests
Does anyone know why this is or what I can do to fix it? I'm using a Linode Shared CPU running Ubuntu 21.10.
Here is the full error:
Traceback (most recent call last):
File "/root/main.py", line 137, in <module>
main_function()
File "/root/main.py", line 131, in main_function
get_screenshot()
File "/root/main.py", line 60, in get_screenshot
video = youtube.streams.get_highest_resolution()
File "/usr/local/lib/python3.9/dist-packages/pytube/__main__.py", line 291, in streams
self.check_availability()
File "/usr/local/lib/python3.9/dist-packages/pytube/__main__.py", line 206, in check_availability
status, messages = extract.playability_status(self.watch_html)
File "/usr/local/lib/python3.9/dist-packages/pytube/__main__.py", line 98, in watch_html
self._watch_html = request.get(url=self.watch_url)
File "/usr/local/lib/python3.9/dist-packages/pytube/request.py", line 53, in get
response = _execute_request(url, headers=extra_headers, timeout=timeout)
File "/usr/local/lib/python3.9/dist-packages/pytube/request.py", line 37, in _execute_request
return urlopen(request, timeout=timeout) # nosec
File "/usr/lib/python3.9/urllib/request.py", line 214, in urlopen
return opener.open(url, data, timeout)
File "/usr/lib/python3.9/urllib/request.py", line 523, in open
response = meth(req, response)
File "/usr/lib/python3.9/urllib/request.py", line 632, in http_response
response = self.parent.error(
File "/usr/lib/python3.9/urllib/request.py", line 555, in error
result = self._call_chain(*args)
File "/usr/lib/python3.9/urllib/request.py", line 494, in _call_chain
result = func(*args)
File "/usr/lib/python3.9/urllib/request.py", line 747, in http_error_302
return self.parent.open(new, timeout=req.timeout)
File "/usr/lib/python3.9/urllib/request.py", line 523, in open
response = meth(req, response)
File "/usr/lib/python3.9/urllib/request.py", line 632, in http_response
response = self.parent.error(
File "/usr/lib/python3.9/urllib/request.py", line 561, in error
return self._call_chain(*args)
File "/usr/lib/python3.9/urllib/request.py", line 494, in _call_chain
result = func(*args)
File "/usr/lib/python3.9/urllib/request.py", line 641, in http_error_default
raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 429: Too Many Requests
From your post and comment I would conclude that Youtube blocks request coming from linode (and probably also most cloud providers).
Unfortunately there is no easy solution, but here are a few ideas:
1.a. Cloud providers (Linode) with Proxy
Keep using linode and use a proxy (this is fairly easy using Python's requests module). There are many free proxies, but often time the latency is quite significant. Also, sometimes connections timeout or aren't processed for some other reason.
You can use pools of proxies in order to have alternatives in case of high latency or other problems.
Also bear in mind using (public) proxies can be is a security hazard. So I'd be extremely careful with the information sent (i.e. no passwords etc).
1.b. GCP
Google has its own cloud provider, GCP. The prices are typically somewhat higher than their counterparts. But it might just be that Youtube doesn't block GCP (being both Google). They might be on some shared network. So that might be worth testing.
2. Bare Metal
As you pointed, your home machine is able to run your script. So you could let your home system run as a home server.
Alternatively, you could also use a singleboard computer (e.g. Raspberry Pi): these have typically a very low power consumption, they're cheap, can run 24/7 and support most software typically required (Python, Docker, etc.).
This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Fetch a Wikipedia article with Python
>>> print urllib2.urlopen('http://zh.wikipedia.org/wiki/%E6%AF%9B%E6%B3%BD%E4%B8%9C').read()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 126, in urlopen
return _opener.open(url, data, timeout)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 400, in open
response = meth(req, response)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 513, in http_response
'http', request, response, code, msg, hdrs)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 438, in error
return self._call_chain(*args)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 372, in _call_chain
result = func(*args)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 521, in http_error_default
raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)
urllib2.HTTPError: HTTP Error 403: Forbidden
You need to provide a user-agent else you'll get a 403, like you did.
On Wikimedia wikis, if you don't supply a User-Agent header, or you
supply an empty or generic one, your request will fail with an HTTP
403 error. See our User-Agent policy. Other MediaWiki installations
may have similar policies.
So just add a user-agent to your code and it should work fine.
Try to download the page with wget of cURL.
If you can't then you might have a network problem.
If you can, then Wikipedia might block certain user agents. In that case, use urllib2's add_header to define a custom user agent (to imitate a browser request).
Hello Guys! I want to access some web page through python script. The url is: http://www.idealo.de/preisvergleich/Shop/27039.html
When I access it through web browser it is OK. But when I want to access it with urllib2:
a = urllib2.urlopen("http://www.idealo.de/preisvergleich/Shop/27039.html")
It gives me the following error:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python2.7/urllib2.py", line 126, in urlopen
return _opener.open(url, data, timeout)
File "/usr/lib/python2.7/urllib2.py", line 406, in open
response = meth(req, response)
File "/usr/lib/python2.7/urllib2.py", line 519, in http_response
'http', request, response, code, msg, hdrs)
File "/usr/lib/python2.7/urllib2.py", line 444, in error
return self._call_chain(*args)
File "/usr/lib/python2.7/urllib2.py", line 378, in _call_chain
result = func(*args)
File "/usr/lib/python2.7/urllib2.py", line 527, in http_error_default
raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)
urllib2.HTTPError: HTTP Error 403: Forbidden
Also I tried to access it with wget:
wget http://www.idealo.de/preisvergleich/Shop/27039.html
The error is:
--2012-04-23 12:42:03-- http://www.idealo.de/preisvergleich/Shop/27039.html
Resolving www.idealo.de (www.idealo.de)... 62.146.49.133
Connecting to www.idealo.de (www.idealo.de)|62.146.49.133|:80... connected.
HTTP request sent, awaiting response... 403 Forbidden
2012-04-23 12:42:03 ERROR 403: Forbidden.
Can anyone explain why it is so? And how can I access it using python?
They're blocking some user agents. If you try with the following:
wget -U "Mozilla/5.0" http://www.idealo.de/preisvergleich/Shop/27039.html
it works. So you have to find the way to fake the user agent in your python code to make it work.
Try this:
import urllib2
opener = urllib2.build_opener()
opener.addheaders = [('User-agent', 'Mozilla/5.0')]
a = opener.open("http://www.idealo.de/preisvergleich/Shop/27039.html")
The code looks like:
url ="http://www.example.com"
for a in range(0,10):
opener = urllib2.build_opener()
urllib2.install_opener(opener)
postdata ="info=123456"+str(a)
urllib2.urlopen(url, postdata)
which just post some data to a specific URL(e.g. http://www.example.com), however, I always get the error message,
Traceback (most recent call last):
File "test.py", line 9, in <module>
urllib2.urlopen(url, postdata)
File "c:\Python26\lib\urllib2.py", line 126, in urlopen
return _opener.open(url, data, timeout)
File "c:\Python26\lib\urllib2.py", line 397, in open
response = meth(req, response)
File "c:\Python26\lib\urllib2.py", line 510, in http_response
'http', request, response, code, msg, hdrs)
File "c:\Python26\lib\urllib2.py", line 435, in error
return self._call_chain(*args)
File "c:\Python26\lib\urllib2.py", line 369, in _call_chain
result = func(*args)
File "c:\Python26\lib\urllib2.py", line 518, in http_error_default
raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)
urllib2.HTTPError: HTTP Error 500: Internal Server Error
I am sure the site is working, so how can I fix the problem? Any help would be greatly appreciated.
You say you're sure the site is working, yet it returns an error. Why don't you try doing whatever you did to determine that the site is working, while running a network logger like wireshark, and then run your test program to see if the two are really issuing the same queries. If not, you've found the problem.
Otherwise, take a look at the server's logs. A much more descriptive error message should be found there. If it's not your server, consider asking whoever does own it.
Some websites don't accept requests from urllib. Try to change the User-Agent.
I have completed a program that is outlined in chapter 3 of Head First Programming.
Basically, the program searches a website and stores the price on that page. Then depending on which option the user selects, a certain message will be sent to the user's twitter account.
Source code from book's website: http://headfirstlabs.com/books/hfprog/chapter03/page108.py
When I run my program, and run the source code from the book's website, I get the same error.
Here is the error:
Traceback (most recent call last):
File "C:\Users\Krysten\Desktop\Ch3.py", line 28, in <module>
send_to_twitter(get_price())
File "C:\Users\Krysten\Desktop\Ch3.py", line 14, in send_to_twitter
resp = urllib.request.urlopen("http://twitter.com/statuses/update.json", params)
File "C:\Python31\lib\urllib\request.py", line 121, in urlopen
return _opener.open(url, data, timeout)
File "C:\Python31\lib\urllib\request.py", line 356, in open
response = meth(req, response)
File "C:\Python31\lib\urllib\request.py", line 468, in http_response
'http', request, response, code, msg, hdrs)
File "C:\Python31\lib\urllib\request.py", line 394, in error
return self._call_chain(*args)
File "C:\Python31\lib\urllib\request.py", line 328, in _call_chain
result = func(*args)
File "C:\Python31\lib\urllib\request.py", line 476, in http_error_default
raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 401: Unauthorized
Is the error being caused because the book is somewhat outdated and twitter has to be accessed in a different way?
In most of Twitter API, basic authentication is deprecated. Use OAuth API.