A site is available but it always responses "Internal Server Error" - python

The code looks like:
url ="http://www.example.com"
for a in range(0,10):
opener = urllib2.build_opener()
urllib2.install_opener(opener)
postdata ="info=123456"+str(a)
urllib2.urlopen(url, postdata)
which just post some data to a specific URL(e.g. http://www.example.com), however, I always get the error message,
Traceback (most recent call last):
File "test.py", line 9, in <module>
urllib2.urlopen(url, postdata)
File "c:\Python26\lib\urllib2.py", line 126, in urlopen
return _opener.open(url, data, timeout)
File "c:\Python26\lib\urllib2.py", line 397, in open
response = meth(req, response)
File "c:\Python26\lib\urllib2.py", line 510, in http_response
'http', request, response, code, msg, hdrs)
File "c:\Python26\lib\urllib2.py", line 435, in error
return self._call_chain(*args)
File "c:\Python26\lib\urllib2.py", line 369, in _call_chain
result = func(*args)
File "c:\Python26\lib\urllib2.py", line 518, in http_error_default
raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)
urllib2.HTTPError: HTTP Error 500: Internal Server Error
I am sure the site is working, so how can I fix the problem? Any help would be greatly appreciated.

You say you're sure the site is working, yet it returns an error. Why don't you try doing whatever you did to determine that the site is working, while running a network logger like wireshark, and then run your test program to see if the two are really issuing the same queries. If not, you've found the problem.
Otherwise, take a look at the server's logs. A much more descriptive error message should be found there. If it's not your server, consider asking whoever does own it.

Some websites don't accept requests from urllib. Try to change the User-Agent.

Related

urlencode gives HTTP Error 403: FORBIDDEN

callurl = "http://vgintnh116:8001/master_data/"
params = urllib.urlencode({'res': 'arovit', 'qfields': 'prod' })
f = urllib2.urlopen(callurl, params)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/u/vgtools2/python-2.6.5/lib/python2.6/urllib2.py", line 126, in urlopen
return _opener.open(url, data, timeout)
File "/u/vgtools2/python-2.6.5/lib/python2.6/urllib2.py", line 397, in open
response = meth(req, response)
File "/u/vgtools2/python-2.6.5/lib/python2.6/urllib2.py", line 510, in http_response
'http', request, response, code, msg, hdrs)
File "/u/vgtools2/python-2.6.5/lib/python2.6/urllib2.py", line 435, in error
return self._call_chain(*args)
File "/u/vgtools2/python-2.6.5/lib/python2.6/urllib2.py", line 369, in _call_chain
result = func(*args)
File "/u/vgtools2/python-2.6.5/lib/python2.6/urllib2.py", line 518, in http_error_default
raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)
urllib2.HTTPError: HTTP Error 403: FORBIDDEN
But it works with -
callurl = "http://vgintnh116:8001/master_data/res=arovit&qfields=prod"
f = urllib2.urlopen(callurl)
Please help. I want to use urlencode to avoid handling spaces and extra characters.
If you pass the second argument (data), request will be POST instead of GET.
Also, dictionaries in Python does not have order. To guarantee the order, you should use sequence.
callurl = "http://vgintnh116:8001/master_data/"
params = urllib.urlencode([('res', 'arovit'), ('qfields', 'prod')])
f = urllib2.urlopen(callurl + params)
From urllib2 documentation:
the HTTP request will be a POST instead of a GET when the data
parameter is provided
In your working example, you are making a GET request.

Proxy Authentication error in Urllib2 (Python 2.7)

[Windows 7 64 bit; Python 2.7]
If I try to use Urllib2, I get this error
Traceback (most recent call last):
File "C:\Users\cYanide\Documents\Python Challenge\1\1.py", line 7, in <module>
response = urllib2.urlopen('http://python.org/')
File "C:\Python27\lib\urllib2.py", line 126, in urlopen
return _opener.open(url, data, timeout)
File "C:\Python27\lib\urllib2.py", line 400, in open
response = meth(req, response)
File "C:\Python27\lib\urllib2.py", line 513, in http_response
'http', request, response, code, msg, hdrs)
File "C:\Python27\lib\urllib2.py", line 438, in error
return self._call_chain(*args)
File "C:\Python27\lib\urllib2.py", line 372, in _call_chain
result = func(*args)
File "C:\Python27\lib\urllib2.py", line 521, in http_error_default
raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)
**urllib2.HTTPError: HTTP Error 407: Proxy Authentication Required**
Now, I'm behind a college proxy which requires authentication so that's probably the reason why this is happening. But isn't Urllib2 supposed to pull the authentication and proxy information from the system settings?
I understand there's some extra code I can insert into my program to 'hardcode' the proxy information in the program but I really don't want to do that unless it's the last resort. It would hinder the portability of the program across computers with different authentication IDs and Passwords in the college.
Your program should see the environment variables which are set in Windows. So have these two environment variables in your Windows.
HTTP_PROXY = http://username:password#proxyserver.domain.com
HTTPS_PROXY = https://username:password#proxyserver.domain.com
And go ahead with executing your script. It should pick up the proper authenticators and proceed with the connection.

why can't I use urllib2.urlopen for wikipedia site? [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Fetch a Wikipedia article with Python
>>> print urllib2.urlopen('http://zh.wikipedia.org/wiki/%E6%AF%9B%E6%B3%BD%E4%B8%9C').read()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 126, in urlopen
return _opener.open(url, data, timeout)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 400, in open
response = meth(req, response)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 513, in http_response
'http', request, response, code, msg, hdrs)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 438, in error
return self._call_chain(*args)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 372, in _call_chain
result = func(*args)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 521, in http_error_default
raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)
urllib2.HTTPError: HTTP Error 403: Forbidden
You need to provide a user-agent else you'll get a 403, like you did.
On Wikimedia wikis, if you don't supply a User-Agent header, or you
supply an empty or generic one, your request will fail with an HTTP
403 error. See our User-Agent policy. Other MediaWiki installations
may have similar policies.
So just add a user-agent to your code and it should work fine.
Try to download the page with wget of cURL.
If you can't then you might have a network problem.
If you can, then Wikipedia might block certain user agents. In that case, use urllib2's add_header to define a custom user agent (to imitate a browser request).

url is not accessible through wget e or script

Hello Guys! I want to access some web page through python script. The url is: http://www.idealo.de/preisvergleich/Shop/27039.html
When I access it through web browser it is OK. But when I want to access it with urllib2:
a = urllib2.urlopen("http://www.idealo.de/preisvergleich/Shop/27039.html")
It gives me the following error:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python2.7/urllib2.py", line 126, in urlopen
return _opener.open(url, data, timeout)
File "/usr/lib/python2.7/urllib2.py", line 406, in open
response = meth(req, response)
File "/usr/lib/python2.7/urllib2.py", line 519, in http_response
'http', request, response, code, msg, hdrs)
File "/usr/lib/python2.7/urllib2.py", line 444, in error
return self._call_chain(*args)
File "/usr/lib/python2.7/urllib2.py", line 378, in _call_chain
result = func(*args)
File "/usr/lib/python2.7/urllib2.py", line 527, in http_error_default
raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)
urllib2.HTTPError: HTTP Error 403: Forbidden
Also I tried to access it with wget:
wget http://www.idealo.de/preisvergleich/Shop/27039.html
The error is:
--2012-04-23 12:42:03-- http://www.idealo.de/preisvergleich/Shop/27039.html
Resolving www.idealo.de (www.idealo.de)... 62.146.49.133
Connecting to www.idealo.de (www.idealo.de)|62.146.49.133|:80... connected.
HTTP request sent, awaiting response... 403 Forbidden
2012-04-23 12:42:03 ERROR 403: Forbidden.
Can anyone explain why it is so? And how can I access it using python?
They're blocking some user agents. If you try with the following:
wget -U "Mozilla/5.0" http://www.idealo.de/preisvergleich/Shop/27039.html
it works. So you have to find the way to fake the user agent in your python code to make it work.
Try this:
import urllib2
opener = urllib2.build_opener()
opener.addheaders = [('User-agent', 'Mozilla/5.0')]
a = opener.open("http://www.idealo.de/preisvergleich/Shop/27039.html")

Head First Programming: Error in example program

I have completed a program that is outlined in chapter 3 of Head First Programming.
Basically, the program searches a website and stores the price on that page. Then depending on which option the user selects, a certain message will be sent to the user's twitter account.
Source code from book's website: http://headfirstlabs.com/books/hfprog/chapter03/page108.py
When I run my program, and run the source code from the book's website, I get the same error.
Here is the error:
Traceback (most recent call last):
File "C:\Users\Krysten\Desktop\Ch3.py", line 28, in <module>
send_to_twitter(get_price())
File "C:\Users\Krysten\Desktop\Ch3.py", line 14, in send_to_twitter
resp = urllib.request.urlopen("http://twitter.com/statuses/update.json", params)
File "C:\Python31\lib\urllib\request.py", line 121, in urlopen
return _opener.open(url, data, timeout)
File "C:\Python31\lib\urllib\request.py", line 356, in open
response = meth(req, response)
File "C:\Python31\lib\urllib\request.py", line 468, in http_response
'http', request, response, code, msg, hdrs)
File "C:\Python31\lib\urllib\request.py", line 394, in error
return self._call_chain(*args)
File "C:\Python31\lib\urllib\request.py", line 328, in _call_chain
result = func(*args)
File "C:\Python31\lib\urllib\request.py", line 476, in http_error_default
raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 401: Unauthorized
Is the error being caused because the book is somewhat outdated and twitter has to be accessed in a different way?
In most of Twitter API, basic authentication is deprecated. Use OAuth API.

Categories

Resources