Python BadStatusLine error on certain DELETE requests - python

I am trying to use the python-rest-client ( http://code.google.com/p/python-rest-client/wiki/Using_Connection ) to perform testing of some RESTful webservices. Since I'm just learning, I've been pointing my tests at the sample services provided at http://www.predic8.com/rest-demo.htm.
I have no problems with creating entries, updating entries, or retrieving entries (POST and GET requests). When I try make a DELETE request, it fails. I can use the Firefox REST Client to perform DELETE requests and they work. I can also make DELETE requests on other services, but I've been driving myself crazy trying to figure out why it doesn't work in this case. I'm using Python 3 with updated Httplib2, but I also tried Python 2.5 so that I could use the python-rest-client with the included version of Httplib2. I see the same problem in either case.
The code is simple, matching the documented use:
from restful_lib import Connection
self.base_url = "http://www.thomas-bayer.com"
self.conn = Connection(self.base_url)
response = self.conn.request_delete('/sqlrest/CUSTOMER/85')
I've looked at the resulting HTTP requests from the browser tool and from my code and I can't see why one works and the other doesn't. This is the trace I receive:
Traceback (most recent call last):
File "/home/fmk/python/rest-client/src/TestExampleService.py", line 68, in test_CRUD
self.Delete()
File "/home/fmk/python/rest-client/src/TestExampleService.py", line 55, in Delete
response = self.conn.request_delete('/sqlrest/CUSTOMER/85')
File "/home/fmk/python/rest-client/src/restful_lib.py", line 64, in request_delete
return self.request(resource, "delete", args, headers=headers)
File "/home/fmk/python/rest-client/src/restful_lib.py", line 138, in request
resp, content = self.h.request("%s://%s%s" % (self.scheme, self.host, '/'.join(request_path)), method.upper(), body=body, headers=headers )
File "/home/fmk/python/rest-client/src/httplib2/__init__.py", line 1175, in request
(response, content) = self._request(conn, authority, uri, request_uri, method, body, headers, redirections, cachekey)
File "/home/fmk/python/rest-client/src/httplib2/__init__.py", line 931, in _request
(response, content) = self._conn_request(conn, request_uri, method, body, headers)
File "/home/fmk/python/rest-client/src/httplib2/__init__.py", line 897, in _conn_request
response = conn.getresponse()
File "/usr/lib/python3.2/http/client.py", line 1046, in getresponse
response.begin()
File "/usr/lib/python3.2/http/client.py", line 346, in begin
version, status, reason = self._read_status()
File "/usr/lib/python3.2/http/client.py", line 316, in _read_status
raise BadStatusLine(line)
http.client.BadStatusLine: ''
What's breaking? What do I do about it? Actually, I'd settle for advice on debugging it. I've changed the domain in my script and pointed it at my own machine so I could view the request. I've viewed/modified the Firefox requests in BurpProxy to make them match my script requests. The modified Burp requests still work and the Python requests still don't.

Apparently the issue is that the server expects there to be some message body for DELETE requests. That's an unusual expectation for a DELETE, but by specifying Content-Length:0 in the headers, I'm able to successfully perform DELETEs.
Somewhere along the way (in python-rest-client or httplib2), the Content-Length header is wiped out if I try to do:
from restful_lib import Connection
self.base_url = "http://www.thomas-bayer.com"
self.conn = Connection(self.base_url)
response = self.conn.request_delete('/sqlrest/CUSTOMER/85', headers={'Content-Length':'0'})
Just to prove the concept, I went to the point in the stack trace where the request was happening:
File "/home/fmk/python/rest-client/src/httplib2/__init__.py", line 897, in _conn_request
response = conn.getresponse()
I printed the headers parameter there to confirm that the content length wasn't there, then I added:
if(method == 'DELETE'):
headers['Content-Length'] = '0'
before the request.
I think the real answer is that the service is wonky, but at least I got to know httplib2 a little better. I've seen some other confused people looking for help with REST and Python, so hopefully I'm not the only one who got something out of this.

The following script correctly produces 404 response from the server:
#!/usr/bin/env python3
import http.client
h = http.client.HTTPConnection('www.thomas-bayer.com', timeout=10)
h.request('DELETE', '/sqlrest/CUSTOMER/85', headers={'Content-Length': 0})
response = h.getresponse()
print(response.status, response.version)
print(response.info())
print(response.read()[:77])
python -V => 3.2
curl -X DELETE http://www.thomas-bayer.com/sqlrest/CUSTOMER/85
curl: (52) Empty reply from server
Status-Line is not optional; HTTP server must return it. Or at least send 411 Length Required response.
curl -H 'Content-length: 0' -X DELETE \
http://www.thomas-bayer.com/sqlrest/CUSTOMER/85
Returns correctly 404.

Related

Python Requests library can't handle redirects for HTTPS URLs when behind a proxy

I think I've discovered a problem with the Requests library's handling of redirects when using HTTPS. As far as I can tell, this is only a problem when the server redirects the Requests client to another HTTPS resource.
I can assure you that the proxy I'm using supports HTTPS and the CONNECT method because I can use it with a browser just fine. I'm using version 2.1.0 of the Requests library which is using 1.7.1 of the urllib3 library.
I watched the transactions in wireshark and I can see the first transaction for https://www.paypal.com/ but I don't see anything for https://www.paypal.com/home. I keep getting timeouts when debugging any deeper in the stack with my debugger so I don't know where to go from here. I'm definitely not seeing the request for /home as a result of the redirect. So it must be erroring out in the code before it gets sent to the proxy.
I want to know if this truly is a bug or if I am doing something wrong. It is really easy to reproduce so long as you have access to a proxy that you can send traffic through. See the code below:
import requests
proxiesDict = {
'http': "http://127.0.0.1:8080",
'https': "http://127.0.0.1:8080"
}
# This fails with "requests.exceptions.ProxyError: Cannot connect to proxy. Socket error: [Errno 111] Connection refused." when it tries to follow the redirect to /home
r = requests.get("https://www.paypal.com/", proxies=proxiesDict)
# This succeeds.
r = requests.get("https://www.paypal.com/home", proxies=proxiesDict)
This also happens when using urllib3 directly. It is probably mainly a bug in urllib3, which Requests uses under the hood, but I'm using the higher level requests library. See below:
proxy = urllib3.proxy_from_url('http://127.0.0.1:8080/')
# This fails with the same error as above.
res = proxy.urlopen('GET', https://www.paypal.com/)
# This succeeds
res = proxy.urlopen('GET', https://www.paypal.com/home)
Here is the traceback when using Requests:
Traceback (most recent call last):
File "tests/downloader_tests.py", line 22, in test_proxy_https_request
r = requests.get("https://www.paypal.com/", proxies=proxiesDict)
File "/usr/local/lib/python2.7/dist-packages/requests/api.py", line 55, in get
return request('get', url, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/requests/api.py", line 44, in request
return session.request(method=method, url=url, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/requests/sessions.py", line 382, in request
resp = self.send(prep, **send_kwargs)
File "/usr/local/lib/python2.7/dist-packages/requests/sessions.py", line 505, in send
history = [resp for resp in gen] if allow_redirects else []
File "/usr/local/lib/python2.7/dist-packages/requests/sessions.py", line 167, in resolve_redirects
allow_redirects=False,
File "/usr/local/lib/python2.7/dist-packages/requests/sessions.py", line 485, in send
r = adapter.send(request, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/requests/adapters.py", line 375, in send
raise ProxyError(e)
requests.exceptions.ProxyError: Cannot connect to proxy. Socket error: [Errno 111] Connection refused.
Update:
The problem only seems to happen with a 302 (Found) redirect not with the normal 301 redirects (Moved Permanently). Also, I noticed that with the Chrome browser, Paypal doesn't return a redirect. I do see the redirect when using Requests - even though I'm borrowing Chrome's User Agent for this experiment. I'm looking for more URLs that return a 302 in order to get more data points.
I need this to work for all URLs or at least understand why I'm seeing this behavior.
This is a bug in urllib3. We're tracking it as urllib3 issue #295.

Error in getting OAuth authentication url

I have the following code in a Python desktop application that authorizes users before using the AppHarbor API. I am following the steps mentioned in the knowledge base and have the following authentication code:
def OnAuthenticate(self, event):
client_id = "" # My App's client id
client_secret_key = "" # My App's secret key
consumer = oauth2.Consumer(key=client_id, secret=client_secret_key)
request_token_url = "https://appharbor.com/user/authorizations/new?client_id="+client_id+"&redirect_uri=http://localhost:8095"
client = oauth2.Client( consumer )
resp, content = client.request(request_token_url, "GET")
...
However, on sending the request, the response is incorrect, this is the error:
client.request(request_token_url, "GET")
TypeError: must be string or buffer, not None
Is there something that I am missing here?
Edit: Following is the stack trace that is thrown up:
resp, content = client.request(request_token_url, "GET")
File "C:\Python27\Lib\site-packages\oauth2-1.5.211-py2.7.egg\oauth2\__init__.py", line 682, in request
connection_type=connection_type)
File "C:\Python27\lib\site-packages\httplib2-0.7.4-py2.7.egg\httplib2\__init__.py", line 1544, in request
(response, content) = self._request(conn, authority, uri, request_uri, method, body, headers, redirections, cachekey)
File "C:\Python27\lib\site-packages\httplib2-0.7.4-py2.7.egg\httplib2\__init__.py", line 1342, in _request
(response, content) = self.request(location, redirect_method, body=body, headers = headers, redirections = redirections - 1)
File "C:\Python27\Lib\site-packages\oauth2-1.5.211-py2.7.egg\oauth2\__init__.py", line 662, in request
req.sign_request(self.method, self.consumer, self.token)
File "C:\Python27\Lib\site-packages\oauth2-1.5.211-py2.7.egg\oauth2\__init__.py", line 493, in sign_request
self['oauth_body_hash'] = base64.b64encode(sha(self.body).digest())
TypeError: must be string or buffer, not None
Upon debugging into the call, I reached httplib2._request function that issued a request
(response, content) = self._conn_request(conn, request_uri, method, body, headers)
This resulted in the following page with error response = 302 (presented in content object)
<html><head><title>Object moved</title></head><body>
<h2>Object moved to <a href="https://appharbor.com/session/new?returnUrl=%2Fuser%
2Fauthorizations%2Fnew%3Foauth_body_hash%3D2jmj7l5rSw0yVb%252FvlWAYkK%252FYBwk%
253D%26oauth_nonce%3D85804131%26oauth_timestamp%3D1340873274%
26oauth_consumer_key%3D26bacb38-ce5a-4699-9342-8e496c16dc49%26oauth_signature_method%
3DHMAC-SHA1%26oauth_version%3D1.0%26redirect_uri%3Dhttp%253A%252F%252Flocalhost%
253A8095%26client_id%3D26bacb38-ce5a-4699-9342-8e496c16dc49%26oauth_signature%
3DXQtYvWIsvML9ZM6Wfs1Wp%252Fy3No8%253D">here</a>.</h2>
</body></html>
The function next removed the body from the content to call another request with body set to None, resulting in the error that was thrown.
(response, content) = self.request(location, redirect_method, body=body, headers = headers, redirections = redirections - 1)
I'm not familiar with the Python lib, but have you considered whether this is because you need to take the user through the three-legged flow Twitter flow and use the url you mention in your question as the authorize_url? Once you have the code, you retrieve the token by POST'ing to this url: https://appharbor.com/tokens.
You might also want to take a closer look at the desktop OAuth .NET sample to get a better understanding of how this works.

Repeated POST request is causing error "socket.error: (99, 'Cannot assign requested address')"

I have a web-service deployed in my box. I want to check the result of this service with various input. Here is the code I am using:
import sys
import httplib
import urllib
apUrl = "someUrl:somePort"
fileName = sys.argv[1]
conn = httplib.HTTPConnection(apUrl)
titlesFile = open(fileName, 'r')
try:
for title in titlesFile:
title = title.strip()
params = urllib.urlencode({'search': 'abcd', 'text': title})
conn.request("POST", "/somePath/", params)
response = conn.getresponse()
data = response.read().strip()
print data+"\t"+title
conn.close()
finally:
titlesFile.close()
This code is giving an error after same number of lines printed (28233). Error message:
Traceback (most recent call last):
File "testService.py", line 19, in ?
conn.request("POST", "/somePath/", params)
File "/usr/lib/python2.4/httplib.py", line 810, in request
self._send_request(method, url, body, headers)
File "/usr/lib/python2.4/httplib.py", line 833, in _send_request
self.endheaders()
File "/usr/lib/python2.4/httplib.py", line 804, in endheaders
self._send_output()
File "/usr/lib/python2.4/httplib.py", line 685, in _send_output
self.send(msg)
File "/usr/lib/python2.4/httplib.py", line 652, in send
self.connect()
File "/usr/lib/python2.4/httplib.py", line 636, in connect
raise socket.error, msg
socket.error: (99, 'Cannot assign requested address')
I am using Python 2.4.3. I am doing conn.close() also. But why is this error being given?
This is not a python problem.
In linux kernel 2.4 the ephemeral port range is from 32768 through 61000. So number of available ports = 61000-32768+1 = 28233. From what i understood, because the web-service in question is quite fast (<5ms actually) thus all the ports get used up. The program has to wait for about a minute or two for the ports to close.
What I did was to count the number of conn.close(). When the number was 28000 wait for 90sec and reset the counter.
BIGYaN identified the problem correctly and you can verify that by calling "netstat -tn" right after the exception occurs. You will see very many connections with state "TIME_WAIT".
The alternative to waiting for port numbers to become available again is to simply use one connection for all requests. You are not required to call conn.close() after each call of conn.request(). You can simply leave the connection open until you are done with your requests.
I too faced similar issue while executing multiple POST statements using python's request library in Spark. To make it worse, I used multiprocessing over each executor to post to a server. So thousands of connections created in seconds that took few seconds each to change the state from TIME_WAIT and release the ports for the next set of connections.
Out of all the available solutions available over the internet that speak of disabling keep-alive, using with request.Session() et al, I found this answer to be working which makes use of 'Connection' : 'close' configuration as header parameter. You may need to put the header content in a separte line outside the post command though.
headers = {
'Connection': 'close'
}
with requests.Session() as session:
response = session.post('https://xx.xxx.xxx.x/xxxxxx/x', headers=headers, files=files, verify=False)
results = response.json()
print results
This is my answer to the similar issue using the above solution.

Python httplib2 Handling Exceptions

I have this very simple code to check if a site is up or down.
import httplib2
h = httplib2.Http()
response, content = h.request("http://www.folksdhhkjd.com")
if response.status == 200:
print "Site is Up"
else:
print "Site is down"
When I enter a valid URL then it properly prints Site is Up because the status is 200 as expected. But, when I enter an invalid URL, should it not print Site is down? Instead it prints an exception something like this
Traceback (most recent call last):
File "C:\Documents and Settings\kripya\Desktop\1.py", line 3, in <module>
response, content = h.request("http://www.folksdhhkjd.com")
File "C:\Python27\lib\site-packages\httplib2\__init__.py", line 1436, in request
(response, content) = self._request(conn, authority, uri, request_uri, method, body, headers, redirections, cachekey)
File "C:\Python27\lib\site-packages\httplib2\__init__.py", line 1188, in _request
(response, content) = self._conn_request(conn, request_uri, method, body, headers)
File "C:\Python27\lib\site-packages\httplib2\__init__.py", line 1129, in _conn_request
raise ServerNotFoundError("Unable to find the server at %s" % conn.host)
ServerNotFoundError: Unable to find the server at www.folksdhhkjd.com
How can I override this exception and print my custom defined "Site is down" message? Any guidance, please?
EDIT
Also one more question... what is the difference between using
h = httplib2.Http('.cache')
and
h = httplib2.Http()
try:
response, content = h.request("http://www.folksdhhkjd.com")
if response.status==200:
print "Site is Up"
except httplib2.ServerNotFoundError:
print "Site is Down"
The issue with your code is that if the host doesn't respond, the request doesn't return ANY status code, and so the library throws an error (I think it's a peculiarity of the library itself, doing some sort of DNS resolution before trying to make the request).
h = httplib2.Http('.cache')
Caches the stuff it retrieves in a directory called .cache so if you do the same request twice it might not have to actually get everything twice; a file starting with a dot is hidden in POSIX filesystems (like on Linux).
h = httplib2.Http()
Doesn't cache it's results, so you have to get everything requested every time.

Python's `urllib2`: Why do I get error 403 when I `urlopen` a Wikipedia page?

I have a strange bug when trying to urlopen a certain page from Wikipedia. This is the page:
http://en.wikipedia.org/wiki/OpenCola_(drink)
This is the shell session:
>>> f = urllib2.urlopen('http://en.wikipedia.org/wiki/OpenCola_(drink)')
Traceback (most recent call last):
File "C:\Program Files\Wing IDE 4.0\src\debug\tserver\_sandbox.py", line 1, in <module>
# Used internally for debug sandbox under external interpreter
File "c:\Python26\Lib\urllib2.py", line 126, in urlopen
return _opener.open(url, data, timeout)
File "c:\Python26\Lib\urllib2.py", line 397, in open
response = meth(req, response)
File "c:\Python26\Lib\urllib2.py", line 510, in http_response
'http', request, response, code, msg, hdrs)
File "c:\Python26\Lib\urllib2.py", line 435, in error
return self._call_chain(*args)
File "c:\Python26\Lib\urllib2.py", line 369, in _call_chain
result = func(*args)
File "c:\Python26\Lib\urllib2.py", line 518, in http_error_default
raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)
urllib2.HTTPError: HTTP Error 403: Forbidden
This happened to me on two different systems in different continents. Does anyone have an idea why this happens?
Wikipedias stance is:
Data retrieval: Bots may not be used
to retrieve bulk content for any use
not directly related to an approved
bot task. This includes dynamically
loading pages from another website,
which may result in the website being
blacklisted and permanently denied
access. If you would like to download
bulk content or mirror a project,
please do so by downloading or hosting
your own copy of our database.
That is why Python is blocked. You're supposed to download data dumps.
Anyways, you can read pages like this in Python 2:
req = urllib2.Request(url, headers={'User-Agent' : "Magic Browser"})
con = urllib2.urlopen( req )
print con.read()
Or in Python 3:
import urllib
req = urllib.request.Request(url, headers={'User-Agent' : "Magic Browser"})
con = urllib.request.urlopen( req )
print(con.read())
To debug this, you'll need to trap that exception.
try:
f = urllib2.urlopen('http://en.wikipedia.org/wiki/OpenCola_(drink)')
except urllib2.HTTPError, e:
print e.fp.read()
When I print the resulting message, it includes the following
"English
Our servers are currently experiencing
a technical problem. This is probably
temporary and should be fixed soon.
Please try again in a few minutes. "
Often times websites will filter access by checking if they are being accessed by a recognised user agent. Wikipedia is just treating your script as a bot and rejecting it. Try spoofing as a browser. The following link takes to you an article to show you how.
http://wolfprojects.altervista.org/changeua.php
Some websites will block access from scripts to avoid 'unnecessary' usage of their servers by reading the headers urllib sends. I don't know and can't imagine why wikipedia does/would do this, but have you tried spoofing your headers?
As Jochen Ritzel mentioned, Wikipedia blocks bots.
However, bots will not get blocked if they use the PHP api.
To get the Wikipedia page titled "love":
http://en.wikipedia.org/w/api.php?format=json&action=query&titles=love&prop=revisions&rvprop=content
I made a workaround for this using php which is not blocked by the site I needed.
it can be accessed like this:
path='http://phillippowers.com/redirects/get.php?
file=http://website_you_need_to_load.com'
req = urllib2.Request(path)
response = urllib2.urlopen(req)
vdata = response.read()
This will return the html code to you

Categories

Resources