HTTP request and response parameters - python

I have a situation here regarding POST HTTP request:
(host and links used here are up and running)
POST /inventory-check.cgi HTTP/1.1
Host: www.joes-hardware.com
Accept-Encoding: identity
Content-Length: 7
Content-Type: text/plain
item=563
when I send the above request string to the Host, then server sends me weird stuffs (along with expected result)
HTTP/1.1 200 OK
Date: Thu, 31 Oct 2013 12:07:48 GMT
Server: Apache/2.2.22 (Unix) DAV/2 FrontPage/5.0.2.2635 mod_ssl/2.2.22 OpenSSL/1.0.1c
Transfer-Encoding: chunked
Content-Type: text/html
6b
<HTML><BODY>
<H1>Joe's Hardware Store Inventory Check</H1>
Yes! Item number 56 is in stock!
</BODY></HTML>
0
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<html><head>
<title>501 Method Not Implemented</title>
</head><body>
<h1>Method Not Implemented</h1>
<p>3 to /index.html not supported.<br />
</p>
<hr>
<address>Apache/2.2.22 (Unix) DAV/2 FrontPage/5.0.2.2635 mod_ssl/2.2.22 OpenSSL/1.0.1c Server at totty.temp.veriohosting.com Port 80</address>
</body></html>
I checked the request with urllib module in python and that gives me only the expected output (Here i have omitted response details)
<HTML><BODY>
<H1>Joe's Hardware Store Inventory Check</H1>
Yes! Item number 56 is in stock!
</BODY></HTML>
What am I missing??
Actully I am new to HTTP and have experience in c/c++/python...Any help will be appreciated.. thanks in advance

item=563 is 8 bytes, but you declare a Content-Length of 7. Therefore, the server sees two requests, one which is a valid HTTP request for item 56 and one which is an invalid HTTP request consisting of the character 3 only, and sends you two responses.

Your request lies about the content-length.
That said, it's still a bit odd what the server does with the additional character 3.

Related

Partial Content Get Request

I'm trying to get partial content from the website http://dijkstra.cs.bilkent.edu.tr/~cs421/partialt11.txt
Below is my get request:
cmd2 = "GET /{} HTTP/1.0\r\nHost: {}\r\nAuthorization: Basic {}\r\nAccept-Ranges: bytes=1-1800\r\n\r\n".format(path2, host2, token2)
where host2, path2, token2 are defined 100% correctly. matchlist[][] gives the range of bytes I want to recover.
However, no matter what I write into Accept-Ranges: bytes=...-..., I get the same "amount" of the file. It is not the whole file. Plus I get the 200 OK message instead of Partial Content as the status code. Even accept-ranges header is not filled in the response. Why is that? Thanks in advance. Below is the response:
'HTTP/1.1 200 OK\r\n
Date: Sun, 20 Mar 2022 08:29:23 GMT\r\n
Server: Apache/2.4.6 () OpenSSL/1.0.2k-fips PHP/5.6.40 mod_perl/2.0.11 Perl/v5.16.3\r\n
Last-Modified: Mon, 07 Mar 2022 10:38:52 GMT\r\n
ETag: "73a-5d99e786e8eae"\r\n
Accept-Ranges: bytes\r\n
Content-Length: 1850\r\n
Connection: close\r\n
Content-Type: text/plain\r\n\r\n
Modem Noise Killer (alpha version)\n\nWith this circuit diagram, some basic tools including a soldering iron, and\nfour or five components from Radio Shack, you should be able to cut the\nnoise/garbage that appear'
Accept-Ranges is a server response indicating that it will accept partial requests. Client should send Range - i.e., in your case:
Range: bytes=1-1800
However, it's worth noting that the server MAY ignore Range

Content-type is blank in the headers of some requests

I've ran this queries millions (yes, millions) of times before with other URLs. However, I'm getting a KeyError when checking the content-type of the following webpage.
Code snippet:
r = requests.get("http://health.usnews.com/health-news/articles/2014/10/15/limiting-malpractice-claims-may-not-curb-costly-medical-tests", timeout=10, headers=headers)
if "text/html" in r.headers["content-type"]:
Error:
KeyError: 'content-type'
I checked the content of r.headers and it's:
CaseInsensitiveDict({'date': 'Fri, 20 May 2016 06:44:19 GMT', 'content-length': '0', 'connection': 'keep-alive', 'server': 'BigIP'})
What could be causing this?
Not all servers set a Content-Type header. Use .get() to retrieve a default if it is missing:
if "text/html" in r.headers.get("content-type", ''):
For the URL you gave I can't reproduce this:
$ curl -s -D - -o /dev/null "http://health.usnews.com/health-news/articles/2014/10/15/limiting-malpractice-claims-may-not-curb-costly-medical-tests"
HTTP/1.1 200 OK
Server: Apache-Coyote/1.1
X-Powered-By: Brightspot
Content-Type: text/html;charset=UTF-8
Date: Fri, 20 May 2016 06:45:12 GMT
Set-Cookie: JSESSIONID=A0C35776067AABCF9E029150C64D8D91; Path=/; HttpOnly
Transfer-Encoding: chunked
but if the header is missing from your response then it usually isn't Python's fault, and certainly not your code's fault.
It could be you encountered a buggy server or temporary glitch, or the server you contacted doesn't like you for one reason or another. Your sample response headers have the content-length set to 0 as well, for example, indicating there was no content to serve at all.
The server that gave you that response is BigIP, a load balancer / network router product from a company called F5. Hard to say exactly what kind (they have global routing servers as well as per-datacenter or cluster load balancers). It could be that the load balancer ran out of back-end servers to serve the request, doesn't have servers in your region, or the load balancer decided that you are sending too many requests and refuses to give you more than just this response, or it is the wrong phase of the moon and Jupiter is in retrograde and it threw a tantrum. We can't know!
But, just in case this happens again, do also look at the response status code. It may well be a 4xx or 5xx status code indicating that something was wrong with your request or with the server. For example, a 429 status code response would indicate you made too many requests in a short amount of time and should slow down. Test for it by checking r.status_code.

Http version in response BaseHttp module python

I am running a server with BaseHttp with python.
I receive a request from a client which is based on HTTP/ 1.1
However, when I am answering the client back with my response, the client refuses to accept my response.
On further analysis I saw that the HTTP Version I am sending is HTTP/1.0.
However , I dont know how is it set.
The error at the client side is.
Original message: not well-formed (invalid token): line 2, column 4
Response
HTTP/1.0 200 OK
Server: BaseHTTP/0.3 Python/2.7.5
Date: Wed, 30 Jul 2014 15:11:42 GMT
Content-type: application/soap+xml; charset=utf-8
Content-length: 823
I am setting the header in the following way:
self.send_response(200)
self.send_header("Content-type", "application/soap+xml; charset=utf-8")
self.send_header("Content-length", content_length)
self.end_headers()
Set the protocol_version attribute on your handler class:
handler.protocol_version = 'HTTP/1.1'
This requires that you set a Content-Length header, which you already do.

Making Head Requests in Twisted

I am relatively new to using Twisted and I am having trouble returning the content-length header when performing a basic head request. I have set up an asynchronous client already but the trouble comes in this bit of code:
def getHeaders(url):
d = Agent(reactor).request("HEAD", url)
d.addCallbacks(handleResponse, handleError)
return d
def handleResponse(r):
print r.code, r.headers
whenFinished = twisted.internet.defer.Deffered()
r.deliverBody(PrinterClient(whenFinished))
return whenFinished
I am making a head request and passing the url. As indicated in this documentation the content-length header is not stored in self.length, but can be accessed from the self.headers response. The output is returning the status code as expected but the header output is not what is expected. Using "uhttp://www.espn.go.com" as an example it currently returns:
Set-Cookie: SWID=77638195-7A94-4DD0-92A5-348603068D58;
path=/; expires=Fri, 31-Jan-2034 00:50:09 GMT; domain=go.com;
X-Ua-Compatible: IE=edge,chrome=1
Cache-Control: max-age=15
Date: Fri, 31 Jan 2014 00:50:09 GMT
P3P: CP="CAO DSP COR CURa ADMa DEVa TAIa PSAa PSDa IVAi IVDi CONi
OUR SAMo OTRo BUS PHY ONL UNI PUR COM NAV INT DEM CNT STA PRE"
Content-Type: text/html; charset=iso-8859-1
As you can see, no content-length field is returned. If the same request is done in requests then the result will contain the content-length header:
r = requests.head("http://www.espn.go.com")
r.headers
({'content-length': '46133', 'content-encoding': 'gzip'...})
(rest omitted for readability)
What is causing this problem? I am sure it is a simple mistake on my part but I for the life of me cannot figure out what I have done wrong. Any help is appreciated.
http://www.espn.go.com/ returns one response if the client sends an Accept-Encoding: gzip header and another response if it doesn't.
One of the differences between the two responses is the inclusion of the Content-Length header.
If you want to make requests using Agent including Accept-Encoding: gzip then take a look at ContentDecoderAgent or the third-party treq package.
http allows (but does not REQUIRE) entity headers in responses to HEAD requests. The only restriction it places is that 200 responses to HEAD requests MUST NOT include an entity payload. Its up to the origin server to decide which, if any entity headers it would like to include.
In the case of Content-Length, it makes sense for this to be optional for HEAD; if the entity will be computed dynamically (as with compressing/decompressing content), it's better for the server to avoid the extra work of computing the content length when the request won't include the content anyway.

OAuth and the YouTube API

I am trying to use the YouTube services with OAuth. I have been able to obtain request tokens, authorize them and transform them into access tokens.
Now I am trying to use those tokens to actually do requests to the YouTube services. For instance I am trying to add a video to a playlist. Hence I am making a POST request to
https://gdata.youtube.com/feeds/api/playlists/XXXXXXXXXXXX
sending a body of
<?xml version="1.0" encoding="UTF-8"?>
<entry xmlns="http://www.w3.org/2005/Atom"
xmlns:yt="http://gdata.youtube.com/schemas/2007">
<id>XXXXXXXXX</id>
</entry>
and with the headers
Gdata-version: 2
Content-type: application/atom+xml
Authorization: OAuth oauth_consumer_key="www.xxxxx.xx",
oauth_nonce="xxxxxxxxxxxxxxxxxxxxxxxxx",
oauth_signature="XXXXXXXXXXXXXXXXXXX",
oauth_signature_method="HMAC-SHA1",
oauth_timestamp="1310985770",
oauth_token="1%2FXXXXXXXXXXXXXXXXXXXX",
oauth_version="1.0"
X-gdata-key: key="XXXXXXXXXXXXXXXXXXXXXXXXX"
plus some standard headers (Host and Content-Length) which are added by urllib2 (I am using Python) at the moment of the request.
Unfortunately, I get an Error 401: Unknown authorization header, and the headers of the response are
X-GData-User-Country: IT
WWW-Authenticate: GoogleLogin service="youtube",realm="https://www.google.com/youtube/accounts/ClientLogin"
Content-Type: text/html; charset=UTF-8
Content-Length: 179
Date: Mon, 18 Jul 2011 10:42:50 GMT
Expires: Mon, 18 Jul 2011 10:42:50 GMT
Cache-Control: private, max-age=0
X-Content-Type-Options: nosniff
X-Frame-Options: SAMEORIGIN
X-XSS-Protection: 1; mode=block
Server: GSE
Connection: close
In particular I do not know how to interpret the WWW-Authenticate header, whose realm hints to ClientLogin.
I have also tried to play with the OAuth Playground and the Authorization header sent by that site looks exactly like mine, except for the order of the fields. Still, on the plyaground everything works. Well, almost: I get an error telling that a Developer key is missing, but that is reasonable since there is no way to add one on the playground. Still, I go past the Error 401.
I have also tried to manually copy the Authorization header from there, and I got an Error 400: Bad request.
What am I doing wrong?
Turns out the problem was the newline before xmlns:yt. I was able to debug this using ncat, as suggeested here, and inspecting the full response.
i would suggest using the oauth python module, because it much more simple and takes care of the auth headers :) https://github.com/simplegeo/python-oauth2, as a solution i suggest you encode your parameters with 'utf-8' , i had a similar problem, and the solution was that google was expecting utf-8 encoded strings

Categories

Resources