Partial Content Get Request - python

I'm trying to get partial content from the website http://dijkstra.cs.bilkent.edu.tr/~cs421/partialt11.txt
Below is my get request:
cmd2 = "GET /{} HTTP/1.0\r\nHost: {}\r\nAuthorization: Basic {}\r\nAccept-Ranges: bytes=1-1800\r\n\r\n".format(path2, host2, token2)
where host2, path2, token2 are defined 100% correctly. matchlist[][] gives the range of bytes I want to recover.
However, no matter what I write into Accept-Ranges: bytes=...-..., I get the same "amount" of the file. It is not the whole file. Plus I get the 200 OK message instead of Partial Content as the status code. Even accept-ranges header is not filled in the response. Why is that? Thanks in advance. Below is the response:
'HTTP/1.1 200 OK\r\n
Date: Sun, 20 Mar 2022 08:29:23 GMT\r\n
Server: Apache/2.4.6 () OpenSSL/1.0.2k-fips PHP/5.6.40 mod_perl/2.0.11 Perl/v5.16.3\r\n
Last-Modified: Mon, 07 Mar 2022 10:38:52 GMT\r\n
ETag: "73a-5d99e786e8eae"\r\n
Accept-Ranges: bytes\r\n
Content-Length: 1850\r\n
Connection: close\r\n
Content-Type: text/plain\r\n\r\n
Modem Noise Killer (alpha version)\n\nWith this circuit diagram, some basic tools including a soldering iron, and\nfour or five components from Radio Shack, you should be able to cut the\nnoise/garbage that appear'

Accept-Ranges is a server response indicating that it will accept partial requests. Client should send Range - i.e., in your case:
Range: bytes=1-1800
However, it's worth noting that the server MAY ignore Range

Related

Caching static files in browser

I am trying to enable caching for static files such as .css and .js.
I am running a WSGI server with Python.
I have tried setting the following headers for caching purposes:
headers.add_header('Cache-control', f'public, max-age={expires.strftime(RFC_1123_DATE)}')
headers.add_header('Expires', expires.strftime(RFC_1123_DATE))
headers.add_header('Last-Modified', generate_last_modified())
Headers recieved in the browser:
HTTP/1.0 200 OK
Date: Tue, 21 Apr 2020 08:06:17 GMT
Server: WSGIServer/0.2 CPython/3.6.9
Content-Encodings:
Content-Type: text/css; charset=UTF-8
Cache-control: public, max-age=Tue, 28 Apr 2020 08:06:17 GMT
Expires: Tue, 28 Apr 2020 08:06:17 GMT
Content-Length: 23399
Last-Modified: Tue, 21 Apr 2020 08:06:1587452777S GMT
Accept-Ranges: bytes
When using Chrome this code works and the files are stored and retrieved from the cache as you would expect. Chrome is using the Expires header and is ignoring the Cache-Control header.
I checked my developer tools and Disable Cache is not enabled. I checked my settings in Firefox in about:config, caching seems to be enabled.
So what am I missing here? Am I missing a header, is an ETAG required, why is Expires working in Chrome but not in Firefox?
I found the solution.
Firefox cache was full, so after emptying the cache it started sending the if-modified-since header again.
Also my server was returning the current time as the last-modified time instead of the actual last modified time.
To fix the issue all I had to do was compare the if-modified-since time from the browser with the last modified time from the file and send a 304 status if nothing changed.

can't get facebook profile pic from flask-OAuth package

I am using this code for logging in to my app. It works fine, but when I try to get the url for profile pic
pic = facebook.get("/me/picture?fields=url")
I get None in response.
TypeError: must be string or buffer, not None
If I try sending this GET request from facebook Graph API, with same access-token, I get the profile link.
I checked out the version message by printing out get("/me") request, I get
connection: keep-alive
etag: "2c59a2baef08156f18055c64eaa9d9822e35e8f1"
pragma: no-cache
cache-control: private, no-cache, no-store, must-revalidate
date: Fri, 23 Jun 2017 14:20:04 GMT
**facebook-api-version: v2.9**
access-control-allow-origin: *
content-type: text/javascript; charset=UTF-8
Which shows that version is automatically converting when i send request from flask-OAuth as well. Then what is it that I am missing ?
As documented in the Graph API, "me/picture?" will return a 302 redirect to the picture image. To get access to the data about the picture, we need to include redirect=false in the end of query. So I get the url by;
pic = facebook.get('/me/picture?redirect=false').data
print 'picture:', pic['data']["url"]

Using 'Requests' python module for POST request, receiving response as if it were GET

So I am trying to make a script that checks a reservation availability of a bus. The starting link for this is https://reservation.pc.gc.ca/.
In the reserve box the following needs to be selected:
Reservation: Day Use (Guided Hikes, Lake O’Hara Bus)
Park: Yoho-Lake O'Hara
Arrival Date: Jun 16
Party Size: 2
When those options are entered, it takes you to the following page: https://reservation.pc.gc.ca/Yoho-LakeO'Hara?Calendar
It is my understanding that if I send a POST request to that second link with the correct data it should return the page I'm looking for
If I look in the dev tools network info when I select the correct parameters the form data is:
__EVENTTARGET:
__EVENTARGUMENT:
__VIEWSTATE: -reallllly long string-
__VIEWSTATEGENERATOR: 8D0E13E6
ctl00$MainContentPlaceHolder$rdbListReservationType: Events
ddlLocations: 213a1bc9-9218-4e98-9a7f-0f209008e437**
ddlArrivalMonth: 2017-06-16
ddlArrivalDay: 19
ddlNights: 1
ddlDepartureMonth:
ddlDepartureDay:
ddlEquipment:
ddlEquipmentSub:
ddlPartySize:2
ctl00$MainContentPlaceHolder$chkExcludeAccessible: on
ctl00$MainContentPlaceHolder$imageButtonCalendar.x: 64
ctl00$MainContentPlaceHolder$imageButtonCalendar.y: 56
So the code I wrote is:
import requests
payload = {
'__EVENTTARGET': '',
'__EVENTARGUMENT': '',
'__VIEWSTATE':-reallly long string-,
'__VIEWSTATEGENERATOR': '8D0E13E6',
'ctl00$MainContentPlaceHolder$rdbListReservationType': 'Events',
'ddlLocations': '213a1bc9-9218-4e98-9a7f-0f209008e437',
'ddlArrivalMonth': 2017-06-16,
'ddlArrivalDay': 19,
'ddlNights': 1,
'ddlDepartureMonth': '',
'ddlDepartureDay': '',
'ddlEquipment': '',
'ddlEquipmentSub': '',
'ddlPartySize': 2,
'ctl00$MainContentPlaceHolder$chkExcludeAccessible': 'on',
'ctl00$MainContentPlaceHolder$imageButtonCalendar.x': 64,
'ctl00$MainContentPlaceHolder$imageButtonCalendar.y': 56
}
r = requests.get(r"https://reservation.pc.gc.ca/Yoho-LakeO'Hara?Calendar", data=payload)
print r.text
r.text ends up just being the second link as if no parameters were entered - as if I just sent a normal GET request to the link. I tried turning the payload values that are integers into strings, I tried removing the empty key:value pairs. No luck. Trying to figure out what I'm missing.
It looks to me like there are 2 things going on:
#errata was correct, and this should be a POST request. You're about halfway there.
What I noticed though is that it seems to post the form data to Home.aspx and the URL that you see after submitting the form is the result of that processing and subsequent redirect.
You might try posting the form data as json to ./Home.aspx.
I found through Postman that this nearly worked, but I had to specify the content-type in order to get the proper results.
If you need to know how to add header and body instructions to the .post() method, it looks like there is a good example (though perhaps slightly outdated) here:
adding header to python request module
Also, fwiw, check out Postman. If you're both inexperienced with requests and with doing it in Python, at least this may lesson some of the burden of trial and error.
You are using
r = requests.get(r"https://reservation.pc.gc.ca/Yoho-LakeO'Hara?Calendar", data=payload)
instead of
r = requests.post(r"https://reservation.pc.gc.ca/Yoho-LakeO'Hara?Calendar", data=payload)
Digging a bit deeper in your problem, I found out that the URL you are calling is actually redirecting to a different URL (returning HTTP response 302):
$ curl -I "https://reservation.pc.gc.ca/Yoho-LakeO'Hara"
HTTP/1.1 302 Found
Cache-Control: private
Content-Length: 77273
Content-Type: text/html; charset=utf-8
Location: https://reservation-pc.fjgc-gccf.gc.ca/GccfLanguage.aspx?lang=eng&ret=https%3a%2f%2freservation.pc.gc.ca%3a443%2fYoho-LakeO%27Hara
Server: Microsoft-IIS/8.0
Set-Cookie: ASP.NET_SessionId=qw4p4e2zxjxx0c2zyq014p45; path=/; secure; HttpOnly
Set-Cookie: CookieLocaleName=en-CA; path=/; secure; HttpOnly
X-Powered-By: ASP.NET
X-Frame-Options: SAMEORIGIN
Date: Wed, 17 May 2017 14:22:53 GMT
However, following the Location from response results also in 302:
$ curl -I "https://reservation-pc.fjgc-gccf.gc.ca/GccfLanguage.aspx?lang=eng&ret=https%3a%2f%2freservation.pc.gc.ca%3a443%2fYoho-LakeO%27Hara"
HTTP/1.1 302 Found
Cache-Control: private
Content-Length: 179
Content-Type: text/html; charset=utf-8
Location: https://reservation.pc.gc.ca:443/Yoho-LakeO'Hara?gccf=true
Server: Microsoft-IIS/8.0
Set-Cookie: ASP.NET_SessionId=rbcuvexfg4fb340ixtcjd1qy; path=/; secure; HttpOnly
Set-Cookie: _gc_lang=eng; domain=.fjgc-gccf.gc.ca; path=/; secure; HttpOnly
X-Powered-By: ASP.NET
X-Frame-Options: SAMEORIGIN
Date: Wed, 17 May 2017 14:24:55 GMT
All this probably results in Requests transforming your POST into GET in the end...

Making Head Requests in Twisted

I am relatively new to using Twisted and I am having trouble returning the content-length header when performing a basic head request. I have set up an asynchronous client already but the trouble comes in this bit of code:
def getHeaders(url):
d = Agent(reactor).request("HEAD", url)
d.addCallbacks(handleResponse, handleError)
return d
def handleResponse(r):
print r.code, r.headers
whenFinished = twisted.internet.defer.Deffered()
r.deliverBody(PrinterClient(whenFinished))
return whenFinished
I am making a head request and passing the url. As indicated in this documentation the content-length header is not stored in self.length, but can be accessed from the self.headers response. The output is returning the status code as expected but the header output is not what is expected. Using "uhttp://www.espn.go.com" as an example it currently returns:
Set-Cookie: SWID=77638195-7A94-4DD0-92A5-348603068D58;
path=/; expires=Fri, 31-Jan-2034 00:50:09 GMT; domain=go.com;
X-Ua-Compatible: IE=edge,chrome=1
Cache-Control: max-age=15
Date: Fri, 31 Jan 2014 00:50:09 GMT
P3P: CP="CAO DSP COR CURa ADMa DEVa TAIa PSAa PSDa IVAi IVDi CONi
OUR SAMo OTRo BUS PHY ONL UNI PUR COM NAV INT DEM CNT STA PRE"
Content-Type: text/html; charset=iso-8859-1
As you can see, no content-length field is returned. If the same request is done in requests then the result will contain the content-length header:
r = requests.head("http://www.espn.go.com")
r.headers
({'content-length': '46133', 'content-encoding': 'gzip'...})
(rest omitted for readability)
What is causing this problem? I am sure it is a simple mistake on my part but I for the life of me cannot figure out what I have done wrong. Any help is appreciated.
http://www.espn.go.com/ returns one response if the client sends an Accept-Encoding: gzip header and another response if it doesn't.
One of the differences between the two responses is the inclusion of the Content-Length header.
If you want to make requests using Agent including Accept-Encoding: gzip then take a look at ContentDecoderAgent or the third-party treq package.
http allows (but does not REQUIRE) entity headers in responses to HEAD requests. The only restriction it places is that 200 responses to HEAD requests MUST NOT include an entity payload. Its up to the origin server to decide which, if any entity headers it would like to include.
In the case of Content-Length, it makes sense for this to be optional for HEAD; if the entity will be computed dynamically (as with compressing/decompressing content), it's better for the server to avoid the extra work of computing the content length when the request won't include the content anyway.

HTTP request and response parameters

I have a situation here regarding POST HTTP request:
(host and links used here are up and running)
POST /inventory-check.cgi HTTP/1.1
Host: www.joes-hardware.com
Accept-Encoding: identity
Content-Length: 7
Content-Type: text/plain
item=563
when I send the above request string to the Host, then server sends me weird stuffs (along with expected result)
HTTP/1.1 200 OK
Date: Thu, 31 Oct 2013 12:07:48 GMT
Server: Apache/2.2.22 (Unix) DAV/2 FrontPage/5.0.2.2635 mod_ssl/2.2.22 OpenSSL/1.0.1c
Transfer-Encoding: chunked
Content-Type: text/html
6b
<HTML><BODY>
<H1>Joe's Hardware Store Inventory Check</H1>
Yes! Item number 56 is in stock!
</BODY></HTML>
0
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<html><head>
<title>501 Method Not Implemented</title>
</head><body>
<h1>Method Not Implemented</h1>
<p>3 to /index.html not supported.<br />
</p>
<hr>
<address>Apache/2.2.22 (Unix) DAV/2 FrontPage/5.0.2.2635 mod_ssl/2.2.22 OpenSSL/1.0.1c Server at totty.temp.veriohosting.com Port 80</address>
</body></html>
I checked the request with urllib module in python and that gives me only the expected output (Here i have omitted response details)
<HTML><BODY>
<H1>Joe's Hardware Store Inventory Check</H1>
Yes! Item number 56 is in stock!
</BODY></HTML>
What am I missing??
Actully I am new to HTTP and have experience in c/c++/python...Any help will be appreciated.. thanks in advance
item=563 is 8 bytes, but you declare a Content-Length of 7. Therefore, the server sees two requests, one which is a valid HTTP request for item 56 and one which is an invalid HTTP request consisting of the character 3 only, and sends you two responses.
Your request lies about the content-length.
That said, it's still a bit odd what the server does with the additional character 3.

Categories

Resources