How to set up Accept-Encoding to gzip in Python client? - python

This is probably a very newbie question, but I'm reading HTTP Request library and I need to write a code that will make a request to a server and asks for gzip compression (since the server supports gzip compression).
For instance, I have:
import requests
r = requests.get('some_url')
r.json()
I know it has something to do with sending Accept-Encoding: gzip in the header of the HTTP request, but I'm not sure how to do that.

You should be able to set the header by using the header argument.
import requests
headers = {'Accept-Encoding': 'gzip'}
r = requests.get('some_url',headers=headers)
result = r.json()

Related

How to send entire http request from file?

I am writing a small application that interprets the http response of a request. I am writing the application in python. I have not found anything that allows me to send the body + headers stored in one file. I can send certain parts like the headers but not the entire request.
For example, if the request is:
GET /index.html HTTP/1.1
Host: localhost
Cookie: bob=lemon
I want to send this entire request in one go. How would I do this in python?
Check out the python requests library. https://requests.readthedocs.io/en/master/user/quickstart/#make-a-request
For the request above it would look something like
import requests
url = 'http://localhost:[YOUR PORT HERE]/'
cookies = {bob : lemon}
r = requests.get(url, cookies=cookies)
To check if you had a successful request you should get a 200 code from.
r.status_code
Check out the library for more, it is very extensive.

Why is my PDF have the headers['content-type'] as 'text/html; charset=utf-8'?

When downloading a pdf file from it's url I am observing a headers['content-type'] as 'text/html; charset=utf-8' when I need 'application/pdf'. Why is this doing it even when I am setting the Headers content-type?
Code example:
import requests
from requests.auth import HTTPBasicAuth
from pprint import pprint
file = 'url.pdf'
username = 'myusername'
password = 'mypassword'
headers = {'content-type': 'application/pdf', 'User-Agent': 'myUser-Agent'}
pdf_fname = 'new.pdf'
resp = requests.get(
file, headers=headers,
auth=HTTPBasicAuth(username, password),
proxies=proxyDict
)
with open(pdf_fname,'wb') as f:
f.write(resp.content)
pprint(resp.headers['content-type'])
GET requests do not have a content body, so have no need for a Content-Type header. Setting the header there is meaningless. HTTP servers generally will ignore the header on any GET requests they receive.
The header you observe is set by the HTTP server you contacted, and if the data you receive from the server is a PDF file, so a response with an incorrect Content-Type header, then that's entirely on the server, not on your code or on requests. Just ignore the header, or contact the administrators of the site you are contacting to ask them to correct the error.
However, if the server is actually sending you HTML, then you may want to save that HTML somewhere and open it in a browser to see what the server is trying to tell you. It may be a specific error message or login page. We can't tell you if this is the case here or not, we simply don't know how this specific website is designed to operate.
Also see another answer of mine that covers troubleshooting requests HTTP requests which differ from how a web browser is being treated for the same URLs.

python making a post file request

Hi guys I'm developing a Python 3 quart asyncio application and I'm trying to setup a test framework around my http API.
Quart has methods to build json, form and raw requests but no files request. I believe I need build the request packet myself and post a "raw" request.
Using postman I can see that the requests need to look like this:
----------------------------298121837148774387758621\r\n
Content-Disposition: form-data; name="firmware"; filename="image.bin"\r\n
Content-Type: application/octet-stream\r\n
\r\n
\x00#\x00\x10\x91\xa0\t\x08+\xaa\t\x08/\xaa\t\x083\xaa\t\x087\xaa\t\x08;\xaa\t\x08\x00\x00\x00\
....
\xff\xff\xff\xff\xff\xff\xff\xa5\t\tZ\x0c\x00Rotea MLU Main V0.12\x00\x00k%\xea\x06\r\n
----------------------------298121837148774387758621--\r\n
I'd prefer not to encode this myself if there is a method that exists.
Is there an module in Python where I can build the raw packet data and send it with the Quart API?
I have tried using quart requests:
import requests
from .web_server import app as quart_app
test_client = quart_app.test_client()
firmware_image = 'test.bin'
with open(firmware_image, 'rb') as f:
data = f.read()
files = {'firmware': (firmware_image, data , 'application/octet-stream')}
firmware_req = requests.Request('POST', 'http://localhost:5000/firmware_update', files=files).prepare()
response = await test_client.post('/firmware_update',
data=firmware_req.body,
headers={'Content-type': 'multipart/form-data'})
Any suggestions would be greatly appreciated.
Cheers. Mitch.
Python's requests module provides a prepare function that you can use to get the raw data it would send for the request.
import requests
url = 'http://localhost:8080/'
files = {'file' : open('z', 'rb'),
'file2': open('zz', 'rb')}
req = requests.Request('POST',url, files=files)
r = req.prepare()
print(r.headers)
print(r.body)

POST Binary (video) File Using Python Requests

I have a working bit of PHP code that uploads a binary to a remote server I don't have shell access to. The PHP code is:
function upload($uri, $filename) {
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $uri);
curl_setopt($ch, CURLOPT_POST, true);
curl_setopt($ch, CURLOPT_POSTFIELDS, array('file' => '#' . $filename));
curl_exec($ch);
curl_close($ch);
}
This results in a header like:
HTTP/1.1
Host: XXXXXXXXX
Accept: */*
Content-Length: 208045596
Expect: 100-continue
Content-Type: multipart/form-data; boundary=----------------------------360aaccde050
I'm trying to port this over to python using requests and I cannot get the server to accept my POST. I have tried every which way to use requests.post, but the header will not mimic the above.
This will successfully transfer the binary to the server (can tell by watching wireshark) but because the header is not what the server is expecting it gets rejected. The response_code is a 200 though.
files = {'bulk_test2.mov': ('bulk_test2.mov', open('bulk_test2.mov', 'rb'))}
response = requests.post(url, files=files)
The requests code results in a header of:
HTTP/1.1
Host: XXXX
Content-Length: 160
Content-Type: multipart/form-data; boundary=250852d250b24399977f365f35c4e060
Accept-Encoding: gzip, deflate, compress
Accept: */*
User-Agent: python-requests/2.2.1 CPython/2.7.5 Darwin/13.1.0
--250852d250b24399977f365f35c4e060
Content-Disposition: form-data; name="bulk_test2.mov"; filename="bulk_test2.mov"
--250852d250b24399977f365f35c4e060--
Any thoughts on how to make requests match the header that the PHP code generates?
There are two large differences:
The PHP code posts a field named file, your Python code posts a field named bulk_test2.mov.
Your Python code posts an empty file. There Content-Length header is 160 bytes, exactly the amount of space the multipart boundaries and Content-Disposition part header take up. Either the bulk_test2.mov file is indeed empty, or you tried to post the file multiple times without rewinding or reopening the file object.
To fix the first problem, use 'file' as the key in your files dictionary:
files = {'file': open('bulk_test2.mov', 'rb')}
response = requests.post(url, files=files)
I used just the open file object as the value; requests will get the filename directly from the file object in that case.
The second issue is something only you can fix. Make sure you don't reuse files when repeatedly posting. Reopen, or use files['file'].seek(0) to rewind the read position back to the start.
The Expect: 100-continue header is an optional client feature that asks the server to confirm that the body upload can go ahead; it is not a required header and any failure to post your file object is not going to be due to requests using this feature or not. If an HTTP server were to misbehave if you don't use this feature, it is in violation of the HTTP RFCs and you'll have bigger problems on your hands. It certainly won't be something requests can fix for you.
If you do manage to post actual file data, any small variations in Content-Length are due to the (random) boundary being a different length between Python and PHP. This is normal, and not the cause of upload problems, unless your target server is extremely broken. Again, don't try to fix such brokenness with Python.
However, I'd assume you overlooked something much simpler. Perhaps the server blacklists certain User-Agent headers, for example. You could clear some of the default headers requests sets by using a Session object:
files = {'file': open('bulk_test2.mov', 'rb')}
session = requests.Session()
del session.headers['User-Agent']
del session.headers['Accept-Encoding']
response = session.post(url, files=files)
and see if that makes a difference.
If the server fails to handle your request because it fails to handle HTTP persistent connections, you could try to use the session as a context manager to ensure that all session connections are closed:
files = {'file': open('bulk_test2.mov', 'rb')}
with requests.Session() as session:
response = session.post(url, files=files, stream=True)
and you could add:
response.raw.close()
for good measure.

How to send GET request including headers using python

I'm trying to build a website using web.py, which is able to search the mobile.de database (mobile.de is a German car sales website). For this I need to use the mobile.de API and make a GET request to it doing the following (this is an example from the API docs):
GET /1.0.0/ad/search?exteriorColor=BLACK&modificationTime.min=2012-05-04T18:13:51.0Z HTTP/1.0
Host: services.mobile.de
Authorization: QWxhZGluOnNlc2FtIG9wZW4=
Accept: application/xml
(The authorization needs to be my username and password joined together using a colon and then being encoded using Base64.)
So I use urllib2 to do the request as follows:
>>> import base64
>>> import urllib2
>>> headers = {'Authorization': base64.b64encode('myusername:mypassw'), 'Accept': 'application/xml'}
>>> req = urllib2.Request('http://services.mobile.de/1.0.0/ad/search?exteriorColor=BLACK', headers=headers)
And from here I am unsure how to proceed. req appears to be an instance with some methods to get the information in it. But did it actually send the request? And if so, where can I get the response?
All tips are welcome!
You need to call req.read() to call the URL and get the response.
But you'd be better off using the requests library, which is much easier to use.

Categories

Resources