requests python lib adding in Accept header - python

so I have made a request to a server with pythons request library. The code looks like this (it uses an adapter so it needs to match a certain pattern)
def getRequest(self, url, header):
"""
implementation of a get request
"""
conn = requests.get(url, headers=header)
newBody = conn.content
newHeader = conn.headers
newHeader['status'] = conn.status_code
response = {"Headers" : newHeader, "Body" : newBody.decode('utf-8')}
self._huddleErrors.handleResponseError(response)
return response
the header parameter I am parsing in is this
{'Authorization': 'OAuth2 handsOffMyToken', 'Accept': 'application/vnd.huddle.data+json'}
however I am getting an xml response back from the server. After checking fiddler I see the request being sent is:
Accept-Encoding: identity
Accept: */*
Host: api.huddle.dev
Authorization: OAuth2 HandsOffMyToken
Accept: application/vnd.huddle.data+json
Accept-Encoding: gzip, deflate, compress
User-Agent: python-requests/1.2.3 CPython/3.3.2 Windows/2008ServerR2
As we can see there are 2 Accept Headers! The requests library is adding in this Accept:* / * header which is throwing off the server. Does anyone know how I can stop this?

As stated in comments it seems this is a problem with the requests library in 3.3. In requests there are default headers (which can be found in the utils folder). When you don't specify your own headers these default headers are used. However if you specify your own headers instead requests tries to merge the headers together to make sure you have all the headers you need.
The problem shows its self in def request() method in sessions.py. Instead of merging all the headers it puts in its headers and then chucks in yours. For now I have just done the dirty hack of removing the accept header from the default headers found in util

Related

python-requests does not grab JSESSIONID

I'm trying to scrape a website using requests. However, a post method that I need to use requires the headers below. I can fill everything apart from the JSESSION ID. The only way I can get this post method to work is if I manually go into the browser, start a session and inspect the page to retrieve the JSESSIONID.
I am looking for a way to retrieve this JSESSIONID using the requests package in python. I saw some suggestions for using a session. However, the requests session does not grab the JSESSIONID, which is the only thing I need. How should I go about a possible solution?
Host:
Connection:
Content-Length:
Accept:
X-Requested-With:
User-Agent:
Content-Type:
Sec-GPC:
Origin:
Sec-Fetch-Site:
Sec-Fetch-Mode:
Sec-Fetch-Dest:
Referer:
Accept-Encoding:
Accept-Language:
Cookie: _1aa19=; JSESSIONID=;
What I currently tried is use a session from the requests package, which should store the cookies of the session. However, After I use a .get method requests.cookies does not have the JSESSIONID stored
query = 'Example%20query'
s = requests.Session()
suggest = s.get(f'https://www.examplewebsite.nl/api_route/suggest?query={query}').json()
s.cookies
JSESSIONID is generated when you go to https://www.examplewebsite.nl page first.
import requests
query = 'Example%20query'
s = requests.Session()
s.get('https://www.examplewebsite.nl')
suggest = s.get(f'https://www.examplewebsite.nl/api_route/suggest?query={query}').json()
print(s.cookies.get("JSESSIONID"))

How to get the request header sent by requests.get()?

I'd like to know what a HTTP GET request header is sent by requests.get() from the client side.
requests.get('http://localhost:9000')
The request header sent by the above python command monitored by netcat is the following. However, I don't find a way to directly monitor the HTTP GET request header sent at the client side.
GET / HTTP/1.1
Host: localhost:9000
Connection: keep-alive
Accept-Encoding: gzip, deflate
Accept: */*
User-Agent: python-requests/2.18.4
import requests
import sys
req = requests.Request('GET', 'localhost:9000')
print req.headers
prepared = req.prepare()
s = requests.Session()
page = s.send(prepared)
The request header sent by the above python command monitored by netcat is the following.
GET / HTTP/1.1
Host: localhost:9000
Accept-Encoding: identity
Also, req.headers can be used to monitor the header. It is not exactly the same as the header sent as it does not contain the Accept-Encoding header. Also, the HTTP GET request header sent by this way is also different from that of the first way.
$ ./main.py
{}
Is there a method to directly monitor what HTTP GET header is sent in the first way?
Also, why the requests send by the two methods are not the same? Isn't it better to make them consistent to avoid possible confusion?
I'd like to know what a HTTP GET request header is sent by requests.get() from the client side
If I got it right, you want to view headers, that were actually sent by requests.get().
You can access them by using .request.headers attributes:
import requests
r = requests.get("http://example.com")
print(r.request.headers)

Why does the requests library add extra headers to the ones I set?

I am trying to do a post request on python utilizing the requests library, when I set my custom headers which are the following:
User-Agent: MBAM-C
Content-Type: application/json
Authorization: True
Content-Length: 619
Connection: Close
However when It send the request with the custom headers it adds its own headers which give a bad request response from the server..
User-Agent: MBAM-C
Accept-Encoding: gzip, deflate
Accept: */*
Connection: Close
Content-Type: application/json
Authorization: True
Content-Length: 559
It is due to the design goals of the requests project.
This behavior is documented here. You may want to use a lower level library, if it is problematic for the library to be correcting content length or adding desirable headers. Requests bills itself as: "an elegant and simple HTTP library for Python, built for human beings." and a part of that is advertising that it can accept compressed content and all mime types.
Note: Custom headers are given less precedence than more specific sources of information. For instance:
Authorization headers set with headers= will be overridden if credentials are specified in .netrc, which in turn will be overridden by the auth= parameter.
Authorization headers will be removed if you get redirected off-host.
Proxy-Authorization headers will be overridden by proxy credentials provided in the URL.
Content-Length headers will be overridden when we can determine the length of the content.

use Python urllib.urlopen to send data accurately

I want to use Python to simulate a login action which acquires some message sending via HTTP GET method. So I write something like this
from urllib.request import urlopen, Request
urlopen(Request(URL, data=data_for_verify.encode(), method='GET'))
The problem is, it doesn't do the same as a real login action which like this (scratch from Wireshark, HTTP printable data only)
GET /rjsdcctrl?mac%3dfcaa14ec56f3%26ipv4%3d1681312010%26ipv61%3d0%26ipv62%3d0%26ipv63%3d0%26ipv64%3d0%26product%3d33554432%26mainver%3d67108864%26subver%3d1610612736 HTTP/1.1
Accept: text/*
User-Agent: HttpCall
Accept-Language: en-us
Host: 10.0.6.251
Cache-Control: no-cache
And what my program did is:
GET / HTTP/1.1
Accept-Encoding: identity
Content-Type: application/x-www-form-urlencoded
Host: 10.0.6.251:80
User-Agent: Python-urllib/3.4
Connection: close
Content-Length: 161
rjsdcctrl?mac%3dfcaa14ec56f3%26ipv4%3d1681312010%26ipv61%3d0%26ipv62%3d0%26ipv63%3d0%26ipv64%3d0%26product%3d33554432%26mainver%3d67108864%26subver%3d1610612736
A real login action have the header comes first, and do not have the line GET / HTTP /1.1
or it just a header without content, and the first line GET contain the real request message. How can I simulate that using Python's urllib?
I use Python 3.4
You shouldn't use the data parameter if you don't want to send data as part of the body. Append the value to the URL:
full_url = "%s?%s" % (URL, data_for_verify.encode())
urlopen(full_url)
To extend #Daniel's answer, you can make use of urllib.urlencode method to prepare your get parameter string and also headers keyword argument to override default headers. So for example:
import urllib
url = 'http://www.example.com/'
data = {
'key1': 'value1',
'key2': 'value2',
'key3': 'value3'
}
headers = {
'Overriden-Header': 'Overriden Header Value'
}
## Update the url and make the actual requests
url = '%s?%s' % (url, urllib.urlencode(data))
response = urllib.urlopen(url, headers=headers)

How to send/receive URL parameters in a HTTP POST correctly?

I am using cakephp 2.4.5. I want to send a HTTP POST with URL parameters. I am using python 2.7 request module to send the HTTP POST. Please assume the payload is structured correctly as I have tested that part.
URL_post = http://127.0.0.1/webroot/TestFunc?identity_number=S111A/post
r = requests.post(URL_post, payload)
On the cakephp side, the controller looks something like this;
public function TestFunc($id=null)
{
$identity_number = $this->request->query['identity_number'];
$this->request->data['Model']['associated_id']=$identity_number;
$this->Model->saveAll($this->request->data, array('deep' => true));
}
I have tested that the query is not received correctly. However, if I am not using HTTP POST and just throwing in a normal URL, the query can be received correctly.
What have I done wrong?
The query part of the url is sent correctly:
import requests
requests.post('http://localhost/webroot/TestFunc?identity_number=S111A/post',
{'Model': 'data'})
The Request
POST /webroot/TestFunc?identity_number=S111A/post HTTP/1.1
Host: localhost
User-Agent: python-requests/2.2.1 CPython/3.4 Linux/3.2
Accept: */*
Accept-Encoding: gzip, deflate, compress
Content-Type: application/x-www-form-urlencoded
Content-Length: 10
Model=data
You could also make the requests using params:
requests.post('http://localhost/webroot/TestFunc',
data={'Model': 'data'},
params={'identity_number': 'S111A/post'})
The only difference is that S111A/post is sent as S111A%2Fpost (the same url in the end).
Look at http://docs.python-requests.org/en/latest/user/quickstart/#passing-parameters-in-urls.
payload = {"identity_number": "S111A/post"}
URL_post = "http://127.0.0.1/webroot/TestFunc"
req = requests.post(URL_post, params=payload)
print(req.status_code)

Categories

Resources