Formatting Python Requests according to the following headers - python

I am trying to use Python Requests library to POST a zipped file as multipart/form-data. I have currently used the Chrome Extension Advanced REST Client that is able to upload the file without a problem. However, I face difficulties while trying to do the same from the console using Python Requests.
The general information for the request is:
Remote Address:IP/Address/to/Host:Port
Request URL:/path/to/host/with/parameters/
Request Method:POST
The request headers from Advanced REST Client are:
Accept:*/*
Accept-Encoding:gzip, deflate
Accept-Language:en-US,en;q=0.8
Authorization:Basic/Authentication
Connection:keep-alive
Content-Length:1893
Content-Type:multipart/form-data; boundary=----WebKitFormBoundaryu3rhOVbU2LpT89zi
Host:/host/name
Origin:chrome-extension://hgmloofddffdnphfgcellkdfbfbjeloo
User-Agent:Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/46.0.2490.86 Safari/537.36
The payload is as follows:
------WebKitFormBoundaryu3rhOVbU2LpT89zi
Content-Disposition: form-data; name="fileUpload"; filename="filename.zip"
Content-Type: application/x-zip-compressed
------WebKitFormBoundaryu3rhOVbU2LpT89zi--
I formatted this query in Python as follows:
import requests
authentication = requests.auth.HTTPBasicAuth(username=user, password=pass)
parameters = {} # with the appropriate parameters
url = '' # base URL
files = {'file': ('fileUpload', 'application/x-zip-compressed', {})}
response = requests.post(url, params = parameters, auth=authentication, files=files)
While the Chrome App, Advanced REST Client gives me a 200 OK response, I get a 400 response (bad query). What am I doing wrong?
Thanks!

Related

Scrapy - How does a request sent using requests library to an API differs from the request that is sent using Scrapy.Request?

I am a beginner at using Scrapy and I was trying to scrape this website https://directory.ntschools.net/#/schools which is using javascript to load the contents. So I checked the networks tab and there's an API address available https://directory.ntschools.net/api/System/GetAllSchools If you open this address, the data is in XML format. But when you check the response tab while inspecting the network tab, the data is there in json format.
I first tried using Scrapy, sent the request to the API address WITHOUT any headers and the response that it returned was in XML which was throwing JSONDecode error upon using json.loads(). So I used the header 'Accept' : 'application/json' and the response I got was in JSON. That worked well
import scrapy
import json
import requests
class NtseSpider_new(scrapy.Spider):
name = 'ntse_new'
header = {
'Accept': 'application/json',
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/107.0.0.0 Safari/537.36 Edg/107.0.1418.56',
}
def start_requests(self):
yield scrapy.Request('https://directory.ntschools.net/api/System/GetAllSchools',callback=self.parse,headers=self.header)
def parse(self,response):
data = json.loads(response.body) #returned json response
But then I used the requests module WITHOUT any headers and the response I got was in JSON too!
import requests
import json
res = requests.get('https://directory.ntschools.net/api/System/GetAllSchools')
js = json.loads(res.content) #returned json response
Can anyone please tell me if there's any difference between both the types of requests? Is there a default response format for requests module when making a request to an API? Surely, I am missing something?
Thanks
It's because Scrapy sets the Accept header to 'text/html,application/xhtml+xml,application/xml ...'. You can see that from this.
I experimented and found that server sends a JSON response if the request has no Accept header.

Can access website's API via browser but not python requests (cookie not working)

I'm trying to scrape data from a dynamic website using Python requests.
I've had a look through the network requests in developer tools and found the URL the website sends GET requests to in order to access the required data:
When a request is made to the website's API it returns a cookie (via the Set-Cookie header) which I believe the browser then uses in future GET requests to access the data. Here is a screenshot of the request and response headers when the page is first loaded and all previous cookies have been removed:
When I load the request URL directly in my browser it works fine (it's able to acquire a valid cookie from the website and load the data). But when I send a GET request to that same URL via the Python requests module, the cookie returned doesn't seem to be working (I'm getting a 403 - Forbidden error).
My Python code:
import requests
session = requests.Session()
headers = {
"Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9",
"Accept-Encoding": "gzip, deflate",
"Accept-Language": "en-GB,en;q=0.9",
"Host": "www.oddschecker.com",
"Upgrade-Insecure-Requests": "1",
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/105.0.0.0 Safari/537.36 Edg/105.0.1343.33",
}
response = session.get(url, headers=headers)
# Currently returning a 403 error unless I add the cookie from my browser as a header.
I believe the cookie is the issue because when I instead take the cookie generated by the browser and use that as a header in my Python program it is then able to return the desired information (until the cookie expires.)
My goal is for the Python program to be able to acquire a working cookie from this website automatically so it can successfully send requests and gather data.

SharePoint checkout File by python

So, I already searched a lot in different forums but I just can´t make it work for me.
I want to automate a tool. Therefore I´m trying to checkout a SharePoint File in a python script:
import requests
from requests.auth import HTTPBasicAuth
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.117 Safari/537.36', 'X-RequestDigest': 'form digest value'}
url = "https://company.sharepoint.com/sites/team/_api/SP.AppContextSite(#target)/web/GetFileByServerRelativeUrl('/sites/team/Shared Documents/project/doc.xlsb')/checkout()"
response = requests.post(url, auth=HTTPBasicAuth(USERNAME, PASSWORD),headers=headers)
I´m getting the response "403 Access denied. You do not have permission to perform this action or access this resource." I can CheckOut the file manually so I clearly have the rights to do it. Is there a problem with the authentification or are there other solutions?
The problem seems that you are not passing the correct form digest value in your headers. The form digest value is a security token that SharePoint requires for any POST requests that modify the state of the server. You can obtain the form digest value by making a POST request to the /_api/contextinfo endpoint and extracting the value from the response. For example:
import requests
from requests.auth import HTTPBasicAuth
Get the form digest value
digest_url = "https://company.sharepoint.com/sites/team/_api/contextinfo"
digest_response = requests.post(digest_url, auth=HTTPBasicAuth(USERNAME, PASSWORD))
digest_value = digest_response.json()['d']['GetContextWebInformation']['FormDigestValue']
Use the form digest value in the headers
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.117 Safari/537.36', 'X-RequestDigest': digest_value}
url = "https://company.sharepoint.com/sites/team/_api/SP.AppContextSite(#target)/web/GetFileByServerRelativeUrl('/sites/team/Shared Documents/project/doc.xlsb')/checkout()"
response = requests.post(url, auth=HTTPBasicAuth(USERNAME, PASSWORD),headers=headers)
Check the status code
if response.status_code == 200:
print("File checked out successfully")
else:
print("Error: ", response.status_code, response.reason)
Explanation:
The form digest value is a way for SharePoint to prevent cross-site request forgery (CSRF) attacks, where a malicious site can send requests to SharePoint on behalf of a user without their consent. The form digest value is a random string that is generated by SharePoint and stored in a hidden input field in the page. When a user submits a form or makes a POST request, SharePoint validates that the form digest value matches the one stored in the server. If they don't match, the request is rejected.
When you are using requests to make POST requests to SharePoint, you need to obtain the form digest value from the /_api/contextinfo endpoint, which returns a JSON object with the form digest value and other information. You need to pass this value in the X-RequestDigest header of your subsequent requests, so that SharePoint can verify that you are authorized to perform the action.
Examples:
Here are some examples of how to use requests to make POST requests to SharePoint with the form digest value:
To create a new folder in a document library:
import requests
from requests.auth import HTTPBasicAuth
# Get the form digest value
digest_url = "https://company.sharepoint.com/sites/team/_api/contextinfo"
digest_response = requests.post(digest_url, auth=HTTPBasicAuth(USERNAME, PASSWORD))
digest_value = digest_response.json()['d']['GetContextWebInformation']['FormDigestValue']
# Use the form digest value in the headers
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.117 Safari/537.36', 'X-RequestDigest': digest_value, 'accept': 'application/json;odata=verbose', 'content-type': 'application/json;odata=verbose'}
# Specify the folder name and the parent folder path
folder_name = "New Folder"
parent_folder = "/sites/team/Shared Documents/project"
# Construct the payload
payload = {
'__metadata': {'type': 'SP.Folder'},
'ServerRelativeUrl': parent_folder + '/' + folder_name
}
# Construct the url
url = "https://company.sharepoint.com/sites/team/_api/SP.AppContextSite(#target)/web/folders?#target='https://company.sharepoint.com/sites/team'"
# Make the POST request
response = requests.post(url, auth=HTTPBasicAuth(USERNAME, PASSWORD),headers=headers, json=payload)
# Check the status code
if response.status_code == 201:
print("Folder created successfully")
else:
print("Error: ", response.status_code, response.reason)
To upload a file to a document library:
import requests
from requests.auth import HTTPBasicAuth
# Get the form digest value
digest_url = "https://company.sharepoint.com/sites/team/_api/contextinfo"
digest_response = requests.post(digest_url, auth=HTTPBasicAuth(USERNAME, PASSWORD))
digest_value = digest_response.json()['d']['GetContextWebInformation']['FormDigestValue']
# Use the form digest value in the headers
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.117 Safari/537.36', 'X-RequestDigest': digest_value, 'accept': 'application/json;odata=verbose', 'content-type': 'application/json;odata=verbose'}
# Specify the file name and the file content
file_name = "test.txt"
file_content = b"Hello world"
# Specify the folder path
folder_path = "/sites/team/Shared Documents/project"
# Construct the url
url = "https://company.sharepoint.com/sites/team/_api/SP.AppContextSite(#target)/web/GetFolderByServerRelativeUrl('" + folder_path + "')/Files/add(url='" + file_name + "',overwrite=true)?#target='https://company.sharepoint.com/sites/team'"
# Make the POST request
response = requests.post(url, auth=HTTPBasicAuth(USERNAME, PASSWORD),headers=headers, data=file_content)
# Check the status code
if response.status_code == 200:
print("File uploaded successfully")
else:
print("Error: ", response.status_code, response.reason)
it looks like you're trying to connect to a corporate account.
This probably does not answer your question, but here I might suggest another way using the Microsoft Graph API.
The advantage of this way is that every user can use this interface with his individual rights. To allow authentication you first need to register your application at Azure APP (https://portal.azure.com/#blade/Microsoft_AAD_RegisteredApps/ApplicationsListBlade).
To connect via Python you can use the o365 module (https://pypi.org/project/O365/). This allows you to communicate with sharepoint via this interface. Here you will also find further explanations to connect to Sharepoint.

Returning 403 Forbidden from simple get but loads okay in browser

I'm trying to get some data from a page, but it's returning the error [403 Forbidden].
I thought it was the user agent, but I tried several user agents and it still returns the error.
I also tried to use the library fake user-agent but I did not succeed.
with requests.Session() as c:
url = '...'
#headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 5.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2224.3 Safari/537.36'}
ua = UserAgent()
header = {'User-Agent':str(ua.chrome)}
page = c.get(url, headers=header)
print page.content
When I access the page manually, everything works.
I'm using python 2.7.14 and requests library, Any idea?
The site could be using anything in the request to trigger the rejection.
So, copy all headers from the request that your browser makes. Then delete them one by one1 to find out which are essential.
As per Python requests. 403 Forbidden, to add custom headers to the request, do:
result = requests.get(url, headers={'header':'value', <etc>})
1A faster way would be to delete half of them each time instead but that's more complicated since there are probably multiple essential headers
These all headers I can see for a generic GET request that are included by the browser:
Host: <URL>
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:58.0) Gecko/20100101 Firefox/58.0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-US,en;q=0.5
Accept-Encoding: gzip, deflate, br
Connection: keep-alive
Upgrade-Insecure-Requests: 1
Try to include all those incrementally in your request (1 by 1) in order to identify which one(s) is/are required for a successful request.
On the other hand, take look of the tabs: Cookies and/or Security available in your browser console / developer tools under Network option.

Python requests: disallow cookies

I am using Python requests:
import requests
image_url = my_url
headers = {'User-agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/29.0.1547.76 Safari/537.36', 'Accept':'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8','Accept-Encoding':'gzip,deflate,sdch'}
r = requests.get(image_url, headers=headers)
I would like the response to be the same as if I were sending the request from a browser that does NOT allow cookies to be set. The reason for this is that some sites give a different response depending on whether or not my browser allows cookies, and I need the non-cookie response.
Cookies are sent or not. If you don't set a cookie header, no cookie is sent. So the request in your question should be treated as sending no cookie.
The server sends a cookie in its response. If you set it in the next request, the server will recognize this. If you don't set it in the next request, the server will see that you don't accept cookies.
see http://docs.python-requests.org/en/latest/user/quickstart/#cookies

Categories

Resources