SharePoint checkout File by python - python

So, I already searched a lot in different forums but I just can´t make it work for me.
I want to automate a tool. Therefore I´m trying to checkout a SharePoint File in a python script:
import requests
from requests.auth import HTTPBasicAuth
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.117 Safari/537.36', 'X-RequestDigest': 'form digest value'}
url = "https://company.sharepoint.com/sites/team/_api/SP.AppContextSite(#target)/web/GetFileByServerRelativeUrl('/sites/team/Shared Documents/project/doc.xlsb')/checkout()"
response = requests.post(url, auth=HTTPBasicAuth(USERNAME, PASSWORD),headers=headers)
I´m getting the response "403 Access denied. You do not have permission to perform this action or access this resource." I can CheckOut the file manually so I clearly have the rights to do it. Is there a problem with the authentification or are there other solutions?

The problem seems that you are not passing the correct form digest value in your headers. The form digest value is a security token that SharePoint requires for any POST requests that modify the state of the server. You can obtain the form digest value by making a POST request to the /_api/contextinfo endpoint and extracting the value from the response. For example:
import requests
from requests.auth import HTTPBasicAuth
Get the form digest value
digest_url = "https://company.sharepoint.com/sites/team/_api/contextinfo"
digest_response = requests.post(digest_url, auth=HTTPBasicAuth(USERNAME, PASSWORD))
digest_value = digest_response.json()['d']['GetContextWebInformation']['FormDigestValue']
Use the form digest value in the headers
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.117 Safari/537.36', 'X-RequestDigest': digest_value}
url = "https://company.sharepoint.com/sites/team/_api/SP.AppContextSite(#target)/web/GetFileByServerRelativeUrl('/sites/team/Shared Documents/project/doc.xlsb')/checkout()"
response = requests.post(url, auth=HTTPBasicAuth(USERNAME, PASSWORD),headers=headers)
Check the status code
if response.status_code == 200:
print("File checked out successfully")
else:
print("Error: ", response.status_code, response.reason)
Explanation:
The form digest value is a way for SharePoint to prevent cross-site request forgery (CSRF) attacks, where a malicious site can send requests to SharePoint on behalf of a user without their consent. The form digest value is a random string that is generated by SharePoint and stored in a hidden input field in the page. When a user submits a form or makes a POST request, SharePoint validates that the form digest value matches the one stored in the server. If they don't match, the request is rejected.
When you are using requests to make POST requests to SharePoint, you need to obtain the form digest value from the /_api/contextinfo endpoint, which returns a JSON object with the form digest value and other information. You need to pass this value in the X-RequestDigest header of your subsequent requests, so that SharePoint can verify that you are authorized to perform the action.
Examples:
Here are some examples of how to use requests to make POST requests to SharePoint with the form digest value:
To create a new folder in a document library:
import requests
from requests.auth import HTTPBasicAuth
# Get the form digest value
digest_url = "https://company.sharepoint.com/sites/team/_api/contextinfo"
digest_response = requests.post(digest_url, auth=HTTPBasicAuth(USERNAME, PASSWORD))
digest_value = digest_response.json()['d']['GetContextWebInformation']['FormDigestValue']
# Use the form digest value in the headers
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.117 Safari/537.36', 'X-RequestDigest': digest_value, 'accept': 'application/json;odata=verbose', 'content-type': 'application/json;odata=verbose'}
# Specify the folder name and the parent folder path
folder_name = "New Folder"
parent_folder = "/sites/team/Shared Documents/project"
# Construct the payload
payload = {
'__metadata': {'type': 'SP.Folder'},
'ServerRelativeUrl': parent_folder + '/' + folder_name
}
# Construct the url
url = "https://company.sharepoint.com/sites/team/_api/SP.AppContextSite(#target)/web/folders?#target='https://company.sharepoint.com/sites/team'"
# Make the POST request
response = requests.post(url, auth=HTTPBasicAuth(USERNAME, PASSWORD),headers=headers, json=payload)
# Check the status code
if response.status_code == 201:
print("Folder created successfully")
else:
print("Error: ", response.status_code, response.reason)
To upload a file to a document library:
import requests
from requests.auth import HTTPBasicAuth
# Get the form digest value
digest_url = "https://company.sharepoint.com/sites/team/_api/contextinfo"
digest_response = requests.post(digest_url, auth=HTTPBasicAuth(USERNAME, PASSWORD))
digest_value = digest_response.json()['d']['GetContextWebInformation']['FormDigestValue']
# Use the form digest value in the headers
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.117 Safari/537.36', 'X-RequestDigest': digest_value, 'accept': 'application/json;odata=verbose', 'content-type': 'application/json;odata=verbose'}
# Specify the file name and the file content
file_name = "test.txt"
file_content = b"Hello world"
# Specify the folder path
folder_path = "/sites/team/Shared Documents/project"
# Construct the url
url = "https://company.sharepoint.com/sites/team/_api/SP.AppContextSite(#target)/web/GetFolderByServerRelativeUrl('" + folder_path + "')/Files/add(url='" + file_name + "',overwrite=true)?#target='https://company.sharepoint.com/sites/team'"
# Make the POST request
response = requests.post(url, auth=HTTPBasicAuth(USERNAME, PASSWORD),headers=headers, data=file_content)
# Check the status code
if response.status_code == 200:
print("File uploaded successfully")
else:
print("Error: ", response.status_code, response.reason)

it looks like you're trying to connect to a corporate account.
This probably does not answer your question, but here I might suggest another way using the Microsoft Graph API.
The advantage of this way is that every user can use this interface with his individual rights. To allow authentication you first need to register your application at Azure APP (https://portal.azure.com/#blade/Microsoft_AAD_RegisteredApps/ApplicationsListBlade).
To connect via Python you can use the o365 module (https://pypi.org/project/O365/). This allows you to communicate with sharepoint via this interface. Here you will also find further explanations to connect to Sharepoint.

Related

Login to a site with Requests-html

I am trying to login to a site through Python as if I am going to scrape it, but after trying for several hours I could not figure out why after I receive authorization token through successful post request I still receive a get html response as if I am not logged in.
The website is https://www.packtpub.com/ and below is my code.
import requests_html
import json
s_async = requests_html.AsyncHTMLSession()
#define vars for post request
user_agent = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/101.0.4951.41 Safari/537.36 Edg/101.0.1210.32"
data = {"username":username,"password":password}
post_url = "https://services.packtpub.com/auth-v1/users/tokens"
headers = {"User-Agent":user_agent}
#post request
r_post = await s_async.post(post_url,json=data,headers=headers)
if r_post.status_code!=200:
raise Exception("Check response code from post request")
#define vars for get request's header
#auth_token has the 'Bearer ' prefix added as this is how the authorization
#token is sent in the get header by my browser inspected through DevTools
auth_token = 'Bearer '+json.loads(r_post.text)['data']['access']
accept = "application/json, text/plain, */*"
accept_encoding = "gzip, deflate, br"
accept_language = "en-US,en;q=0.9,bg;q=0.8"
origin = "https://account.packtpub.com"
referer = "https://account.packtpub.com/"
#sec_ch_ua= r" Not A;Brand";v="99", "Chromium";v="101", "Microsoft Edge";v="101"
sec_ch_ua_mobile = "?0"
sec_ch_ua_platform = "Windows"
##sec-fetch-dest: empty
sec_fetch_mode = "cors"
sec_fetch_site = "same-site"
#the detailed headers are an attempt to copy get header from the successful get request by the browser
headers = {"User-Agent":user_agent,
"authorization":auth_token,
"accept":accept,
"accept-encoding":accept_encoding,
"accept-language":accept_language,
"origin":origin,
"referer":referer,
"sec-ch-ua-mobile":sec_ch_ua_mobile,
"sec-ch-ua-platform":sec_ch_ua_platform,
"sec-fetch-mode":sec_fetch_mode,
"sec-fetch-site":sec_fetch_site}
#get request
r_get = await s_async.get("https://www.packtpub.com",headers=headers)
if r_get.status_code!=200:
raise Exception("Check response code from get request")
await r_get.html.arender()
#looking for signs that the login is successful, more specifically I am looking for
#the absence of the "User Sign In" button
with open(r"C:\packtpub_inspect.html","wb") as file:
file.write(r_get.html.raw_html)`

Delete default header fields in Python Requests [duplicate]

I want to send a value for "User-agent" while requesting a webpage using Python Requests. I am not sure is if it is okay to send this as a part of the header, as in the code below:
debug = {'verbose': sys.stderr}
user_agent = {'User-agent': 'Mozilla/5.0'}
response = requests.get(url, headers = user_agent, config=debug)
The debug information isn't showing the headers being sent during the request.
Is it acceptable to send this information in the header? If not, how can I send it?
The user-agent should be specified as a field in the header.
Here is a list of HTTP header fields, and you'd probably be interested in request-specific fields, which includes User-Agent.
If you're using requests v2.13 and newer
The simplest way to do what you want is to create a dictionary and specify your headers directly, like so:
import requests
url = 'SOME URL'
headers = {
'User-Agent': 'My User Agent 1.0',
'From': 'youremail#domain.example' # This is another valid field
}
response = requests.get(url, headers=headers)
If you're using requests v2.12.x and older
Older versions of requests clobbered default headers, so you'd want to do the following to preserve default headers and then add your own to them.
import requests
url = 'SOME URL'
# Get a copy of the default headers that requests would use
headers = requests.utils.default_headers()
# Update the headers with your custom ones
# You don't have to worry about case-sensitivity with
# the dictionary keys, because default_headers uses a custom
# CaseInsensitiveDict implementation within requests' source code.
headers.update(
{
'User-Agent': 'My User Agent 1.0',
}
)
response = requests.get(url, headers=headers)
It's more convenient to use a session, this way you don't have to remember to set headers each time:
session = requests.Session()
session.headers.update({'User-Agent': 'Custom user agent'})
session.get('https://httpbin.org/headers')
By default, session also manages cookies for you. In case you want to disable that, see this question.
It will send the request like browser
import requests
url = 'https://Your-url'
headers={'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/102.0.0.0 Safari/537.36'}
response= requests.get(url.strip(), headers=headers, timeout=10)

Python Requests - SAML Login Redirect

I'm trying to log in to a website from this URL: "https://pollev.com/login". Since I'm using a school email, the portal redirects to the school's login portal and uses that portal to authenticate the login. It shows up when you type in a uw.edu email (example: myname#uw.edu). After logging in, UW sends a POST request callback to https://www.polleverywhere.com/auth/washington/callback with a SAMLResponse header like this. I think I need to simulate the GET request from pollev's login page and then send the login headers to the UW login page, but what I'm doing right now isn't working.
Here's my code:
import requests
with requests.session() as s:
header_data = {
'user - agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 '
'(KHTML, like Gecko) Chrome/69.0.3497.100 Safari/537.36',
'referer': 'https://pollev.com/login'
}
login_data = {
'j_username' : 'username',
'j_password' : 'password',
'_eventId_proceed' : 'Sign in'
}
r = s.get('https://idp.u.washington.edu/idp/profile/SAML2/Redirect/SSO?execution=e2s1',
headers=header_data, data=login_data)
print(r.text)
Right now, r.text shows a NoSuchFlowExecutionException html page. What am I missing? Logging into the website normally requires a login, password, Referrer, and X-CSRF token which I was able to do, but I don't know how to navigate a redirect for authentication.
Old question but I had nearly identical needs and carried on until I solved it. In my case, which may still be the case of the OP, I have the required credentials. I am certain this could be made more efficient / pythonic and would greatly appreciate those tips / corrections.
import re
import requests
# start HTTP request session
s = requests.Session()
# Prepare for first request - This is the ultimate target URL
url1 = '/URL/needing/shibbolethSAML/authentication'
header_data = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 11_2_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/89.0.4389.82 Safari/537.36'}
# Make first request
r1 = s.get(url1, headers = header_data)
# Prepare for second request - extract URL action for next POST from response, append header, and add login credentials
ss1 = re.search('action="', r1.text)
ss2 = re.search('" autocomplete', r1.text)
url2 = 'https://idp.u.washington.edu' + r1.text[ss1.span(0)[1]:ss2.span(0)[0]]
header_data.update({'Accept-Encoding': 'gzip, deflate, br', 'Content-Type': 'application/x-www-form-urlencoded'})
cred = {'j_username': 'username', 'j_password':'password', '_eventId_proceed' : 'Sign in'}
# Make second request
r2 = s.post(url2, data = cred)
# Prepare for third request - format and extract URL, RelayState, and SAMLResponse
ss3 = re.search('<form action="',r2.text) # expect only one instance of this pattern in string
ss4 = re.search('" method="post">',r2.text) # expect only one instance of this pattern in string
url3 = r2.text[ss3.span(0)[1]:ss4.span(0)[0]].replace(':',':').replace('/','/')
ss4 = re.search('name="RelayState" value="', r2.text) # expect only one instance of this pattern in string
ss5 = re.search('"/>', r2.text)
relaystate_value = r2.text[ss4.span(0)[1]:ss5.span(0)[0]].replace(':',':')
ss6 = re.search('name="SAMLResponse" value="', r2.text)
ss7 = [m.span for m in re.finditer('"/>',r2.text)] # expect multiple matches with the second match being desired
saml_value = r2.text[ss6.span(0)[1]:ss7[1](0)[0]]
data = {'RelayState': relaystate_value, 'SAMLResponse': [saml_value, 'Continue']}
header_data.update({'Host': 'training.ehs.washington.edu', 'Referer': 'https://idp.u.washington.edu/', 'Connection': 'keep-alive'})
# Make third request
r3 = s.post(url3, headers=header_data, data = data)
# You should now be at the intended URL
You're not going to be successful faking out SAML2 SSO. The identity provider (IdP) at UW is looking to support an authentication request from the service provider (SP) polleverywhere.com. Part of that is verifying the request actually originated from polleverywhere. This could be as simple has requiring SSL connection from polleverywhere, it could be as complicated as requiring an encrypted & signed authentication request. Since you don't have those credentials, the resulting response isn't going to be readable. SPs are registered with IdPs.
Now, there may be a different way to sign into polleverywhere -- a different URL which will not trigger an SSO request, but that might be network restricted or require other difficult authentication.

Formatting Python Requests according to the following headers

I am trying to use Python Requests library to POST a zipped file as multipart/form-data. I have currently used the Chrome Extension Advanced REST Client that is able to upload the file without a problem. However, I face difficulties while trying to do the same from the console using Python Requests.
The general information for the request is:
Remote Address:IP/Address/to/Host:Port
Request URL:/path/to/host/with/parameters/
Request Method:POST
The request headers from Advanced REST Client are:
Accept:*/*
Accept-Encoding:gzip, deflate
Accept-Language:en-US,en;q=0.8
Authorization:Basic/Authentication
Connection:keep-alive
Content-Length:1893
Content-Type:multipart/form-data; boundary=----WebKitFormBoundaryu3rhOVbU2LpT89zi
Host:/host/name
Origin:chrome-extension://hgmloofddffdnphfgcellkdfbfbjeloo
User-Agent:Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/46.0.2490.86 Safari/537.36
The payload is as follows:
------WebKitFormBoundaryu3rhOVbU2LpT89zi
Content-Disposition: form-data; name="fileUpload"; filename="filename.zip"
Content-Type: application/x-zip-compressed
------WebKitFormBoundaryu3rhOVbU2LpT89zi--
I formatted this query in Python as follows:
import requests
authentication = requests.auth.HTTPBasicAuth(username=user, password=pass)
parameters = {} # with the appropriate parameters
url = '' # base URL
files = {'file': ('fileUpload', 'application/x-zip-compressed', {})}
response = requests.post(url, params = parameters, auth=authentication, files=files)
While the Chrome App, Advanced REST Client gives me a 200 OK response, I get a 400 response (bad query). What am I doing wrong?
Thanks!

Error: HTTPS site requires a 'Referer header' to be sent by your Web browser, but none was sent

You are seeing this message because this HTTPS site requires a 'Referer
header' to be sent by your Web browser, but none was sent. This header is
required for security reasons, to ensure that your browser is not being
hijacked by third parties.
I was trying to login to a website using requests but received the error above, how do I create a 'Referer
header'?
payload = {'inUserName': 'xxx.com', 'inUserPass': 'xxxxxx'}
url = 'https:xxxxxx'
req=requests.post(url, data=payload)
print(req.text)
You can pass in headers you want to send on your request as a keyword argument to request.post:
payload = {'inUserName': 'xxx.com', 'inUserPass': 'xxxxxx'}
url = 'https:xxxxxx'
req=requests.post(url, data=payload, headers={'Referer': 'yourReferer')
print(req.text)
I guess you are using this library: http://docs.python-requests.org/en/latest/user/quickstart/
If this is the case you have to add a custom header Referer (see section Custom headers). The code would be something like this:
url = '...'
payload = ...
headers = {'Referer': 'https://...'}
r = requests.post(url, data=payload, headers=headers)
For more information on the referer see this wikipedia article: https://en.wikipedia.org/wiki/Referer
I was getting the Same error in Chrome. What I did was just disabled all my chrome extensions including ad blockers. Here after I reloaded the page from where i wanted to scrape the data and logged in once again and then in the code as #Stephan Kulla mentioned you need to add headers inside headers i added user agent, referer, referrer-policy, origin. all these you can get in from inspect sample where you will find a Network part..
add all those in header and try to login again using post it should work.(It worked for me)
ori = 'https:......'
login_route = 'login/....'
header = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/105.0.0.0 Safari/537.36' , 'origin':'https://www.screener.in', 'referer': '/login/','referrer-policy':'same-origin'}
s=requests.session()
csrf = s.get(ori+login_route).cookies['csrftoken']
payload = {
'username': 'xxxxxx',
'password': 'yyyyyyy',
'csrfmiddlewaretoken': csrf
}
login_req = s.post(ori+login_route,headers=header,data=payload)

Categories

Resources