How to perform an HTTP/XML authentication with requests - python

I am trying to authenticate to Docushare with Python 3.4 using requests 2.7. I am relatively new to Python and to the requests module but I've done a lot of reading and am not able to make any more progress. My code doesn't give any errors and I receive a JSESSIONID cookie back from my post request, but I'm not getting the AmberUser authentication cookie. I don't know how to investigate this further to find out what the problem is.
The form of the request comes from http://docushare.xerox.com/en-us/Help/prog/prog5.htm#Authentication
Request:
POST /dscgi/ds.py/Login HTTP/1.1
Host: docushare.xerox.com
Content-Type: text/xml
Content-Length: xxxx
<?xml version="1.0" ?>
<authorization>
<username>msmith</username>
<password>mysecretstring</password>
</authorization>
My Python / requests code looks like:
import requests
url = "https://mydocusharesite/dsweb/Login"
xml="""<?xml version='1.0' ?>
<authorization>
<username>myusername</username>
<password><![CDATA[mypa$$word]]></password>
<domain>DocuShare</domain>
</authorization>"""
headers = {"DocuShare-Version":"5.0", 'Content-Type':'text/xml'}
s = requests.Session()
r = s.post(url,data=xml,headers=headers)
print('Status code:', r.status_code)
print('headers:\n', r.headers)
print('request headers:\n',r.request.headers)
c = s.cookies
print('Cookies:\n', c)
The output I get is
Status code: 200
headers:
{'set-cookie': 'JSESSIONID=21B7E5E0D83D1F1267371B9FD1B19BBC.tomcat1; Path=/docushare; Secure', 'transfer-encoding': 'chunked', 'connection': 'close', 'content-type': 'text/html;charset=UTF-8', 'cache-control': 'private', 'date': 'Sun, 07 Jun 2015 02:22:59 GMT', 'expires': '-1'}
request headers:
{'Connection': 'keep-alive', 'DocuShare-Version': '5.0', 'Accept': '*/*', 'Content-Type': 'text/xml', 'Accept-Encoding': 'gzip, deflate', 'Content-Length': '153', 'User-Agent': 'python-requests/2.7.0 CPython/3.4.3 Darwin/14.3.0'}
Cookies:
<RequestsCookieJar[<Cookie JSESSIONID=21B7E5E0D83D1F1267371B9FD1B19BBC.tomcat1 for mydocusharesite>]>

Your CDATA section should look like <![CDATA[mypa$$word]]>. Your code is currently sending ![CDATA[mypa$$word]] as the actual password.

Related

Unable to get the value of an API key from a website

I'm trying to get the value of API key avaialable within headers from this website. The value of API key can be found using this link within headers (once the page is reloaded).
In dev tools, I found the headers like the following where API key and value are present:
Accept: application/json
Content-Type: application/json
Referer: https://www.pinnacle.com/en/
Sec-Fetch-Mode: cors
User-Agent: Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/76.0.3809.100 Safari/537.36
X-API-Key: CmX2KcMrXuFmNg6YFbmTxE0y9CIrOi0R
X-Device-UUID: 3a10d97d-5dc63d32-9b562999-2a023260
However, when I print the headers (using the second link), I get the following items except for that API key.
{'Date': 'Tue, 20 Aug 2019 03:53:47 GMT', 'Content-Type': 'application/problem+json', 'Content-Length': '119', 'Connection': 'keep-alive', 'Set-Cookie': '__cfduid=d43bcbb47c4b830f22e994d7311c5f37d1566273227; expires=Wed, 19-Aug-20 03:53:47 GMT; path=/; domain=.pinnacle.com; HttpOnly', 'Access-Control-Allow-Origin': '*', 'Access-Control-Allow-Methods': 'HEAD, GET, POST, PUT, DELETE, OPTIONS', 'Access-Control-Allow-Headers': 'Accept, Content-Type, X-API-Key, X-Device-UUID, X-Session, X-Language', 'Access-Control-Max-Age': '86400', 'Cache-Control': 'no-cache', 'CF-Cache-Status': 'MISS', 'Expect-CT': 'max-age=604800, report-uri="https://report-uri.cloudflare.com/cdn-cgi/beacon/expect-ct"', 'Vary': 'Accept-Encoding', 'Server': 'cloudflare', 'CF-RAY': '50916c15eb6ee03b-DFW'}
I've tried with:
import requests
from bs4 import BeautifulSoup
link = 'https://guest.api.arcadia.pinnacle.com/0.1/sports/33/markets/live/straight'
res = requests.get(link)
print(res.headers)
How can I get the value of API key from that site?
Let's just break down how 'requests' works.
When you say:
res = requests.get(link)
It means you're sending the API server a request - you're supposed to be providing the API key here. It isn't supposed to be something 'requests' receives after a request, instead it's supposed to be something requests needs to perform the request.

How can I request (get) and read an xml file using python?

I tried requesting an RSS feed on Treasury Direct using Python. In the past I've used urllib, or requests libraries to serve this purpose and it's worked fine. This time however, I continue to get the 406 status error, which I understand is the page's way of telling me it doesn't accept my header details from the request. I've tried altering it however to no avail.
This is how I've tried
import requests
url = 'https://www.treasurydirect.gov/TA_WS/securities/announced/rss'
user_agent = {'User-agent': 'Mozilla/5.0'}
response = requests.get(url, headers = user_agent)
print response.text
Environments: Python 2.7 and 3.4.
I also tried accessing via curl with the same exact error.
I believe this to be page specific, but can't figure out how to appropriately frame a request to read this page.
I found an API on the page which I can read the same data in json so this issue is now more of a curiosity to me than a true problem.
Any answers would be greatly appreciated!
Header Details
{'surrogate-control': 'content="ESI/1.0",no-store', 'content-language': 'en-US', 'x-content-type-options': 'nosniff', 'x-powered-by': 'Servlet/3.0', 'transfer-encoding': 'chunked', 'set-cookie': 'BIGipServerpl_www.treasurydirect.gov_443=3221581322.47873.0000; path=/; Httponly; Secure, TS01598982=016b0e6f4634928e3e7e689fa438848df043a46cb4aa96f235b0190439b1d07550484963354d8ef442c9a3eb647175602535b52f3823e209341b1cba0236e4845955f0cdcf; Path=/', 'strict-transport-security': 'max-age=31536000; includeSubDomains', 'keep-alive': 'timeout=10, max=100', 'connection': 'Keep-Alive', 'cache-control': 'no-store', 'date': 'Sun, 23 Apr 2017 04:13:00 GMT', 'x-frame-options': 'SAMEORIGIN', '$wsep': '', 'content-type': 'text/html;charset=ISO-8859-1'}
You need to add accept to headers request:
import requests
url = 'https://www.treasurydirect.gov/TA_WS/securities/announced/rss'
headers = {'accept': 'application/xml;q=0.9, */*;q=0.8'}
response = requests.get(url, headers=headers)
print response.text

Python Requests Not Returning Same Header as Browser Request/cURL

I'm looking to write a script that can automatically download .zip files from the Bureau of Transportation Statistics Carrier Website, but I'm having trouble getting the same response headers as I can see in Chrome when I download the zip file. I'm looking to get a response header that looks like this:
HTTP/1.1 302 Object moved
Cache-Control: private
Content-Length: 183
Content-Type: text/html
Location: http://tsdata.bts.gov/103627300_T_T100_SEGMENT_ALL_CARRIER.zip
Server: Microsoft-IIS/8.5
X-Powered-By: ASP.NET
Date: Thu, 21 Apr 2016 15:56:31 GMT
However, when calling requests.post(url, data=params, headers=headers) with the same information that I can see in the Chrome network inspector I am getting the following response:
>>> res.headers
{'Cache-Control': 'private', 'Content-Length': '262', 'Content-Type': 'text/html', 'X-Powered-By': 'ASP.NET', 'Date': 'Thu, 21 Apr 2016 20:16:26 GMT', 'Server': 'Microsoft-IIS/8.5'}
It's got pretty much everything except it's missing the Location key that I need in order to download the .zip file with all of the data I want. Also the Content-Length value is different, but I'm not sure if that's an issue.
I think that my issue has something to do with the fact that when you click "Download" on the page it actually sends two requests that I can see in the Chrome network console. The first request is a POST request that yields an HTTP response of 302 and then has the Location in the response header. The second request is a GET request to the url specified in the Location value of the response header.
Should I really be sending two requests here? Why am I not getting the same response headers using requests as I do in the browser? FWIW I used curl -X POST -d /*my data*/ and got back this in my terminal:
<head><title>Object moved</title></head>
<body><h1>Object Moved</h1>This object may be found here.</body>
Really appreciate any help!
I was able to download the zip file that I was looking for by using almost all of the headers that I could see in the Google Chrome web console. My headers looked like this:
{'Connection': 'keep-alive', 'Cache-Control': 'max-age=0', 'Referer': 'http://www.transtats.bts.gov/DL_SelectFields.asp?Table_ID=293', 'Origin': 'http://www.transtats.bts.gov', 'Upgrade-Insecure-Requests': 1, 'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8', 'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/49.0.2623.112 Safari/537.36', 'Cookie': 'ASPSESSIONIDQADBBRTA=CMKGLHMDDJIECMNGLMDPOKHC', 'Accept-Language': 'en-US,en;q=0.8', 'Accept-Encoding': 'gzip, deflate', 'Content-Type': 'application/x-www-form-urlencoded'}
And then I just wrote:
res = requests.post(url, data=form_data, headers=headers)
where form_data was copied from the "Form Data" section of the Chrome console. Once I got that request, I used the zipfile and io modules to parse the content of the response stored in res. Like this:
import zipfile, io
zipfile.ZipFile(io.BytesIO(res.content))
and then the file was in the directory where I ran the Python code.
Thanks to the users who answered on this thread.

Issue with submitting an HTTP POST request

I am having an issue with submitting an HTTP Post request. My purpose of this program is to scrape the lyrics off a website, and then use that string in a text summarizer. I am having an issue submitting the POST request on the summarizer's website. Currently with the code below, it does not submit request. It just returns the page. I think it may be due to the content-type being different, but I am not sure.
My code:
def summarize(lyrics):
url = 'http://www.freesummarizer.com'
values = {'text' : lyrics,
'maxsentences' : '1',
'maxtopwords' : '40',
'email' : 'your#email.com' }
headers = {'User-Agent' : 'Mozilla/5.0'}
cookies = {'_jsuid': '777245265', '_ga':'GA1.2.164138903.1423973625', '__smToken':'elPdHJINsP5LvAYhia6OAA68', '__smListBuilderShown':'true', '_first_pageview':'1', '_gat':'1', '_eventqueue':'%7B%22heatmap%22%3A%5B%7B%22type%22%3A%22heatmap%22%2C%22href%22%3A%22%252F%22%2C%22x%22%3A324%2C%22y%22%3A1800%2C%22w%22%3A640%7D%5D%2C%22events%22%3A%5B%5D%7D', 'PHPSESSID':'28b0843d49700e134530fbe32ea62923', '__smSmartbarShown':'true'}
r = requests.post(url, data=values, headers=headers)
print(r.text)
My Response:
'transfer-encoding': 'chunked'
'set-cookie': 'PHPSESSID=1f10ec11e6f9040cbb5a81e16bfcdf7f; path=/',
'expires': 'Thu, 19 Nov 1981 08:52:00 GMT'
'keep-alive': 'timeout=5, max=100'
'server': 'Apache'
'connection': 'Keep-Alive'
'pragma': 'no-cache'
'cache-control': 'no-store, no-cache, must-revalidate, post-check=0, pre-check=0'
'date': 'Fri, 27 Feb 2015 18:38:41 GMT'
'content-type': 'text/html'
A successful response on this website:
Host: freesummarizer.com
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.10; rv:35.0) Gecko/20100101 Firefox/35.0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-US,en;q=0.5
Accept-Encoding: gzip, deflate
Referer: http://freesummarizer.com/
Cookie: _jsuid=777245265; _ga=GA1.2.164138903.1423973625; __smToken=elPdHJINsP5LvAYhia6OAA68; __smListBuilderShown=true; _first_pageview=1; _gat=1; _eventqueue=%7B%22heatmap%22%3A%5B%7B%22type%22%3A%22heatmap%22%2C%22href%22%3A%22%252F%22%2C%22x%22%3A324%2C%22y%22%3A1800%2C%22w%22%3A640%7D%5D%2C%22events%22%3A%5B%5D%7D; PHPSESSID=28b0843d49700e134530fbe32ea62923; __smSmartbarShown=true
Connection: keep-alive
Content-Type: application/x-www-form-urlencoded
Content-Length: 6044
Everything seems to be working just fine with requests.
But I think the issue here is that you are using the wrong tool for the job.
The tool I believe you are looking for is Selenium.
Selenium automates browsers. That's it! What you do with that power is entirely up to you. Primarily, it is for automating web applications for testing purposes, but is certainly not limited to just that. Boring web-based administration tasks can (and should!) also be automated as well.
You should absolutely take a look it this tool.
Selenium docs

Python Requests Digest Authorization

Hi I am working on a simple program to get a token-id from a router using REST API. The problem that I am facing, is that I do not see the Authorization headers when I use HTTPDigestAuth. When I use the Google App POSTMAN, I can see the headers and it work. What I am missing in my code?
My code:
import requests
from requests.auth import HTTPBasicAuth, HTTPDigestAuth
user = 'pod1u1'
passwd = 'pass'
url = 'https://10.0.236.188/api/v1/auth/token-services'
auth = HTTPDigestAuth(user, passwd)
r = requests.post(url, auth=auth, verify=False)
print 'Request headers:', r.request.headers
print 'Status Code: ', r.status_code
print 'response Headers: ', r.headers
print '######################################'
auth = HTTPBasicAuth(user, passwd)
r = requests.post(url, auth=auth, verify=False)
print 'Request headers:', r.request.headers
print 'Status Code: ', r.status_code
print 'response Headers: ', r.headers
Shell commands w/ output:
My script --
$python digest.py
Request headers: CaseInsensitiveDict({'Content-Length': '0', 'Accept-Encoding': 'gzip, deflate, compress', 'Accept': '*/*', 'User-Agent': 'python-requests/2.2.0 CPython/2.7.5 Darwin/13.0.0'})
Status Code: 401
response Headers: CaseInsensitiveDict({'date': 'Tue, 14 Jan 2014 00:28:27 GMT', 'content-length': '83', 'content-type': 'application/json', 'connection': 'keep-alive', 'server': 'nginx/1.4.2'})
######################################
Request headers: CaseInsensitiveDict({'Accept': '*/*', 'Content-Length': '0', 'Accept- Encoding': 'gzip, deflate, compress', 'Authorization': u'Basic cG9kMXUxOkMxc2NvTDF2Mw==', 'User-Agent': 'python-requests/2.2.0 CPython/2.7.5 Darwin/13.0.0'})
Status Code: 401
response Headers: CaseInsensitiveDict({'date': 'Tue, 14 Jan 2014 00:28:27 GMT', 'content-length': '448', 'content-type': 'text/html', 'connection': 'keep-alive', 'server': 'nginx/1.4.2'})
POSTMAN
POST /api/v1/auth/token-services HTTP/1.1
Host: 10.0.236.188
Authorization: Digest username="pod1u1", realm="pod1u1#ecatsrtpdmz.cisco.com", nonce="", uri="/api/v1/auth/token-services", response="08ac88b7f5e0533986e9fc974f132258", opaque=""
Cache-Control: no-cache
{
"kind": "object#auth-token",
"expiry-time": "Tue Jan 14 00:09:27 2014",
"token-id": "Vj7mYUMTrsuljaiXEPoNJNiXLzf8UeDsRnEgh3DvQcU=",
"link": "https://10.0.236.188/api/v1/auth/token-services/9552418862"
}
You are doing a POST, so intuitively, you need to pass 'params' argument to requests.post method
You can use a sniffer to see exactly what POSTMAN send to the url and do the same...
Just for info, I did a requests.get with digest credentials (on another url), it worked and I saw auth headers.
Maybe you could first start with a GET to create a "session" then do your POST, just a guess :)
[ADDED]
I would also try to use "raw" headers as a workaround :
[...]
headers = {
"Host": "10.0.236.188",
"Authorization": '''Digest username="pod1u1", realm="pod1u1#ecatsrtpdmz.cisco.com", nonce="", uri="/api/v1/auth/token-services", response="08ac88b7f5e0533986e9fc974f132258", opaque=""''',
"Cache-Control": "no-cache"
}
r = requests.post(url, auth=auth, headers=headers, verify=False)
[/ADDED]
The Problem is in the server side: Lukasa # GitHUB help me. "That doesn't look like a service that requires Digest Auth. If Digest Auth is required, the 401 should contain a header like this: WWW-Authenticate: Digest qop="auth". This does not. Instead, you're being returned a JSON body that contains an error message.
Digest Auth should not send headers on the initial message, because the server needs to inform you how to generate the digest. I invite you to open up the section of code that generates the digest. We require the realm, nonce and qop from the server before we can correctly generate the header."

Categories

Resources