I'm trying to get the value of API key avaialable within headers from this website. The value of API key can be found using this link within headers (once the page is reloaded).
In dev tools, I found the headers like the following where API key and value are present:
Accept: application/json
Content-Type: application/json
Referer: https://www.pinnacle.com/en/
Sec-Fetch-Mode: cors
User-Agent: Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/76.0.3809.100 Safari/537.36
X-API-Key: CmX2KcMrXuFmNg6YFbmTxE0y9CIrOi0R
X-Device-UUID: 3a10d97d-5dc63d32-9b562999-2a023260
However, when I print the headers (using the second link), I get the following items except for that API key.
{'Date': 'Tue, 20 Aug 2019 03:53:47 GMT', 'Content-Type': 'application/problem+json', 'Content-Length': '119', 'Connection': 'keep-alive', 'Set-Cookie': '__cfduid=d43bcbb47c4b830f22e994d7311c5f37d1566273227; expires=Wed, 19-Aug-20 03:53:47 GMT; path=/; domain=.pinnacle.com; HttpOnly', 'Access-Control-Allow-Origin': '*', 'Access-Control-Allow-Methods': 'HEAD, GET, POST, PUT, DELETE, OPTIONS', 'Access-Control-Allow-Headers': 'Accept, Content-Type, X-API-Key, X-Device-UUID, X-Session, X-Language', 'Access-Control-Max-Age': '86400', 'Cache-Control': 'no-cache', 'CF-Cache-Status': 'MISS', 'Expect-CT': 'max-age=604800, report-uri="https://report-uri.cloudflare.com/cdn-cgi/beacon/expect-ct"', 'Vary': 'Accept-Encoding', 'Server': 'cloudflare', 'CF-RAY': '50916c15eb6ee03b-DFW'}
I've tried with:
import requests
from bs4 import BeautifulSoup
link = 'https://guest.api.arcadia.pinnacle.com/0.1/sports/33/markets/live/straight'
res = requests.get(link)
print(res.headers)
How can I get the value of API key from that site?
Let's just break down how 'requests' works.
When you say:
res = requests.get(link)
It means you're sending the API server a request - you're supposed to be providing the API key here. It isn't supposed to be something 'requests' receives after a request, instead it's supposed to be something requests needs to perform the request.
Related
I'm still very new to coding (been coding for a week) so I am struggling with a very basic function.
I am trying to log into a website using python however I am having a hard time changing the set-cookie header.
See my current code below:
import requests
targetURL = "http://hostip/v2_Website/aspx/login.aspx"
headers = {
"Host": "*host IP*",
"User-Agent": "Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Firefox/52.0"}
response = requests.get(url=targetURL,
proxies=proxies,
headers=headers,)
response_headers = response.headers
When I print the response.headers I get the following:
{'Cache-Control': 'no-cache, no-store', 'Pragma': 'no-cache,no-cache', 'Content-Length': '15671', 'Content-Type': 'text/html; charset=utf-8', 'Expires': '-1', 'Server': 'Microsoft-IIS/7.5', 'X-AspNet-Version': '2.0.50727', 'Set-Cookie': 'ASP.NET_SessionId=vq5q4lzlrqiiebbmxw341yic; path=/; HttpOnly, CookieLoginAttempts=5; expires=Tue, 14-Aug-2018 17:14:09 GMT; path=/', 'X-Powered-By': 'ASP.NET', 'Date': 'Tue, 14 Aug 2018 07:14:10 GMT', 'Connection': 'close'}
Obviously when I use these headers in my HTTP POST it fails due to the POST having a Set-Cookie header instead of Cookie value.
My objectives are as follows:
Update/change the Set-Cookie key to Cookie
Then I would like to remove values that are not needed in the Cookie key
Add other keys and values
Ultimately I would like to change the headers to the following so I can use it for my POST in order for me to pass login credentials:
POST /Test server/aspx/Login.aspx?function=Welcome HTTP/1.1
Host: *Host IP*
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Firefox/52.0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-US,en;q=0.5
Accept-Encoding: gzip, deflate
Referer: http://*HostIP*/v2_Website/aspx/main.aspx?function=Welcome
Cookie: ASP.NET_SessionId=3vy0fy55xsmffhbotikrwh55; CookieLoginAttempts=5; Admin=false
Connection: close
Upgrade-Insecure-Requests: 1
Content-Type: application/x-www-form-urlencoded
Content-Length: 220
Is my above objective even possible? If so how does one even achieve it as I don't understand the process of modifying a dictionary I can't see.
Again I would like to you to note I am still very green in the world of coding and trying to "think like a coder" thus keeping responses a little less technical would be highly appreciated, just so I can understand your response and advice. Any help would be great!
I was a able to find the answer with a quite a bit of research.
Instead of trying to edit it manually I did the following:
import requests
session_requests = requests.session()
login_url = "http://*host ip*/v2_Website/aspx/Login.aspx"
result = session_requests.get(login_url)
result = session_requests.post(login_url,
headers= dict(referer=login_url))
This pulls through the needed cookie and adds everything as need. My headers come back as follows:
POST /_v2_Website/aspx/Login.aspx HTTP/1.1
Host: *host IP*
User-Agent: python-requests/2.18.4
Accept-Encoding: gzip, deflate
Accept: */*
Connection: close
referer: http://*hostIP*/v2_Website/aspx/Login.aspx
Cookie: ASP.NET_SessionId=3crqoo45hnn21anuqulmmr55; CookieLoginAttempts=5
Content-Length: 79
Content-Type: application/x-www-form-urlencoded
I am trying to make a get request to a webpage but I keep getting a 404 error using Python2.7 with requests package. However, using CURL I get a successful response and it works with the browser.
Python
r = requests.get('https://www.ynet.co.il/articles/07340L-446694800.html')
r.status_code
404
r.headers
{'backend-cache-control': '', 'Content-Length': '20661', 'WAI': '02',
'X-me': '08', 'vg_id': '1', 'Content-Encoding': 'gzip', 'Vary': 'Accept-Encoding',
'Last-Modified': 'Sun, 20 May 2018 01:20:04 GMT', 'Connection': 'keep-alive',
'V-TTL': '47413', 'Date': 'Sun, 20 May 2018 14:55:21 GMT', 'VX-Cache': 'HIT',
'Content-Type': 'text/html; charset=UTF-8', 'Accept-Ranges': 'bytes'}
r.reason
'Not Found'
CURL
curl https://www.ynet.co.il/articles/07340L-446694800.html
The code is correct, it works for some other sites (see https://repl.it/repls/MemorableUpbeatExams ).
This site loads for me in the browser, so I confirm your issue.
It might be that they block Python requests, because they don't want their site scraped and analysed by bots, but they forgot to block curl.
What you are doing is probably violating www.ynet.co.il terms of use, and you shouldn't do that.
A 404 is displayed when:
The URL is incorrect and the response is actually accurate.
Trailing spaces in the URL
The website may not like HTTP(S) requests coming from Python code. Change your headers by adding "www." to your Referer url.
resp = requests.get(r'http://www.xx.xx.xx.xx/server/rest/line/125')
or
headers = {
'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36'
}
result = requests.get('https://www.transfermarkt.co.uk', headers=headers)
I'm looking to write a script that can automatically download .zip files from the Bureau of Transportation Statistics Carrier Website, but I'm having trouble getting the same response headers as I can see in Chrome when I download the zip file. I'm looking to get a response header that looks like this:
HTTP/1.1 302 Object moved
Cache-Control: private
Content-Length: 183
Content-Type: text/html
Location: http://tsdata.bts.gov/103627300_T_T100_SEGMENT_ALL_CARRIER.zip
Server: Microsoft-IIS/8.5
X-Powered-By: ASP.NET
Date: Thu, 21 Apr 2016 15:56:31 GMT
However, when calling requests.post(url, data=params, headers=headers) with the same information that I can see in the Chrome network inspector I am getting the following response:
>>> res.headers
{'Cache-Control': 'private', 'Content-Length': '262', 'Content-Type': 'text/html', 'X-Powered-By': 'ASP.NET', 'Date': 'Thu, 21 Apr 2016 20:16:26 GMT', 'Server': 'Microsoft-IIS/8.5'}
It's got pretty much everything except it's missing the Location key that I need in order to download the .zip file with all of the data I want. Also the Content-Length value is different, but I'm not sure if that's an issue.
I think that my issue has something to do with the fact that when you click "Download" on the page it actually sends two requests that I can see in the Chrome network console. The first request is a POST request that yields an HTTP response of 302 and then has the Location in the response header. The second request is a GET request to the url specified in the Location value of the response header.
Should I really be sending two requests here? Why am I not getting the same response headers using requests as I do in the browser? FWIW I used curl -X POST -d /*my data*/ and got back this in my terminal:
<head><title>Object moved</title></head>
<body><h1>Object Moved</h1>This object may be found here.</body>
Really appreciate any help!
I was able to download the zip file that I was looking for by using almost all of the headers that I could see in the Google Chrome web console. My headers looked like this:
{'Connection': 'keep-alive', 'Cache-Control': 'max-age=0', 'Referer': 'http://www.transtats.bts.gov/DL_SelectFields.asp?Table_ID=293', 'Origin': 'http://www.transtats.bts.gov', 'Upgrade-Insecure-Requests': 1, 'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8', 'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/49.0.2623.112 Safari/537.36', 'Cookie': 'ASPSESSIONIDQADBBRTA=CMKGLHMDDJIECMNGLMDPOKHC', 'Accept-Language': 'en-US,en;q=0.8', 'Accept-Encoding': 'gzip, deflate', 'Content-Type': 'application/x-www-form-urlencoded'}
And then I just wrote:
res = requests.post(url, data=form_data, headers=headers)
where form_data was copied from the "Form Data" section of the Chrome console. Once I got that request, I used the zipfile and io modules to parse the content of the response stored in res. Like this:
import zipfile, io
zipfile.ZipFile(io.BytesIO(res.content))
and then the file was in the directory where I ran the Python code.
Thanks to the users who answered on this thread.
I am trying to authenticate to Docushare with Python 3.4 using requests 2.7. I am relatively new to Python and to the requests module but I've done a lot of reading and am not able to make any more progress. My code doesn't give any errors and I receive a JSESSIONID cookie back from my post request, but I'm not getting the AmberUser authentication cookie. I don't know how to investigate this further to find out what the problem is.
The form of the request comes from http://docushare.xerox.com/en-us/Help/prog/prog5.htm#Authentication
Request:
POST /dscgi/ds.py/Login HTTP/1.1
Host: docushare.xerox.com
Content-Type: text/xml
Content-Length: xxxx
<?xml version="1.0" ?>
<authorization>
<username>msmith</username>
<password>mysecretstring</password>
</authorization>
My Python / requests code looks like:
import requests
url = "https://mydocusharesite/dsweb/Login"
xml="""<?xml version='1.0' ?>
<authorization>
<username>myusername</username>
<password><![CDATA[mypa$$word]]></password>
<domain>DocuShare</domain>
</authorization>"""
headers = {"DocuShare-Version":"5.0", 'Content-Type':'text/xml'}
s = requests.Session()
r = s.post(url,data=xml,headers=headers)
print('Status code:', r.status_code)
print('headers:\n', r.headers)
print('request headers:\n',r.request.headers)
c = s.cookies
print('Cookies:\n', c)
The output I get is
Status code: 200
headers:
{'set-cookie': 'JSESSIONID=21B7E5E0D83D1F1267371B9FD1B19BBC.tomcat1; Path=/docushare; Secure', 'transfer-encoding': 'chunked', 'connection': 'close', 'content-type': 'text/html;charset=UTF-8', 'cache-control': 'private', 'date': 'Sun, 07 Jun 2015 02:22:59 GMT', 'expires': '-1'}
request headers:
{'Connection': 'keep-alive', 'DocuShare-Version': '5.0', 'Accept': '*/*', 'Content-Type': 'text/xml', 'Accept-Encoding': 'gzip, deflate', 'Content-Length': '153', 'User-Agent': 'python-requests/2.7.0 CPython/3.4.3 Darwin/14.3.0'}
Cookies:
<RequestsCookieJar[<Cookie JSESSIONID=21B7E5E0D83D1F1267371B9FD1B19BBC.tomcat1 for mydocusharesite>]>
Your CDATA section should look like <![CDATA[mypa$$word]]>. Your code is currently sending ![CDATA[mypa$$word]] as the actual password.
I am having an issue with submitting an HTTP Post request. My purpose of this program is to scrape the lyrics off a website, and then use that string in a text summarizer. I am having an issue submitting the POST request on the summarizer's website. Currently with the code below, it does not submit request. It just returns the page. I think it may be due to the content-type being different, but I am not sure.
My code:
def summarize(lyrics):
url = 'http://www.freesummarizer.com'
values = {'text' : lyrics,
'maxsentences' : '1',
'maxtopwords' : '40',
'email' : 'your#email.com' }
headers = {'User-Agent' : 'Mozilla/5.0'}
cookies = {'_jsuid': '777245265', '_ga':'GA1.2.164138903.1423973625', '__smToken':'elPdHJINsP5LvAYhia6OAA68', '__smListBuilderShown':'true', '_first_pageview':'1', '_gat':'1', '_eventqueue':'%7B%22heatmap%22%3A%5B%7B%22type%22%3A%22heatmap%22%2C%22href%22%3A%22%252F%22%2C%22x%22%3A324%2C%22y%22%3A1800%2C%22w%22%3A640%7D%5D%2C%22events%22%3A%5B%5D%7D', 'PHPSESSID':'28b0843d49700e134530fbe32ea62923', '__smSmartbarShown':'true'}
r = requests.post(url, data=values, headers=headers)
print(r.text)
My Response:
'transfer-encoding': 'chunked'
'set-cookie': 'PHPSESSID=1f10ec11e6f9040cbb5a81e16bfcdf7f; path=/',
'expires': 'Thu, 19 Nov 1981 08:52:00 GMT'
'keep-alive': 'timeout=5, max=100'
'server': 'Apache'
'connection': 'Keep-Alive'
'pragma': 'no-cache'
'cache-control': 'no-store, no-cache, must-revalidate, post-check=0, pre-check=0'
'date': 'Fri, 27 Feb 2015 18:38:41 GMT'
'content-type': 'text/html'
A successful response on this website:
Host: freesummarizer.com
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.10; rv:35.0) Gecko/20100101 Firefox/35.0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-US,en;q=0.5
Accept-Encoding: gzip, deflate
Referer: http://freesummarizer.com/
Cookie: _jsuid=777245265; _ga=GA1.2.164138903.1423973625; __smToken=elPdHJINsP5LvAYhia6OAA68; __smListBuilderShown=true; _first_pageview=1; _gat=1; _eventqueue=%7B%22heatmap%22%3A%5B%7B%22type%22%3A%22heatmap%22%2C%22href%22%3A%22%252F%22%2C%22x%22%3A324%2C%22y%22%3A1800%2C%22w%22%3A640%7D%5D%2C%22events%22%3A%5B%5D%7D; PHPSESSID=28b0843d49700e134530fbe32ea62923; __smSmartbarShown=true
Connection: keep-alive
Content-Type: application/x-www-form-urlencoded
Content-Length: 6044
Everything seems to be working just fine with requests.
But I think the issue here is that you are using the wrong tool for the job.
The tool I believe you are looking for is Selenium.
Selenium automates browsers. That's it! What you do with that power is entirely up to you. Primarily, it is for automating web applications for testing purposes, but is certainly not limited to just that. Boring web-based administration tasks can (and should!) also be automated as well.
You should absolutely take a look it this tool.
Selenium docs