I am using https://developer.microsoft.com/en-us/graph/graph-explorer to make requests
I am trying to convert them to Python to use for general automation.
I always copy from browser>postman>code, so I have all the cookies/tokens/etc. I need, and my python request will work until something expires. In this case, that something is a bearer token.
I can’t figure out how to get a new, valid bearer token other than re-doing above process or copying just the token and copy-pasting into my code.
While trying to find an auth request that would spit one out, I came across a collection for Postman here:
https://learn.microsoft.com/en-us/azure/active-directory/develop/v2-oauth2-auth-code-flow
and when I replace {{tenant}} with my orgs tenant_id, I get a 200 request with a bearer token, but when I insert this bearer token into my Graph API request code I get the following error:
{"error":{"code":"BadRequest","message":"/me request is only valid with delegated authentication flow.","innerError":{"date":"2022-10-23T14:31:22","request-id":"...","client-request-id":"..."}}}
Here is a screenshot of the Postman Auth
Here is my Graph API call that only works with bearer tokens copied from graph-explorer
def recreate_graph_request1(bearer = None):
'''
I went to https://developer.microsoft.com/en-us/graph/graph-explorer
and picked a request. Outlook>GET emails from a user
at first response was for some generic user, but I logged in using my account and it actually worked.
Then I used my old copy curl as bash trick to make it python
:return:
'''
url = "https://graph.microsoft.com/v1.0/me/messages?$filter=(from/emailAddress/address)%20eq%20%27my.boss#company.com%27"
payload = {}
headers = {
'Accept': '*/*',
'Accept-Language': 'en-US,en;q=0.9',
'Authorization': bearer,
'Connection': 'keep-alive',
'Origin': 'https://developer.microsoft.com',
'Referer': 'https://developer.microsoft.com/',
'SdkVersion': 'GraphExplorer/4.0, graph-js/3.0.2 (featureUsage=6)',
'Sec-Fetch-Dest': 'empty',
'Sec-Fetch-Mode': 'cors',
'Sec-Fetch-Site': 'same-site',
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/106.0.0.0 Safari/537.36',
'client-request-id': 'n0t_th3_s4m3_4s_1n_P05tm4n',
'sec-ch-ua': '"Chromium";v="106", "Google Chrome";v="106", "Not;A=Brand";v="99"',
'sec-ch-ua-mobile': '?0',
'sec-ch-ua-platform': '"Windows"'
}
response = requests.request("GET", url, headers=headers, data=payload)
return response
token_from_ms_auth = 'eyCOPIED_FROM_POSTMAN....'
bearer_from_ms_auth = 'Bearer '+token_from_ms_auth
print(recreate_graph_request1(bearer_from_ms_auth).text)
TBH, I was not overly optimistic that any bearer token would work, even if it was somehow associated with my tenant - but I hoped it would, and the resulting disappointment has driven me to ask the universe for help. I do not understand these meandering flows and looking at others' answers only confuses me more. I am hoping someone can help me figure out this scenario.
Access tokens are short lived. Refresh them after they expire to continue accessing resources.
Please refer this document: https://learn.microsoft.com/en-us/azure/active-directory/develop/v2-oauth2-auth-code-flow#refresh-the-access-token
Hope this helps.
Related
I am trying to get the response body of this request "ListByMovieAndDate" from this specific website:
https://hkmovie6.com/movie/d88a803b-4a76-488f-b587-6ccbd3f43d86/SHOWTIME
Screenshot below is the request in Chrome Dev Tool.
I have tried several methods to mimic the request, including
copying the request as cURL (bash) and using a tool to translate it to Python request
import requests
headers = {'authority': 'hkmovie6.com',
'sec-ch-ua': '"Chromium";v="92", " Not A;Brand";v="99", "Google Chrome";v="92"',
'uthorization': 'eyJhbGciOiJIUzUxMiIsImtpZCI6ImFjY2VzcyIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJtb3ZpZTYiLCJhdWQiOiJyb2xlLmJhc2ljIiwiZXhwIjoxNjI4MDg0NTUxLCJpYXQiOjE2MjgwODI3NTEsImp0aSI6IjQxZjJmZDBjLTk3YzgtNDFiYi04NDRiLTU5YWM5MTY0ZmYyNSJ9.jz_G80XDafzSHyzxog1IAY_xikAdQEEFizJXkiiHkNhwAY-MWF1E11Nel7WrsDlE184tcFtSjUKbHdx7281dFA',
'x-grpc-web': '1',
'language': 'zhHK',
'sec-ch-ua-mobile': '?0',
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.131 Safari/537.36',
'content-type': 'application/grpc-web+proto',
'accept': '*/*',
'origin': 'https://hkmovie6.com',
'sec-fetch-site': 'same-origin',
'sec-fetch-mode': 'cors',
'sec-fetch-dest': 'empty',
'referer': 'https://hkmovie6.com/movie/d88a803b-4a76-488f-b587-6ccbd3f43d86/SHOWTIME',
'accept-language': 'en-US,en;q=0.9,zh-TW;q=0.8,zh;q=0.7,ja;q=0.6',
'cookie': '__stripe_mid=dfb76ec9-1469-48ef-81d6-659f8d7c12da9a119d; lang=zhHK; auth=%7B%22isLogin%22%3Afalse%2C%22access%22%3A%7B%22token%22%3A%22eyJhbGciOiJIUzUxMiIsImtpZCI6ImFjY2VzcyIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJtb3ZpZTYiLCJhdWQiOiJyb2xlLmJhc2ljIiwiZXhwIjoxNjI4MDg0NTUxLCJpYXQiOjE2MjgwODI3NTEsImp0aSI6IjQxZjJmZDBjLTk3YzgtNDFiYi04NDRiLTU5YWM5MTY0ZmYyNSJ9.jz_G80XDafzSHyzxog1IAY_xikAdQEEFizJXkiiHkNhwAY-MWF1E11Nel7WrsDlE184tcFtSjUKbHdx7281dFA%22%2C%22expiry%22%3A1628084551%7D%2C%22refresh%22%3A%7B%22token%22%3A%22eyJhbGciOiJIUzUxMiIsImtpZCI6InJlZnJlc2giLCJ0eXAiOiJKV1QifQ.eyJpc3MiOiJtb3ZpZTYiLCJhdWQiOiJyb2xlLmJhc2ljIiwiZXhwIjoxNjMwNjc0NzUxLCJpYXQiOjE2MjgwODI3NTEsImp0aSI6IjM0YWFjNWVhLTkwZTctNDdhYS05OTE3LTQ5N2UxMGUwNmU3YSJ9.Mrwt2iWddQHthQNHafF4mirU-JiynidiTzq0X4J96IMICcWbWEoZBB4M1HhvFdeB2WvU1nHaNDyMZEhkINKK8g%22%2C%22expiry%22%3A1630674751%7D%7D; showtimeMode=time; _gid=GA1.2.2026576359.1628082750; _ga=GA1.2.704463189.1627482203; _ga_8W8P8XEJX1=GS1.1.1628082750.11.1.1628083640.0',
}
data = '$\\u0000\\u0000\\u0000\\u0000,\\n$d88a803b-4a76-488f-b587-6ccbd3f43d86\\u0010\\u0080\xB1\xA7\\u0088\\u0006'
response = requests.post('https://hkmovie6.com/m6-api/showpb.ShowAPI/ListByMovieAndDate', headers=headers, data=data)
All I got is a response header with a message: grpc: received message larger than max:
{'Content-Type': 'application/grpc-web+proto', 'grpc-status': '8',
'grpc-message': 'grpc: received message larger than max (1551183920
vs. 4194304)', 'x-envoy-upstream-service-time': '49',
'access-control-allow-origin': 'https://hkmovie6.com',
'access-control-allow-credentials': 'true',
'access-control-expose-headers': 'grpc-status,grpc-message',
'X-Cloud-Trace-Context': '72c873ad3012ad710f938098310f7f11', ...
I also tried to use Postman Interceptor to capture the actual request sent when I browsed the site. This time with a different message:
I managed to get the response body when I used selenium but it is far from ideal performance-wise.
I wonder if grpc is a hint but I spent several hours reading without getting what I wanted.
My only question is whether it is possible to get the "ListByMovieAndDate" response just by making simple Python http request to the api url? Thanks!
An admittedly cursory read suggests that the backend is gRPC and the client that you're introspecting is using gRPC-Web which is a clever solution to the problem of wanting to make gRPC requests using a JavaScript client.
Suffice to say that, you can't access the backend using HTTP/1 and REST if it is indeed gRPC but you may (!) be able to craft a Python gRPC client that talks to it if there's no constraints by e.g. client IP, type and there's no auth.
I am still a beginner at web scraping, I am trying to extract data from an API but the problem is that it has a Bearer token and this token changed after 5 to 6 hours so I have to go to the web page again and copy the token again so is there any way to extract the data without any more opening to the web page and copy the token again
I found this info as well on the network request, as someone told me that I could use the refresh_token to access but I don't know how to do that
Cache-Control: no-cache,
Connection: keep-alive,
Content-Length: 177,
Content-Type: application/json;charset=UTF-8,
Cookie: dhh_token=; refresh_token=; _hurrier_session=81556f54bf555a952d1a7f780766b028,
dnt: 1
import pandas as pd
from time import sleep
def make_request():
headers = {
'Connection': 'keep-alive',
'Pragma': 'no-cache',
'Cache-Control': 'no-cache',
'sec-ch-ua': '^\\^',
'Accept': 'application/json',
'Authorization': 'Bearer eyJhbGciOiJIUzI1NiJ9.eyJpc3MiOiJMdXRiZlZRUVZhWlpmNTNJbGxhaXFDY3BCVTNyaGtqZiIsInN1YiI6MzEzMTcwLCJleHAiOjE2MjQzMjU2NDcsInJvbCI6ImRpc3BhdGNoZXIiLCJyb2xlcyI6WyJodXJyaWVyLmRpc3BhdGNoZXIiLCJjb2QuY29kX21hbmFnZXIiXSwibmFtIjoiRXNsYW0gWmVmdGF3eSIsImVtYSI6ImV6ZWZ0YXd5QHRhbGFiYXQuY29tIiwidXNlcm5hbWUiOiJlemVmdGF3eUB0YWxhYmF0LmNvbSIsImNvdW50cmllcyI6WyJrdyIsImJoIiwicWEiLCJhZSIsImVnIiwib20iLCJqbyIsInEyIiwiazMiXX0.XYykBij-jaiIS_2tdqKFIfYGfw0uS0rKmcOTSHor8Nk',
'sec-ch-ua-mobile': '?0',
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.77 Safari/537.36',
'Content-Type': 'application/json;charset=UTF-8',
'Origin': 'url',
'Sec-Fetch-Site': 'same-origin',
'Sec-Fetch-Mode': 'cors',
'Sec-Fetch-Dest': 'empty',
'Referer': 'url',
'Accept-Language': 'en-US,en;q=0.9,ar-EG;q=0.8,ar;q=0.7',
'dnt': '1',
}
data = {
'status': 'picked'
}
response = requests.post('url/api', headers=headers, json=data)
print(response.text)
return json.loads(response.text)
def extract_data(row):
data_row = {
'order_id': row['order']['code'],
'deedline': row['order']['deadline'].split('.')[0],
'picked_at': row['picked_at'].split('.')[0],
'picked_by': row['picked_by'],
'processed_at': row['processed_at'],
'type': row['type']
}
return data_row
def periodique_extract(delay):
extract_count = 0
while True:
extract_count += 1
data = make_request()
if extract_count == 1 :
df = pd.DataFrame([extract_data(row) for row in data['data']])
df.to_csv(r"C:\Users\di\Desktop\New folder\a.csv", mode='a')
else:
df = pd.DataFrame([extract_data(row) for row in data['data']])
df.to_csv(r"C:\Users\di\Desktop\New folder\a.csv", mode='a',header=False)
print('exracting data {} times'.format(extract_count))
sleep(delay)
periodique_extract(60)
#note: as the website is track live operation so I extract data every 1 min ```
Sometimes these tokens require JavaScript execution to be set and automatically added to API requests. That means you need to open the page in something that actually runs the javascript, in order to get the token. I.e. actually opening the page in a browser.
One solution could be to use something like Selenium or Puppeteer to open the page whenever the token expires to get a new token, that you then feed to your script. But this depends on the specifics on the page, without a link the correct solution is difficult to say. But if the method of you opening the page in your browser, copying the token, then running your script works, then this is very likely to also work.
I am trying to get a request https://api.dex.guru/v1/tokens/0x7060d3F1CC70A07f4768560B9D9B692ac29244dE using python. I have tried tons of different things but they all respond with 403 error forbidden. I have tried everything I can think of and have googled with no success.
currently my code for this request looks like this:
headers = {
'authority': 'api.dex.guru',
'cache-control': 'max-age=0',
'sec-ch-ua': '^\\^',
'sec-ch-ua-mobile': '?0',
'upgrade-insecure-requests': '1',
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.212 Safari/537.36',
'accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9',
'sec-fetch-site': 'none',
'sec-fetch-mode': 'navigate',
'sec-fetch-user': '?1',
'sec-fetch-dest': 'document',
'accept-language': 'en-US,en;q=0.9',
'cookie': (cookies are here)
}
response = requests.get('https://api.dex.guru/v1/tradingview/symbols?symbol=0x7060d3f1cc70a07f4768560b9d9b692ac29244de-bsc', headers=headers)
then i print out response and it is a 403 error. Please help, I need this data for a project.
Good afternoon.
I have managed to get this to work with the help of another user on Reddit.
The key to getting this API call to work is to use the cloudscraper module :-
import cloudscraper
scraper = cloudscraper.create_scraper() # returns a CloudScraper instance
print(scraper.get("https://api.dex.guru/v1/tokens/0x8076C74C5e3F5852037F31Ff0093Eeb8c8ADd8D3-bsc").text)
This gave me a 200 response with the expected JSON content (substitute my URL above with yours and you should get the expected 200 response).
Many thanks
Jimmy
I tried messing around with this myself, it appears your site has some sort of DDOS protection from Cloudflare blocking these API calls. I'm not an expert in Python or headers by any means, so you might be supplying something to deal with that. However I looked on their website and it seems like the API is still in development. Finally, I was getting 503 errors instead, and I was able to access the API normally through my browser. Happy to tinker around more with this if you don't mind explaining what some of the cookies/headers are doing.
Try to check the body of the response (response.content or response.text) as that might give you a more clear picture of why you get blocked.
For me it looks like they do some filtering based on the user-agent. I do get a Cloudflare DoS protection page (with a HTTP 503 response for example). Using a user-agent string that suggests that JavaScript won't work I do get a HTTP 200:
headers = {"User-Agent": "HTTPie/2.4.0"}
r = requests.get("https://api.dex.guru/v1/tokens/0x7060d3F1CC70A07f4768560B9D9B692ac29244dE", headers=headers)
I am trying to log into zyBooks using the Requests library in Python. I saw in the Network tab of google chrome that I need an auth_token to be able to add to the URL to actually create and do the login request. Firstly, here is the Network tab snapshot after I log into the website:
So first, I need to do the 1st POST request that is named 'signin' (the 2nd one, 1st OPTIONS request doesn't seem to do anything or respond with anything). The signin POST request is supposed to respond with an auth_token, which then I can use to login using the 3rd name in the list, which is the first GET request.
The response of the first POST request is the auth_token:
And here is the detail about the first POST request. You can see the request URL and the payload required:
As proof, here is what request URL would look like. As you can see, it needs the auth_token.
I am however, unable to get the first POST request's auth_token in anyway that I have tried so far. Both request URL for the first 2 'signin' are what is in the code. Here is the code:
import requests
url = 'https://learn.zybooks.com/signin'
payload = {"email":"myemail","password":"mypassword"}
headers = {
'Host': 'zyserver.zybooks.com',
'Connection': 'keep-alive',
'Content-Length': '52',
'Pragma': 'no-cache',
'Cache-Control': 'no-cache',
'sec-ch-ua': "Chromium;v=88, Google Chrome;v=88, ;Not A Brand;v=99",
'Accept': 'application/json, text/javascript, */*; q=0.01',
'DNT': '1',
'sec-ch-ua-mobile': '?0',
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like
Gecko) Chrome/88.0.4324.150 Safari/537.36',
'Content-Type': 'application/json',
'Origin': 'https://learn.zybooks.com',
'Sec-Fetch-Site': 'same-site',
'Sec-Fetch-Mode': 'cors',
'Sec-Fetch-Dest': 'empty',
'Referer': 'https://learn.zybooks.com/',
'Accept-Encoding': 'gzip, deflate, br',
'Accept-Language': 'en-US,en;q=0.9',
}
session = requests.Session()
req1 = session.post(url)
req2 = session.post(url, data=payload)
print(req2.json())
I just get the JSONDecoreError:
353 obj, end = self.scan_once(s, idx)
354 except StopIteration as err:
--> 355 raise JSONDecodeError("Expecting value", s, err.value) from None
356 return obj, end
JSONDecodeError: Expecting value: line 1 column 1 (char 0)
From what I have researched in many posts online, this error happens because the response doesn't contain any JSON. But that doesn't make any sense as I need that JSON response with the auth_token to be able to create the GET request to login to the site.
Got it. It's because zyBooks runs on Ember.js. There is no html (barely), it's a javascript website. The javascript needs to be loaded first, then the form can be filled and submitted.
I did not go through with implementing it myself, but for future people coming here, there are posts on this subject, such as:
using requests to login to a website that has javascript login form
I am trying to make a program that checks for ski lift reservation openings. So far I am able to get the correct response from the API but it only works for about 15 min before some cookie expires. Here is my current process.
Go to site: https://www.keystoneresort.com/plan-your-trip/lift-access/tickets.aspx and look at the network response, then I copy the highlighted xhr script as a curl(bash).
website/api in question
I then take that curl(bash) and import it into postman and get the response:
Postman response
Then I take the code from postman so I can run it in python
Code used by postman
import requests, json
url = "https://www.keystoneresort.com/api/LiftAccessApi/GetLiftTicketControlReservationInventory?
startDate=01%2F21%2F2021&endDate=03%2F06%2F2021&_=1611254694375"
payload={}
headers = {
'authority': 'www.keystoneresort.com',
'accept': 'application/json, text/javascript, */*; q=0.01',
'x-queueit-ajaxpageurl': 'https%3A%2F%2Fwww.keystoneresort.com%2Fplan-your-trip%2Flift-
access%2Ftickets.aspx%3FstartDate%3D01%252F23%252F2021%26numberOfDays%3D1%26ageGroup%3DAdult',
'x-requested-with': 'XMLHttpRequest',
'__requestverificationtoken': 'mbVIzNL1qZUKDT3Re8H9kXVNoYLmQPC-tgLCSbM_inVSN1v_2Pei-A- GWDaKL7i6NRIVTr0lnlmiYACNvfmd6Zzsikk1:HI8y8wZJXMuP7nsTJwS-adYZu7FoHVPVHWY5naHRiB71dg2PzehuQa8WJy418eIrVqwmvhw-a1F34sJ425mXzWpEANE1',
'user-agent': 'Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.141 Mobile Safari/537.36',
'save-data': 'off',
'sec-fetch-site': 'same-origin',
'sec-fetch-mode': 'cors',
'sec-fetch-dest': 'empty',
'referer': 'https://www.keystoneresort.com/plan-your-trip/lift-access/tickets.aspx? startDate=01%2F23%2F2021&numberOfDays=1&ageGroup=Adult',
'accept-language': 'en-US,en;q=0.9',
'cookie': 'QueueITAccepted-SDFrts345E-V3_vailresortsecomm1=EventId%3Dvailresortsecomm1%26QueueId%3D96d15411-09e1-4443-89a3-f0d6e4cef5d5%26RedirectType%3Dsafetynet%26IssueTime%3D1611254692%26Hash%3D06e1aecd2d5cdf64363d53f4fc63f1c22316f604895cd3ecfd1d8b03f86ba36a; TS019b45a2=01d73c084b0f6abf04d77ffeb9e37953f3d047ebae13a4f5ffa8e69045bf156b4959e093cf10f08359c6f45a491fdc474e068898a9; TS01f060ff=01d73c084b0f6abf04d77ffeb9e37953f3d047ebae13a4f5ffa8e69045bf156b4959e093cf10f08359c6f45a491fdc474e068898a9; AMCV_974C370453295F9A0A490D44%40AdobeOrg=1406116232%7CMCIDTS%7C18649%7CMCMID%7C30886069937558409272202898840476568322%7CMCAAMLH-1611859494%7C9%7CMCAAMB-1611859494%7CRKhpRz8krg2tLO6pguXWp5olkAcUniQYPHaMWWgdJ3xzPWQmdj0y%7CMCOPTOUT-1611261894s%7CNONE%7CMCAID%7CNONE%7CvVersion%7C2.5.0;'
}
s = requests.Session()
y = s.get(url)
print(y)
response = requests.request("GET", url, headers=headers, data=payload)
todos = json.loads(response.text)
x = json.dumps(todos, indent = 2)
print(x)
Now if you run this in python, it will not work because the cookies will have expired for this session by the time someone tries it. So you would have to follow the process I listed above if you want to see what I am doing. The response I get looks like this, which is what I want but only for it not to expire.
Python response
I have looked extensively at different ways I can get the cookies using requests and selneium. All solutions I have tried only get some of the cookies and not all of them. I need the ones that are in the "cookie" header listed in my code, but I have not found a way to do that without refreshing the page and posting the curl in postman and copying the response. I am still fairly new to python and coding in general so don't go to hard on me if the answer is super simple.
I think some of these cookies are rendered by java script, which may be part of the problem. I can also delete some of the cookies in my code and have it still work(until it expires). If there is an easier way to do what I am doing please let me know.
Thanks.