I'm trying to log in to a website through a python script that I've created using the requests module. I've issued a post HTTP request with appropriate parameters and headers to the server, but for some reason I get a different response from that site compared to what I see in dev tools. The status is always 200, though. There is also a get request in place within the script that should fetch the credentials once the login is successful. Currently, it throws a JSONDecodeError on the last line.
import requests
link = 'https://propwire.com/login'
check_url = 'https://propwire.com/search'
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.0.0 Safari/537.36',
'x-requested-with': 'XMLHttpRequest',
'referer': 'https://propwire.com/login',
'accept-encoding': 'gzip, deflate, br',
'accept-language': 'en-US,en;q=0.9,bn;q=0.8',
'origin': 'https://propwire.com',
}
payload = {"email":"some-email","password":"password","remember":"true"}
with requests.Session() as s:
r = s.get(link)
headers['x-xsrf-token'] = r.cookies['XSRF-TOKEN'].rstrip('%3D')
s.headers.update(headers)
s.post(link,json=payload)
res = s.get(check_url)
print(res.json()['props']['auth'])
So, I wanted to get a few search results for Google without getting blocked for a Machine Learning app. I want to use a python script to rotate my IP Address while making requests to avoid getting blocked by Google. I can't seem to get the python script working. I don't a API endpoint from which I can connect to NordVPN.
I tried to figure out the endpoint using the chrome extension and inspecting its webpage. But it was of no use.
Currently I'm stuck at this issue.
My code:
import requests
access_token = 'my-secret-token'
# Get a list of available server groups
server_groups_url = "https://api.nordvpn.com/server"
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) '
'Chrome/89.0.4389.82 Safari/537.36',
'Accept-Language': 'en-US,en;q=0.9,fr;q=0.8,es;q=0.7',
'Accept-Encoding': 'gzip',
'Accept': 'application/json',
'DNT': '1',
'Connection': 'keep-alive',
'Upgrade-Insecure-Requests': '1',
'TE': 'Trailers',
'Authorization': f"Bearer {access_token}"
}
server_groups = requests.get(server_groups_url, headers=headers).json()
# Choose a server (e.g. the first server in the list)
hostname = server_groups[0]['domain']
the hostname in the code returns something like this: 'p119.nordvpn.com'
I don't know how to connect to this VPN using python code. Can someone help me ?
Context:
I'm making GET requests to an API, and the API sometimes returns data that is up to 5 minutes old. However, when making the same request on Chrome, the data is always up to date. The server is ngnix.
This is the API request made when the page is loaded in Chrome:
https://buff.163.com/api/market/goods/sell_order?game=csgo&goods_id=781660&_=1604808126524
Relevant Code:
def epochTimestamp():
return int(round(datetime.now().timestamp()*1000))
def getProxies():
proxy = random.choice(proxies)
return {'http': fr'socks5h://{proxy}', 'https': fr'socks5h://{proxy}'}
get_purchase_headers = {
'Host': 'buff.163.com',
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.183 Safari/537.36',
'X-Requested-With': 'XMLHttpRequest',
'Cache-Control': 'max-age=0'
}
url = f"https://buff.163.com/api/market/goods/sell_order?game=csgo&goods_id=781660&_={epochTimestamp()}"
source = requests.get(url, timeout=10, proxies=getProxies(), headers=get_purchase_headers)
What I have tried:
Including User-Agent headers
'Cache-Control': 'max-age=0'
Including timestamp in the URL
I am trying to login into www.zalando.it using the requests library, but every time I try to post my data I am getting a 403 error. I saw in the network tab from Zalando and the login call and is the same.
These are just dummy data, you can test creating a test account.
Here is the code for the login function:
import requests
import pickle
import json
session = requests.session()
headers1 = {'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.163 Safari/537.36'}
r = session.get('https://www.zalando.it/', headers = headers1)
cookies = r.cookies
url = 'https://www.zalando.it/api/reef/login'
payload = {'username': "email#email.it", 'password': "password", 'wnaMode': "shop"}
headers = {
'x-xsrf-token': cookies['frsx'],
#'_abck': str(cookies['_abck']),
'usercentrics_enabled' : 'true',
'Connection': 'keep-alive',
'Content-Type':'application/json; charset=utf-8',
'User-Agent':"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.163 Safari/537.36",
'origin':'https://www.zalando.it',
'Access-Control-Allow-Origin': '*',
'Access-Control-Allow-Credentials': 'true',
'Access-Control-Allow-Methods': 'GET,PUT,POST,DELETE,OPTIONS',
'Access-Control-Allow-Headers': 'Origin,X-Requested-With,Content-Type,Accept,content-type,application/json',
'sec-fetch-mode': 'no-cors',
'sec-fetch-site': 'same-origin',
'accept': '*/*',
'accept-encoding': 'gzip, deflate, br',
'accept-language': 'it-IT,it;q=0.9,en-US;q=0.8,en;q=0.7',
'dpr': '1.3125',
'referer': 'https://www.zalando.it/uomo-home/',
'viewport-width': '1464'
}
x = session.post(url, data = json.dumps(payload), headers = headers, cookies = cookies)
print(x) #error 403
print(x.text) #page that show 403
For the initial request it needs to look like an actual browser request, after that the headers need to be modified to look like an xhr (Ajax) request. Also, there's some response headers that need to be added to future requests to the server, along with cookies such as the client-id and an xsrf token.
Here's some example code that is currently working:
import requests
# first load the home page
home_page_link = "https://www.zalando.it/"
login_api_schema = "https://www.zalando.it/api/reef/login/schema"
login_api_post = "https://www.zalando.it/api/reef/login"
headers = {
'Host': 'www.zalando.it',
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/64.0.3282.140 Safari/537.36 Edge/17.17134',
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8',
'Accept-Language': 'en-US,en;q=0.5',
'Accept-Encoding': 'gzip, deflate',
'DNT': '1',
'Connection' : 'close',
'Upgrade-Insecure-Requests': '1'
}
if __name__ == '__main__':
with requests.Session() as s:
s.headers.update(headers)
r = s.get(home_page_link)
# fetch these cookies: frsx, Zalando-Client-Id
cookie_dict = s.cookies.get_dict()
# update the headers
# remove this header for the xhr requests
del s.headers['Upgrade-Insecure-Requests']
# these 2 are taken from some response cookies
s.headers['x-xsrf-token'] = cookie_dict['frsx']
s.headers['x-zalando-client-id'] = cookie_dict['Zalando-Client-Id']
# i didn't pay attention to where these came from
# just saw them and manually added them
s.headers['x-zalando-render-page-uri'] = '/'
s.headers['x-zalando-request-uri'] = '/'
# this is sent as a response header and is needed to
# track future requests/responses
s.headers['x-flow-id'] = r.headers['X-Flow-Id']
# only accept json data from xhr requests
s.headers['Accept'] = 'application/json'
# when clicking the login button this request is sent
# i didn't test without this request
r = s.get(login_api_schema)
# add an origin header
s.headers['Origin'] = 'https://www.zalando.it'
# finally log in, this should return a 201 response with a cookie
login_data = {"username":"email#email.it","password":"password","wnaMode":"modal"}
r = s.post(login_api_post, json=login_data)
print(r.status_code)
print(r.headers)
Well, it seems to me that this website is protected by Akamai (looks like Akamai Bot Manager).
See that Server: AkamaiGHost in the response headers of /api/reef/login when you get a 403 response?
Also, have a look at the requests sent during a legitimate browser session: there are many requests sent to /static/{some unique ID}, with some sensor_data, including your user-agent, and some other "gibberish".
The above description seems to fit this one:
The BMP SDK collects behavioral data while the user is interacting with the application. This behavioral data, also known as sensor data, includes the device characteristics, device orientation, accelerometer data, touch events, etc. Reference: BMP SDK
Also, this answer confirms that some of the cookies set by this website in fact belong to Akamai Bot Manager.
Well, I'm not sure if there's an easy way of bypassing it. After all, that's a product developed exactly for this purpose - block web-scraping bots like yours.
I would like to monitor a particular URL and wait until it internally redirects me by using python requests. The website will randomly redirect me after a period of time. However, I am having some issues right now. The strategy I have employed so far is something like this:
headers = {
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8',
'Accept-Encoding': 'gzip, deflate, br',
'Accept-Language': 'en-US,en;q=0.9',
'Cache-Control': 'no-cache',
'Connection': 'keep-alive',
'Pragma': 'no-cache',
'Upgrade-Insecure-Requests': '1',
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.99 Safari/537.36'
}
session = requests.Session()
while success is False:
r = session.get(url, headers=headers, allow_redirects=True)
if keyword in r.text:
success = True
time.sleep(30)
print("Success.")
It seems as though every time I make a GET request, the timer is reset and so I am never redirected, I thought a session would fix this but perhaps not. Although, how am I meant to check for changes to the page without sending a new request every 30 seconds? Looking at the network tab in Chrome it seems as though the status code is 307.
If anyone knows how to resolve this issue it would be very helpful, thanks.
Selenium is the quick and ugly answer:
from selenium import webdriver
profile = webdriver.FirefoxProfile()
profile.set_preference("general.useragent.override", "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.87 Safari/537.36")
browser = webdriver.Firefox(profile)
browser.get(url)
while success is False:
text = browser.page_source
if keyword in text:
success = True
time.sleep(30)
print("Success.")
As far using requests goes, I'd hazard to guess that your web browser is requesting the reload, does the request in the network differ in anyway than the initial request? browsermob-proxy is a great tool for deep diving into these sorts of issues, it's effectively the network tab on steroids.
Apologies for the vagueness of the last half, but it's difficult to say more without having seen the website.