How to bypass Cloudflare with Python on GET requests?

How to bypass Cloudflare with Python on GET requests? - python

I want to bypass Cloudflare on a GET request I have tried using Cloudscraper which worked for me in the past but now seems decreped.
I tried:
import cloudscraper
import requests
ses = requests.Session()
ses.headers = {
'referer': 'https://magiceden.io/',
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/99.0.4844.84 Safari/537.36',
'accept': 'application/json'
}
scraper = cloudscraper.create_scraper(sess=ses)
hookLink = f"https://magiceden.io/launchpad/planetarians"
meG = scraper.get("https://api-mainnet.magiceden.io/launchpads/planetarians")
print(meG.status_code)
print(meG.text)
The issue seems to be that I'm getting a captcha on the request

The python library works well (I never knew about it), the issue is your user agent. Cloudflare uses some sort of extra checks to determine whether you're faking it.
For me, any of the following works:
ses.headers = {
'referer': 'https://magiceden.io/',
'accept': 'application/json'
}
ses.headers = {
'accept': 'application/json'
}
And also just:
scraper = cloudscraper.create_scraper()
meG = scraper.get("https://api-mainnet.magiceden.io/launchpads/planetarians")
EDIT:
You can use this dict syntax instead to fake the user agent (as per the manual)
scraper = cloudscraper.create_scraper(
browser={
'browser': 'chrome',
'platform': 'windows',
'desktop': True
}
)

Related

Failed to log in to a website using the requests module

I'm trying to log in to a website through a python script that I've created using the requests module. I've issued a post HTTP request with appropriate parameters and headers to the server, but for some reason I get a different response from that site compared to what I see in dev tools. The status is always 200, though. There is also a get request in place within the script that should fetch the credentials once the login is successful. Currently, it throws a JSONDecodeError on the last line.
import requests
link = 'https://propwire.com/login'
check_url = 'https://propwire.com/search'
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.0.0 Safari/537.36',
'x-requested-with': 'XMLHttpRequest',
'referer': 'https://propwire.com/login',
'accept-encoding': 'gzip, deflate, br',
'accept-language': 'en-US,en;q=0.9,bn;q=0.8',
'origin': 'https://propwire.com',
}
payload = {"email":"some-email","password":"password","remember":"true"}
with requests.Session() as s:
r = s.get(link)
headers['x-xsrf-token'] = r.cookies['XSRF-TOKEN'].rstrip('%3D')
s.headers.update(headers)
s.post(link,json=payload)
res = s.get(check_url)
print(res.json()['props']['auth'])

Connect to NordVPN using Python in MacOS without using command line tools

So, I wanted to get a few search results for Google without getting blocked for a Machine Learning app. I want to use a python script to rotate my IP Address while making requests to avoid getting blocked by Google. I can't seem to get the python script working. I don't a API endpoint from which I can connect to NordVPN.
I tried to figure out the endpoint using the chrome extension and inspecting its webpage. But it was of no use.
Currently I'm stuck at this issue.
My code:
import requests
access_token = 'my-secret-token'
# Get a list of available server groups
server_groups_url = "https://api.nordvpn.com/server"
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) '
'Chrome/89.0.4389.82 Safari/537.36',
'Accept-Language': 'en-US,en;q=0.9,fr;q=0.8,es;q=0.7',
'Accept-Encoding': 'gzip',
'Accept': 'application/json',
'DNT': '1',
'Connection': 'keep-alive',
'Upgrade-Insecure-Requests': '1',
'TE': 'Trailers',
'Authorization': f"Bearer {access_token}"
}
server_groups = requests.get(server_groups_url, headers=headers).json()
# Choose a server (e.g. the first server in the list)
hostname = server_groups[0]['domain']
the hostname in the code returns something like this: 'p119.nordvpn.com'
I don't know how to connect to this VPN using python code. Can someone help me ?

Why is Python requests.get() is retrieving outdated data from API?

Context:
I'm making GET requests to an API, and the API sometimes returns data that is up to 5 minutes old. However, when making the same request on Chrome, the data is always up to date. The server is ngnix.
This is the API request made when the page is loaded in Chrome:
https://buff.163.com/api/market/goods/sell_order?game=csgo&goods_id=781660&_=1604808126524
Relevant Code:
def epochTimestamp():
return int(round(datetime.now().timestamp()*1000))
def getProxies():
proxy = random.choice(proxies)
return {'http': fr'socks5h://{proxy}', 'https': fr'socks5h://{proxy}'}
get_purchase_headers = {
'Host': 'buff.163.com',
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.183 Safari/537.36',
'X-Requested-With': 'XMLHttpRequest',
'Cache-Control': 'max-age=0'
}
url = f"https://buff.163.com/api/market/goods/sell_order?game=csgo&goods_id=781660&_={epochTimestamp()}"
source = requests.get(url, timeout=10, proxies=getProxies(), headers=get_purchase_headers)
What I have tried:
Including User-Agent headers
'Cache-Control': 'max-age=0'
Including timestamp in the URL

Python Requests login: i have error 403 but the request looks correct

I am trying to login into www.zalando.it using the requests library, but every time I try to post my data I am getting a 403 error. I saw in the network tab from Zalando and the login call and is the same.
These are just dummy data, you can test creating a test account.
Here is the code for the login function:
import requests
import pickle
import json
session = requests.session()
headers1 = {'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.163 Safari/537.36'}
r = session.get('https://www.zalando.it/', headers = headers1)
cookies = r.cookies
url = 'https://www.zalando.it/api/reef/login'
payload = {'username': "email#email.it", 'password': "password", 'wnaMode': "shop"}
headers = {
'x-xsrf-token': cookies['frsx'],
#'_abck': str(cookies['_abck']),
'usercentrics_enabled' : 'true',
'Connection': 'keep-alive',
'Content-Type':'application/json; charset=utf-8',
'User-Agent':"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.163 Safari/537.36",
'origin':'https://www.zalando.it',
'Access-Control-Allow-Origin': '*',
'Access-Control-Allow-Credentials': 'true',
'Access-Control-Allow-Methods': 'GET,PUT,POST,DELETE,OPTIONS',
'Access-Control-Allow-Headers': 'Origin,X-Requested-With,Content-Type,Accept,content-type,application/json',
'sec-fetch-mode': 'no-cors',
'sec-fetch-site': 'same-origin',
'accept': '*/*',
'accept-encoding': 'gzip, deflate, br',
'accept-language': 'it-IT,it;q=0.9,en-US;q=0.8,en;q=0.7',
'dpr': '1.3125',
'referer': 'https://www.zalando.it/uomo-home/',
'viewport-width': '1464'
}
x = session.post(url, data = json.dumps(payload), headers = headers, cookies = cookies)
print(x) #error 403
print(x.text) #page that show 403

For the initial request it needs to look like an actual browser request, after that the headers need to be modified to look like an xhr (Ajax) request. Also, there's some response headers that need to be added to future requests to the server, along with cookies such as the client-id and an xsrf token.
Here's some example code that is currently working:
import requests
# first load the home page
home_page_link = "https://www.zalando.it/"
login_api_schema = "https://www.zalando.it/api/reef/login/schema"
login_api_post = "https://www.zalando.it/api/reef/login"
headers = {
'Host': 'www.zalando.it',
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/64.0.3282.140 Safari/537.36 Edge/17.17134',
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8',
'Accept-Language': 'en-US,en;q=0.5',
'Accept-Encoding': 'gzip, deflate',
'DNT': '1',
'Connection' : 'close',
'Upgrade-Insecure-Requests': '1'
}
if __name__ == '__main__':
with requests.Session() as s:
s.headers.update(headers)
r = s.get(home_page_link)
# fetch these cookies: frsx, Zalando-Client-Id
cookie_dict = s.cookies.get_dict()
# update the headers
# remove this header for the xhr requests
del s.headers['Upgrade-Insecure-Requests']
# these 2 are taken from some response cookies
s.headers['x-xsrf-token'] = cookie_dict['frsx']
s.headers['x-zalando-client-id'] = cookie_dict['Zalando-Client-Id']
# i didn't pay attention to where these came from
# just saw them and manually added them
s.headers['x-zalando-render-page-uri'] = '/'
s.headers['x-zalando-request-uri'] = '/'
# this is sent as a response header and is needed to
# track future requests/responses
s.headers['x-flow-id'] = r.headers['X-Flow-Id']
# only accept json data from xhr requests
s.headers['Accept'] = 'application/json'
# when clicking the login button this request is sent
# i didn't test without this request
r = s.get(login_api_schema)
# add an origin header
s.headers['Origin'] = 'https://www.zalando.it'
# finally log in, this should return a 201 response with a cookie
login_data = {"username":"email#email.it","password":"password","wnaMode":"modal"}
r = s.post(login_api_post, json=login_data)
print(r.status_code)
print(r.headers)

Well, it seems to me that this website is protected by Akamai (looks like Akamai Bot Manager).
See that Server: AkamaiGHost in the response headers of /api/reef/login when you get a 403 response?
Also, have a look at the requests sent during a legitimate browser session: there are many requests sent to /static/{some unique ID}, with some sensor_data, including your user-agent, and some other "gibberish".
The above description seems to fit this one:
The BMP SDK collects behavioral data while the user is interacting with the application. This behavioral data, also known as sensor data, includes the device characteristics, device orientation, accelerometer data, touch events, etc. Reference: BMP SDK
Also, this answer confirms that some of the cookies set by this website in fact belong to Akamai Bot Manager.
Well, I'm not sure if there's an easy way of bypassing it. After all, that's a product developed exactly for this purpose - block web-scraping bots like yours.

Monitoring a website for internal redirects

I would like to monitor a particular URL and wait until it internally redirects me by using python requests. The website will randomly redirect me after a period of time. However, I am having some issues right now. The strategy I have employed so far is something like this:
headers = {
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8',
'Accept-Encoding': 'gzip, deflate, br',
'Accept-Language': 'en-US,en;q=0.9',
'Cache-Control': 'no-cache',
'Connection': 'keep-alive',
'Pragma': 'no-cache',
'Upgrade-Insecure-Requests': '1',
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.99 Safari/537.36'
}
session = requests.Session()
while success is False:
r = session.get(url, headers=headers, allow_redirects=True)
if keyword in r.text:
success = True
time.sleep(30)
print("Success.")
It seems as though every time I make a GET request, the timer is reset and so I am never redirected, I thought a session would fix this but perhaps not. Although, how am I meant to check for changes to the page without sending a new request every 30 seconds? Looking at the network tab in Chrome it seems as though the status code is 307.
If anyone knows how to resolve this issue it would be very helpful, thanks.

Selenium is the quick and ugly answer:
from selenium import webdriver
profile = webdriver.FirefoxProfile()
profile.set_preference("general.useragent.override", "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.87 Safari/537.36")
browser = webdriver.Firefox(profile)
browser.get(url)
while success is False:
text = browser.page_source
if keyword in text:
success = True
time.sleep(30)
print("Success.")
As far using requests goes, I'd hazard to guess that your web browser is requesting the reload, does the request in the network differ in anyway than the initial request? browsermob-proxy is a great tool for deep diving into these sorts of issues, it's effectively the network tab on steroids.
Apologies for the vagueness of the last half, but it's difficult to say more without having seen the website.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to bypass Cloudflare with Python on GET requests? - python

Related

Failed to log in to a website using the requests module

Connect to NordVPN using Python in MacOS without using command line tools

Why is Python requests.get() is retrieving outdated data from API?

Python Requests login: i have error 403 but the request looks correct

Monitoring a website for internal redirects

Categories

Resources