Problems with cloud flare 403 and python

Problems with cloud flare 403 and python - python

I've been trying to make a code auto redeemer for a site theres a problem every time i send a request to the website the. The issue is a 403 error which means i haven't passed the right fooling methods like headers, cookies, CF. But I have so I'm lost I've tried everything the problem is 100% cloud flare having a strange verification I can't find a way to bypass it. I've passed auth headers with correct cookies aswell. I've tried with requests library and with cloudscrape and bs4
The site is
from bs4 import BeautifulSoup
import cloudscraper
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.132 Safari/537.36'
}
scraper = cloudscraper.create_scraper()
r = scraper.get('https://rblxwild.com/api/promo-code/redeem-code', headers=headers)
print(r) > 403
Someone too tell me how to bypass the cloudflare protection methods.

Related

Python Web Scrapping Error 403 even with header User Agent

I'm a newbie learning Python. While using BeautifulSoup and Requests to scrap "https://batdongsan.com.vn/nha-dat-ban-tp-hcm" for collect data on housing price of my hometown, I get blocked by 403 error even though having tried Headers User Agent. Here is my code :
**url3 = "https://batdongsan.com.vn/nha-dat-ban-tp-hcm"
headers = {"User-Agent" : "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.5060.114 Safari/537.36 Edg/103.0.1264.49"}
page = requests.get(url3, headers = headers)
print(page)**
Result : <Response [403]>
Have anyone tried and succeeded to bypass the same problem. Any help is highly appriciated.
Many thanks

import cloudscraper
scraper = cloudscraper.create_scraper()
soup = BeautifulSoup(scraper.get("https://batdongsan.com.vn/nha-dat-ban-tp-hcm").text)
print(soup.text) ## do what you want with the response
You can install cloudscraper with pip install cloudscraper

What does this robot.txt mean?

There's a website that I need to crawl, I have no financial purpose just to study.
I checked the robots.txt and it was as follows.
User-agent: *
Allow: /
Disallow: /*.notfound.html
Can I crawl this website using request and beautifulSoup?
I checked that crawling without a header causes a 403 error. Does this mean that crawling is not allowed?

status code: 403 means client-side error and from the server-side such type of
error is not responsible for meaning the website is allowed to extract data. To get ride of 403 error you must need to inject something with requests like headers and most of the time but not always will solve this problem just injecting User-Agent as header. Here is an example how to inject User-Agent using requests module with BeautifulSoup.
import requests
from bs4 import BeautifulSoup
header = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.77 Safari/537.36"
}
response = requests.get("Your url", headers=headers)
print(response.status_code)
#soup = BeautifulSoup(response .content, "lxml")

Why does grequests.get() return None?

I'm trying to fetch automatically the MAC addresses for some vendors in Python. I found a website that is really helpful, but I'm not being able to access its information from Python. When I run this:
import grequests
rs = (grequests.get(u) for u in ['https://aruljohn.com/mac/000000'])
requests = grequests.map(rs)
for response in requests:
print(response)
It prints None. Does anyone know how to solve this?

Looks like the issue is just not setting a user-agent in the headers. I was able to request the website without any issues. I just used normal python request but it should work fine with grequests. I do think you might want to find a more active library. You could check out aiohttp. Very active and I have had a wonderful experience using aiohttp.
import requests
from lxml import html
def request_website(mac):
url = 'https://aruljohn.com/mac/' + mac
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.169 Safari/537.36'
}
r = requests.get(url, headers=headers)
return r.text
response = request_website('000000')
tree = html.fromstring(response)
results = tree.cssselect('.results p')[0].text
print (results)

Python requests user agent not working

I am using python requests to get the html page.
I am using the latest version of chrome in the user agent.
But the response tells that Please update your browser.
Here is my sample code.
import requests
url = 'https://www.choicehotels.com/alabama/mobile/quality-inn-hotels/al045/hotel-reviews/4'
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36', 'content-type': 'application/xhtml+xml', 'referer': url}
url_response = s.get(url, headers=headers, timeout=15)
print url_response.text
I am using python 2.7 in a windows server.
But when I ran the same code in my local I got the required output.

Please update your browser is the answer.
You cannot do https with old browser (and request in python2.7 could be old browser). There were a lot of security problems in https protocols, so it seems that servers doesn't allow to connect with unsecure encryptions and connection standards.

GET method with python3 Requests

I am trying to get some data from a page. I open Chrome's development tools and successfully find the data I wanted. It's in XHR with GET method (sorry I don't know how to descript it).Then I copy the params, headers, and put all these to requests.get() method. The response I get is totally different to what I saw on the development tools.
Here is my code
import requests
queryList={
"category":"summary",
"subcategory":"all",
"statsAccumulationType":"0",
"isCurrent":"true",
"playerId":None,
"teamIds":"825",
"matchId":"1103063",
"stageId":None,
"tournamentOptions":None,
"sortBy":None,
"sortAscending":None,
"age":None,
"ageComparisonType":None,
"appearances":None,
"appearancesComparisonType":None,
"field":None,
"nationality":None,
"positionOptions":None,
"timeOfTheGameEnd":None,
"timeOfTheGameStart":None,
"isMinApp":None,
"page":None,
"includeZeroValues":None,
"numberOfPlayersToPick":None,
}
header={
'modei-last-mode':'JL7BrhwmeqKfQpbWy6CpG/eDlC0gPRS2BCvKvImVEts=',
'Referer':'https://www.whoscored.com/Matches/1103063/LiveStatistics/Spain-La-Liga-2016-2017-Leganes-Real-Madrid',
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/57.0.2987.133 Safari/537.36',
"x-requested-with":"XMLHttpRequest",
}
url='https://www.whoscored.com/StatisticsFeed/1/GetMatchCentrePlayerStatistics'
test=requests.get(url=url,params=queryList,headers=header)
print(test.text)
I follow this post below but it's already 2 years ago and I believe the structure is changed.
XHR request URL says does not exist when attempting to parse it's content

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Problems with cloud flare 403 and python - python

Related

Python Web Scrapping Error 403 even with header User Agent

What does this robot.txt mean?

Why does grequests.get() return None?

Python requests user agent not working

GET method with python3 Requests

Categories

Resources