Trying to bypass Cloudflare's wall, I wanted to get access to cf_clearence...
I tried cfscrape, link to the package [link].
import cfscrape
cookie_value, user_agent = cfscrape.get_cookie_string("https://somesite.com")
request += "Cookie: %s\r\nUser-Agent: %s\r\n" % (cookie_value, user_agent)
print(request)
This should return cf_clearance, and __cfduid, but in our case its returning,
requests.exceptions.HTTPError: 403 Client Error: Forbidden for url: https://somesite.com
Also tried cf_clearance, to make a Cloudflare challenge pass successfully, the code that I tried:
from playwright.sync_api import sync_playwright
from cf_clearance import sync_cf_retry, sync_stealth
import requests
# not use cf_clearance, cf challenge is fail
proxies = {
"all": "socks5://localhost:7890"
}
res = requests.get('https://somesite.com', proxies=proxies)
if '<title>Please Wait... | Cloudflare</title>' in res.text:
print("cf challenge fail")
# get cf_clearance
with sync_playwright() as p:
browser = p.chromium.launch(headless=False, proxy={"server": "socks5://localhost:7890"})
page = browser.new_page()
sync_stealth(page, pure=True)
page.goto('https://somesite.com')
res = sync_cf_retry(page)
if res:
cookies = page.context.cookies()
for cookie in cookies:
if cookie.get('name') == 'cf_clearance':
cf_clearance_value = cookie.get('value')
print(cf_clearance_value)
ua = page.evaluate('() => {return navigator.userAgent}')
print(ua)
else:
print("cf challenge fail")
browser.close()
# use cf_clearance, must be same IP and UA
headers = {"user-agent": ua}
cookies = {"cf_clearance": cf_clearance_value}
res = requests.get('https://somesite.com', proxies=proxies, headers=headers, cookies=cookies)
if '<title>Please Wait... | Cloudflare</title>' not in res.text:
print("cf challenge success")
The above code was from here.
Tried it with and without proxies.
With proxies:
O/P:
Failed to establish a new connection: [WinError 10061] No connection could be made because the target machine actively refused it
Without proxies:
O/P:
cf challenge fail
.
.
.
NotImplementedError: Encountered recaptcha. Check whether your proxy is an elite proxy.
Related
I am trying to create a bot to read cookies but I'm failing to do so, what the hell am I doing wrong?
import urllib
import http.cookiejar
URL = 'https://roblox.com'
def extract_cookies():
cookie_jar = http.cookiejar.CookieJar()
url_opener = urllib.request.build_opener(urllib.request.HTTPCookieProcessor(cookie_jar))
url_opener.open(URL)
print(URL)
for cookie in cookie_jar:
print("[Cookie Name = %s] [Cookie Value = %s]" %(cookie.name, cookie.value))
if __name__ == '__main__':
extract_cookies()```
It's a permanent redirect: https://roblox.com redirects to https://www.roblox.com. This is why you get an HTTP 308 status code. Note the difference in www.
The server tells you where to go in the HTTP response:
HTTP GET https://roblox.com/
--
HTTP/2 308 Permanent Redirect
location: https://www.roblox.com/
So update your URL to https://www.roblox.com
I am connected to the web via VPN and I would like to connect to news site to grab, well, news. For this a library exists: finNews. And this is the code:
import FinNews as fn
cnbc_feed = fn.CNBC(topics=['finance', 'earnings'])
print(cnbc_feed.get_news())
print(cnbc_feed.possible_topics())
Now because of the VPN the connection wont work and it throws:
<urlopen error [WinError 10061] No connection could be made because
the target machine actively refused it ( client - server )
So I started separately to understand how to make a connection work and it does work (return is "connected"):
import urllib.request
proxy = "http://user:pw#proxy:port"
proxies = {"http":"http://%s" % proxy}
url = "http://www.google.com/search?q=test"
headers={'User-agent' : 'Mozilla/5.0'}
try:
proxy_support = urllib.request.ProxyHandler(proxies)
opener = urllib.request.build_opener(proxy_support, urllib.request.HTTPHandler(debuglevel=1))
urllib.request.install_opener(opener)
req = urllib.request.Request(url, None, headers)
html = urllib.request.urlopen(req).read()
#print (html)
print ("Connected")
except (HTTPError, URLError) as err:
print("No internet connection.")
Now I figured how to access news and how to make a connection via VPN, but I cant bring both together. I want to grab the news via the library through VPN?! I am fairly new to Python so I guess I dont get the logic fully yet.
EDIT: I tried to combine with Feedparser, based on furas hint:
import urllib.request
import feedparser
proxy = "http://user:pw#proxy:port"
proxies = {"http":"http://%s" % proxy}
#url = "http://www.google.com/search?q=test"
#url = "http://www.reddit.com/r/python/.rss"
url = "https://timesofindia.indiatimes.com/rssfeedstopstories.cms"
headers={'User-agent' : 'Mozilla/5.0'}
try:
proxy_support = urllib.request.ProxyHandler(proxies)
opener = urllib.request.build_opener(proxy_support, urllib.request.HTTPHandler(debuglevel=1))
urllib.request.install_opener(opener)
req = urllib.request.Request(url, None, headers)
html = urllib.request.urlopen(req).read()
#print (html)
#print ("Connected")
feed = feedparser.parse(html)
#print (feed['feed']['link'])
print ("Number of RSS posts :", len(feed.entries))
entry = feed.entries[1]
print ("Post Title :",entry.title)
except (HTTPError, URLError) as err:
print("No internet connection.")
But same error....this is a big nut to crack...
May I ask for your advice? Thank you :)
I am using Python Requests + Cfscrape Module to Bypass the Cloudflare Enabled website but sometimes it does not validate the URL Properly brings 403 Status Header.
Also, I am using Tor Proxy for Find the Blocked URLs
import sys
import requests
import cfscrape
# Create the session and set the proxies.
proxies = {'http': 'socks5://127.0.0.1:9050',
'https': 'socks5://127.0.0.1:9050'}
# Start Session
#s = requests.Session()
s = cfscrape.create_scraper() # https://github.com/Anorov/cloudflare-scrape/issues/103
# Proxy Connection
s.proxies = proxies
# Bypass Cloudflare Enabled website - https://support.cloudflare.com/hc/en-us/articles/203306930-Does-Cloudflare-block-Tor-
scraper = cfscrape.create_scraper(sess=s, delay=10)
try:
#user input
LINK = input('Enter a URL: ')
response = scraper.get(LINK)
except requests.ConnectionError as e:
print("OOPS!! Connection Error - May be the URL is Not Valid or Can't Bypass them")
except requests.Timeout as e:
print("OOPS!! Timeout Error")
except requests.RequestException as e:
print("OOPS!! General Error (Enter a Valid URL) - Add HTTP/HTTPS infront of the URL")
except (KeyboardInterrupt, SystemExit):
print("Ok ok, quitting")
sys.exit(1)
else:
if response.history:
print("URL was redirected")
for resp in response.history:
print(resp.status_code, resp.url)
print("Final destination:")
print(response.status_code, response.url)
break
else:
print(response.status_code, response.url + " - Current Live and Active URL")
I wrote a simple python script to make a request to this website http://www.lagado.com/proxy-test using the requests module.
This website essentially tells you whether the request is using a proxy or not. According to the website, the request is not going through a proxy and is in fact going through my IP address.
Here is the code:
proxiesLocal = {
'https': proxy
}
headers = RandomHeaders.LoadHeader()
url = "http://www.lagado.com/proxy-test"
res = ''
while (res == ''):
try:
res = requests.get(url, headers=headers, proxies=proxiesLocal)
proxyTest = bs4.BeautifulSoup(res.text, "lxml")
items = proxyTest.find_all("p")
print(len(items))
for item in items:
print(item.text)
quit()
except:
print('sleeping')
time.sleep(5)
continue
Assuming that proxy is a variable of type string that stores the address of the proxy, what am I doing wrong?
I have follow this tutorial but I still fail to get output. Below is my code in view.py
def index(request):
#html="a"
#url= requests.get("https://www.python.org/")
#page = urllib.request.urlopen(url)
#soup = BeautifulSoup(page.read())
#soup=url.content
#urllib3.disable_warnings()
#requests.packages.urllib3.disable_warnings(InsecureRequestWarning)
#url=url.content
#default_headers = make_headers(basic_auth='myusername:mypassword')
#http = ProxyManager("https://myproxy.com:8080/", headers=default_headers)
r = urllib.request.urlopen('http://www.aflcio.org/Legislation-and-Politics/Legislative-Alerts').read()
soup = BeautifulSoup(r)
url= type(soup)
context={"result":url,}
return render (request, 'index.html',context)
Output:
urlopen error [WinError 10060] A connection attempt failed because the
connected party did not properly respond after a period of time, or
established connection failed because connected host has failed to respond
If you are sitting behind a firewall or similar you might have to specify a proxy for the request to get through.
See below example using the requests library.
import requests
proxies = {
'http': 'http://10.10.1.10:3128',
'https': 'http://10.10.1.10:1080',
}
r = requests.get('http://example.org', proxies=proxies)