How to mock proxied http requests - python

I'm trying to mock an http request that uses an http proxy
import requests
import responses
import logging
import http.client as http_client
http_client.HTTPConnection.debuglevel = 1
logging.basicConfig()
logging.getLogger().setLevel(logging.DEBUG)
requests_log = logging.getLogger("requests.packages.urllib3")
requests_log.setLevel(logging.DEBUG)
requests_log.propagate = True
proxies = {
'http': 'http://proxy_host.com/'
}
headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36'}
responses.add(responses.GET, 'http://proxy_host.com/', body = "Proxy request succeeded", status=200)
responses.add(responses.GET, 'http://actual_host.com/', body = "Actual request succeeded", status=200)
response = requests.get('http://actual_host.com/', proxies = proxies, headers = headers)
I get this error message:
requests.exceptions.ProxyError: HTTPConnectionPool(host='proxy_host.com', port=80): Max retries exceeded with url: http://actual_host.com/ (Caused by ProxyError('Cannot connect to proxy.', NewConnectionError('<urllib3.connection.HTTPConnection object at 0x10ab62820>: Failed to establish a new connection: [Errno 8] nodename nor servname provided, or not known')))
I'm not sure if there is anything else that I'm supposed to add to my code to make this work. The documentation for the responses library doesn't make any mention of proxying, so I'm not sure if this is even possible

Related

Http - Tunnel connection failed: 403 Forbidden error with Python web scraping

I am trying to web scrape a http website and I am getting below error when I am trying to read the website.
HTTPSConnectionPool(host='proxyvipecc.nb.xxxx.com', port=83): Max retries exceeded with url: http://campanulaceae.myspecies.info/ (Caused by ProxyError('Cannot connect to proxy.', OSError('Tunnel connection failed: 403 Forbidden',)))
Below is the code I have written with similar website. I tried using urllib and user-agent and still the same issue.
url = "http://campanulaceae.myspecies.info/"
response = requests.get(url, headers={'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.77 Safari/537.36'})
soup = BeautifulSoup(response.text, 'html.parser')
Can anyone help me with the issue. Thanks in advance
you should try to add proxy while requesting url.
proxyDict = {
'http' : "add http proxy",
'https' : "add https proxy"
}
requests.get(url, proxies=proxyDict)
you can find more information here
i tried using User-Agent: Defined and it worked for me.
url = "http://campanulaceae.myspecies.info/"
headers = {
"Accept-Language" : "en-US,en;q=0.5",
"User-Agent": "Defined",
}
response = requests.get(url, headers=headers)
response.raise_for_status()
data = response.text
soup = BeautifulSoup(data, 'html.parser')
print(soup.prettify())
If you get an error that says "bs4.FeatureNotFound: Couldn't find a tree builder with the features you requested: html-parser." Then it means you're not using the right parser, you'll need to import lxml at the top and install the module then use "lxml" instead of "html.parser" when you make soup.

Max retries exceeded with url: / Caused by ProxyError

i wanted to get some proxy list from this webPage; https://free-proxy-list.net/
but i stuck in this error and dont know how to fix it.
requests.exceptions.ProxyError: HTTPSConnectionPool(host='free-proxy-list.net', port=443): Max retries exceeded with url: / (Caused by ProxyError('Cannot connect to proxy.', NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x00000278BFFA1EB0>: Failed to establish a new connection: [WinError 10060] A connection attempt failed because the connected
party did not properly respond after a period of time, or established connection failed because connected host has failed to respond')))
and btw, this is my related code:
import urllib
import requests
from bs4 import BeautifulSoup
from fake_useragent import UserAgent
ua = UserAgent(cache=False)
header = {
"User-Agent": str(ua.msie)
}
proxy = {
"https": "http://95.66.151.101:8080"
}
urls = "https://free-proxy-list.net/"
res = requests.get(urls, proxies=proxy)
soup = BeautifulSoup(res.text,'lxml')
and i tried to scrape other web sites, but i realized that its not the way.
Your using https in the Json dict when your proxy is a http proxy
Proxies should always be inside this format
For a http proxy
{'"http": "Http Proxy"}
For a https proxy
{"https":"Https Proxy"}
And for the UserAgent
{"User-Agent": "Opera/9.80 (X11; Linux x86_64; U; de) Presto/2.2.15 Version/10.00"}
Example
import requests
requests.get("https://example.com", proxies={"http":"http://95.66.151.101:8080"}, headers={"User-Agent": "Opera/9.80 (X11; Linux x86_64; U; de) Presto/2.2.15 Version/10.00"})
The module from fake_useragent import UserAgent You imported is irrelevant and unnecessary
Extra
The error could've also happened because the proxy isn't valid or responded improperly
If you are looking for free lists of proxies consider checking out these sources
https://pastebin.com/raw/VJwVkqRT
https://proxyscrape.com/free-proxy-list
https://www.freeproxylists.net/
I have never seen the fake_useragent module and don't know what its for, but I removed it. Also don't know why you added these header elements, but I dont believe it is necessary for the task you described. Looking at the html in your link, the proxies are in section id="list"-->div class="container"--> <tbody>. The below code does give all the elements in the mentioned area, and includes all the proxies. You can alter this if you want to get more specific info.
import requests
from bs4 import BeautifulSoup
urls = "https://free-proxy-list.net/"
res = requests.get(urls)
soup = BeautifulSoup(res.text,"html.parser")
tbody = soup.find("tbody")
print(tbody.prettify())

how can i fix it this error in python code

I want to check the login status so. I make program to check it
import requests
import json
import datetime
headers = {
"Accept": "application/json, text/plain, */*",
"Accept-Encoding": "gzip, deflate",
"Accept-Language": "ko-KR,ko;q=0.9,en-US;q=0.8,en;q=0.7",
"Connection": "keep-alive",
"Content-Length": "50",
"Content-Type": "application/json;charset=UTF-8",
"Cookie": "_ga=GA1.2.290443894.1570500092; _gid=GA1.2.963761342.1579153496; JSESSIONID=A4B3165F23FBEA34B4BBE429D00F12DF",
"Host": "marke.ai",
"Origin": "http://marke",
"Referer": "http://marke/event2/login",
"User-Agent": "Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.117 Mobile Safari/537.36",
}
url = "http://mark/api/users/login"
va = {"username": "seg", "password": "egkegn"}
c = requests.post(url, data=json.dumps(va), headers=headers)
if c.status_code != 200:
print("error")
This is working very well in my windows local with Pycharm
but when i ran the code in Linux i got the error like this
requests.exceptions.ProxyError: HTTPConnectionPool(host='marke', port=80):
Max retries exceeded with url: http://marke.ai/api/users/login (
Caused by ProxyError('Cannot connect to proxy.',
NewConnectionError('<urllib3.connection.HTTPConnection>: Failed to establish a new connection: [Errno 110] Connection timed out',)
)
)
So.. what is the problem please teach me also if you know the solution please teach me!!
thank you
According to your error, it seems you are behind a proxy.
So you have to specify your proxy parameters when building your request.
Build your proxies as a dict following this format
proxies = {
"http": "http://my_proxy:my_port",
"https": "https://my_proxy:my_port"
}
If you don't know your proxy parameters, then you can get them using urllib module :
import urllib
proxies = urllib.request.getproxies()
There's a proxy server configured on that Linux host, and it can't connect to it.
Judging by the documentation, you may have a PROXY_URL environment variable set.
Modifying #Arkenys answer. Please try this.
import urllib.request
proxies = urllib.request.getproxies()
# all other things
c = requests.post(url, data=json.dumps(va), headers=headers, proxies=proxies)

Requests.get does not work, Failed to establish a new connection

I am trying to a web-scraping. Firstly the code was working but later it does not. The code is
import requests
import hashlib
from bs4 import BeautifulSoup
def sha512(x):
m = hashlib.sha512(x.encode())
return m.hexdigest()
session = requests.Session()
session.cookies["user-agent"] = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.117 Safari/537.36"
r = session.post("https://ringzer0ctf.com/login", data={"username":"myusername","password":"mypass"})
r = session.get("https://ringzeractf.com/challenges/13")
soup = BeautifulSoup(r.text, 'html.parser')
It gives error like
requests.exceptions.ConnectionError: HTTPSConnectionPool(host='ringzeractf.com', port=443): Max retries exceeded
with url: /challenges/13 (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x04228490>: Failed to establish a new connection: [Errno 11001] getaddrinfo failed'))
Your URL in the GET request is wrong. Change ringzeractf to ringzer0ctf

Python's request module returns a proxy error

Sending a post request with proxies but keep running into proxy error.
Already tried multiple solutions on stackoverflow for [WinError 10061] No connection could be made because the target machine actively refused it.
Tried changing, system settings, verified if the remote server is existing and running, also no HTTP_PROXY environment variable is set in the system.
import requests
proxy = {IP_ADDRESS:PORT} #proxy
proxy = {'https': 'https://' + proxy}
#standard header
header={
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.87 Safari/537.36",
"Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8",
"Referer": "https://tres-bien.com/adidas-yeezy-boost-350-v2-black-fu9006-fw19",
"Accept-Encoding": "gzip, deflate, br",
"Accept-Language": "en-GB,en-US;q=0.9,en;q=0.8"
}
#payload to be posted
payload = {
"form_key":"1UGlG3F69LytBaMF",
"sku":"adi-fw19-003",
# above two values are dynamically populating the field; hardcoded the value here to help you replicate.
"fullname": "myname",
"email": "myemail#gmail.com",
"address": "myaddress",
"zipcode": "areacode",
"city": "mycity" ,
"country": "mycountry",
"phone": "myphonenumber",
"Size_raffle":"US_11"
}
r = requests.post(url, proxies=proxy, headers=header, verify=False, json=payload)
print(r.status_code)
Expected output: 200, alongside an email verification sent to my email address.
Actual output: requests.exceptions.ProxyError: HTTPSConnectionPool(host='tres-bien.com', port=443): Max retries exceeded with url: /adidas-yeezy-boost-350-v2-black-fu9006-fw19 (Caused by ProxyError('Cannot connect to proxy.', NewConnectionError(': Failed to establish a new connection: [WinError 10061] No connection could be made because the target machine actively refused it',)))
Quite a few things are wrong here... (after looking at the raffle page you're trying to post to, I suspect it is https://tres-bien.com/adidas-yeezy-boost-350-v2-black-fu9006-fw19 based on the exception you posted).
1) I'm not sure whats going on with your first definition of proxy as a dict instead of a string. That said, it's probably a good practice to use both http and https proxies. If your proxy can support https then it should be able to support http.
proxy = {
'http': 'http://{}:{}'.format(IP_ADDRESS, PORT),
'https': 'https://{}:{}'.format(IP_ADDRESS, PORT)
}
2) Second issue is that the raffle you're trying to submit to takes url encoded form data, not json. Thus your request should be structured like:
r = requests.post(
url=url,
headers=headers,
data=payload
)
3) That page has a ReCaptcha present, which is missing from your form payload. This isn't why your request is getting a connection error, but you're not going to successfully submit a form that has a ReCaptcha field without a proper token.
4) Finally, I suspect the root of your ProxyError is you are trying to POST to the wrong url. Looking at Chrome Inspector, you should be submitting this data to
https://tres-bien.com/tbscatalog/manage/rafflepost/ whereas your exception output indicates you are POSTing to https://tres-bien.com/adidas-yeezy-boost-350-v2-black-fu9006-fw19
Good luck with the shoes.

Categories

Resources