I am trying to make request through a SOCKS5 proxy server over HTTPS but it fails or returns the empty string. I am using PySocks library.
Here is my example
WEB_SITE_PROXY_CHECK_URL = "whatismyipaddress.com/"
REQUEST_SCHEMA = "https://"
host_url = REQUEST_SCHEMA + WEB_SITE_PROXY_CHECK_URL
socket.connect((host_url, 443))
request = "GET / HTTP/1.1\nHost: " + host_url + "\nUser-Agent:Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.64 Safari/537.11\n\n"
socket.send(request)
response = socket.recv(4096)
print response
But it doesn't work, it prints an empty string response.
Is there any way to make HTTPS request through the socks5 proxy in Python ?
Thanks
As of requests version 2.10.0, released on 2016-04-29, requests
supports SOCKS.
It requires PySocks, which can be installed with pip install pysocks.
import requests
host_url = 'https://example.com'
#Fill in your own proxies' details
proxies={http:'socks5://user:pass#host:port',
https:'socks5://user:pass#host:port'}
#define headers if you will
headers={}
response = requests.get(host_url, headers=headers, proxies=proxies)
Beware, when using a SOCKS proxy, request socks will make HTTP requests with the full URL (e.g., GET example.com HTTP/1.1 rather than GET / HTTP/1.1) and this behavior may cause problems.
Related
I am currently building a proxy rotator for Python. Everything is running fine so far, except for the fact that despite the proxies, the tracker - pages return my own IP.
I have already read through dozens of posts in this forum. It often says "something is wrong with the proxy in this case".
I have a long list of proxies ( about 600 ) which I test with my method and I made sure when I scrapped them that they were marked either "elite" or "anonymous" before I put them on this list.
So can it be that the majority of free proxies are "junk" when it comes to anonymity or am I fundamentally doing something wrong?
And is there basically a way to find out how the proxy is set regarding anonymity?
Python 3.10.
import requests
headers = {
"User-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.169 Safari/537.36"
}
proxi = {"http": ""}
prox_ping_ready = [173.219.112.85:8080,
43.132.148.107:2080,
216.176.187.99:8886,
193.108.21.234:1234,
151.80.120.192:3128,
139.255.10.234:8080,
120.24.33.141:8000,
12.88.29.66:9080,
47.241.66.249:1081,
51.79.205.165:8080,
63.250.53.181:3128,
160.3.168.70:8080]
ipTracker = ["wtfismyip.com/text", "api.ip.sb/ip", "ipecho.net/plain", "ifconfig.co/ip"]
for element in proxy_ping_ready:
for choice in ipTracker:
try:
proxi["http"] = "http://" + element
ips = requests.get(f'https://{choice}', proxies=proxi, timeout=1, headers=headers).text
print(f'My IP address is: {ips}', choice)
except Exception as e:
print("Error:", e)
time.sleep(3)
Output(example):
My IP address is: 89.13.9.135
api.ip.sb/ip
My IP address is: 89.13.9.135
wtfismyip.com/text
My IP address is: 89.13.9.135
ifconfig.co/ip
(Every time my own address).
You only set your proxy for http traffic, you need to include a key for https traffic as well.
proxi["http"] = "http://" + element
proxi["https"] = "http://" + element # or "https://" + element, depends on the proxy
As James mentioned, you should use also https proxy
proxi["https"] = "http://" + element
If you getting max retries with url it most probably means that the proxy is not working or is too slow and overloaded, so you might increase your timeout.
You can verify if your proxy is working by setting it as env variable. I took one from your list
import os
os.environ["http_proxy"] = "173.219.112.85:8080"
os.environ["https_proxy"] = "173.219.112.85:8080"
and then run your code without proxy settings by changing your request to
ips = requests.get(f'wtfismyip.com/text', headers=headers).text
On Powershell, I am currently performing this request, copied from the network tab on developer tools
$session = New-Object Microsoft.PowerShell.Commands.WebRequestSession
$session.UserAgent = "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/106.0.0.0 Safari/537.36"
Invoke-WebRequest -useBasicParsing -Uri "https://......?...."
-WebSession $session
-Headers #{
"Accept"="*/*"
"Accept-Encoding"="gzip, deflate, br"
"Accept-Language"="en-US,en;q=0.9"
"Authorization"="Basic mzYw....="
"Referer"="https://......."
"sec-Fetch-Dest"="empty"
"sec-Fetch-Mode"=="cors"
"sec-Fetch-Site="same-origin"
"sec-ch-ua"="`""Chromium`";v=`"106`",`"Google Chrome`";v=`"106`",`"NotlA=Brand`";v=`"99`""
"sec-ch-ua-mobile"="?0"
"sec-ch-ua-platform"="`"Windows`""
}
-ContentType "application/x-www-form-urlencoded"
which returns me the response 200 just fine.
However, when i tried to perform the same requests, with the headers config on python requests, I am getting a SSL proxy-related error (see SSL_verification wrong version number even with certifi verify).
Is proxy automatically configured on PowerShell requests? How can I find out what proxy are my requests currently routed to? Otherwise, how can I replicate 1:1 powershell requests to python requests?
I have tried running ipconfig /all command and using the Primary Dns Suffix field as proxy arguments in requests
requests.get(url, header = headers_in_powershell, proxies = { 'http': 'the_dns_suffix', 'https': 'the_dns_suffix' }
but the requests just gets stuck (waits with no response indefinitely).
For most commands, Powershell uses the system proxy by default (or they have a -Proxy switch to tell them where it is, but some don't and have to be told to use it.
From memory, Invoke-WebRequest can be problematical as (I think) it uses the .NET web client
Try adding this to the start of the PS script:
[System.Net.WebRequest]::DefaultWebProxy = [System.Net.WebRequest]::GetSystemWebProxy()
[System.Net.WebRequest]::DefaultWebProxy.Credentials = [System.Net.CredentialCache]::DefaultNetworkCredentials
I was trying to send http/https requests via proxy (socks5), but I can't understand if the problem is in my code or in the proxy.
I tried using this code and it gives me an error:
requests.exceptions.ConnectionError: SOCKSHTTPSConnectionPool(host='www.google.com', port=443): Max retries exceeded with url: / (Caused by NewConnectionError('<urllib3.contrib.socks.SOCKSHTTPSConnection object at 0x000001B656AC9608>: Failed to establish a new connection: Connection closed unexpectedly'))
This is my code:
import requests
url = "https://www.google.com"
proxies = {
"http":"socks5://fsagsa:sacesf241_country-darwedafs_session-421dsafsa#x.xxx.xxx.xx:31112",
"https":"socks5://fsagsa:sacesf241_country-darwedafs_session-421dsafsa#x.xxx.xxx.xx:31112",
}
headers = {
"Upgrade-Insecure-Requests": "1",
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/93.0.4577.63 Safari/537.36",
"Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9",
"Sec-Gpc": "1",
"Sec-Fetch-Site": "same-origin",
"Sec-Fetch-User": "?1",
"Accept-Encoding": "gzip, deflate, br",
"Sec-Fetch-Mode": "navigate",
"Sec-Fetch-Dest": "document",
"Accept-Language": "en-GB,en;q=0.9"
}
r = requests.get(url, headers = headers, proxies = proxies)
print(r)
Then, I checked the proxy with an online tool
The tool manages to send requests through the proxy. .
So the problem is in this code? I can't figure out what's wrong.
Edit (15/09/2021)
I added headers but the problem is still there.
Create a local server/mock to handle the request using pytest or some other testing framework with responses library to eliminate variables external to your application/script. I’m quite sure Google will reject requests with empty headers. Also, ensure you installed the correct dependencies to enable SOCKS proxy support in requests (python -m pip install requests[socks]). Furthermore, if you are making a remote request to connect to your proxy you must change socks5 to socks5h in your proxies dictionary.
References
pytest: https://docs.pytest.org/en/6.2.x/
responses: https://github.com/getsentry/responses
requests[socks]: https://docs.python-requests.org/en/master/user/advanced/#socks
In addition to basic HTTP proxies, Requests also supports proxies using the SOCKS protocol. This is an optional feature that requires that additional third-party libraries be installed before use.
You can get the dependencies for this feature from pip:
$ python -m pip install requests[socks]
Once you’ve installed those dependencies, using a SOCKS proxy is just as easy as using a HTTP one:
proxies = {
'http': 'socks5://user:pass#host:port',
'https': 'socks5://user:pass#host:port'
}
Using the scheme socks5 causes the DNS resolution to happen on the client, rather than on the proxy server. This is in line with curl, which uses the scheme to decide whether to do the DNS resolution on the client or proxy. If you want to resolve the domains on the proxy server, use socks5h as the scheme.
My code works when I make request from location machine.
When I try to make the request from AWS EC2 I get the following error:
requests.exceptions.ReadTimeout: HTTPSConnectionPool(host='www1.xyz.com', port=443): Read timed out. (read timeout=20)
I tried checking the url and that was not the issue. I then went ahead and tried to visit the page using the url and hidemyass webproxy with location set to the AWS EC2 machine, it got a 404.
The code:
# Dummy URL's
header = {
"User-Agent": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.75 Safari/537.36",
"X-Requested-With": "XMLHttpRequest"
}
url = 'https://www1.xyz.com/iKeys.jsp?symbol={}&date=31DEC2020'.format(
symbol)
raw_page = requests.get(url, timeout=10, headers=header).text
I have tried setting the proxies to another ip address in the request, which I searched online:
proxies = {
"http": "http://125.99.100.193",
"https": "https://125.99.100.193",}
raw_page = requests.get(url, timeout=10, headers=header, proxies=proxies).text
Still got the same error.
1- Do I need to specify the port in proxies? Could this be causing the error when proxy is set?
2- What could be a solution for this?
Thanks
I have a web link as below:
https://www.nseindia.com/live_market/dynaContent/live_watch/option_chain/optionKeys.jsp
I use the below code to collect the data but getting error as:
requests.exceptions.ConnectionError: ('Connection aborted.',
OSError("(10060, 'WSAETIMEDOUT')",))
My Code:
from requests import Session
import lxml.html
expiry_list = []
try:
session = Session()
headers = {'user-agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.117 Safari/537.36'}
session.headers.update(headers)
url = 'https://www1.nseindia.com/live_market/dynaContent/live_watch/option_chain/optionKeys.jsp'
params = {'symbolCode': 9999, 'symbol': 'BANKNIFTY', 'instrument': 'OPTIDX', 'date': '-', 'segmentLink': 17}
response = session.get(url, params=params)
soup = lxml.html.fromstring(response.text)
expiry_list = soup.xpath('//form[#id="ocForm"]//option/text()')
expiry_list.remove(expiry_list[0])
except Exception as error:
print("Error:", error)
print("Expiry_Date =", expiry_list)
Its working perfect in my local machine but giving error in Amazon EC2 Instance Any settings need to be changed for resolving request timeout error.
AWS houses lots of botnets, so spam blacklists frequently list AWS IPs. Your EC2 is probably part of an IP block that is blacklisted. You might be able to verify by putting your public EC2 IP in here https://mxtoolbox.com/. I would try verifying if you can even make a request via curl from the command line curl -v {URL}. If that times out, then I bet your IP is blocked by the remote server's firewall rules. Since your home IP has access, you can try to setup a VPN on your network, have the EC2 connect to your VPN, and then retry your python script. It should work then, but it will be as if you're making the request from your home (so don't do anything stupid). Most routers allow you to setup an OpenVPN or PPTP VPN right in the admin UI. I suspect that once your EC2's IP changes, you'll trick the upstream server and be able to scrape.