On Powershell, I am currently performing this request, copied from the network tab on developer tools
$session = New-Object Microsoft.PowerShell.Commands.WebRequestSession
$session.UserAgent = "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/106.0.0.0 Safari/537.36"
Invoke-WebRequest -useBasicParsing -Uri "https://......?...."
-WebSession $session
-Headers #{
"Accept"="*/*"
"Accept-Encoding"="gzip, deflate, br"
"Accept-Language"="en-US,en;q=0.9"
"Authorization"="Basic mzYw....="
"Referer"="https://......."
"sec-Fetch-Dest"="empty"
"sec-Fetch-Mode"=="cors"
"sec-Fetch-Site="same-origin"
"sec-ch-ua"="`""Chromium`";v=`"106`",`"Google Chrome`";v=`"106`",`"NotlA=Brand`";v=`"99`""
"sec-ch-ua-mobile"="?0"
"sec-ch-ua-platform"="`"Windows`""
}
-ContentType "application/x-www-form-urlencoded"
which returns me the response 200 just fine.
However, when i tried to perform the same requests, with the headers config on python requests, I am getting a SSL proxy-related error (see SSL_verification wrong version number even with certifi verify).
Is proxy automatically configured on PowerShell requests? How can I find out what proxy are my requests currently routed to? Otherwise, how can I replicate 1:1 powershell requests to python requests?
I have tried running ipconfig /all command and using the Primary Dns Suffix field as proxy arguments in requests
requests.get(url, header = headers_in_powershell, proxies = { 'http': 'the_dns_suffix', 'https': 'the_dns_suffix' }
but the requests just gets stuck (waits with no response indefinitely).
For most commands, Powershell uses the system proxy by default (or they have a -Proxy switch to tell them where it is, but some don't and have to be told to use it.
From memory, Invoke-WebRequest can be problematical as (I think) it uses the .NET web client
Try adding this to the start of the PS script:
[System.Net.WebRequest]::DefaultWebProxy = [System.Net.WebRequest]::GetSystemWebProxy()
[System.Net.WebRequest]::DefaultWebProxy.Credentials = [System.Net.CredentialCache]::DefaultNetworkCredentials
Related
I am trying to send a Http Post request to a website with these headers :
headers = {
"content-type": "application/x-www-form-urlencoded; charset=UTF-8",
"cookie": "__gpi=UID=00000625243f2b12:T=1654153135:RT=1654342443:S=ALNI_MbdFxSgua2dONohDTz9bEGks8vnoQ; __gads=ID=05dae5d77dbc463f:T=1654153135:S=ALNI_MbLIzKIHhP022gtr7bRBqu9PSxNtQ; PHPSESSID=8a932c5bbe4d667513dfdc3a0051ed37",
"origin": "https://www.dcode.fr",
"pragma": "no-cache",
"referer": "https://www.dcode.fr/cipher-identifier",
"sec-fetch-site": "same-origin",
"user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/101.0.4951.67 Safari/537.36 OPR/87.0.4390.45",
"x-requested-with": "XMLHttpRequest"
}
At first it is working perfectly.
But after some time this stop working.I think because cookies expire.
Erroneous Output :
{"captcha":"<script>$.getScript('https:\/\/www.google.com\/recaptcha\/api.js').done(function( script, textStatus ) {\n $('#captcha').addClass('g-recaptcha').attr({'data-sitekey':'6LeoCVQaAAAAALADLNorGItVJxP40YUjD1Q3S0zp','data-callback':'recaptcha_callback'});\n });\n<\/script>\n<div id='captcha'><\/div>"}
Expected output :
{"caption":"dCode's analyzer suggests to investigate:","results":{"<a href=\"\/rot-13-cipher\" target=\"_blank\">ROT-13 Cipher<\/a>":"\u25a0\u25a0","<a href=\"\/base-58-cipher\" target=\"_blank\">Base 58<\/a>":"\u25a0","<a href=\"\/playfair-cipher\" target=\"_blank\">PlayFair Cipher<\/a>":"\u25a0","<a href=\"\/base-64-encoding\" target=\"_blank\">Base64 Coding<\/a>":"\u25a0","<a href=\"\/substitution-cipher\" target=\"_blank\">Substitution Cipher<\/a>":"\u25aa","<a href=\"\/rot-cipher\" target=\"_blank\">ROT Cipher<\/a>":"\u25aa","<a href=\"\/caesar-cipher\" target=\"_blank\">Caesar Cipher<\/a>":"\u25aa","<a href=\"\/shift-cipher\" target=\"_blank\">Shift Cipher<\/a>":"\u25aa","<a href=\"\/hill-cipher\" target=\"_blank\">Hill Cipher<\/a>":"\u25aa","<a href=\"\/affine-cipher\" target=\"_blank\">Affine Cipher<\/a>":"\u25aa","<a href=\"\/keyboard-change-cipher\" target=\"_blank\">Keyboard Change Cipher<\/a>":"\u25ab","<a href=\"\/vigenere-cipher\" target=\"_blank\">Vigenere Cipher<\/a>":"\u25ab","<a href=\"\/homophonic-cipher\" target=\"_blank\">Homophonic Cipher<\/a>":"\u25ab","<a href=\"\/autoclave-cipher\" target=\"_blank\">Autoclave Cipher<\/a>":"\u25ab","<a href=\"\/beaufort-cipher\" target=\"_blank\">Beaufort Cipher<\/a>":"\u25ab","<a href=\"\/burrows-wheeler-transform\" target=\"_blank\">Burrows\u2013Wheeler Transform<\/a>":"\u25ab"}
If I copy the cookie from capturing requests using browser's developer tools and paste it in code, Then it will work again for a short amount of time.
How can I byoass this recaptcha error ?
the website or api is running some kind of js authentication to block anything that is not a browser to bypass this you have 2 options
either reverse the js and understand how the cookies are constructed and replicate them in python (this is very hard and might take weeks of reverse engineering)
or you can create a selenium instance that visits the site and waits for the cookies to be present then simply passes them to requests you will have to do this each time captcha is presented (this is the easier option but this will make your script slower)
This is not necessarily because cookies are expired, take a look at your output, it's a recaptcha. You need to solve the captcha first.
In addition to that, make sure you are changing requests' default useragent.
Consider using requests.Session if you are not using it already or alternatively selenium if possible
I was trying to send http/https requests via proxy (socks5), but I can't understand if the problem is in my code or in the proxy.
I tried using this code and it gives me an error:
requests.exceptions.ConnectionError: SOCKSHTTPSConnectionPool(host='www.google.com', port=443): Max retries exceeded with url: / (Caused by NewConnectionError('<urllib3.contrib.socks.SOCKSHTTPSConnection object at 0x000001B656AC9608>: Failed to establish a new connection: Connection closed unexpectedly'))
This is my code:
import requests
url = "https://www.google.com"
proxies = {
"http":"socks5://fsagsa:sacesf241_country-darwedafs_session-421dsafsa#x.xxx.xxx.xx:31112",
"https":"socks5://fsagsa:sacesf241_country-darwedafs_session-421dsafsa#x.xxx.xxx.xx:31112",
}
headers = {
"Upgrade-Insecure-Requests": "1",
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/93.0.4577.63 Safari/537.36",
"Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9",
"Sec-Gpc": "1",
"Sec-Fetch-Site": "same-origin",
"Sec-Fetch-User": "?1",
"Accept-Encoding": "gzip, deflate, br",
"Sec-Fetch-Mode": "navigate",
"Sec-Fetch-Dest": "document",
"Accept-Language": "en-GB,en;q=0.9"
}
r = requests.get(url, headers = headers, proxies = proxies)
print(r)
Then, I checked the proxy with an online tool
The tool manages to send requests through the proxy. .
So the problem is in this code? I can't figure out what's wrong.
Edit (15/09/2021)
I added headers but the problem is still there.
Create a local server/mock to handle the request using pytest or some other testing framework with responses library to eliminate variables external to your application/script. I’m quite sure Google will reject requests with empty headers. Also, ensure you installed the correct dependencies to enable SOCKS proxy support in requests (python -m pip install requests[socks]). Furthermore, if you are making a remote request to connect to your proxy you must change socks5 to socks5h in your proxies dictionary.
References
pytest: https://docs.pytest.org/en/6.2.x/
responses: https://github.com/getsentry/responses
requests[socks]: https://docs.python-requests.org/en/master/user/advanced/#socks
In addition to basic HTTP proxies, Requests also supports proxies using the SOCKS protocol. This is an optional feature that requires that additional third-party libraries be installed before use.
You can get the dependencies for this feature from pip:
$ python -m pip install requests[socks]
Once you’ve installed those dependencies, using a SOCKS proxy is just as easy as using a HTTP one:
proxies = {
'http': 'socks5://user:pass#host:port',
'https': 'socks5://user:pass#host:port'
}
Using the scheme socks5 causes the DNS resolution to happen on the client, rather than on the proxy server. This is in line with curl, which uses the scheme to decide whether to do the DNS resolution on the client or proxy. If you want to resolve the domains on the proxy server, use socks5h as the scheme.
1 - The target DOMAIN is https://www.dnb.com/
This website is blocking access to it from many countries around the world including mine (Algeria).
So the known solution is clear (use a proxy), which I did.
2 - Configuring the system proxy in the network configuration, and connecting to the website via (Google Chrome) works, also using Firefox with the proxy settings works fine.
3 - I came to my code to start the job
import requests
# 1. Initialize the proxy
proxy = "xxx.xxx.xxx.xxx:3128"
# 2. Setting the Headers (I cloned Firefox request headers)
headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:88.0) Gecko/20100101 Firefox/88.0",
"Accept-Encoding": "gzip, deflate, br",
"Connection": "keep - alive",
"Accept": "text/html, application/xhtml+xml, application/xml;q=0.9, image/webp, */*;q = 0.8",
"Upgrade - Insecure - Requests": "1",
"Host": "www.dnb.com",
"DNT": "1"
}
# 3. URL
URL = "https://www.dnb.com/business-directory/company-profiles.bicicletas_monark_s-a.7ad1f8788ea84850ceef11444c425a52.html"
# 4. Make a get request.
r = requests.get(URL, headers=headers, proxies={"https": proxy})
# Nothing in return and program keep executing (like infinite loop).
Note:
I know this keeps on waiting because the default timeout is set to None, but it is sure that the setup is working, and the requests library must return a response, using the timeout here can be to assess the reliability of the proxy as an example.
So, What the cause for this, it stuck (and I'm also), I'm getting the response and the correct HTML content with (Firefox, Chrome, Postman) with the same configuration.
I checked your code and ran it on my local machine. It seems the issue is with proxy. I added a public proxy and it is working. You can confirm it by adding a "timeout" argument to the requests.get function to some seconds. Also if the code working properly(even the response is 403) it means there is an issue with the proxy.
I'm using requests in Python 3.8 in order to connect to an Amazon web page.
I'm also using tor, in order to connect via SOCKS5.
This is the relevant piece of code:
session = requests.session()
session.headers.update({'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) '
'Chrome/44.0.2403.157 Safari/537.36'})
anon = {'http': "socks5://localhost:9050", 'https': "socks5://localhost:9050"}
r = session.get("myurl", proxies=anon)
print(r.content)
However, it doesn't work. It gives me the Amazon 503 error. What I need to know is if there is some method to bypass this problem or it depends on a sort of "ip blocking".
Thank you
I am using python requests to get the html page.
I am using the latest version of chrome in the user agent.
But the response tells that Please update your browser.
Here is my sample code.
import requests
url = 'https://www.choicehotels.com/alabama/mobile/quality-inn-hotels/al045/hotel-reviews/4'
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36', 'content-type': 'application/xhtml+xml', 'referer': url}
url_response = s.get(url, headers=headers, timeout=15)
print url_response.text
I am using python 2.7 in a windows server.
But when I ran the same code in my local I got the required output.
Please update your browser is the answer.
You cannot do https with old browser (and request in python2.7 could be old browser). There were a lot of security problems in https protocols, so it seems that servers doesn't allow to connect with unsecure encryptions and connection standards.