Requests is not honoring the proxies flag.
There is something I am missing about making a request over a proxy with python requests library.
If I enable the OS system proxy, then it works, but if I make the request with just requests module proxies setting, the remote machine will not see the proxy set in requests, but will see my real ip, it is as if not proxy was set.
The bellow example will show this effect, at the time of this post the bellow proxy is alive but any working proxy should replicate the effect.
import requests
proxy ={
'http:': 'https://143.208.200.26:7878',
'https:': 'http://143.208.200.26:7878'
}
data = requests.get(url='http://ip-api.com/json', proxies=proxy).json()
print('Ip: %s\nCity: %s\nCountry: %s' % (data['query'], data['city'], data['country']))
I also tried changing the proxy_dict format:
proxy ={
'http:': '143.208.200.26:7878',
'https:': '143.208.200.26:7878'
}
But still it has not effect.
I am using:
-Windows 10
-python 3.9.6
-urllib 1.25.8
Many thanks in advance for any response to help sort this out.
Ok is working yea !!! .
The credits for solving this goes to (Olvin Rogh) Thanks Olvin for your help and pointing out my problem. I was adding colon ":" inside the keys
This code is working now.
PROXY = {'https': 'https://143.208.200.26:7878',
'http': 'http://143.208.200.26:7878'}
with requests.Session() as session:
session.proxies = PROXY
r = session.get('http://ip-api.com/json')
print(json.dumps(r.json(), indent=2))
I have programmed an application in Python and implemented an auto update mechanism which just retrieves a text file from a cloud server and then checks the version number.
This works fine, but some subsidiaries have their proxies configured in a way that the cloud server can only be accessed through a proxy server.
Now, retrieving something from the web while using a proxy server is generally not a big deal.
I could just use something like this:
import requests
url = 'https://www.cloudserver.com/versionfile'
proxy = 'http://user:pass#proxyserver:port'
proxies = {'http': proxy, 'https': proxy}
requests.get(url, proxies=proxies)
This works wonderfully. The problem is that I don't want my customers to enter username, password and proxyserver. Ok, I could get the username with getpass.getuser(), but not the password.
Another option that sounded promising was pypac:
>>> from pypac import PACSession
>>> session = PACSession()
>>> session.get('http://example.org')
<Response [407]>
Alas, it answers with 407 - Proxy Authentication Required.
There are professional programs out there which just magically use the system proxy settings including username and password (or maybe a hashed version or a ticket of some form) and never have to ask the user about anything. It just works, e.g. Firefox seems to do it that way.
Is it possible to extract or reuse the system settings to access the web without asking the user for the credentials in Python?
I'm trying to automate log-in into Costco.com to check some member only prices.
I used dev tool and the Network tab to identify the request that handles the Logon, from which I inferred the POST URL and the parameters.
Code looks like:
import requests
s = requests.session()
payload = {'logonId': 'email#email.com',
'logonPassword': 'mypassword'
}
#get this data from Google-ing "my user agent"
user_agent = {"User-Agent" : "myusergent"}
url = 'https://www.costco.com/Logon'
response = s.post(url, headers=user_agent,data=payload)
print(response.status_code)
When I run this, it just runs and runs and never returns anything. Waited 5 minutes and still running.
What am I going worng?
maybe you should try to make a get requests to get some cookies before make the post requests, if the post requests doesnt work, maybe you should add a timeout so the script stop and you know that it doesnt work.
r = requests.get(w, verify=False, timeout=10)
This one is tough. Usually, in order to set the proper cookies, a get request to the url is first required. We can go directly to https://www.costco.com/LogonForm so long as we change the user agent from the default python requests one. This is accomplished as follows:
import requests
agent = (
"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) "
"Chrome/85.0.4183.102 Safari/537.36"
)
with requests.Session() as s:
headers = {'user-agent': agent}
s.headers.update(headers)
logon = s.get('https://www.costco.com/LogonForm')
# Saved the cookies in variable, explanation below
cks = s.cookies
Logon get request is successful, ie status code 200! Taking a look at cks:
print(sorted([c.name for c in cks]))
['C_LOC',
'CriteoSessionUserId',
'JSESSIONID',
'WC_ACTIVEPOINTER',
'WC_AUTHENTICATION_-1002',
'WC_GENERIC_ACTIVITYDATA',
'WC_PERSISTENT',
'WC_SESSION_ESTABLISHED',
'WC_USERACTIVITY_-1002',
'_abck',
'ak_bmsc',
'akaas_AS01',
'bm_sz',
'client-zip-short']
Then using the inspect network in google chrome and clicking login yields the following form data for the post in order to login. (place this below cks)
data = {'logonId': username,
'logonPassword': password,
'reLogonURL': 'LogonForm',
'isPharmacy': 'false',
'fromCheckout': '',
'authToken': '-1002,5M9R2fZEDWOZ1d8MBwy40LOFIV0=',
'URL':'Lw=='}
login = s.post('https://www.costco.com/Logon', data=data, allow_redirects=True)
However, simply trying this makes the request just sit there and infinitely redirect.
Using burp suite, I stepped into the post and and found the post request when done via browser. This post has many more cookies than obtained in the initial get request.
Quite a few more in fact
# cookies is equal to the curl from burp, then converted curl to python req
sorted(cookies.keys())
['$JSESSIONID',
'AKA_A2',
'AMCVS_97B21CFE5329614E0A490D45%40AdobeOrg',
'AMCV_97B21CFE5329614E0A490D45%40AdobeOrg',
'C_LOC',
'CriteoSessionUserId',
'OptanonConsent',
'RT',
'WAREHOUSEDELIVERY_WHS',
'WC_ACTIVEPOINTER',
'WC_AUTHENTICATION_-1002',
'WC_GENERIC_ACTIVITYDATA',
'WC_PERSISTENT',
'WC_SESSION_ESTABLISHED',
'WC_USERACTIVITY_-1002',
'WRIgnore',
'WRUIDCD20200731',
'__CT_Data',
'_abck',
'_cs_c',
'_cs_cvars',
'_cs_id',
'_cs_s',
'_fbp',
'ajs_anonymous_id_2',
'ak_bmsc',
'akaas_AS01',
'at_check',
'bm_sz',
'client-zip-short',
'invCheckPostalCode',
'invCheckStateCode',
'mbox',
'rememberedLogonId',
's_cc',
's_sq',
'sto__count',
'sto__session']
Most of these look to be static, however because there are so many its hard to tell which is which and what each is supposed to be. It's here where I myself get stuck, and I am actually really curious how this would be accomplished. In some of the cookie data I can also see some sort of ibm commerce information, so I am linking Prevent Encryption (Krypto) Of Url Paramaters in IBM Commerce Server 6 as its the only other relevant SO answer question pertaining somewhat remotely to this.
Essentially though the steps would be to determine the proper cookies to pass for this post (and then the proper cookies and info for the redirect!). I believe some of these are being set by some js or something since they are not in the get response from the site. Sorry I can't be more help here.
If you absolutely need to login, try using selenium as it simulates a browser. Otherwise, if you just want to check if an item is in stock, this guide uses requests and doesn't need to be logged in https://aryaboudaie.com/python/technical/educational/2020/07/05/using-python-to-buy-a-gift.html
appreciate your help here, thanks in advance.
My Problem:
I am using Python's Requests module for get/post requests to a Django REST API behind a work proxy. I am unable to get past the proxy and I encounter an error. I have summarised this below:
Using the following code (what I've tried):
s = requests.Session()
s.headers = {
"User-Agent": [someGenericUserAgent]
}
s.trust_env = False
proxies = {
'http': 'http://[domain]\[userName]:[password]#[proxy]:8080',
'https': 'https://[domain]\[userName]:[password]#[proxy]:8080'
}
os.environ['NO_PROXY'] = [APIaddress]
os.environ['no_proxy'] = [APIaddress]
r = s.post(url=[APIaddress], proxies=proxies)
With this I get an error:
... OSError('Tunnel connection failed: 407 Proxy Authentication Required')))
Additional Context:
This is on a windows 10 machine.
Work uses a "automatic proxy setup" script (.pac), looking at the script there are a number of proxies that will be automatically assigned depending on the IP address of the machine. All of these proxies I have tried under [proxy] above, with the same error.
The above works when I am not running through the work network, and I don't use the additional proxy settings (removing proxies=proxies). i.e on my home network.
I have no issues with a get request via my browser via the proxy to the Django REST API view.
Things I am uncertain about:
I don't know if I am using the right [proxy]. Is there a way to verify this? I have tried using [findMyProxy].com sites, using the ip addresses it still doesn't work.
I don't know if I am using [domain]\[userName] correctly. is a \ correct? my work does use a domain.
I'm certain it is not a requests issue, as trying to do pip install --proxy http://[domain]\[userName]:[password]#[proxy]:8080 someModule bares the same 407 error.
Any help appreciated.
How I came to the solution:
I used curl to establish a <200 response>, after a lot of trial and error, success was:
$ curl -U DOMAIN\USER:PW -v -x http://LOCATION_OF_PAC_FILE --proxy-ntlm www.google.com
Where -U is the domain, user name and password.
-V is verbose, made it easier for debugging.
-x is the proxy, in my case the location to the .pac file. Curl automatically determines the proxy IP from the PAC. Requests does not do this by default (that I know of).
I used curl to determine that my proxy was using ntlm.
www.google.com as an external site to test the proxy auth.
NOTE: only 1 off \ between domain and username.
Trying to make request use ntlm, I found was impossible by default and instead used requests-ntlm2.
The PAC file through request-ntlm2 did not work so I used pypac to autodiscover the PAC file then determined the proxy based on the URL.
The working code is as follows:
from pypac import PACSession
from requests_ntlm2 import (
HttpNtlmAuth,
HttpNtlmAdapter,
NtlmCompatibility
)
username = 'DOMAIN\\USERNAME'
password = 'PW'
# Don't need the following thanks to pypacs
# proxy_ip = 'PROXY_IP'
# proxy_port = "PORT"
# proxies = {
# 'http': 'http://{}:{}'.format(proxy_ip, proxy_port),
# 'https': 'http://{}:{}'.format(proxy_ip, proxy_port)
# }
ntlm_compatibility = NtlmCompatibility.NTLMv2_DEFAULT
# session = requests.Session() <- replaced with PACSession()
session = PACSession()
session.mount(
'https://',
HttpNtlmAdapter(
username,
password,
ntlm_compatibility=ntlm_compatibility
)
)
session.mount(
'http://',
HttpNtlmAdapter(
username,
password,
ntlm_compatibility=ntlm_compatibility
)
)
session.auth = HttpNtlmAuth(
username,
password,
ntlm_compatibility=ntlm_compatibility
)
# Don't need the following thanks to pypacs
# session.proxies = proxies
response = session.get('http://www.google.com')
I'm not sure how else to describe this. I'm trying to log into a website using the requests library with Python but it doesn't seem to be capturing all cookies from when I login and subsequent requests to the site go back to the login page.
The code I'm using is as follows: (with redactions)
with requests.Session() as s:
r = s.post('https://www.website.co.uk/login', data={
'amember_login': 'username',
'amember_password': 'password'
})
Looking at the developer tools in Chrome. I see the following:
After checking r.cookies it seems only that PHPSESSID was captured there's no sign of the amember_nr cookie.
The value in PyCharm only shows:
{RequestsCookieJar: 1}<RequestsCookieJar[<Cookie PHPSESSID=kjlb0a33jm65o1sjh25ahb23j4 for .website.co.uk/>]>
Why does this code fail to save 'amember_nr' and is there any way to retrieve it?
SOLUTION:
It appears the only way I can get this code to work properly is using Selenium, selecting the elements on the page and automating the typing/clicking. The following code produces the desired result.
from seleniumrequests import Chrome
driver = Chrome()
driver.get('http://www.website.co.uk')
username = driver.find_element_by_xpath("//input[#name='amember_login']")
password = driver.find_element_by_xpath("//input[#name='amember_pass']")
username.send_keys("username")
password.send_keys("password")
driver.find_element_by_xpath("//input[#type='submit']").click() # Page is logged in and all relevant cookies saved.
You can try this:
with requests.Session() as s:
s.get('https://www.website.co.uk/login')
r = s.post('https://www.website.co.uk/login', data={
'amember_login': 'username',
'amember_password': 'password'
})
The get request will set the required cookies.
FYI I would use something like BurpSuite to capture ALL the data being sent to the server and sort out what headers etc are required ... sometimes people/servers to referrer checking, set cookies via JAVA or wonky scripting, even seen java obfuscation and blocking of agent tags not in whitelist etc... it's likely something the headers that the server is missing to give you the cookie.
Also you can have Python use burp as a proxy so you can see exactly what gets sent to the server and the response.
https://github.com/freeload101/Python/blob/master/CS_HIDE/CS_HIDE.py (proxy support )