How to stay connected in website using requests - python

I want to connect to a website with Proxy and stay connected there, for let's say 10 seconds.
My script:
import requests
url = 'http://WEBSITE.com/'
proxies ={'http': 'http://IP:PORT'}
s = requests.Session();
s.proxies.update(proxies)
s.get(url);
As much as I learnt, I came up with this script which connects to the website but I think it does not stay connected, what should I do so this script connects to the website with proxy and stays connected?

The Session object doesn't necessarily keep the connection alive. To that end this might work:
import requests
url = 'http://WEBSITE.com/'
proxies = {'http': 'http://IP:PORT'}
headers = {
"connection" : "keep-alive",
"keep-alive" : "timeout=10, max=1000"
}
s = requests.Session();
s.proxies.update(proxies)
s.get(url, headers=headers);
See connection, and keep-alive headers :)
edit: after reviewing the requests documentation, I learned that the Session object can also be used to store headers. Here is a slightly better answer:
import requests
url = 'http://WEBSITE.com/'
proxies = {'http': 'http://IP:PORT'}
headers = {
"connection" : "keep-alive",
"keep-alive" : "timeout=10, max=1000"
}
s = requests.Session()
s.proxies.update(proxies)
s.headers.update(headers)
s.get(url)

Related

ProxyError, when trying to query prometheus behind the proxy

I am coding a module that needs functionality to query Prometheus, when Prometheus is sitting behind proxy and module is making queries from my local environment. My development environment is in Virtual Machine, with correct environment variables and DNS -settings, and is able to talk with the Prometheus behind the proxy for example with accessing the front-end GUI.
I've tested my requests.get() method, when its executed on the network behind the proxy and it is returning the correct values, so I am pretty positive that the proxy is causing the problem, for some reason I dont get the program to respect the proxy dictionary I am feeding to requests. I am using Visual Studio Code and Python 3.9.7.
When executing the code at the bottom of this post, I am getting loads of errors, in which the last one is this one: (Cleared some values, such as the proxy servers, url and query out, due to privacy reasons, they're correct and in-place in my code)
requests.exceptions.ProxyError: HTTPSConnectionPool(host='', port=443): Max retries exceeded with url: / (Caused by ProxyError('Cannot connect to proxy.', RemoteDisconnected('Remote end closed connection without response')))
Relevant Python Code:
import requests
import json
http_proxy = ''
https_proxy = ''
ftp_proxy = ''
proxies = {
"http" : http_proxy,
"https" : https_proxy,
"ftp" : ftp_proxy
}
headers = {
'Content-Type': 'application/json',
}
response = requests.get(url='' + '/api/v1/query', verify=False, headers=headers, proxies=proxies, params={'query': ''}).text
j = json.loads(response)
print(j)
Any help is greatly appreciated!
There's are several bugs open regarding the missing or bugged support for no_proxy for Python Requests (ex: #4871, #5000).
The only solution for now AFAICS is coding the missing logic in a function like the following one:
import json
import requests
import tldextract
http_proxy = "http://PROXY_ADDR:PROXY_PORT"
https_proxy = "http://PROXY_ADDR:PROXY_PORT"
no_proxy = "example.net,.example.net,127.0.0.1,localhost"
def get_url(url, headers={}, params={}):
ext = tldextract.extract(url)
proxy_exceptions = [host for host in no_proxy.replace(' ', '').split(',')]
if (ext.domain in proxy_exceptions
or '.'.join([ext.subdomain, ext.domain, ext.suffix]) in proxy_exceptions
or '.'.join([ext.domain, ext.suffix]) in proxy_exceptions):
proxies = {}
else:
proxies = {
"http" : http_proxy,
"https" : https_proxy,
}
response = requests.get(url=url,
verify=False,
headers=headers,
params=params,
proxies=proxies)
return response
url = "https://example.net/api/v1/query"
headers = {
'Content-Type': 'application/json',
}
response = get_url(url, headers)
j = json.loads(response.text)
print(j)

Python PUT call fails while curl call doesn't

Best wishes (first things first!)
I want to enable/disable a PoE port on my UniFi switch. For this I aim using Python 3.9.1 (first time) with the following code:
import requests
import json
import sys
import urllib3
urllib3.disable_warnings(urllib3.exceptions.InsecureRequestWarning)
gateway = {"ip": "MYSERVER.COM", "port": "8443"}
headers = {"Accept": "application/json", "Content-Type": "application/json"}
login_url = f"https://{gateway['ip']}:{gateway['port']}/api/login"
login_data = {
"username": "MYUSERNAME",
"password": "MYPASSWORD"
}
session = requests.Session()
login_response = session.post(login_url, headers=headers, data=json.dumps(login_data), verify=False)
if (login_response.status_code == 200):
api_url_portoverrides = 'api/s/default/rest/device/MYDEVICEID'
poe_url = f"https://{gateway['ip']}:{gateway['port']}/{api_url_portoverrides}"
# build json for port overrides
json_poe_state_on = '{"port_overrides": [{"port_idx": 6, "portconf_id": "MYPROFILE1"}]}'
json_poe_state_off = '{"port_overrides": [{"port_idx": 6, "portconf_id": "MYPROFILE2"}]}'
post_response = session.put(poe_url, headers=headers, data=json.dumps(json_poe_state_off))
print('Response HTTP Request {request}'.format(request=post_response.request ))
else:
print("Login failed")
The login works (I get the 2 security cookies and tried them in Paw (a macOS REST API client) to see if these were ok)) but the second call, the. PUT, returns OK but noting happens.
Before I've done this in Python, I tried all my calls in Paw first and there it works. I tried everything in bash with curl and there it works too. So I am a bit at a loss here.
Anyone has an idea?
Thank you for your time!
Best regards!
Peter
Solved it! By looking into what was posted with Wireshark I saw that the payload was different. The culprit was the json.dumps function which encoded the string by putting a backslash in front of each double quote.
It works now!

Proxies attribute in Requests module is ignored

I'm building a small script to test the certain proxies against the API.
It seems that the actual request isn't trigger under the provided proxy. For example, the following request will be valid and I will get an response from the API.
import requests
r = requests.post("https://someapi.com", data=request_data,
proxies={"http": "http://999.999.999.999:1212"}, timeout=5)
print(r.text)
How come I get the response even if the proxy provided was invalid?
You can define the proxies like this;
import requests
pxy = "http://999.999.999.999:1212"
proxyDict = {
'http': pxy,
'https': pxy,
'ftp': pxy,
'SOCKS4': pxy
}
r = requests.post("https://someapi.com", data=request_data,
proxies=proxyDict, timeout=5)
print(r.text)

How to use requests library without system-configured proxies

If I supply None or an empty dict to the proxies parameter, requests will automatically fall back to the proxies configured for the operating system as obtained through urllib.request.getproxies() (Python 3) / urllib.getproxies().
import requests
r = requests.get('http://google.com', proxies = {}) # or = None...
print(r.text)
Specifying proxies = { 'http': False } will even cause requests to hang completely for whatever weird reason.
So how do I direct requests to perform HTTP requests directly, without any proxy ?
Turns out you have to use an empty string for the protocol you want to use a direct connection:
r = requests.get('http://google.com', proxies = { 'http': '', ... })
Weird, but that's life.

Proxies with Python 'Requests' module

Just a short, simple one about the excellent Requests module for Python.
I can't seem to find in the documentation what the variable 'proxies' should contain. When I send it a dict with a standard "IP:PORT" value it rejected it asking for 2 values.
So, I guess (because this doesn't seem to be covered in the docs) that the first value is the ip and the second the port?
The docs mention this only:
proxies – (optional) Dictionary mapping protocol to the URL of the proxy.
So I tried this... what should I be doing?
proxy = { ip: port}
and should I convert these to some type before putting them in the dict?
r = requests.get(url,headers=headers,proxies=proxy)
The proxies' dict syntax is {"protocol": "scheme://ip:port", ...}. With it you can specify different (or the same) proxie(s) for requests using http, https, and ftp protocols:
http_proxy = "http://10.10.1.10:3128"
https_proxy = "https://10.10.1.11:1080"
ftp_proxy = "ftp://10.10.1.10:3128"
proxies = {
"http" : http_proxy,
"https" : https_proxy,
"ftp" : ftp_proxy
}
r = requests.get(url, headers=headers, proxies=proxies)
Deduced from the requests documentation:
Parameters:
method – method for the new Request object.
url – URL for the new Request object.
...
proxies – (optional) Dictionary mapping protocol to the URL of the proxy.
...
On linux you can also do this via the HTTP_PROXY, HTTPS_PROXY, and FTP_PROXY environment variables:
export HTTP_PROXY=10.10.1.10:3128
export HTTPS_PROXY=10.10.1.11:1080
export FTP_PROXY=10.10.1.10:3128
On Windows:
set http_proxy=10.10.1.10:3128
set https_proxy=10.10.1.11:1080
set ftp_proxy=10.10.1.10:3128
You can refer to the proxy documentation here.
If you need to use a proxy, you can configure individual requests with the proxies argument to any request method:
import requests
proxies = {
"http": "http://10.10.1.10:3128",
"https": "https://10.10.1.10:1080",
}
requests.get("http://example.org", proxies=proxies)
To use HTTP Basic Auth with your proxy, use the http://user:password#host.com/ syntax:
proxies = {
"http": "http://user:pass#10.10.1.10:3128/"
}
I have found that urllib has some really good code to pick up the system's proxy settings and they happen to be in the correct form to use directly. You can use this like:
import urllib
...
r = requests.get('http://example.org', proxies=urllib.request.getproxies())
It works really well and urllib knows about getting Mac OS X and Windows settings as well.
The accepted answer was a good start for me, but I kept getting the following error:
AssertionError: Not supported proxy scheme None
Fix to this was to specify the http:// in the proxy url thus:
http_proxy = "http://194.62.145.248:8080"
https_proxy = "https://194.62.145.248:8080"
ftp_proxy = "10.10.1.10:3128"
proxyDict = {
"http" : http_proxy,
"https" : https_proxy,
"ftp" : ftp_proxy
}
I'd be interested as to why the original works for some people but not me.
Edit: I see the main answer is now updated to reflect this :)
If you'd like to persisist cookies and session data, you'd best do it like this:
import requests
proxies = {
'http': 'http://user:pass#10.10.1.0:3128',
'https': 'https://user:pass#10.10.1.0:3128',
}
# Create the session and set the proxies.
s = requests.Session()
s.proxies = proxies
# Make the HTTP request through the session.
r = s.get('http://www.showmemyip.com/')
8 years late. But I like:
import os
import requests
os.environ['HTTP_PROXY'] = os.environ['http_proxy'] = 'http://http-connect-proxy:3128/'
os.environ['HTTPS_PROXY'] = os.environ['https_proxy'] = 'http://http-connect-proxy:3128/'
os.environ['NO_PROXY'] = os.environ['no_proxy'] = '127.0.0.1,localhost,.local'
r = requests.get('https://example.com') # , verify=False
The documentation
gives a very clear example of the proxies usage
import requests
proxies = {
'http': 'http://10.10.1.10:3128',
'https': 'http://10.10.1.10:1080',
}
requests.get('http://example.org', proxies=proxies)
What isn't documented, however, is the fact that you can even configure proxies for individual urls even if the schema is the same!
This comes in handy when you want to use different proxies for different websites you wish to scrape.
proxies = {
'http://example.org': 'http://10.10.1.10:3128',
'http://something.test': 'http://10.10.1.10:1080',
}
requests.get('http://something.test/some/url', proxies=proxies)
Additionally, requests.get essentially uses the requests.Session under the hood, so if you need more control, use it directly
import requests
proxies = {
'http': 'http://10.10.1.10:3128',
'https': 'http://10.10.1.10:1080',
}
session = requests.Session()
session.proxies.update(proxies)
session.get('http://example.org')
I use it to set a fallback (a default proxy) that handles all traffic that doesn't match the schemas/urls specified in the dictionary
import requests
proxies = {
'http': 'http://10.10.1.10:3128',
'https': 'http://10.10.1.10:1080',
}
session = requests.Session()
session.proxies.setdefault('http', 'http://127.0.0.1:9009')
session.proxies.update(proxies)
session.get('http://example.org')
i just made a proxy graber and also can connect with same grabed proxy without any input
here is :
#Import Modules
from termcolor import colored
from selenium import webdriver
import requests
import os
import sys
import time
#Proxy Grab
options = webdriver.ChromeOptions()
options.add_argument('headless')
driver = webdriver.Chrome(chrome_options=options)
driver.get("https://www.sslproxies.org/")
tbody = driver.find_element_by_tag_name("tbody")
cell = tbody.find_elements_by_tag_name("tr")
for column in cell:
column = column.text.split(" ")
print(colored(column[0]+":"+column[1],'yellow'))
driver.quit()
print("")
os.system('clear')
os.system('cls')
#Proxy Connection
print(colored('Getting Proxies from graber...','green'))
time.sleep(2)
os.system('clear')
os.system('cls')
proxy = {"http": "http://"+ column[0]+":"+column[1]}
url = 'https://mobile.facebook.com/login'
r = requests.get(url, proxies=proxy)
print("")
print(colored('Connecting using proxy' ,'green'))
print("")
sts = r.status_code
here is my basic class in python for the requests module with some proxy configs and stopwatch !
import requests
import time
class BaseCheck():
def __init__(self, url):
self.http_proxy = "http://user:pw#proxy:8080"
self.https_proxy = "http://user:pw#proxy:8080"
self.ftp_proxy = "http://user:pw#proxy:8080"
self.proxyDict = {
"http" : self.http_proxy,
"https" : self.https_proxy,
"ftp" : self.ftp_proxy
}
self.url = url
def makearr(tsteps):
global stemps
global steps
stemps = {}
for step in tsteps:
stemps[step] = { 'start': 0, 'end': 0 }
steps = tsteps
makearr(['init','check'])
def starttime(typ = ""):
for stemp in stemps:
if typ == "":
stemps[stemp]['start'] = time.time()
else:
stemps[stemp][typ] = time.time()
starttime()
def __str__(self):
return str(self.url)
def getrequests(self):
g=requests.get(self.url,proxies=self.proxyDict)
print g.status_code
print g.content
print self.url
stemps['init']['end'] = time.time()
#print stemps['init']['end'] - stemps['init']['start']
x= stemps['init']['end'] - stemps['init']['start']
print x
test=BaseCheck(url='http://google.com')
test.getrequests()
It’s a bit late but here is a wrapper class that simplifies scraping proxies and then making an http POST or GET:
ProxyRequests
https://github.com/rootVIII/proxy_requests
Already tested, the following code works. Need to use HTTPProxyAuth.
import requests
from requests.auth import HTTPProxyAuth
USE_PROXY = True
proxy_user = "aaa"
proxy_password = "bbb"
http_proxy = "http://your_proxy_server:8080"
https_proxy = "http://your_proxy_server:8080"
proxies = {
"http": http_proxy,
"https": https_proxy
}
def test(name):
print(f'Hi, {name}') # Press Ctrl+F8 to toggle the breakpoint.
# Create the session and set the proxies.
session = requests.Session()
if USE_PROXY:
session.trust_env = False
session.proxies = proxies
session.auth = HTTPProxyAuth(proxy_user, proxy_password)
r = session.get('https://www.stackoverflow.com')
print(r.status_code)
if __name__ == '__main__':
test('aaa')
I share some code how to fetch proxies from the site "https://free-proxy-list.net" and store data to a file compatible with tools like "Elite Proxy Switcher"(format IP:PORT):
##PROXY_UPDATER - get free proxies from https://free-proxy-list.net/
from lxml.html import fromstring
import requests
from itertools import cycle
import traceback
import re
######################FIND PROXIES#########################################
def get_proxies():
url = 'https://free-proxy-list.net/'
response = requests.get(url)
parser = fromstring(response.text)
proxies = set()
for i in parser.xpath('//tbody/tr')[:299]: #299 proxies max
proxy = ":".join([i.xpath('.//td[1]/text()')
[0],i.xpath('.//td[2]/text()')[0]])
proxies.add(proxy)
return proxies
######################write to file in format IP:PORT######################
try:
proxies = get_proxies()
f=open('proxy_list.txt','w')
for proxy in proxies:
f.write(proxy+'\n')
f.close()
print ("DONE")
except:
print ("MAJOR ERROR")

Categories

Resources