Port number not showing in headers.host - python

I am making an HTTP request from frontend and I can see the port number in Host field of request Headers in dev tools (eg xyz.com:1234). But using python's requests module, host only shows xyz.com.
How can I get the port number?

The requests library does not need to create and add a Host header when you use it to make a request, but you can add a Host header if you want: just provide the headers keyword argument--e.g. headers={'Host': 'xyz.com:1234'} if using your example above.
Parsing a port number from a URL, a manual approach
Your question seems to be more tied to parsing a port number for a request, however, and for that an example should clear things up for you:
from urllib.parse import urlparse
import requests
def get_port(url: str) -> int:
schema_ports = {'http': 80, 'https': 443}
parsed_url = urlparse(url)
if parsed_url.port:
return parsed_url.port
return schema_ports.get(parsed_url.scheme, None)
ports = (
get_port(requests.get('http://localhost:8001').request.url),
get_port(requests.get('http://google.com').request.url),
get_port(requests.get('https://google.com').request.url)
)
print(ports) # (8001, 80, 443)
In this example, there are three HTTP GET requests with the requests library. Although in this contrived example you already see the request URL, if you are working on a solution from a generic requests.models.Response object you can get the request URL from the request.url attribute. You then need to realize in cases where no port is specified explicitly, you will need to infer a reasonable default (as there is no explicit port). The get_port definition above gives an example of this for two common schemes (HTTP and HTTPS).
Read about Python's standard library's urllib.parse module for more information.
A more automated approach, leaning on the standard library
The manual approach described above describes how to think about this problem in a generic sense, but it does not scale easily to the many common schemes that may exist (ssh, gopher, etc.).
On POSIX systems, the /etc/services file maintains mappings for common service schemes to ports/protocols and optional descriptions, e.g.
http 80/udp www www-http # World Wide Web HTTP
http 80/tcp www www-http # World Wide Web HTTP
The getservbyname function in Python's socket library has a way to tap into this type of mapping:
>>> socket.getservbyname('https')
443
>>> socket.getservbyname('http')
80
With this, we can refine my first example to avoid manually specifying mappings for common schemes:
import socket
from urllib.parse import urlparse
import requests
def get_port(url: str) -> int:
parsed_url = urlparse(url)
if parsed_url.port:
return parsed_url.port
try:
return socket.getservbyname(parsed_url.scheme)
except OSError:
return None
ports = (
get_port(requests.get('http://localhost:8001').request.url),
get_port(requests.get('http://google.com').request.url),
get_port(requests.get('https://google.com').request.url)
)
print(ports) # (8001, 80, 443)

Related

How to set proxy while using urllib3.PoolManager in python

I am currently using connection pool provided by urllib3 in python like the following,
pool = urllib3.PoolManager(maxsize = 10)
resp = pool.request('GET', 'http://example.com')
content = resp.read()
resp.release_conn()
However, I don't know how to set proxy while using this connection pool. I tried to set proxy in the 'request' like pool.request('GET', 'http://example.com', proxies={'http': '123.123.123.123:8888'} but it didn't work.
Can someone tell me how to set the proxy while using connection pool
Thanks~
There is an example for how to use a proxy with urllib3 in the Advanced Usage section of the documentation. I adapted it to fit your example:
import urllib3
proxy = urllib3.ProxyManager('http://123.123.123.123:8888/', maxsize=10)
resp = proxy.request('GET', 'http://example.com/')
content = resp.read()
# You don't actually need to release_conn() if you're reading the full response.
# This will be a harmless no-op:
resp.release_conn()
The ProxyManager behaves the same way as a PoolManager would.

How can I target a specific application server using the HTTP "Host" header (using GoLang)

The question:
Why can't I target a server using it's IP address in the request URL and the hostname as the "Host" header using GoLang?
Why does the same thing work using python? (2.7.6 - urllib2)
Background:
I'm writing a systems test that will send HTTP requests to several specific application servers that I'm testing and inspect the results for correctness. Each application server has the same function and should return the same response data. These servers are grouped behind load balancers. These load balancers are then resolved by DNS and the traffic is forwarded to the backend servers as appropriate. In order to target each server independently (for the tests) I am using each server's IP address in the URL instead of the usual hostname and I'm setting the "Host" HTTP header to be the hostname that usually goes in the url. This is to make sure the SSL cert
can decode the secure (HTTPS) request.
Current status:
I already have a python script that sends these test requests. Here's the basic idea of that script:
headers = {"Host": "<hostname>"} # this <hostname> is usually what would go in the URL on the next line
request = urllib2.Request('https://<ip-address>/path?query=123', headers=headers)
response = urllib2.urlopen(request)
# etc...
This code has been working fine for several months now. I've verified that it is indeed targeting the correct servers, based on IP address.
Goal:
I want to replicate this script in golang to make use of the concurrent capabilities of go. The reason I'm using go is that I'd like to send lots more requests at a time (using goroutines).
Problem:
Using the same technique as shown above (but in Go) I get the following error:
Get https://<ip-address>/path?query=123: x509: cannot validate certificate for <ip-address> because it doesn't contain any IP SANs
Here's an example of the kind of code I have written:
request, _ := http.NewRequest("GET", "https://<ip-address>/path?query=123", nil)
request.Header.Set("Host", "<hostname>")
client := &http.Client{}
response, err := client.Do(request)
// etc...
Again, why does the python code work while the GoLang code returns an error?
As per the python documentation:
Warning: HTTPS requests do not do any verification of the server’s certificate.
To replicate this behavior in Go see: https://stackoverflow.com/a/12122718/216488

Using urllib2 via proxy

I am trying to use urllib2 through a proxy; however, after trying just about every variation of passing my verification details using urllib2, I either get a request that hangs forever and returns nothing or I get 407 Errors. I can connect to the web fine using my browser which connects to a prox-pac and redirects accordingly; however, I can't seem to do anything via the command line curl, wget, urllib2 etc. even if I use the proxies that the prox-pac redirects to. I tried setting my proxy to all of the proxies from the pac-file using urllib2, none of which work.
My current script looks like this:
import urllib2 as url
proxy = url.ProxyHandler({'http': 'username:password#my.proxy:8080'})
auth = url.HTTPBasicAuthHandler()
opener = url.build_opener(proxy, auth, url.HTTPHandler)
url.install_opener(opener)
url.urlopen("http://www.google.com/")
which throws HTTP Error 407: Proxy Authentication Required and I also tried:
import urllib2 as url
handlePass = url.HTTPPasswordMgrWithDefaultRealm()
handlePass.add_password(None, "http://my.proxy:8080", "username", "password")
auth_handler = url.HTTPBasicAuthHandler(handlePass)
opener = url.build_opener(auth_handler)
url.install_opener(opener)
url.urlopen("http://www.google.com")
which hangs like curl or wget timing out.
What do I need to do to diagnose the problem? How is it possible that I can connect via my browser but not from the command line on the same computer using what would appear to be the same proxy and credentials?
Might it be something to do with the router? if so, how can it distinguish between browser HTTP requests and command line HTTP requests?
Frustrations like this are what drove me to use Requests. If you're doing significant amounts of work with urllib2, you really ought to check it out. For example, to do what you wish to do using Requests, you could write:
import requests
from requests.auth import HTTPProxyAuth
proxy = {'http': 'http://my.proxy:8080'}
auth = HTTPProxyAuth('username', 'password')
r = requests.get('http://wwww.google.com/', proxies=proxy, auth=auth)
print r.text
Or you could wrap it in a Session object and every request will automatically use the proxy information (plus it will store & handle cookies automatically!):
s = requests.Session(proxies=proxy, auth=auth)
r = s.get('http://www.google.com/')
print r.text

Full control over HTTP headers in Python?

Is there any Python HTTP library that helps to imitate one of popular web-browser and has HTTPS support? I would like to define the order of HTTP headers, the presence of each exact header, the order of cookies values - everything that relates to "fingerprint" of a browser. We need that to test specific web server.
httplib.request will take an OrderedDict for headers. Some headers will be added automatically for protocol compliance, which will be left out if you specify them in your supplied headers.
Take a look at the putheader and _send_request methods, which you could override if their behaviour didn't suit your purposes.
>>> import httplib
>>> from collections import OrderedDict
>>> h = OrderedDict(('X-A','a'),('X-B','b'),('X-C','c'))
>>> c = httplib.HTTPConnection('localhost')
>>> c.set_debuglevel(1)
>>> r = c.request('GET','/','',h)
send: 'GET / HTTP/1.1\r\nHost: localhost\r\nAccept-Encoding: identity\r\nX-A: a\r\nX-B: b\r\nX-C: c\r\n\r\n'
Check out Requests which is very easy to work with and has all you need.
Alternatively you can drive web browser itself from Python using Selenium

python changing headers

how do i change my headers and request so that i appear as firefox ...
like when request to some servers
import urllib
f = urllib.urlopen("rss feed")
they deny my request saying your client dosent have permission...
i get reply but the reply contains " your client dosent have permission"
so how do i get around this and get the data...
http://vsbabu.org/mt/archives/2003/05/27/urllib2_setting_http_headers.html
If you want to use good old urllib instead of newer, fancier urllib2, then as urllib's docs say, and I quote,
For example, applications may want to specify a different User-Agent header than URLopener defines. This can be accomplished with the following code:
import urllib
class AppURLopener(urllib.FancyURLopener):
version = "App/1.7"
urllib._urlopener = AppURLopener()
Of course, you'll want a version (aka User-Agent header) suitable for whatever version of Firefox (or w/ever else;-) you want to pretend you are;-).

Categories

Resources