Is there any Python HTTP library that helps to imitate one of popular web-browser and has HTTPS support? I would like to define the order of HTTP headers, the presence of each exact header, the order of cookies values - everything that relates to "fingerprint" of a browser. We need that to test specific web server.
httplib.request will take an OrderedDict for headers. Some headers will be added automatically for protocol compliance, which will be left out if you specify them in your supplied headers.
Take a look at the putheader and _send_request methods, which you could override if their behaviour didn't suit your purposes.
>>> import httplib
>>> from collections import OrderedDict
>>> h = OrderedDict(('X-A','a'),('X-B','b'),('X-C','c'))
>>> c = httplib.HTTPConnection('localhost')
>>> c.set_debuglevel(1)
>>> r = c.request('GET','/','',h)
send: 'GET / HTTP/1.1\r\nHost: localhost\r\nAccept-Encoding: identity\r\nX-A: a\r\nX-B: b\r\nX-C: c\r\n\r\n'
Check out Requests which is very easy to work with and has all you need.
Alternatively you can drive web browser itself from Python using Selenium
Related
I am making an HTTP request from frontend and I can see the port number in Host field of request Headers in dev tools (eg xyz.com:1234). But using python's requests module, host only shows xyz.com.
How can I get the port number?
The requests library does not need to create and add a Host header when you use it to make a request, but you can add a Host header if you want: just provide the headers keyword argument--e.g. headers={'Host': 'xyz.com:1234'} if using your example above.
Parsing a port number from a URL, a manual approach
Your question seems to be more tied to parsing a port number for a request, however, and for that an example should clear things up for you:
from urllib.parse import urlparse
import requests
def get_port(url: str) -> int:
schema_ports = {'http': 80, 'https': 443}
parsed_url = urlparse(url)
if parsed_url.port:
return parsed_url.port
return schema_ports.get(parsed_url.scheme, None)
ports = (
get_port(requests.get('http://localhost:8001').request.url),
get_port(requests.get('http://google.com').request.url),
get_port(requests.get('https://google.com').request.url)
)
print(ports) # (8001, 80, 443)
In this example, there are three HTTP GET requests with the requests library. Although in this contrived example you already see the request URL, if you are working on a solution from a generic requests.models.Response object you can get the request URL from the request.url attribute. You then need to realize in cases where no port is specified explicitly, you will need to infer a reasonable default (as there is no explicit port). The get_port definition above gives an example of this for two common schemes (HTTP and HTTPS).
Read about Python's standard library's urllib.parse module for more information.
A more automated approach, leaning on the standard library
The manual approach described above describes how to think about this problem in a generic sense, but it does not scale easily to the many common schemes that may exist (ssh, gopher, etc.).
On POSIX systems, the /etc/services file maintains mappings for common service schemes to ports/protocols and optional descriptions, e.g.
http 80/udp www www-http # World Wide Web HTTP
http 80/tcp www www-http # World Wide Web HTTP
The getservbyname function in Python's socket library has a way to tap into this type of mapping:
>>> socket.getservbyname('https')
443
>>> socket.getservbyname('http')
80
With this, we can refine my first example to avoid manually specifying mappings for common schemes:
import socket
from urllib.parse import urlparse
import requests
def get_port(url: str) -> int:
parsed_url = urlparse(url)
if parsed_url.port:
return parsed_url.port
try:
return socket.getservbyname(parsed_url.scheme)
except OSError:
return None
ports = (
get_port(requests.get('http://localhost:8001').request.url),
get_port(requests.get('http://google.com').request.url),
get_port(requests.get('https://google.com').request.url)
)
print(ports) # (8001, 80, 443)
I am trying to make an HTTP GET API call to one of my server, which support HTTP basic authentication using an API key in base64 encoding. so basically I want to add my authorization header in base64 encoding to my request.
The one method of authorization I know is:
>>> import requests
>>> r = requests.get('https://test.com/test-API-Gateway/v0/deployments', auth=('user', 'password'), verify=False)).text
>>> print r
{"statusCode":401,"statusMsg":Unauthorized,"result":[]}
But my server does not return anything, since it does not take id and password for authentication, rather it needs the base64 encoding header. Can you please tell me how to achieve this?
Thanks in advance.
The Python Requests library does allow you to add custom headers. You should be able to create the appropriate header (with your base64 encoding) and pass it as a parameter, like so:
import requests
url = 'https://test.com/test-API-Gateway/v0/deployments'
myheaders = {'my-header-param': 'somedata'}
r = requests.get(url, headers=myheaders, verify=False)).text
The related documentation can be found here.
I'm trying to build a website using web.py, which is able to search the mobile.de database (mobile.de is a German car sales website). For this I need to use the mobile.de API and make a GET request to it doing the following (this is an example from the API docs):
GET /1.0.0/ad/search?exteriorColor=BLACK&modificationTime.min=2012-05-04T18:13:51.0Z HTTP/1.0
Host: services.mobile.de
Authorization: QWxhZGluOnNlc2FtIG9wZW4=
Accept: application/xml
(The authorization needs to be my username and password joined together using a colon and then being encoded using Base64.)
So I use urllib2 to do the request as follows:
>>> import base64
>>> import urllib2
>>> headers = {'Authorization': base64.b64encode('myusername:mypassw'), 'Accept': 'application/xml'}
>>> req = urllib2.Request('http://services.mobile.de/1.0.0/ad/search?exteriorColor=BLACK', headers=headers)
And from here I am unsure how to proceed. req appears to be an instance with some methods to get the information in it. But did it actually send the request? And if so, where can I get the response?
All tips are welcome!
You need to call req.read() to call the URL and get the response.
But you'd be better off using the requests library, which is much easier to use.
how do i change my headers and request so that i appear as firefox ...
like when request to some servers
import urllib
f = urllib.urlopen("rss feed")
they deny my request saying your client dosent have permission...
i get reply but the reply contains " your client dosent have permission"
so how do i get around this and get the data...
http://vsbabu.org/mt/archives/2003/05/27/urllib2_setting_http_headers.html
If you want to use good old urllib instead of newer, fancier urllib2, then as urllib's docs say, and I quote,
For example, applications may want to specify a different User-Agent header than URLopener defines. This can be accomplished with the following code:
import urllib
class AppURLopener(urllib.FancyURLopener):
version = "App/1.7"
urllib._urlopener = AppURLopener()
Of course, you'll want a version (aka User-Agent header) suitable for whatever version of Firefox (or w/ever else;-) you want to pretend you are;-).
I am using urllib.urlretrieve in Python to download websites. Though some websites seem to not want me to download them, unless they have a proper referrer from their own site. Does anybody know of a way I can set a referrer in one of Python's libraries or a external one to.
import urllib2
req = urllib2.Request('http://www.example.com/')
req.add_header('Referer', 'http://www.python.org/')
r = urllib2.urlopen(req)
adopted from http://docs.python.org/library/urllib2.html
urllib makes it hard to send arbitrary headers with the request; you could use urllib2, which lets you build and send a Request object with arbitrary headers (including of course the -- alas sadly spelled;-) -- Referer). Doesn't offer urlretrieve, but it's easy to just urlopen as you with and copy the resulting file-like object to disk if you want (directly, or e.g. via shutil functions).
Also, using urllib2 with build_opener you can do this:
import urllib2
opener = urllib2.build_opener()
opener.addheaders = [('Referer', 'http://www.python.org/')]
opener.open('http://www.example.com/')