I've been trying to debug a Python script I've inherited. It's trying to POST a CSV to a website via HTTPLib. The problem, as far as I can tell, is that HTTPLib doesn't handle receiving a 100-continue response, as per python http client stuck on 100 continue. Similarly to that post, this "Just Works" via Curl, but for various reasons we need this to run from a Python script.
I've tried to employ the work-around as detailed in an answer on that post, but I can't find a way to use that to submit the CSV after accepting the 100-continue response.
The general flow needs to be like this:
-> establish connection
-> send data including "expect: 100-continue" header, but not including the JSON body yet
<- receive "100-continue"
-> using the same connection, send the JSON body of the request
<- receive the 200 OK message, in a JSON response with other information
Here's the code in its current state, with my 10+ other commented remnants of other attempted workarounds removed:
#!/usr/bin/env python
import os
import ssl
import http.client
import binascii
import logging
import json
#classes taken from https://stackoverflow.com/questions/38084993/python-http-client-stuck-on-100-continue
class ContinueHTTPResponse(http.client.HTTPResponse):
def _read_status(self, *args, **kwargs):
version, status, reason = super()._read_status(*args, **kwargs)
if status == 100:
status = 199
return version, status, reason
def begin(self, *args, **kwargs):
super().begin(*args, **kwargs)
if self.status == 199:
self.status = 100
def _check_close(self, *args, **kwargs):
return super()._check_close(*args, **kwargs) and self.status != 100
class ContinueHTTPSConnection(http.client.HTTPSConnection):
response_class = ContinueHTTPResponse
def getresponse(self, *args, **kwargs):
logging.debug('running getresponse')
response = super().getresponse(*args, **kwargs)
if response.status == 100:
setattr(self, '_HTTPConnection__state', http.client._CS_REQ_SENT)
setattr(self, '_HTTPConnection__response', None)
return response
def uploadTradeIngest(ingestFile, certFile, certPass, host, port, url):
boundary = binascii.hexlify(os.urandom(16)).decode("ascii")
headers = {
"accept": "application/json",
"Content-Type": "multipart/form-data; boundary=%s" % boundary,
"Expect": "100-continue",
}
context = ssl.SSLContext(ssl.PROTOCOL_SSLv23)
context.load_cert_chain(certfile=certFile, password=certPass)
connection = ContinueHTTPSConnection(
host, port=port, context=context)
with open(ingestFile, "r") as fh:
ingest = fh.read()
## Create form-data boundary
ingest = "--%s\r\nContent-Disposition: form-data; " % boundary + \
"name=\"file\"; filename=\"%s\"" % os.path.basename(ingestFile) + \
"\r\n\r\n%s\r\n--%s--\r\n" % (ingest, boundary)
print("pre-request")
connection.request(
method="POST", url=url, headers=headers)
print("post-request")
#resp = connection.getresponse()
resp = connection.getresponse()
if resp.status == http.client.CONTINUE:
resp.read()
print("pre-send ingest")
ingest = json.dumps(ingest)
ingest = ingest.encode()
print(ingest)
connection.send(ingest)
print("post-send ingest")
resp = connection.getresponse()
print("response1")
print(resp)
print("response2")
print(resp.read())
print("response3")
return resp.read()
But this simply returns a 400 "Bad Request" response. The problem (I think) lies with the formatting and type of the "ingest" variable. If I don't run it through json.dumps() and encode() then the HTTPConnection.send() method rejects it:
ERROR: Got error: memoryview: a bytes-like object is required, not 'str'
I had a look at using the Requests library instead, but I couldn't get it to use my local certificate bundle to accept the site's certificate. I have a full chain with an encrypted key, which I did decrypt, but still ran into constant SSL_VERIFY errors from Requests. If you have a suggestion to solve my current problem with Requests, I'm happy to go down that path too.
How can I use HTTPLib or Requests (or any other libraries) to achieve what I need to achieve?
In case anyone comes across this problem in future, I ended up working around it with a bit of a kludge. I tried HTTPLib, Requests, and URLLib3 are all known to not handle the 100-continue header, so... I just wrote a Python wrapper around Curl via the subprocess.run() function, like this:
def sendReq(upFile):
sendFile=f"file=#{upFile}"
completed = subprocess.run([
curlPath,
'--cert',
args.cert,
'--key',
args.key,
targetHost,
'-H',
'accept: application/json',
'-H',
'Content-Type: multipart/form-data',
'-H',
'Expect: 100-continue',
'-F',
sendFile,
'-s'
], stdout=subprocess.PIPE, universal_newlines=True)
return completed.stdout
The only issue I had with this was that it fails if Curl was built against the NSS libraries, which I resolved by including a statically-built Curl binary with the package, the path to which is contained in the curlPath variable in the code. I obtained this binary from this Github repo.
Related
I found this code and it seemed to reliable and efficient to me but unfortunately it's for python2 and also it uses urllib2 while everybody is saying requests is faster. What would be the equivalent code of the following (or something more efficient or more reliable) in python 3?
#!/usr/bin/env python
#-*- coding:utf-8 -*-
import sys
import urllib2
# This script uses HEAD requests (with fallback in case of 405)
# to follow the redirect path up to the real URL
# (c) 2012 Filippo Valsorda - FiloSottile
# Released under the GPL license
class HeadRequest(urllib2.Request):
def get_method(self):
return "HEAD"
class HEADRedirectHandler(urllib2.HTTPRedirectHandler):
"""
Subclass the HTTPRedirectHandler to make it use our
HeadRequest also on the redirected URL
"""
def redirect_request(self, req, fp, code, msg, headers, newurl):
if code in (301, 302, 303, 307):
newurl = newurl.replace(' ', '%20')
newheaders = dict((k,v) for k,v in req.headers.items()
if k.lower() not in ("content-length", "content-type"))
return HeadRequest(newurl,
headers=newheaders,
origin_req_host=req.get_origin_req_host(),
unverifiable=True)
else:
raise urllib2.HTTPError(req.get_full_url(), code, msg, headers, fp)
class HTTPMethodFallback(urllib2.BaseHandler):
"""
Fallback to GET if HEAD is not allowed (405 HTTP error)
"""
def http_error_405(self, req, fp, code, msg, headers):
fp.read()
fp.close()
newheaders = dict((k,v) for k,v in req.headers.items()
if k.lower() not in ("content-length", "content-type"))
return self.parent.open(urllib2.Request(req.get_full_url(),
headers=newheaders,
origin_req_host=req.get_origin_req_host(),
unverifiable=True))
# Build our opener
opener = urllib2.OpenerDirector()
for handler in [urllib2.HTTPHandler, urllib2.HTTPDefaultErrorHandler,
HTTPMethodFallback, HEADRedirectHandler,
urllib2.HTTPErrorProcessor, urllib2.HTTPSHandler]:
opener.add_handler(handler())
response = opener.open(HeadRequest(sys.argv[1]))
print(response.geturl())
By the way Head request is not actually what I need. I only want to know if the link is broken(In some sites if you give them a broken code they will redirect you back to the main page of the site and I want my code to recognize this too) and head request is the most efficient solution that came to my mind for this so if you know any better way I'd also appreciate that.
Take a look at Requests: http://docs.python-requests.org/en/master/
To do a head request, you simply go:
import requests
r=requests.head('http://www.example.com')
Then you can access the object for what you need. For example, the status code:
print r.status_code
Update:
If you are wanting to check to see if a page is live, you'll want to do a GET request. I've seen cases of HEAD requests returning a 200 response and on the same URL, a GET request returning a 500
this is a test script to request data from Rovi API, provided by the API itself.
test.py
import requests
import time
import hashlib
import urllib
class AllMusicGuide(object):
api_url = 'http://api.rovicorp.com/data/v1.1/descriptor/musicmoods'
key = 'my key'
secret = 'secret'
def _sig(self):
timestamp = int(time.time())
m = hashlib.md5()
m.update(self.key)
m.update(self.secret)
m.update(str(timestamp))
return m.hexdigest()
def get(self, resource, params=None):
"""Take a dict of params, and return what we get from the api"""
if not params:
params = {}
params = urllib.urlencode(params)
sig = self._sig()
url = "%s/%s?apikey=%s&sig=%s&%s" % (self.api_url, resource, self.key, sig, params)
resp = requests.get(url)
if resp.status_code != 200:
# THROW APPROPRIATE ERROR
print ('unknown err')
return resp.content
from another script I import the module:
from roviclient.test import AllMusicGuide
and create an instance of the class inside a mood function:
def mood():
test = AllMusicGuide()
print (test.get('[moodids=moodids]'))
according to documentation, the following is the syntax for requests:
descriptor/musicmoods?apikey=apikey&sig=sig [&moodids=moodids] [&format=format] [&country=country] [&language=language]
but running the script I get the following error:
unknown err
<h1>Gateway Timeout</h1>:
what is wrong?
"504, try once more. 502, it went through."
Your code is fine, this is a network issue. "Gateway Timeout" is a 504. The intermediate host handling your request was unable to complete it. It made its own request to another server on your behalf in order to handle yours, but this request took too long and timed out. Usually this is because of network congestion in the backend; if you try a few more times, does it sometimes work?
In any case, I would talk to your network administrator. There could be any number of reasons for this and they should be able to help fix it for you.
I have looked and perhaps i missed it. I currently have a file such as the one below:
PUT /URL/TO/SEND/REQUEST
Host: 127.0.0.1
Connection: keep-alive
...
bunch of data here
This file contains the header & the data i want to send over ssl. I know on windows i can use fiddler etc.. to send this raw data BUT i was hoping to use python. I tried looking (may be not hard enough) at urllib2 urllib & httplib to see if i could just send this file as the entire request i don't want to deal with parsing the file etc... Is this possible?
I did notice that in httplib i can use request where "body can be a file object." but from the description seems as though it still sends the header seperately and that file is only for the data being sent.
Thanks
It isn't documented, but it looks like you should be able to use httplib.HTTPConnection.send() for this:
In [13]: httplib.HTTPConnection.send??
Type: instancemethod
String Form:<unbound method HTTPConnection.send>
File: /usr/local/lib/python2.7/httplib.py
Definition: httplib.HTTPConnection.send(self, data)
Source:
def send(self, data):
"""Send `data' to the server."""
if self.sock is None:
if self.auto_open:
self.connect()
else:
raise NotConnected()
if self.debuglevel > 0:
print "send:", repr(data)
blocksize = 8192
if hasattr(data,'read') and not isinstance(data, array):
if self.debuglevel > 0: print "sendIng a read()able"
datablock = data.read(blocksize)
while datablock:
self.sock.sendall(datablock)
datablock = data.read(blocksize)
else:
self.sock.sendall(data)
The request() method combines the header and body and passes it to this function, which looks like it should handle strings or file objects.
Of course you will still need to know the host so that you can create the HTTPConnection object, so your code might look something like this (untested):
import httplib
conn = httplib.HTTPConnection('127.0.0.1')
conn.send(open(filename))
response = conn.getresponse()
edit: It turns out there is some internal state stuff that keeps this from working as is, here is a workaround (full example with google main page), but it is a bit of a hack. Tested using Python 2.6 and 2.7, does not appear to work on 3.x by just replacing httplib with http.client:
import httplib
conn = httplib.HTTPConnection('www.google.com')
conn.send('GET / HTTP/1.1\r\nHost: www.google.com\r\n\r\n')
conn._HTTPConnection__state = httplib._CS_REQ_SENT
response = conn.getresponse()
The key part here is setting conn.__state (mangled name) to the httplib._CS_REQ_SENT after calling send().
I am using Python's requests library in one method of my application. The body of the method looks like this:
def handle_remote_file(url, **kwargs):
response = requests.get(url, ...)
buff = StringIO.StringIO()
buff.write(response.content)
...
return True
I'd like to write some unit tests for that method, however, what I want to do is to pass a fake local url such as:
class RemoteTest(TestCase):
def setUp(self):
self.url = 'file:///tmp/dummy.txt'
def test_handle_remote_file(self):
self.assertTrue(handle_remote_file(self.url))
When I call requests.get with a local url, I got the KeyError exception below:
requests.get('file:///tmp/dummy.txt')
/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/requests/packages/urllib3/poolmanager.pyc in connection_from_host(self, host, port, scheme)
76
77 # Make a fresh ConnectionPool of the desired type
78 pool_cls = pool_classes_by_scheme[scheme]
79 pool = pool_cls(host, port, **self.connection_pool_kw)
80
KeyError: 'file'
The question is how can I pass a local url to requests.get?
PS: I made up the above example. It possibly contains many errors.
As #WooParadog explained requests library doesn't know how to handle local files. Although, current version allows to define transport adapters.
Therefore you can simply define you own adapter which will be able to handle local files, e.g.:
from requests_testadapter import Resp
import os
class LocalFileAdapter(requests.adapters.HTTPAdapter):
def build_response_from_file(self, request):
file_path = request.url[7:]
with open(file_path, 'rb') as file:
buff = bytearray(os.path.getsize(file_path))
file.readinto(buff)
resp = Resp(buff)
r = self.build_response(request, resp)
return r
def send(self, request, stream=False, timeout=None,
verify=True, cert=None, proxies=None):
return self.build_response_from_file(request)
requests_session = requests.session()
requests_session.mount('file://', LocalFileAdapter())
requests_session.get('file://<some_local_path>')
I'm using requests-testadapter module in the above example.
Here's a transport adapter I wrote which is more featureful than b1r3k's and has no additional dependencies beyond Requests itself. I haven't tested it exhaustively yet, but what I have tried seems to be bug-free.
import requests
import os, sys
if sys.version_info.major < 3:
from urllib import url2pathname
else:
from urllib.request import url2pathname
class LocalFileAdapter(requests.adapters.BaseAdapter):
"""Protocol Adapter to allow Requests to GET file:// URLs
#todo: Properly handle non-empty hostname portions.
"""
#staticmethod
def _chkpath(method, path):
"""Return an HTTP status for the given filesystem path."""
if method.lower() in ('put', 'delete'):
return 501, "Not Implemented" # TODO
elif method.lower() not in ('get', 'head'):
return 405, "Method Not Allowed"
elif os.path.isdir(path):
return 400, "Path Not A File"
elif not os.path.isfile(path):
return 404, "File Not Found"
elif not os.access(path, os.R_OK):
return 403, "Access Denied"
else:
return 200, "OK"
def send(self, req, **kwargs): # pylint: disable=unused-argument
"""Return the file specified by the given request
#type req: C{PreparedRequest}
#todo: Should I bother filling `response.headers` and processing
If-Modified-Since and friends using `os.stat`?
"""
path = os.path.normcase(os.path.normpath(url2pathname(req.path_url)))
response = requests.Response()
response.status_code, response.reason = self._chkpath(req.method, path)
if response.status_code == 200 and req.method.lower() != 'head':
try:
response.raw = open(path, 'rb')
except (OSError, IOError) as err:
response.status_code = 500
response.reason = str(err)
if isinstance(req.url, bytes):
response.url = req.url.decode('utf-8')
else:
response.url = req.url
response.request = req
response.connection = self
return response
def close(self):
pass
(Despite the name, it was completely written before I thought to check Google, so it has nothing to do with b1r3k's.) As with the other answer, follow this with:
requests_session = requests.session()
requests_session.mount('file://', LocalFileAdapter())
r = requests_session.get('file:///path/to/your/file')
The easiest way seems using requests-file.
https://github.com/dashea/requests-file (available through PyPI too)
"Requests-File is a transport adapter for use with the Requests Python library to allow local filesystem access via file:// URLs."
This in combination with requests-html is pure magic :)
packages/urllib3/poolmanager.py pretty much explains it. Requests doesn't support local url.
pool_classes_by_scheme = {
'http': HTTPConnectionPool,
'https': HTTPSConnectionPool,
}
In a recent project, I've had the same issue. Since requests doesn't support the "file" scheme, I'll patch our code to load the content locally. First, I define a function to replace requests.get:
def local_get(self, url):
"Fetch a stream from local files."
p_url = six.moves.urllib.parse.urlparse(url)
if p_url.scheme != 'file':
raise ValueError("Expected file scheme")
filename = six.moves.urllib.request.url2pathname(p_url.path)
return open(filename, 'rb')
Then, somewhere in test setup or decorating the test function, I use mock.patch to patch the get function on requests:
#mock.patch('requests.get', local_get)
def test_handle_remote_file(self):
...
This technique is somewhat brittle -- it doesn't help if the underlying code calls requests.request or constructs a Session and calls that. There may be a way to patch requests at a lower level to support file: URLs, but in my initial investigation, there didn't seem to be an obvious hook point, so I went with this simpler approach.
To load a file from a local URL, e.g. an image file you can do this:
import urllib
from PIL import Image
Image.open(urllib.request.urlopen('file:///path/to/your/file.png'))
I think simple solution for this will be creating temporary http server using python and using it.
Put all your files in temporary folder eg. tempFolder
Go to that directory and create a temporary http server in terminal/cmd as per your OS using command python -m http.server 8000 (Note 8000 is port no.)
This will you give you a link to http server. You can access it from http://127.0.0.1:8000/
Open your desired file in browser and copy the link to your url.
I have to implement a function to get headers only (without doing a GET or POST) using urllib2. Here is my function:
def getheadersonly(url, redirections = True):
if not redirections:
class MyHTTPRedirectHandler(urllib2.HTTPRedirectHandler):
def http_error_302(self, req, fp, code, msg, headers):
return urllib2.HTTPRedirectHandler.http_error_302(self, req, fp, code, msg, headers)
http_error_301 = http_error_303 = http_error_307 = http_error_302
cookieprocessor = urllib2.HTTPCookieProcessor()
opener = urllib2.build_opener(MyHTTPRedirectHandler, cookieprocessor)
urllib2.install_opener(opener)
class HeadRequest(urllib2.Request):
def get_method(self):
return "HEAD"
info = {}
info['headers'] = dict(urllib2.urlopen(HeadRequest(url)).info())
info['finalurl'] = urllib2.urlopen(HeadRequest(url)).geturl()
return info
Uses code from answer this and this. However this is doing redirection even when the flag is False. I tried the code with:
print getheadersonly("http://ms.com", redirections = False)['finalurl']
print getheadersonly("http://ms.com")['finalurl']
Its giving morganstanley.com in both cases. What is wrong here?
Firstly, your code contains several bugs:
On each request of getheadersonly you install a new global urlopener which is then used in subsequent calls of urllib2.urlopen
You make two HTTP-requests to get two different attributes of a response.
The implementation of urllib2.HTTPRedirectHandler.http_error_302 is not so trivial and I do not understand how can it prevent redirections in the first place.
Basically, you should understand that each handler is installed in an opener to handle certain kind of response. urllib2.HTTPRedirectHandler is there to convert certain http-codes into a redirections. If you do not want redirections, do not add a redirection handler into the opener. If you do not want to open ftp links, do not add FTPHandler, etc.
That is all you need is to create a new opener and add the urllib2.HTTPHandler() in it, customize the request to be 'HEAD' request and pass an instance of the request to the opener, read the attributes, and close the response.
class HeadRequest(urllib2.Request):
def get_method(self):
return 'HEAD'
def getheadersonly(url, redirections=True):
opener = urllib2.OpenerDirector()
opener.add_handler(urllib2.HTTPHandler())
opener.add_handler(urllib2.HTTPDefaultErrorHandler())
if redirections:
# HTTPErrorProcessor makes HTTPRedirectHandler work
opener.add_handler(urllib2.HTTPErrorProcessor())
opener.add_handler(urllib2.HTTPRedirectHandler())
try:
res = opener.open(HeadRequest(url))
except urllib2.HTTPError, res:
pass
res.close()
return dict(code=res.code, headers=res.info(), finalurl=res.geturl())
You can send a HEAD request using httplib. A HEAD request is the same as a GET request, but the server doesn't send then message body.