Fetch a file from a local url with Python requests?

Fetch a file from a local url with Python requests? - python

I am using Python's requests library in one method of my application. The body of the method looks like this:
def handle_remote_file(url, **kwargs):
response = requests.get(url, ...)
buff = StringIO.StringIO()
buff.write(response.content)
...
return True
I'd like to write some unit tests for that method, however, what I want to do is to pass a fake local url such as:
class RemoteTest(TestCase):
def setUp(self):
self.url = 'file:///tmp/dummy.txt'
def test_handle_remote_file(self):
self.assertTrue(handle_remote_file(self.url))
When I call requests.get with a local url, I got the KeyError exception below:
requests.get('file:///tmp/dummy.txt')
/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/requests/packages/urllib3/poolmanager.pyc in connection_from_host(self, host, port, scheme)
76
77 # Make a fresh ConnectionPool of the desired type
78 pool_cls = pool_classes_by_scheme[scheme]
79 pool = pool_cls(host, port, **self.connection_pool_kw)
80
KeyError: 'file'
The question is how can I pass a local url to requests.get?
PS: I made up the above example. It possibly contains many errors.

As #WooParadog explained requests library doesn't know how to handle local files. Although, current version allows to define transport adapters.
Therefore you can simply define you own adapter which will be able to handle local files, e.g.:
from requests_testadapter import Resp
import os
class LocalFileAdapter(requests.adapters.HTTPAdapter):
def build_response_from_file(self, request):
file_path = request.url[7:]
with open(file_path, 'rb') as file:
buff = bytearray(os.path.getsize(file_path))
file.readinto(buff)
resp = Resp(buff)
r = self.build_response(request, resp)
return r
def send(self, request, stream=False, timeout=None,
verify=True, cert=None, proxies=None):
return self.build_response_from_file(request)
requests_session = requests.session()
requests_session.mount('file://', LocalFileAdapter())
requests_session.get('file://<some_local_path>')
I'm using requests-testadapter module in the above example.

Here's a transport adapter I wrote which is more featureful than b1r3k's and has no additional dependencies beyond Requests itself. I haven't tested it exhaustively yet, but what I have tried seems to be bug-free.
import requests
import os, sys
if sys.version_info.major < 3:
from urllib import url2pathname
else:
from urllib.request import url2pathname
class LocalFileAdapter(requests.adapters.BaseAdapter):
"""Protocol Adapter to allow Requests to GET file:// URLs
#todo: Properly handle non-empty hostname portions.
"""
#staticmethod
def _chkpath(method, path):
"""Return an HTTP status for the given filesystem path."""
if method.lower() in ('put', 'delete'):
return 501, "Not Implemented" # TODO
elif method.lower() not in ('get', 'head'):
return 405, "Method Not Allowed"
elif os.path.isdir(path):
return 400, "Path Not A File"
elif not os.path.isfile(path):
return 404, "File Not Found"
elif not os.access(path, os.R_OK):
return 403, "Access Denied"
else:
return 200, "OK"
def send(self, req, **kwargs): # pylint: disable=unused-argument
"""Return the file specified by the given request
#type req: C{PreparedRequest}
#todo: Should I bother filling `response.headers` and processing
If-Modified-Since and friends using `os.stat`?
"""
path = os.path.normcase(os.path.normpath(url2pathname(req.path_url)))
response = requests.Response()
response.status_code, response.reason = self._chkpath(req.method, path)
if response.status_code == 200 and req.method.lower() != 'head':
try:
response.raw = open(path, 'rb')
except (OSError, IOError) as err:
response.status_code = 500
response.reason = str(err)
if isinstance(req.url, bytes):
response.url = req.url.decode('utf-8')
else:
response.url = req.url
response.request = req
response.connection = self
return response
def close(self):
pass
(Despite the name, it was completely written before I thought to check Google, so it has nothing to do with b1r3k's.) As with the other answer, follow this with:
requests_session = requests.session()
requests_session.mount('file://', LocalFileAdapter())
r = requests_session.get('file:///path/to/your/file')

The easiest way seems using requests-file.
https://github.com/dashea/requests-file (available through PyPI too)
"Requests-File is a transport adapter for use with the Requests Python library to allow local filesystem access via file:// URLs."
This in combination with requests-html is pure magic :)

packages/urllib3/poolmanager.py pretty much explains it. Requests doesn't support local url.
pool_classes_by_scheme = {
'http': HTTPConnectionPool,
'https': HTTPSConnectionPool,
}

In a recent project, I've had the same issue. Since requests doesn't support the "file" scheme, I'll patch our code to load the content locally. First, I define a function to replace requests.get:
def local_get(self, url):
"Fetch a stream from local files."
p_url = six.moves.urllib.parse.urlparse(url)
if p_url.scheme != 'file':
raise ValueError("Expected file scheme")
filename = six.moves.urllib.request.url2pathname(p_url.path)
return open(filename, 'rb')
Then, somewhere in test setup or decorating the test function, I use mock.patch to patch the get function on requests:
#mock.patch('requests.get', local_get)
def test_handle_remote_file(self):
...
This technique is somewhat brittle -- it doesn't help if the underlying code calls requests.request or constructs a Session and calls that. There may be a way to patch requests at a lower level to support file: URLs, but in my initial investigation, there didn't seem to be an obvious hook point, so I went with this simpler approach.

To load a file from a local URL, e.g. an image file you can do this:
import urllib
from PIL import Image
Image.open(urllib.request.urlopen('file:///path/to/your/file.png'))

I think simple solution for this will be creating temporary http server using python and using it.
Put all your files in temporary folder eg. tempFolder
Go to that directory and create a temporary http server in terminal/cmd as per your OS using command python -m http.server 8000 (Note 8000 is port no.)
This will you give you a link to http server. You can access it from http://127.0.0.1:8000/
Open your desired file in browser and copy the link to your url.

Related

Trying to send Python HTTPConnection content after accepting 100-continue header

I've been trying to debug a Python script I've inherited. It's trying to POST a CSV to a website via HTTPLib. The problem, as far as I can tell, is that HTTPLib doesn't handle receiving a 100-continue response, as per python http client stuck on 100 continue. Similarly to that post, this "Just Works" via Curl, but for various reasons we need this to run from a Python script.
I've tried to employ the work-around as detailed in an answer on that post, but I can't find a way to use that to submit the CSV after accepting the 100-continue response.
The general flow needs to be like this:
-> establish connection
-> send data including "expect: 100-continue" header, but not including the JSON body yet
<- receive "100-continue"
-> using the same connection, send the JSON body of the request
<- receive the 200 OK message, in a JSON response with other information
Here's the code in its current state, with my 10+ other commented remnants of other attempted workarounds removed:
#!/usr/bin/env python
import os
import ssl
import http.client
import binascii
import logging
import json
#classes taken from https://stackoverflow.com/questions/38084993/python-http-client-stuck-on-100-continue
class ContinueHTTPResponse(http.client.HTTPResponse):
def _read_status(self, *args, **kwargs):
version, status, reason = super()._read_status(*args, **kwargs)
if status == 100:
status = 199
return version, status, reason
def begin(self, *args, **kwargs):
super().begin(*args, **kwargs)
if self.status == 199:
self.status = 100
def _check_close(self, *args, **kwargs):
return super()._check_close(*args, **kwargs) and self.status != 100
class ContinueHTTPSConnection(http.client.HTTPSConnection):
response_class = ContinueHTTPResponse
def getresponse(self, *args, **kwargs):
logging.debug('running getresponse')
response = super().getresponse(*args, **kwargs)
if response.status == 100:
setattr(self, '_HTTPConnection__state', http.client._CS_REQ_SENT)
setattr(self, '_HTTPConnection__response', None)
return response
def uploadTradeIngest(ingestFile, certFile, certPass, host, port, url):
boundary = binascii.hexlify(os.urandom(16)).decode("ascii")
headers = {
"accept": "application/json",
"Content-Type": "multipart/form-data; boundary=%s" % boundary,
"Expect": "100-continue",
}
context = ssl.SSLContext(ssl.PROTOCOL_SSLv23)
context.load_cert_chain(certfile=certFile, password=certPass)
connection = ContinueHTTPSConnection(
host, port=port, context=context)
with open(ingestFile, "r") as fh:
ingest = fh.read()
## Create form-data boundary
ingest = "--%s\r\nContent-Disposition: form-data; " % boundary + \
"name=\"file\"; filename=\"%s\"" % os.path.basename(ingestFile) + \
"\r\n\r\n%s\r\n--%s--\r\n" % (ingest, boundary)
print("pre-request")
connection.request(
method="POST", url=url, headers=headers)
print("post-request")
#resp = connection.getresponse()
resp = connection.getresponse()
if resp.status == http.client.CONTINUE:
resp.read()
print("pre-send ingest")
ingest = json.dumps(ingest)
ingest = ingest.encode()
print(ingest)
connection.send(ingest)
print("post-send ingest")
resp = connection.getresponse()
print("response1")
print(resp)
print("response2")
print(resp.read())
print("response3")
return resp.read()
But this simply returns a 400 "Bad Request" response. The problem (I think) lies with the formatting and type of the "ingest" variable. If I don't run it through json.dumps() and encode() then the HTTPConnection.send() method rejects it:
ERROR: Got error: memoryview: a bytes-like object is required, not 'str'
I had a look at using the Requests library instead, but I couldn't get it to use my local certificate bundle to accept the site's certificate. I have a full chain with an encrypted key, which I did decrypt, but still ran into constant SSL_VERIFY errors from Requests. If you have a suggestion to solve my current problem with Requests, I'm happy to go down that path too.
How can I use HTTPLib or Requests (or any other libraries) to achieve what I need to achieve?

In case anyone comes across this problem in future, I ended up working around it with a bit of a kludge. I tried HTTPLib, Requests, and URLLib3 are all known to not handle the 100-continue header, so... I just wrote a Python wrapper around Curl via the subprocess.run() function, like this:
def sendReq(upFile):
sendFile=f"file=#{upFile}"
completed = subprocess.run([
curlPath,
'--cert',
args.cert,
'--key',
args.key,
targetHost,
'-H',
'accept: application/json',
'-H',
'Content-Type: multipart/form-data',
'-H',
'Expect: 100-continue',
'-F',
sendFile,
'-s'
], stdout=subprocess.PIPE, universal_newlines=True)
return completed.stdout
The only issue I had with this was that it fails if Curl was built against the NSS libraries, which I resolved by including a statically-built Curl binary with the package, the path to which is contained in the curlPath variable in the code. I obtained this binary from this Github repo.

requests - Gateway Timeout

this is a test script to request data from Rovi API, provided by the API itself.
test.py
import requests
import time
import hashlib
import urllib
class AllMusicGuide(object):
api_url = 'http://api.rovicorp.com/data/v1.1/descriptor/musicmoods'
key = 'my key'
secret = 'secret'
def _sig(self):
timestamp = int(time.time())
m = hashlib.md5()
m.update(self.key)
m.update(self.secret)
m.update(str(timestamp))
return m.hexdigest()
def get(self, resource, params=None):
"""Take a dict of params, and return what we get from the api"""
if not params:
params = {}
params = urllib.urlencode(params)
sig = self._sig()
url = "%s/%s?apikey=%s&sig=%s&%s" % (self.api_url, resource, self.key, sig, params)
resp = requests.get(url)
if resp.status_code != 200:
# THROW APPROPRIATE ERROR
print ('unknown err')
return resp.content
from another script I import the module:
from roviclient.test import AllMusicGuide
and create an instance of the class inside a mood function:
def mood():
test = AllMusicGuide()
print (test.get('[moodids=moodids]'))
according to documentation, the following is the syntax for requests:
descriptor/musicmoods?apikey=apikey&sig=sig [&moodids=moodids] [&format=format] [&country=country] [&language=language]
but running the script I get the following error:
unknown err
<h1>Gateway Timeout</h1>:
what is wrong?

"504, try once more. 502, it went through."
Your code is fine, this is a network issue. "Gateway Timeout" is a 504. The intermediate host handling your request was unable to complete it. It made its own request to another server on your behalf in order to handle yours, but this request took too long and timed out. Usually this is because of network congestion in the backend; if you try a few more times, does it sometimes work?
In any case, I would talk to your network administrator. There could be any number of reasons for this and they should be able to help fix it for you.

How to check that url is a valid image source using urllib2? [duplicate]

In python, how would I check if a url ending in .jpg exists?
ex:
http://www.fakedomain.com/fakeImage.jpg
thanks

The code below is equivalent to tikiboy's answer, but using a high-level and easy-to-use requests library.
import requests
def exists(path):
r = requests.head(path)
return r.status_code == requests.codes.ok
print exists('http://www.fakedomain.com/fakeImage.jpg')
The requests.codes.ok equals 200, so you can substitute the exact status code if you wish.
requests.head may throw an exception if server doesn't respond, so you might want to add a try-except construct.
Also if you want to include codes 301 and 302, consider code 303 too, especially if you dereference URIs that denote resources in Linked Data. A URI may represent a person, but you can't download a person, so the server will redirect you to a page that describes this person using 303 redirect.

>>> import httplib
>>>
>>> def exists(site, path):
... conn = httplib.HTTPConnection(site)
... conn.request('HEAD', path)
... response = conn.getresponse()
... conn.close()
... return response.status == 200
...
>>> exists('http://www.fakedomain.com', '/fakeImage.jpg')
False
If the status is anything other than a 200, the resource doesn't exist at the URL. This doesn't mean that it's gone altogether. If the server returns a 301 or 302, this means that the resource still exists, but at a different URL. To alter the function to handle this case, the status check line just needs to be changed to return response.status in (200, 301, 302).

thanks for all the responses everyone, ended up using the following:
try:
f = urllib2.urlopen(urllib2.Request(url))
deadLinkFound = False
except:
deadLinkFound = True

Looks like http://www.fakedomain.com/fakeImage.jpg automatically redirected to http://www.fakedomain.com/index.html without any error.
Redirecting for 301 and 302 responses are automatically done without giving any response back to user.
Please take a look HTTPRedirectHandler, you might need to subclass it to handle that.
Here is the one sample from Dive Into Python:
http://diveintopython3.ep.io/http-web-services.html#redirects

There are problems with the previous answers when the file is in ftp server (ftp://url.com/file), the following code works when the file is in ftp, http or https:
import urllib2
def file_exists(url):
request = urllib2.Request(url)
request.get_method = lambda : 'HEAD'
try:
response = urllib2.urlopen(request)
return True
except:
return False

Try it with mechanize:
import mechanize
br = mechanize.Browser()
br.set_handle_redirect(False)
try:
br.open_novisit('http://www.fakedomain.com/fakeImage.jpg')
print 'OK'
except:
print 'KO'

This might be good enough to see if a url to a file exists.
import urllib
if urllib.urlopen('http://www.fakedomain.com/fakeImage.jpg').code == 200:
print 'File exists'

in Python 3.6.5:
import http.client
def exists(site, path):
connection = http.client.HTTPConnection(site)
connection.request('HEAD', path)
response = connection.getresponse()
connection.close()
return response.status == 200
exists("www.fakedomain.com", "/fakeImage.jpg")
In Python 3, the module httplib has been renamed to http.client
And you need remove the http:// and https:// from your URL, because the httplib is considering : as a port number and the port number must be numeric.

Python3
import requests
def url_exists(url):
"""Check if resource exist?"""
if not url:
raise ValueError("url is required")
try:
resp = requests.head(url)
return True if resp.status_code == 200 else False
except Exception as e:
return False

The answer of #z3moon was good, but I think it is for py 2.x. For python 3.x, you may want to add request to the module call.
import urllib
def check_valid_URLs(url) -> bool:
try:
if urllib.request.urlopen(url).code == 200:
return True
else:
return False
except:
return False

I think you can try send a http request to the url and read the response.If no exception was caught,it probably exists.

How do I unit test a module that relies on urllib2?

I've got a piece of code that I can't figure out how to unit test! The module pulls content from external XML feeds (twitter, flickr, youtube, etc.) with urllib2. Here's some pseudo-code for it:
params = (url, urlencode(data),) if data else (url,)
req = Request(*params)
response = urlopen(req)
#check headers, content-length, etc...
#parse the response XML with lxml...
My first thought was to pickle the response and load it for testing, but apparently urllib's response object is unserializable (it raises an exception).
Just saving the XML from the response body isn't ideal, because my code uses the header information too. It's designed to act on a response object.
And of course, relying on an external source for data in a unit test is a horrible idea.
So how do I write a unit test for this?

urllib2 has a functions called build_opener() and install_opener() which you should use to mock the behaviour of urlopen()
import urllib2
from StringIO import StringIO
def mock_response(req):
if req.get_full_url() == "http://example.com":
resp = urllib2.addinfourl(StringIO("mock file"), "mock message", req.get_full_url())
resp.code = 200
resp.msg = "OK"
return resp
class MyHTTPHandler(urllib2.HTTPHandler):
def http_open(self, req):
print "mock opener"
return mock_response(req)
my_opener = urllib2.build_opener(MyHTTPHandler)
urllib2.install_opener(my_opener)
response=urllib2.urlopen("http://example.com")
print response.read()
print response.code
print response.msg

It would be best if you could write a mock urlopen (and possibly Request) which provides the minimum required interface to behave like urllib2's version. You'd then need to have your function/method which uses it able to accept this mock urlopen somehow, and use urllib2.urlopen otherwise.
This is a fair amount of work, but worthwhile. Remember that python is very friendly to ducktyping, so you just need to provide some semblance of the response object's properties to mock it.
For example:
class MockResponse(object):
def __init__(self, resp_data, code=200, msg='OK'):
self.resp_data = resp_data
self.code = code
self.msg = msg
self.headers = {'content-type': 'text/xml; charset=utf-8'}
def read(self):
return self.resp_data
def getcode(self):
return self.code
# Define other members and properties you want
def mock_urlopen(request):
return MockResponse(r'<xml document>')
Granted, some of these are difficult to mock, because for example I believe the normal "headers" is an HTTPMessage which implements fun stuff like case-insensitive header names. But, you might be able to simply construct an HTTPMessage with your response data.

Build a separate class or module responsible for communicating with your external feeds.
Make this class able to be a test double. You're using python, so you're pretty golden there; if you were using C#, I'd suggest either in interface or virtual methods.
In your unit test, insert a test double of the external feed class. Test that your code uses the class correctly, assuming that the class does the work of communicating with your external resources correctly. Have your test double return fake data rather than live data; test various combinations of the data and of course the possible exceptions urllib2 could throw.
Aand... that's it.
You can't effectively automate unit tests that rely on external sources, so you're best off not doing it. Run an occasional integration test on your communication module, but don't include those tests as part of your automated tests.
Edit:
Just a note on the difference between my answer and #Crast's answer. Both are essentially correct, but they involve different approaches. In Crast's approach, you use a test double on the library itself. In my approach, you abstract the use of the library away into a separate module and test double that module.
Which approach you use is entirely subjective; there's no "correct" answer there. I prefer my approach because it allows me to build more modular, flexible code, something I value. But it comes at a cost in terms of additional code to write, something that may not be valued in many agile situations.

You can use pymox to mock the behavior of anything and everything in the urllib2 (or any other) package. It's 2010, you shouldn't be writing your own mock classes.

I think the easiest thing to do is to actually create a simple web server in your unit test. When you start the test, create a new thread that listens on some arbitrary port and when a client connects just returns a known set of headers and XML, then terminates.
I can elaborate if you need more info.
Here's some code:
import threading, SocketServer, time
# a request handler
class SimpleRequestHandler(SocketServer.BaseRequestHandler):
def handle(self):
data = self.request.recv(102400) # token receive
senddata = file(self.server.datafile).read() # read data from unit test file
self.request.send(senddata)
time.sleep(0.1) # make sure it finishes receiving request before closing
self.request.close()
def serve_data(datafile):
server = SocketServer.TCPServer(('127.0.0.1', 12345), SimpleRequestHandler)
server.datafile = datafile
http_server_thread = threading.Thread(target=server.handle_request())
To run your unit test, call serve_data() then call your code that requests a URL that looks like http://localhost:12345/anythingyouwant.

Why not just mock a website that returns the response you expect? then start the server in a thread in setup and kill it in the teardown. I ended up doing this for testing code that would send email by mocking an smtp server and it works great. Surely something more trivial could be done for http...
from smtpd import SMTPServer
from time import sleep
import asyncore
SMTP_PORT = 6544
class MockSMTPServer(SMTPServer):
def __init__(self, localaddr, remoteaddr, cb = None):
self.cb = cb
SMTPServer.__init__(self, localaddr, remoteaddr)
def process_message(self, peer, mailfrom, rcpttos, data):
print (peer, mailfrom, rcpttos, data)
if self.cb:
self.cb(peer, mailfrom, rcpttos, data)
self.close()
def start_smtp(cb, port=SMTP_PORT):
def smtp_thread():
_smtp = MockSMTPServer(("127.0.0.1", port), (None, 0), cb)
asyncore.loop()
return Thread(None, smtp_thread)
def test_stuff():
#.......snip noise
email_result = None
def email_back(*args):
email_result = args
t = start_smtp(email_back)
t.start()
sleep(1)
res.form["email"]= self.admin_email
res = res.form.submit()
assert res.status_int == 302,"should've redirected"
sleep(1)
assert email_result is not None, "didn't get an email"

Trying to improve a bit on #john-la-rooy answer, I've made a small class allowing simple mocking for unit tests
Should work with python 2 and 3
try:
import urllib.request as urllib
except ImportError:
import urllib2 as urllib
from io import BytesIO
class MockHTTPHandler(urllib.HTTPHandler):
def mock_response(self, req):
url = req.get_full_url()
print("incomming request:", url)
if url.endswith('.json'):
resdata = b'[{"hello": "world"}]'
headers = {'Content-Type': 'application/json'}
resp = urllib.addinfourl(BytesIO(resdata), header, url, 200)
resp.msg = "OK"
return resp
raise RuntimeError('Unhandled URL', url)
http_open = mock_response
#classmethod
def install(cls):
previous = urllib._opener
urllib.install_opener(urllib.build_opener(cls))
return previous
#classmethod
def remove(cls, previous=None):
urllib.install_opener(previous)
Used like this:
class TestOther(unittest.TestCase):
def setUp(self):
previous = MockHTTPHandler.install()
self.addCleanup(MockHTTPHandler.remove, previous)

Python seek on remote file using HTTP

How do I seek to a particular position on a remote (HTTP) file so I can download only that part?
Lets say the bytes on a remote file were: 1234567890
I wanna seek to 4 and download 3 bytes from there so I would have: 456
and also, how do I check if a remote file exists?
I tried, os.path.isfile() but it returns False when I'm passing a remote file url.

If you are downloading the remote file through HTTP, you need to set the Range header.
Check in this example how it can be done. Looks like this:
myUrlclass.addheader("Range","bytes=%s-" % (existSize))
EDIT: I just found a better implementation. This class is very simple to use, as it can be seen in the docstring.
class HTTPRangeHandler(urllib2.BaseHandler):
"""Handler that enables HTTP Range headers.
This was extremely simple. The Range header is a HTTP feature to
begin with so all this class does is tell urllib2 that the
"206 Partial Content" reponse from the HTTP server is what we
expected.
Example:
import urllib2
import byterange
range_handler = range.HTTPRangeHandler()
opener = urllib2.build_opener(range_handler)
# install it
urllib2.install_opener(opener)
# create Request and set Range header
req = urllib2.Request('http://www.python.org/')
req.header['Range'] = 'bytes=30-50'
f = urllib2.urlopen(req)
"""
def http_error_206(self, req, fp, code, msg, hdrs):
# 206 Partial Content Response
r = urllib.addinfourl(fp, hdrs, req.get_full_url())
r.code = code
r.msg = msg
return r
def http_error_416(self, req, fp, code, msg, hdrs):
# HTTP's Range Not Satisfiable error
raise RangeError('Requested Range Not Satisfiable')
Update: The "better implementation" has moved to github: excid3/urlgrabber in the byterange.py file.

I highly recommend using the requests library. It is easily the best HTTP library I have ever used. In particular, to accomplish what you have described, you would do something like:
import requests
url = "http://www.sffaudio.com/podcasts/ShellGameByPhilipK.Dick.pdf"
# Retrieve bytes between offsets 3 and 5 (inclusive).
r = requests.get(url, headers={"range": "bytes=3-5"})
# If a 4XX client error or a 5XX server error is encountered, we raise it.
r.raise_for_status()

AFAIK, this is not possible using fseek() or similar. You need to use the HTTP Range header to achieve this. This header may or may not be supported by the server, so your mileage may vary.
import urllib2
myHeaders = {'Range':'bytes=0-9'}
req = urllib2.Request('http://www.promotionalpromos.com/mirrors/gnu/gnu/bash/bash-1.14.3-1.14.4.diff.gz',headers=myHeaders)
partialFile = urllib2.urlopen(req)
s2 = (partialFile.read())
EDIT: This is of course assuming that by remote file you mean a file stored on a HTTP server...
If the file you want is on an FTP server, FTP only allows to to specify a start offset and not a range. If this is what you want, then the following code should do it (not tested!)
import ftplib
fileToRetrieve = 'somefile.zip'
fromByte = 15
ftp = ftplib.FTP('ftp.someplace.net')
outFile = open('partialFile', 'wb')
ftp.retrbinary('RETR '+ fileToRetrieve, outFile.write, rest=str(fromByte))
outFile.close()

You can use httpio to access remote HTTP files as if they were local:
pip install httpio
import zipfile
import httpio
url = "http://some/large/file.zip"
with httpio.open(url) as fp:
zf = zipfile.ZipFile(fp)
print(zf.namelist())

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Fetch a file from a local url with Python requests? - python

The easiest way seems using requests-file. https://github.com/dashea/requests-file (available through PyPI too) "Requests-File is a transport adapter for use with the Requests Python library to allow local filesystem access via file:// URLs." This in combination with requests-html is pure magic :)

packages/urllib3/poolmanager.py pretty much explains it. Requests doesn't support local url. pool_classes_by_scheme = { 'http': HTTPConnectionPool, 'https': HTTPSConnectionPool, }

To load a file from a local URL, e.g. an image file you can do this: import urllib from PIL import Image Image.open(urllib.request.urlopen('file:///path/to/your/file.png'))

Related

Trying to send Python HTTPConnection content after accepting 100-continue header

requests - Gateway Timeout

How to check that url is a valid image source using urllib2? [duplicate]

How do I unit test a module that relies on urllib2?

Python seek on remote file using HTTP

Categories

Resources