I want to use mod - xsendfile (which I've downloaded and installed) to save content from urls, external pages, that I read in with urllib and urllib2 in the variable one_download.I'm new to this and not sure how to properly configure some of the x-sendfile properties. In the code below I assume that I can place the urllib content in one_download directly into xsendfile instead of taking a middle step as saving it to a txt file and then pass that txt - file to xsendfile.
import urllib2,urllib
def download_from_external_url(request):
post_data = [('name','Dave'),]
# example url
#url = http://www.expressen.se/kronikorer/k-g-bergstrom/sexpartiuppgorelsen-rackte-inte--det-star-klart-nu/ - for example
result = urllib2.urlopen(url, urllib.urlencode(post_data))
print result
one_download = result.read()
# testprint content in one_download in shell
print one_download
# pass content in one_download, in dict c, to xsendfile
c = {"one_download":one_download}
c['Content-Disposition']= 'attachment; one_download=%s' %smart_str(one_download)
c["X-Sendfile"] = one_download # <-- not working
return HttpResponse(json.dumps(c),'one_download_index.html', mimetype='application/force-download')
That's not what X-Sendfile is for; it's for serving static files you already have on disk without having to go through Django. Since you're downloading the file dynamically, and it's in memory anyway, you might as well serve it directly.
Related
Below is the api response im getting :::
{"contentType":"image/jpeg","createdTime":"2021-10-10T11:00:47.000Z","fileName":"Passport_Chris J Passport Color - pp.jpg","id":10144,"size":105499,"updatedTime":"2021-10-10T11:00:47.000Z","links":[{"rel":"self","href":"https://dafzprod.custhelp.com/services/rest/connect/v1.4/CompanyRegd.ManagerDetails/43/FileAttachments/10144?download="},{"rel":"canonical","href":"https://dafzprod.custhelp.com/services/rest/connect/v1.4/CompanyRegd.ManagerDetails/43/FileAttachments/10144"},{"rel":"describedby","href":"https://dafzprod.custhelp.com/services/rest/connect/v1.4/metadata-catalog/CompanyRegd.ManagerDetails/FileAttachments","mediaType":"application/schema+json"}]}
i need to save this file as jpg format locally to my system? could you please provide me a solution through python
You might have to decode the JSON-string (if not already done):
import json
json_decoded = json.loads(json_string)
Afterwards you can get the URL to retrieve and the filename from this JSON-structure
url = json_decoded['links'][0]['href']
local_filename = json_decoded['fileName']
Now you can download the file and save it (as seen here How to save an image locally using Python whose URL address I already know?):
import urllib.request
urllib.request.urlretrieve(url, local_filename)
I would like to download image file from shortener url or generated url which doesn't contain file name on it.
I have tried to use [content-Disposition]. However my file name is not in ASCII code. So it can't print the name.
I have found out i can use urlretrieve, request to download file but i need to save as different name.
I want to download by keeping it's own name..
How can i do this?
matches = re.match(expression, message, re.I)
url = matches[0]
print(url)
original = urlopen(url)
remotefile = urlopen(url)
#blah = remotefile.info()['content-Disposition']
#print(blah)
#value, params = cgi.parse_header(blah)
#filename = params["filename*"]
#print(filename)
#print(original.url)
#filename = "filedown.jpg"
#urlretrieve(url, filename)
These are the list that i have try but none of them work
I was able to get this to work with the requests library because you can use it to get the url that the shortened url redirects to. Then, I applied your code to the redirected url and it worked. There might be a way to only use urllib (I assume thats what you are using) with this, but I dont know.
import requests
from urllib.request import urlopen
import cgi
def getFilenameFromURL(url):
req = requests.request("GET", url)
# req.url is now the url the shortened url redirects to
original = urlopen(req.url)
value, params = cgi.parse_header(original.info()['Content-Disposition'])
filename = params["filename*"]
print(filename)
return filename
getFilenameFromURL("https://shorturl.at/lKOY3")
You can then use urlretrieve with this. Its inefficient but it works... Also since you can get the actual url with the requests library, you can probably get the filname through there.
I am trying to have my server, in python 3, go grab files from URLs. Specifically, I would like to pass a URL into a function, I would like the function to go grab an audio file(of many varying formats) and save it as an MP3, probably using ffmpeg or ffmpy. If the URL also has a PDF, I would also like to save that, as a PDF. I haven't done much research on the PDF yet, but I have been working on the audio piece and wasn't sure if this was even possible.
I have looked at several questions here, but most notably;
How do I download a file over HTTP using Python?
It's a little old but I tried several methods in there and always get some sort of issue. I have tried using the requests library, urllib, streamripper, and maybe one other.
Is there a way to do this and with a recommended library?
For example, most of the ones I have tried do save something, like the html page, or an empty file called 'file.mp3' in this case.
Streamripper received a try changing user agents error.
I am not sure if this is possible, but I am sure there is something I'm not understanding here, could someone point me in the right direction?
This isn't necessarily the code I'm trying to use, just an example of something I have used that doesn't work.
import requests
url = "http://someurl.com/webcast/something"
r = requests.get(url)
with open('file.mp3', 'wb') as f:
f.write(r.content)
# Retrieve HTTP meta-data
print(r.status_code)
print(r.headers['content-type'])
print(r.encoding)
**Edit
import requests
import ffmpy
import datetime
import os
## THIS SCRIPT CAN BE PASSED A URL AND IF THE URL RETURNS
## HTTP HEADER FOR CONTENT TYPE AUDIO/MPEG, THE FILE WILL
## BE SAVED AS THE CURRENT-DATE-AND-TIME.MP3
##
## THIS SCRIPT CAN BE PASSED A URL AND IF THE URL RETURNS
## HTTP HEADER FOR CONTENT TYPE application/pdf, THE FILE WILL
## BE SAVED AS THE CURRENT-DATE-AND-TIME.PDF
##
## THIS SCRIPT CAN BE PASSED A URL AND IF THE URL RETURNS
## HTTP HEADER FOR CONTENT TYPE other than application/pdf, OR
## audio/mpeg, THE FILE WILL NOT BE SAVED
def BordersPythonDownloader(url):
print('Beginning file download requests')
r = requests.get(url, stream=True)
contype = r.headers['content-type']
if contype == "audio/mpeg":
print("audio file")
filename = '[{}].mp3'.format(str(datetime.datetime.now()))
with open('file.mp3', 'wb+') as f:
f.write(r.content)
ff = ffmpy.FFmpeg(
inputs={'file.mp3': None},
outputs={filename: None}
)
ff.run()
if os.path.exists('file.mp3'):
os.remove('file.mp3')
elif contype == "application/pdf":
print("pdf file")
filename = '[{}].pdf'.format(str(datetime.datetime.now()))
with open(filename, 'wb+') as f:
f.write(r.content)
else:
print("URL DID NOT RETURN AN AUDIO OR PDF FILE, IT RETURNED {}".format(contype))
# INSERT YOUR URL FOR TESTING
# OR CALL THIS SCRIPT FROM ELSEWHERE, PASSING IT THE URL
#DEFINE YOUR URL
#url = 'http://archive.org/download/testmp3testfile/mpthreetest.mp3'
#CALL THE SCRIPT; PASSING IT YOUR URL
#x = BordersPythonDownloader(url)
#ANOTHER EXAMPLE WITH A PDF
#url = 'https://www.cisco.com/c/en/us/td/docs/switches/lan/catalyst6500/ios/12-2SY/configuration/guide/sy_swcg/etherchannel.pdf'
#x = BordersPythonDownloader(url)
Thanks Richard, this code works and helps me understand this better. Any suggestions for improving the above working example?
I'm new to Python and was trying to figure out how to code a script that will download the contents of HTML pages. I was thinking of doing something like:
Y = 0
X = "example.com/example/" + Y
While Y != 500:
(code to download file), Y++
if Y == 500:
break
so the (Y) is the file name and I need to download files from example.com/example/1 all the way till file number 500, regardless of the file type.
Read this official docs page:
This module provides a high-level interface for fetching data across the World Wide Web.
In particular, the urlopen() function is similar to the built-in function open(), but accepts Universal Resource Locators (URLs) instead of filenames.
Some restrictions apply — it can only open URLs for reading, and no seek operations are available.
So you have code like this:
import urllib
content = urllib.urlopen("http://www.google.com").read()
#urllib.request.urlopen(...).read() in python 3
The following code should meet your need. It will download 500 web contents and save them to disk.
import urllib2
def grab_html(url):
response = urllib2.urlopen(url)
mimetype = response.info().getheader('Content-Type')
return response.read(), mimetype
for i in range(500):
filename = str(i) # Use digit as filename
url = "http://example.com/example/{0}".format(filename)
contents, _ = grab_html(url)
with open(filename, "w") as fp:
fp.write(contents)
Notes:
If you need parallel fetching, here is a great example https://docs.python.org/3/library/concurrent.futures.html
I'm currently creating an app thats supposed to take a input in form of a url (here a PDF-file) and recognize this as a PDF and then upload it to a tmp folder i have on a server.
I have absolutely no idea how to proceed with this. I've already made a form which contains a FileField which works perfectly, but when it comes to urls i have no clue.
Thank you for all answers, and sorry about the lacking english skills.
The first 4 bytes of a pdf file are %PDF so you could just download the first 4 bytes from that url and compare them to %PDF. If it matches, then download the whole file.
Example:
import urllib2
url = 'your_url'
req = urllib2.urlopen(url)
first_four_bytes = req.read(4)
if first_four_bytes == '%PDF':
pdf_content = urllib2.urlopen(url).read()
# save to temp folder
else:
# file is not PDF