Upload file through url in django - python

I'm currently creating an app thats supposed to take a input in form of a url (here a PDF-file) and recognize this as a PDF and then upload it to a tmp folder i have on a server.
I have absolutely no idea how to proceed with this. I've already made a form which contains a FileField which works perfectly, but when it comes to urls i have no clue.
Thank you for all answers, and sorry about the lacking english skills.

The first 4 bytes of a pdf file are %PDF so you could just download the first 4 bytes from that url and compare them to %PDF. If it matches, then download the whole file.
Example:
import urllib2
url = 'your_url'
req = urllib2.urlopen(url)
first_four_bytes = req.read(4)
if first_four_bytes == '%PDF':
pdf_content = urllib2.urlopen(url).read()
# save to temp folder
else:
# file is not PDF

Related

how can I download files from the URL that does not have file name in it and how to get to know its file extension or file type

this is an URL example "https://procurement-notices.undp.org/view_file.cfm?doc_id=257280"
if you put it in the browser a file will start downloading in your system.
I want to download this file using python and store it somewhere on my computer
this is how tried
import requests
# first_url = 'https://readthedocs.org/projects/python-guide/downloads/pdf/latest/'
second_url="https://procurement-notices.undp.org/view_file.cfm?doc_id=257280"
myfile = requests.get(second_url , allow_redirects=True)
# this works for the first URL
# open('example.pdf' , 'wb').write(myfile.content)
# this did't work for both of them
# open('example.txt' , 'wb').write(myfile.content)
# this works for the second URL
open('example.doc' , 'wb').write(myfile.content)
first: if I put the first_url in the browser it will download a pdf file, putting second_url will download a .doc file How can I know what type of file will the URL give to us or what type of file will be downloaded so that I use the correct open(...) method?
second: If I use the second URL in the browser a file with the name "T__proc_notices_notices_080_k_notice_doc_79545_770020123.docx" starts downloading. how can I know this file name when I try to download the file?
if you know any better solution kindly let me know for the implementation.
kindly have a quick look at Downloading Files from URLs and zip downloaded files in python question aswell
myfile.headers['content-type'] will give you the MIME-type of the URL's content and myfile.headers['content-disposition'] gives you info like filename etc. (if the response contains this header at all)
you can use response headers content-type like for first url it is application/pdf and sencond url for is application/msword you save file according to it. you can make extension dictinary where you can store possible file format and their types and match with it. your second question is also same like this one so i am taking your two urls from that question and for file name i am using just integers
all_Urls = ['https://omextemplates.content.office.net/support/templates/en-us/tf16402488.dotx' ,
'https://procurement-notices.undp.org/view_file.cfm?doc_id=257280']
extension_dict = {'application/vnd.openxmlformats-officedocument.wordprocessingml.document':'.docx',
'application/vnd.openxmlformats-officedocument.wordprocessingml.template':'.dotx',
'application/vnd.ms-word.document.macroEnabled.12':'.docm',
'application/vnd.ms-word.template.macroEnabled.12':'.dotm',
'application/pdf':'.pdf',
'application/msword':'.doc'}
for i,url in enumerate(all_Urls):
resp = requests.get(url)
response_headers = resp.headers
file_extension = extensio_dict[response_headers['Content-Type']]
with open(f"{i}.{file_extension}",'wb') as f:
f.write(resp.content)
for MIME-Type see this answer

can someone fix this code for google colab?

so i found this code that lets you upload a file from a direct link to google drive using google colab. but i have to edit the code each time i want to add a url to upload to google drive.
can anyone fix the code so that i can enter the url as a form instead of editing the code and maybe so that i can use the form to manually name the file. or auto naming would be fine. like "1.mp4" "2.mp4" and so on.
this is the code
import requests
file_url = "http://1.droppdf.com/files/5iHzx/automate-the-boring-stuff-with-python-2015-.pdf"
r = requests.get(file_url, stream = True)
with open("/content/gdrive/My Drive/python.pdf", "wb") as file:
for block in r.iter_content(chunk_size = 1024):
if block:
file.write(block)
You can make the file URL a form parameter by adding ##param string to the line:
file_url = "http://1.droppdf.com/files/5iHzx/automate-the-boring-stuff-with-python-2015-.pdf" ##param string

Save streaming audio from URL as MP3, or even just audio file from URL as MP3

I am trying to have my server, in python 3, go grab files from URLs. Specifically, I would like to pass a URL into a function, I would like the function to go grab an audio file(of many varying formats) and save it as an MP3, probably using ffmpeg or ffmpy. If the URL also has a PDF, I would also like to save that, as a PDF. I haven't done much research on the PDF yet, but I have been working on the audio piece and wasn't sure if this was even possible.
I have looked at several questions here, but most notably;
How do I download a file over HTTP using Python?
It's a little old but I tried several methods in there and always get some sort of issue. I have tried using the requests library, urllib, streamripper, and maybe one other.
Is there a way to do this and with a recommended library?
For example, most of the ones I have tried do save something, like the html page, or an empty file called 'file.mp3' in this case.
Streamripper received a try changing user agents error.
I am not sure if this is possible, but I am sure there is something I'm not understanding here, could someone point me in the right direction?
This isn't necessarily the code I'm trying to use, just an example of something I have used that doesn't work.
import requests
url = "http://someurl.com/webcast/something"
r = requests.get(url)
with open('file.mp3', 'wb') as f:
f.write(r.content)
# Retrieve HTTP meta-data
print(r.status_code)
print(r.headers['content-type'])
print(r.encoding)
**Edit
import requests
import ffmpy
import datetime
import os
## THIS SCRIPT CAN BE PASSED A URL AND IF THE URL RETURNS
## HTTP HEADER FOR CONTENT TYPE AUDIO/MPEG, THE FILE WILL
## BE SAVED AS THE CURRENT-DATE-AND-TIME.MP3
##
## THIS SCRIPT CAN BE PASSED A URL AND IF THE URL RETURNS
## HTTP HEADER FOR CONTENT TYPE application/pdf, THE FILE WILL
## BE SAVED AS THE CURRENT-DATE-AND-TIME.PDF
##
## THIS SCRIPT CAN BE PASSED A URL AND IF THE URL RETURNS
## HTTP HEADER FOR CONTENT TYPE other than application/pdf, OR
## audio/mpeg, THE FILE WILL NOT BE SAVED
def BordersPythonDownloader(url):
print('Beginning file download requests')
r = requests.get(url, stream=True)
contype = r.headers['content-type']
if contype == "audio/mpeg":
print("audio file")
filename = '[{}].mp3'.format(str(datetime.datetime.now()))
with open('file.mp3', 'wb+') as f:
f.write(r.content)
ff = ffmpy.FFmpeg(
inputs={'file.mp3': None},
outputs={filename: None}
)
ff.run()
if os.path.exists('file.mp3'):
os.remove('file.mp3')
elif contype == "application/pdf":
print("pdf file")
filename = '[{}].pdf'.format(str(datetime.datetime.now()))
with open(filename, 'wb+') as f:
f.write(r.content)
else:
print("URL DID NOT RETURN AN AUDIO OR PDF FILE, IT RETURNED {}".format(contype))
# INSERT YOUR URL FOR TESTING
# OR CALL THIS SCRIPT FROM ELSEWHERE, PASSING IT THE URL
#DEFINE YOUR URL
#url = 'http://archive.org/download/testmp3testfile/mpthreetest.mp3'
#CALL THE SCRIPT; PASSING IT YOUR URL
#x = BordersPythonDownloader(url)
#ANOTHER EXAMPLE WITH A PDF
#url = 'https://www.cisco.com/c/en/us/td/docs/switches/lan/catalyst6500/ios/12-2SY/configuration/guide/sy_swcg/etherchannel.pdf'
#x = BordersPythonDownloader(url)
Thanks Richard, this code works and helps me understand this better. Any suggestions for improving the above working example?

Automatically Downloading Files from an ASPX Website in Python

This is my first time posting on stack overflow, and I look forward to getting more involved with the community. I need to download, rename, and save many Excel files from an ASPX website, but I cannot access these Excel files directly via a URL (i.e., the URL does not end with "excelfilename.csv"). What I can do is go to a URL which initiates the download of the file. An example of the URL is below.
https://www.websitename.com/something/ASPXthing.aspx?ReportName=ExcelFileName&Date=SomeDate&reportformat=csv
The inputs that I want to vary via loops are "ExcelFileName" and "SomeDate". I know one can fetch these files with urllib when the Excel files can be accessed directly via a URL, but how can I do it with a URL like this one?
Thanks in advance for helping out!
Using the requests library, you can fetch the file and iterate over chunks to write to file
import requests
report_names = ["Filename1","Filename2"]
dates = ['2016-02-22','2016-02-23'] # as strings
for report_name in report_names:
for date in dates:
with open('%s_%s_fetched.csv' % (report_name.split('.')[0],date,), 'wb') as handle:
response = requests.get('https://www.websitename.com/something/ASPXthing.aspx?ReportName=%s&Date=%s&reportformat=csv' % (report_name,date,), stream=True)
if not response.ok:
# Something went wrong
for block in response.iter_content(1024):
handle.write(block)

How to use downloaded url content with Xsendfile (Django)

I want to use mod - xsendfile (which I've downloaded and installed) to save content from urls, external pages, that I read in with urllib and urllib2 in the variable one_download.I'm new to this and not sure how to properly configure some of the x-sendfile properties. In the code below I assume that I can place the urllib content in one_download directly into xsendfile instead of taking a middle step as saving it to a txt file and then pass that txt - file to xsendfile.
import urllib2,urllib
def download_from_external_url(request):
post_data = [('name','Dave'),]
# example url
#url = http://www.expressen.se/kronikorer/k-g-bergstrom/sexpartiuppgorelsen-rackte-inte--det-star-klart-nu/ - for example
result = urllib2.urlopen(url, urllib.urlencode(post_data))
print result
one_download = result.read()
# testprint content in one_download in shell
print one_download
# pass content in one_download, in dict c, to xsendfile
c = {"one_download":one_download}
c['Content-Disposition']= 'attachment; one_download=%s' %smart_str(one_download)
c["X-Sendfile"] = one_download # <-- not working
return HttpResponse(json.dumps(c),'one_download_index.html', mimetype='application/force-download')
That's not what X-Sendfile is for; it's for serving static files you already have on disk without having to go through Django. Since you're downloading the file dynamically, and it's in memory anyway, you might as well serve it directly.

Categories

Resources