I would like to download image file from shortener url or generated url which doesn't contain file name on it.
I have tried to use [content-Disposition]. However my file name is not in ASCII code. So it can't print the name.
I have found out i can use urlretrieve, request to download file but i need to save as different name.
I want to download by keeping it's own name..
How can i do this?
matches = re.match(expression, message, re.I)
url = matches[0]
print(url)
original = urlopen(url)
remotefile = urlopen(url)
#blah = remotefile.info()['content-Disposition']
#print(blah)
#value, params = cgi.parse_header(blah)
#filename = params["filename*"]
#print(filename)
#print(original.url)
#filename = "filedown.jpg"
#urlretrieve(url, filename)
These are the list that i have try but none of them work
I was able to get this to work with the requests library because you can use it to get the url that the shortened url redirects to. Then, I applied your code to the redirected url and it worked. There might be a way to only use urllib (I assume thats what you are using) with this, but I dont know.
import requests
from urllib.request import urlopen
import cgi
def getFilenameFromURL(url):
req = requests.request("GET", url)
# req.url is now the url the shortened url redirects to
original = urlopen(req.url)
value, params = cgi.parse_header(original.info()['Content-Disposition'])
filename = params["filename*"]
print(filename)
return filename
getFilenameFromURL("https://shorturl.at/lKOY3")
You can then use urlretrieve with this. Its inefficient but it works... Also since you can get the actual url with the requests library, you can probably get the filname through there.
Related
I am downloading some files from the FAO GAEZ database, which uses HTTP POST based login from.
I am thus using the requests module. Here is my code:
my_user = "blabla"
my_pass = "bleble"
site_url = "http://www.gaez.iiasa.ac.at/w/ctrl?_flow=Vwr&_view=Welcome&fieldmain=main_lr_lco_cult&idPS=0&idAS=0&idFS=0"
file_url = "http://www.gaez.iiasa.ac.at/w/ctrl?_flow=VwrServ&_view=AAGrid&idR=m1ed3ed864793f16e83ba9a5a975066adaa6bf1b0"
with requests.Session() as s:
s.get(site_url)
s.post(site_url, data={'_username': 'my_user', '_password': 'my_pass'})
r = s.get(file_url)
if r.ok:
with open(my_path + "\\My file.zip", "wb") as c:
c.write(r.content)
However, with this procedure I download the HTML of the page.
I suspect that to solve the problem I have to add the name of the zip file to the url, i.e. new_file_url = file_url + "/file_name.zip". The problem is that I don't know the "file_name". I've tried with the name of the file which I obtain when I download it manually, but it does not work.
Any of idea on how to solve this? If you need more details on GAEZ website, see also: Python - Login and download specific file from website
I have a code that takes a random flag from the Flag Mashup Bot and downloads it:
import requests
DIR = 'C:/Users/myUser/Desktop/Flags/'
URL = 'https://flagsmashupbot.pythonanywhere.com/mashup?passwd=fl4gsm4shupb0t'
def download_image(img_url: str, dest_dir: str):
img_data = requests.get(img_url).content
with open(dest_dir, 'wb') as file:
file.write(img_data)
if __name__ == "__main__":
response = requests.get(URL)
if response.ok:
page = response.text
image_url = page[page.find('data:image', page.find('data:image') + 1):page.find('" download=')]
name = page[page.find('" download=') + 12:page.find('_FlagsMashupBot.png"')]
DIR += (name + '.png')
print(DIR)
download_image(image_url, DIR)
When I run it, I get the following error on line 8:
requests.exceptions.InvalidSchema: No connection adapters were found for [image URL]
When I read about it, I realized that it's because the image URLs from the site don't start with "https://" (or at least that's what I understood).
So, is there a way to use requests.get() without having https at the start of the URL?
The reason you would not get an HTTP/HTTPs based URL is since the data is in href format pointing to the base64 encoded version of the image.
You may use urllib to open up the href download link and save the contents to a file:
data = 'data:image/png;charset=utf-8;base64,iVBORw0KGgoAAAANSUhEUgAABwgAAASwCAIAAABggIlUAAAABmJLR0QA/wD/AP+gvaeTAAAgAElEQVR4nOzdaZhd92Hf97Pde+fOYGYAzAx2gCBBivsCgqtEUl7piF.......'
response = urllib.request.urlopen(data)
with open('image.png', 'wb') as f:
f.write(response.file.read())
I am trying to have my server, in python 3, go grab files from URLs. Specifically, I would like to pass a URL into a function, I would like the function to go grab an audio file(of many varying formats) and save it as an MP3, probably using ffmpeg or ffmpy. If the URL also has a PDF, I would also like to save that, as a PDF. I haven't done much research on the PDF yet, but I have been working on the audio piece and wasn't sure if this was even possible.
I have looked at several questions here, but most notably;
How do I download a file over HTTP using Python?
It's a little old but I tried several methods in there and always get some sort of issue. I have tried using the requests library, urllib, streamripper, and maybe one other.
Is there a way to do this and with a recommended library?
For example, most of the ones I have tried do save something, like the html page, or an empty file called 'file.mp3' in this case.
Streamripper received a try changing user agents error.
I am not sure if this is possible, but I am sure there is something I'm not understanding here, could someone point me in the right direction?
This isn't necessarily the code I'm trying to use, just an example of something I have used that doesn't work.
import requests
url = "http://someurl.com/webcast/something"
r = requests.get(url)
with open('file.mp3', 'wb') as f:
f.write(r.content)
# Retrieve HTTP meta-data
print(r.status_code)
print(r.headers['content-type'])
print(r.encoding)
**Edit
import requests
import ffmpy
import datetime
import os
## THIS SCRIPT CAN BE PASSED A URL AND IF THE URL RETURNS
## HTTP HEADER FOR CONTENT TYPE AUDIO/MPEG, THE FILE WILL
## BE SAVED AS THE CURRENT-DATE-AND-TIME.MP3
##
## THIS SCRIPT CAN BE PASSED A URL AND IF THE URL RETURNS
## HTTP HEADER FOR CONTENT TYPE application/pdf, THE FILE WILL
## BE SAVED AS THE CURRENT-DATE-AND-TIME.PDF
##
## THIS SCRIPT CAN BE PASSED A URL AND IF THE URL RETURNS
## HTTP HEADER FOR CONTENT TYPE other than application/pdf, OR
## audio/mpeg, THE FILE WILL NOT BE SAVED
def BordersPythonDownloader(url):
print('Beginning file download requests')
r = requests.get(url, stream=True)
contype = r.headers['content-type']
if contype == "audio/mpeg":
print("audio file")
filename = '[{}].mp3'.format(str(datetime.datetime.now()))
with open('file.mp3', 'wb+') as f:
f.write(r.content)
ff = ffmpy.FFmpeg(
inputs={'file.mp3': None},
outputs={filename: None}
)
ff.run()
if os.path.exists('file.mp3'):
os.remove('file.mp3')
elif contype == "application/pdf":
print("pdf file")
filename = '[{}].pdf'.format(str(datetime.datetime.now()))
with open(filename, 'wb+') as f:
f.write(r.content)
else:
print("URL DID NOT RETURN AN AUDIO OR PDF FILE, IT RETURNED {}".format(contype))
# INSERT YOUR URL FOR TESTING
# OR CALL THIS SCRIPT FROM ELSEWHERE, PASSING IT THE URL
#DEFINE YOUR URL
#url = 'http://archive.org/download/testmp3testfile/mpthreetest.mp3'
#CALL THE SCRIPT; PASSING IT YOUR URL
#x = BordersPythonDownloader(url)
#ANOTHER EXAMPLE WITH A PDF
#url = 'https://www.cisco.com/c/en/us/td/docs/switches/lan/catalyst6500/ios/12-2SY/configuration/guide/sy_swcg/etherchannel.pdf'
#x = BordersPythonDownloader(url)
Thanks Richard, this code works and helps me understand this better. Any suggestions for improving the above working example?
Python 3. Probably need to use urllib to do this,
I need to know how to send a request to a direct download link, and get the name of the file it attempts to save.
(As an example, a KSP mod from CurseForge: https://kerbal.curseforge.com/projects/mechjeb/files/2355387/download)
Of course, the file ID (2355387) will be changed. It could be from any project, but always on CurseForge. (If that makes a difference on the way it's downloaded.)
That example link results in the file:
How can I return that file name in Python?
Edit: I should note that I want to avoid saving the file, reading the name, then deleting it if possible. That seems like the worst way to do this.
Using urllib.request, when you request a response from a url, the response contains a reference to the url you are downloading.
>>> from urllib.request import urlopen
>>> url = 'https://kerbal.curseforge.com/projects/mechjeb/files/2355387/download'
>>> response = urlopen(url)
>>> response.url
'https://addons-origin.cursecdn.com/files/2355/387/MechJeb2-2.6.0.0.zip'
You can use os.path.basename to get the filename:
>>> from os.path import basename
>>> basename(response.url)
'MechJeb2-2.6.0.0.zip'
from urllib import request
url = 'file download link'
filename = request.urlopen(request.Request(url)).info().get_filename()
I want to have a user input a file URL and then have my django app download the file from the internet.
My first instinct was to call wget inside my django app, but then I thought there may be another way to get this done. I couldn't find anything when I searched. Is there a more django way to do this?
You are not really dependent on Django for this.
I happen to like using requests library.
Here is an example:
import requests
def download(url, path, chunk=2048):
req = requests.get(url, stream=True)
if req.status_code == 200:
with open(path, 'wb') as f:
for chunk in req.iter_content(chunk):
f.write(chunk)
f.close()
return path
raise Exception('Given url is return status code:{}'.format(req.status_code))
Place this is a file and import into your module whenever you need it.
Of course this is very minimal but this will get you started.
You can use urlopen from urllib2 like in this example:
import urllib2
pdf_file = urllib2.urlopen("http://www.example.com/files/some_file.pdf")
with open('test.pdf','wb') as output:
output.write(pdf_file.read())
For more information, read the urllib2 docs.