Python - download url and save with generic filename - python

I'm trying to download fanart images from fanart.tv api. Therefore I wrote a script to build the api call and collect the URLs. The code might need some clean up but I guess it is functional for now:
APICaller.py
My problem now is to save the images with that generic filename which is givin within the URL
For example
I call my script with these args:
python APICaller.py -a "Madonna" -p "C:/temp" -n "Madonna - Hung up"
As a result I receive:
'http://assets.fanart.tv/fanart/music/79239441-bfd5-4981-a70c-55c3f15c1287/artistbackground/madonna-4fe25d4f1b951.jpg', 'http://assets.fanart.tv/fanart/music/79239441-bfd5-4981-a70c-55c3f15c1287/artistbackground/madonna-4fe2766aac587.jpg',
... and so on
Now I want to save all images to /extrafanart/madonna-4fe25d4f1b951.jpg ...
What is the best way to handle it? urlparse, split or parsing with regex maybe?
Please help, this is very frustrating :(

import os
import urllib
for url in urllist:
filename = url.rstrip('/').rsplit('/', 1)[-1]
path = os.path.join(os.path.join(os.path.sep, 'extrafanart'), filename)
urllib.urlretrieve(url, path)

Related

python : wget module downloading file without any extension

I am writing small python code to download a file from follow link and retrieve original filename
and its extension.But I have come across one such follow link for which python downloads the file but it is without any extension whereas file has .txt extension when downloads using browser.
Below is the code I am trying :
from urllib.request import urlopen
from urllib.parse import unquote
import wget
filePath = 'D:\\folder_path'
followLink = 'http://example.com/Reports/Download/c4feb46c-8758-4266-bec6-12358'
response = urlopen(followLink)
if response.code == 200:
print('Follow Link(response url) :' + response.url)
print('\n')
unquote_url = unquote(response.url)
file_name = wget.detect_filename(response.url).replace('|', '_')
print('file_name - '+file_name)
wget.download(response.url,filePa
th)
file_name variable in above code is just giving 'c4feb46c-8758-4266-bec6-12358' as filename.
Where I want to download it as c4feb46c-8758-4266-bec6-12358.txt.
I have also tried to read file name from header i.e. response.info(). But not getting proper file name.
Anyone can please help me with this.I am stucked in my work.Thanks in advance.
Wget gets the filename from the URL itself. For example, if your URL was https://someurl.com/filename.pdf, it is saved as filename.pdf. If it was https://someurl.com/filename, it is saved as filename. Since wget.download returns the filename of the downloaded file, you can rename it to any extension you want with os.rename(filename, filename+'.<extension>').

Python: Writing script to scrape images off of HTTPS URL database

I was messing around in python 3.x yesterday, and i wanted to scrape all of the images off of a HTTPS website. This is the code I have so far
import urllib
import urllib.request
idnum = 190154
ur = 'https://skystorage.iscorp.com/pictures/IL/Lincolnway//%d' % idnum
url = ur + '.JPG?rev=0'
filename = str(idnum) + '.JPG'
idnum = idnum + 1
try: urllib.request.urlretrieve(url , filename)
except urllib.error.URLError as e:
print(e.reason)
This, however, is not working at all as planned, as the URL is HTTPS and urllib does not seem to support this. How would I be able to do something similarly to scrape the images?
man there is much work to do,but anyway I would like to help you .
first think you should know is that if you are in a html page,first you must create a list of the urls of the image that you would like to download ,to do this you can find helpful to know what Is a regular expression and know how to use RE library of python.
With the RE you can search in the html code the urls of the image.
Then made a method that save on your computer all of image that are in the list that you have been created before.
I hope I was helpful

Python script to save webpage and rename it while saving (save as - command)

Hi I searched a lot and ended up with no relevant results on how to save a webpage using python 2.6 and renaming it while saving.
Better user requests libraty:
import requests
pagelink = "http://www.example.com"
page = requests.get(pagelink)
with open('/path/to/file/example.html', "w") as file:
file.write(page.text)
You may want to use the urllib(2) package to access the webpage, and then save the file object to the desired location (os.path).
It should look something like this:
import urllib2, os
pagelink = "http://www.example.com"
page = urllib2.urlopen(pagelink)
with open(os.path.join('/(full)path/to/Documents',pagelink), "w") as file:
file.write(page)

Download a file to a particular folder python

I can download a file from URL the following way.
import urllib2
response = urllib2.urlopen("http://www.someurl.com/file.pdf")
html = response.read()
One way I can think of is open this file as binary and then resave it to the differnet folder I want to save
but is there a better way?
Thanks
You can use the python module wget for downloading the file. Here is a sample code
import wget
url = 'http://www.example.com/foo.zip'
path = 'path/to/destination'
wget.download(url,out = path)
The function you are looking for is urllib.urlretrieve
import urllib
linkToFile = "http://www.someurl.com/file.pdf"
localDestination = "/home/user/local/path/to/file.pdf"
resultFilePath, responseHeaders = urllib.urlretrieve(linkToFile, localDestination)

How to automate satellite image downloads?

I am looking for a way to automate the process of downloading satellite imagery. The screenshot shows the type and format of files I am interested in downloading (.ntf and the 150MB files).
I encountered the following code from TheBioBucket that looks promising, although the R package XML is obsolete.
require(XML)
dir.create("D:/GIS_DataBase/DEM/")
setwd("D:/GIS_DataBase/DEM/")
doc <- htmlParse("http://www.viewfinderpanoramas.org/dem3.html#alps")
urls <- paste0("http://www.viewfinderpanoramas.org", xpathSApply(doc,'//*/a[contains(#href,"/dem1/N4")]/#href'))
names <- gsub(".*dem1/(\\w+\\.zip)", "\\1", urls)
for (i in 1:length(urls)) download.file(urls[i], names[i])
Is there a good way to automate the process of downloading .ntf files programmatically using R or Python?
Scraping is definitely easy to implement in Python.
# collect.py
import urllib, urllib2, bs4
from urlparse import urljoin
soup = bs4.BeautifulSoup(urllib2.urlopen("http://www.viewfinderpanoramas.org/dem3.html#alps"))
links = soup.find_all('a')
for link in links:
try:
if "/dem1/N4" in link['href']:
url = urljoin("http://www.viewfinderpanoramas.org/", link['href'])
filename = link['href'].split('/')[-1]
urllib.urlretrieve(url, filename)
#break
except:
pass
You might want to change the filename to include path where you want to put the file
In R the XML package can facilitate what you need fairly easily. Here's a place to start
library(XML)
demdir <- "http://www.viewfinderpanoramas.org/dem1/"
# this returns a data.frame with file names
dems <- readHTMLTable(demdir)[[1]]
# you'll want, for example, to download only zip files
demnames <- dems[grepl(".zip",dems$Name),"Name"]
# (but you can add other subsetting/selection operations here)
# download the files to the same name locally
# (change '.' if you want some other directory)
sapply(demnames, function(demfi) download.file(paste0(demdir,demfi), file.path(".",demfi)))
The only complication I can see is if the filename is too long (if it's truncated in your web-browser), then the filename in dems will also be truncated.

Categories

Resources