web scraping - how to download image into a folder python - python

I have this code where i would like to download the image and save it into a folder but i am getting the src of the image.I have gone through stack overflow where i found this Batch downloading text and images from URL with Python / urllib / beautifulsoup? but have no idea how to proceed
Here is my code,so far i have tried
elm5=soup.find('div', id="dv-dp-left-content")
img=elm5.find("img")
src = img["src"]
print src
How can i download these images using url into a folder

EDIT: 2021.07.19
Updated from urllib (Python 2) to urllib.request (Python 3)
import urllib.request
f = open('local_file_name','wb')
f.write(urllib.request.urlopen(src).read())
f.close()
src have to be full path - for examplehttp://hostname.com/folder1/folder2/filename.ext.
If src is /folder1/folder2/filename.ext you have to add http://hostname.com/.
If src is folder2/filename.ext you have to add http://hostname.com/folder1/.
etc.
EDIT: example how to download StackOverflow logo :)
import urllib.request
f = open('stackoverflow.png','wb')
f.write(urllib.request.urlopen('https://cdn.sstatic.net/Img/unified/sprites.svg?v=fcc0ea44ba27').read())
f.close()

the src attribute contains the image's url.
you can download it with:
urllib.request.urlretrieve(src, "image.jpg")

Related

How to download Flickr images using photos url (does not contain .jpg, .png, etc.) using Python

I want to download image from Flickr using following type of links using Python:
https://www.flickr.com/photos/66176388#N00/2172469872/
https://www.flickr.com/photos/clairity/798067744/
This data is obtained from xml file given at https://snap.stanford.edu/data/web-flickr.html
Is there any Python script or way to download images automatically.
Thanks.
I try to find answer from other sources and compiled the answer as follows:
import re
from urllib import request
def download(url, save_name):
html = request.urlopen(url).read()
html=html.decode('utf-8')
img_url = re.findall(r'https:[^" \\:]*_b\.jpg', html)[0]
print(img_url)
with open(save_name, "wb") as fp:
fp.write(request.urlopen(img_url).read())
download('https://www.flickr.com/photos/clairity/798067744/sizes/l/', 'image.jpg')

python : wget module downloading file without any extension

I am writing small python code to download a file from follow link and retrieve original filename
and its extension.But I have come across one such follow link for which python downloads the file but it is without any extension whereas file has .txt extension when downloads using browser.
Below is the code I am trying :
from urllib.request import urlopen
from urllib.parse import unquote
import wget
filePath = 'D:\\folder_path'
followLink = 'http://example.com/Reports/Download/c4feb46c-8758-4266-bec6-12358'
response = urlopen(followLink)
if response.code == 200:
print('Follow Link(response url) :' + response.url)
print('\n')
unquote_url = unquote(response.url)
file_name = wget.detect_filename(response.url).replace('|', '_')
print('file_name - '+file_name)
wget.download(response.url,filePa
th)
file_name variable in above code is just giving 'c4feb46c-8758-4266-bec6-12358' as filename.
Where I want to download it as c4feb46c-8758-4266-bec6-12358.txt.
I have also tried to read file name from header i.e. response.info(). But not getting proper file name.
Anyone can please help me with this.I am stucked in my work.Thanks in advance.
Wget gets the filename from the URL itself. For example, if your URL was https://someurl.com/filename.pdf, it is saved as filename.pdf. If it was https://someurl.com/filename, it is saved as filename. Since wget.download returns the filename of the downloaded file, you can rename it to any extension you want with os.rename(filename, filename+'.<extension>').

Entering the url in the browser, downloads the image automatically. How to write a python script to download such images?

url="https://images.data.gov.sg/api/traffic-images/2016/02/96128cfd-ab9a-4959-972e-a5e74bb149a9.jpg"
I am trying this:
import urllib
url="https://images.data.gov.sg/api/traffic-images/2016/02/96128cfd-ab9a-4959-972e-a5e74bb149a9.jpg"
IMAGE=url.rsplit("/")[-1]
urllib.urlretrieve(url,IMAGE)
Image is downloaded in the destination folder after the execution, but it is corrupt.
"Could not load image"; error pops up.
It might be because the domain that you are trying to reach has restrictions over download policy. Check this one out, hope it helps! https://stackoverflow.com/a/8389368/2539771
import urllib
URL = "https://images-na.ssl-images-amazon.com/images/I/714tx9QbaKL.SL1500.jpg"
urllib.urlretrieve(URL, "sample.png")
from PIL import Image
img = Image.open('/home/sks/sample.png')
img.show()

Download a file to a particular folder python

I can download a file from URL the following way.
import urllib2
response = urllib2.urlopen("http://www.someurl.com/file.pdf")
html = response.read()
One way I can think of is open this file as binary and then resave it to the differnet folder I want to save
but is there a better way?
Thanks
You can use the python module wget for downloading the file. Here is a sample code
import wget
url = 'http://www.example.com/foo.zip'
path = 'path/to/destination'
wget.download(url,out = path)
The function you are looking for is urllib.urlretrieve
import urllib
linkToFile = "http://www.someurl.com/file.pdf"
localDestination = "/home/user/local/path/to/file.pdf"
resultFilePath, responseHeaders = urllib.urlretrieve(linkToFile, localDestination)

How to automate satellite image downloads?

I am looking for a way to automate the process of downloading satellite imagery. The screenshot shows the type and format of files I am interested in downloading (.ntf and the 150MB files).
I encountered the following code from TheBioBucket that looks promising, although the R package XML is obsolete.
require(XML)
dir.create("D:/GIS_DataBase/DEM/")
setwd("D:/GIS_DataBase/DEM/")
doc <- htmlParse("http://www.viewfinderpanoramas.org/dem3.html#alps")
urls <- paste0("http://www.viewfinderpanoramas.org", xpathSApply(doc,'//*/a[contains(#href,"/dem1/N4")]/#href'))
names <- gsub(".*dem1/(\\w+\\.zip)", "\\1", urls)
for (i in 1:length(urls)) download.file(urls[i], names[i])
Is there a good way to automate the process of downloading .ntf files programmatically using R or Python?
Scraping is definitely easy to implement in Python.
# collect.py
import urllib, urllib2, bs4
from urlparse import urljoin
soup = bs4.BeautifulSoup(urllib2.urlopen("http://www.viewfinderpanoramas.org/dem3.html#alps"))
links = soup.find_all('a')
for link in links:
try:
if "/dem1/N4" in link['href']:
url = urljoin("http://www.viewfinderpanoramas.org/", link['href'])
filename = link['href'].split('/')[-1]
urllib.urlretrieve(url, filename)
#break
except:
pass
You might want to change the filename to include path where you want to put the file
In R the XML package can facilitate what you need fairly easily. Here's a place to start
library(XML)
demdir <- "http://www.viewfinderpanoramas.org/dem1/"
# this returns a data.frame with file names
dems <- readHTMLTable(demdir)[[1]]
# you'll want, for example, to download only zip files
demnames <- dems[grepl(".zip",dems$Name),"Name"]
# (but you can add other subsetting/selection operations here)
# download the files to the same name locally
# (change '.' if you want some other directory)
sapply(demnames, function(demfi) download.file(paste0(demdir,demfi), file.path(".",demfi)))
The only complication I can see is if the filename is too long (if it's truncated in your web-browser), then the filename in dems will also be truncated.

Categories

Resources