I have this script which download all images from a given web url address:
from selenium import webdriver
import urllib
class ChromefoxTest:
def __init__(self,url):
self.url=url
self.uri = []
def chromeTest(self):
# file_name = "C:\Users\Administrator\Downloads\images"
self.driver=webdriver.Chrome()
self.driver.get(self.url)
self.r=self.driver.find_elements_by_tag_name('img')
# output=open(file_name,'w')
for i, v in enumerate(self.r):
src = v.get_attribute("src")
self.uri.append(src)
pos = len(src) - src[::-1].index('/')
print src[pos:]
self.g=urllib.urlretrieve(src, src[pos:])
# output.write(src)
# output.close()
if __name__=='__main__':
FT=ChromefoxTest("http://imgur.com/")
FT.chromeTest()
my question is: how do i make this script to save all the pics to a specific folder location on my windows machine?
You need to specify the path where you want to save the file. This is explained in the documentation for urllib.urlretrieve:
The method is: urllib.urlretrieve(url[, filename[, reporthook[, data]]]).
And the documentation says:
The second argument, if present, specifies the file location to copy to (if absent, the location will be a tempfile with a generated name).
So...
urllib.urlretrieve(src, 'location/on/my/system/foo.png')
Will save the image to the specified folder.
Also, consider reviewing the documentation for os.path. Those functions will help you manipulate file names and paths.
If you use the requests library you can slurp up really big image files (or small ones) efficiently and arrange to store them in a place of your choice in an obvious way.
Use this code and you'll get a nice picture of a beagle dog!
image_url is the link to the remote image.
file_path is where you want to store the image locally. It can include just a file name or a full path, at your option.
chunk_size is the size of the piece of the file to be downloaded with each slurp from the remote site.
length is the actual size of the piece that is written locally. Since I did this interactively I put this in mainly so that I wouldn't have to look at a long vertical stream of 1024s on my screen.
..
>>> import requests
>>> image_url = 'http://maxpixel.freegreatpicture.com/static/photo/1x/Eyes-Dog-Portrait-Animal-Familiar-Domestic-Beagle-2507963.jpg'
>>> file_path = r'c:\scratch\beagle.jpg'
>>> r = requests.get(image_url, stream=True)
>>> with open(file_path, 'wb') as beagle:
... for chunk in r.iter_content(chunk_size=1024):
... length = beagle.write(chunk)
Related
I'm trying to download Helm's latest release using a script. I want to download the binary and copy it to a file. I tried looking at the documentation, but it's very confusing to read and I don't understand this. I have found a way to download specific files, but nothing regarding the binary. So far, I have:
from github import Github
def get_helm(filename):
f = open(filename, 'w') # The file I want to copy the binary to
g = Github()
r = g.get_repo("helm/helm")
# Get binary and use f.write() to transfer it to the file
f.close
return filename
I am also well aware of the limits of queries that I can do since there are no credentials.
For Helm in particular, you're not going to have a good time since they apparently don't publish their release files via GitHub, only the checksum metadata.
See https://github.com/helm/helm/releases/tag/v3.6.0 ...
Otherwise, this would be as simple as:
get the JSON data from https://api.github.com/repos/{repo}/releases
get the first release in the list (it's the newest)
look through the assets of that release to find the file you need (e.g. for your architecture)
download it using your favorite HTTP client (e.g. the one you used to get the JSON data in the first step)
Nevertheless, here's a script that works for Helm's additional hoops-to-jump-through:
import requests
def download_binary_with_progress(source_url, dest_filename):
binary_resp = requests.get(source_url, stream=True)
binary_resp.raise_for_status()
with open(dest_filename, "wb") as f:
for chunk in binary_resp.iter_content(chunk_size=524288):
f.write(chunk)
print(f.tell(), "bytes written")
return dest_filename
def download_newest_helm(desired_architecture):
releases_resp = requests.get(
f"https://api.github.com/repos/helm/helm/releases"
)
releases_resp.raise_for_status()
releases_data = releases_resp.json()
newest_release = releases_data[0]
for asset in newest_release.get("assets", []):
name = asset["name"]
# For a project using regular releases, this would be simplified to
# checking for the desired architecture and doing
# download_binary_with_progress(asset["browser_download_url"], name)
if desired_architecture in name and name.endswith(".tar.gz.asc"):
tarball_filename = name.replace(".tar.gz.asc", ".tar.gz")
tarball_url = f"https://get.helm.sh/{tarball_filename}"
return download_binary_with_progress(
source_url=tarball_url, dest_filename=tarball_filename
)
raise ValueError("No matching release found")
download_newest_helm("darwin-arm64")
I see that there are two ways to download images using python-reuqests.
Uisng PIL as stated in docs (https://requests.readthedocs.io/en/master/user/quickstart/#binary-response-content):
from PIL import Image
from io import BytesIO
i = Image.open(BytesIO(r.content))
using streamed response content:
r = requests.get(url, stream=True)
with open(image_name, 'wb') as f:
for chunk in r.iter_content():
f.write(chunk)
Which is the recommended wya to download images however? both have its merits I suyppose, and I was wondering what is the optimal approach.
I love the minimalist way. There is nothing called right way. It depends on the task you want to perform and the constraints you have.
import requests
with open('file.png', 'wb') as f:
f.write(requests.get(url).content)
# if you change png to jpg, there will be no error
I did use the below lines of code in a function to save images.
# import the required libraries from Python
import pathlib,urllib.request,os,uuid
# URL of the image you want to download
image_url = "https://example.com/image.png"
# Using the uuid generate new and unique names for your images
filename = str(uuid.uuid4())
# Strip the image extension from it's original name
file_ext = pathlib.Path(image_url).suffix
# Join the new image name to the extension
picture_filename = filename + file_ext
# Using pathlib, specify where the image is to be saved
downloads_path = str(pathlib.Path.home() / "Downloads")
# Form a full image path by joining the path to the
# images' new name
picture_path = os.path.join(downloads_path, picture_filename)
# Using "urlretrieve()" from urllib.request save the image
urllib.request.urlretrieve(image_url, picture_path)
# urlretrieve() takes in 2 arguments
# 1. The URL of the image to be downloaded
# 2. The image new name after download. By default, the image is
# saved inside your current working directory
I am using selenium to login a page, and download some tiff files,
now i have a variable downloadurl, it contains an array of url links which i scraped from the website. now i am using the below code to download files:
driver = webdriver.Chrome();
driver.get(downloadurl)
I do get all files downloaded but with no names, eg. img(1), img(2) ...
Now my problem is: I want driver.get(downloadurl) download files one by one according to downloadurl array sequence, and rename the file right after it is downloaded according to title variable which is an array, then download the next file, and rename...
P.S. I avoid to use requests because the login procedure is very complicated and requires authorization cookies.
Many thanks for the help!
To elaborate on my comment:
import os
import time
for downloadlink, uniqueName in my_list_of_links_and_names:
driver = webdriver.Chrome();
driver.get(downloadurl)
time.sleep(5) # give it time to download (not sure if this is necessary)
# the file is now downloaded
os.rename("img(1).png", uniqueName) # the name is now changed
This will work assuming that "img(1).png" will be renamed and then the next download will come in as "img(1).png" yet again.
The hardest part would be making my_list_of_links_and_names but if you have the data in separate lists, just zip() them together. You can also generate your own title every loop based on some criteria...
First we will create a function (Rename_file) that renames the downloaded image from its folder.
def Rename_file(new_name, Dl_path): #Renames Downloaded Files in the path
filename = max([f for f in os.listdir(Dl_path)])
if 'image.png' in filename: #Finds 'image.png' name in said path
time.sleep(2) #you can change the value in here depending on your requirements
os.rename(os.path.join(Dl_path, filename), os.path.join(Dl_path, new_name+'.png')) #can be changed to .jpg etc
Then we Apply this function in array of url links:
for link in downloadurl: #Will get each link in download url array
for new_name in title:
driver.get(link) #download the said image in link
Rename_file(new_name,Dl_path)
Sample code:
downloadurl = ['www.sample2.com','www.sample2.com']
Dl_path = "//location//of//image_downloaded"
title = ['Title 1', 'Title 2']
def Rename_file(new_name, Dl_path):
filename = max([f for f in os.listdir(Dl_path)])
if 'image.png' in filename:
time.sleep(2)
os.rename(os.path.join(Dl_path, filename), os.path.join(Dl_path, new_name+'.png'))
for new_name in title:
for link in downloadurl:
driver.get(link)
time.sleep(2)
Rename_file(new_name,Dl_path)
I'm quite sure on the Rename function I created but I haven't really tested this with an array of url links since I really can't think of where could I test it. Hopefully this works on you. Please let me know :-)
I've been attempting to work at this for hours but decided to turn to the experts here on stackoverflow.
I'm trying to download an image from a url:
import urllib
originalphoto = urllib.urlretrieve(bundle.obj.url)
#originalphoto is being saved to the tmp directory in Ubuntu
This works and it saves the image in the tmp directory, but I need to modify this image by resizing it to a 250px by 250px image and then save it to a folder on my Desktop: /home/ubuntu/Desktop/resizedshots
The name of the original image is in bundle.obj.url, for example if bundle.obj.url is:
http://photographs.500px.com/kyle/09-09-201315-47-571378756077.jpg the name of the image is "09-09-201315-47-571378756077.jpg"
After the image is resized, I need to save is to this folder as 09-09-201315-47-571378756077small.jpg
As you can see, I'm adding in the word "small" to the end the file name. Once all of this is done, I would like to delete the temporary image file that was downloaded so that it doesn't take up the disk.
Any ideas on how this can be done?
Thanks
This is the definition:
def urlretrieve(url, filename=None, reporthook=None, data=None):
You can set the second argument to something you know and then do
import os
os.remove(something_you_know)
If you do not set the second argument you do this:
import urllib, os
url = 'http://photographs.500px.com/kyle/09-09-201315-47-571378756077.jpg'
file, headers = urllib.urlretrieve(url)
# do something
os.remove(file)
if os.remove does not work you still have the file open.
Just wondered if anyone could help I'm trying to download a NetCDF file from the internet within my code. The website is wish to download from is:
http://www.esrl.noaa.gov/psd/cgi-bin/db_search/DBListFiles.pl?did=3&tid=38354&vid=20
the file name which I would like to download is air.sig995.2013.nc
and if its downloaded manually the link is:
ftp://ftp.cdc.noaa.gov/Datasets/ncep.reanalysis.dailyavgs/surface/air.sig995.2013.nc
Thanks
I would use urllib to retrieve the file
like this:
urllib.urlretrieve(url, filename)
where url is the url of the download and filename is the what you want to name the file
You can try this :
#!/usr/bin/env python
# Read data from an opendap server
import netCDF4
# specify an url, the JARKUS dataset in this case
url = 'http://dtvirt5.deltares.nl:8080/thredds/dodsC/opendap/rijkswaterstaat/jarkus/profiles/transect.nc'
# for local windows files, note that '\t' defaults to the tab character in python, so use prefix r to indicate that it is a raw string.
url = r'f:\opendap\rijkswaterstaat\jarkus\profiles\transect.nc'
# create a dataset object
dataset = netCDF4.Dataset(url)
# lookup a variable
variable = dataset.variables['id']
# print the first 10 values
print variable[0:10]
from
https://publicwiki.deltares.nl/display/OET/Reading%2Bdata%2Bfrom%2BOpenDAP%2Busing%2Bpython