I am using selenium to login a page, and download some tiff files,
now i have a variable downloadurl, it contains an array of url links which i scraped from the website. now i am using the below code to download files:
driver = webdriver.Chrome();
driver.get(downloadurl)
I do get all files downloaded but with no names, eg. img(1), img(2) ...
Now my problem is: I want driver.get(downloadurl) download files one by one according to downloadurl array sequence, and rename the file right after it is downloaded according to title variable which is an array, then download the next file, and rename...
P.S. I avoid to use requests because the login procedure is very complicated and requires authorization cookies.
Many thanks for the help!
To elaborate on my comment:
import os
import time
for downloadlink, uniqueName in my_list_of_links_and_names:
driver = webdriver.Chrome();
driver.get(downloadurl)
time.sleep(5) # give it time to download (not sure if this is necessary)
# the file is now downloaded
os.rename("img(1).png", uniqueName) # the name is now changed
This will work assuming that "img(1).png" will be renamed and then the next download will come in as "img(1).png" yet again.
The hardest part would be making my_list_of_links_and_names but if you have the data in separate lists, just zip() them together. You can also generate your own title every loop based on some criteria...
First we will create a function (Rename_file) that renames the downloaded image from its folder.
def Rename_file(new_name, Dl_path): #Renames Downloaded Files in the path
filename = max([f for f in os.listdir(Dl_path)])
if 'image.png' in filename: #Finds 'image.png' name in said path
time.sleep(2) #you can change the value in here depending on your requirements
os.rename(os.path.join(Dl_path, filename), os.path.join(Dl_path, new_name+'.png')) #can be changed to .jpg etc
Then we Apply this function in array of url links:
for link in downloadurl: #Will get each link in download url array
for new_name in title:
driver.get(link) #download the said image in link
Rename_file(new_name,Dl_path)
Sample code:
downloadurl = ['www.sample2.com','www.sample2.com']
Dl_path = "//location//of//image_downloaded"
title = ['Title 1', 'Title 2']
def Rename_file(new_name, Dl_path):
filename = max([f for f in os.listdir(Dl_path)])
if 'image.png' in filename:
time.sleep(2)
os.rename(os.path.join(Dl_path, filename), os.path.join(Dl_path, new_name+'.png'))
for new_name in title:
for link in downloadurl:
driver.get(link)
time.sleep(2)
Rename_file(new_name,Dl_path)
I'm quite sure on the Rename function I created but I haven't really tested this with an array of url links since I really can't think of where could I test it. Hopefully this works on you. Please let me know :-)
Related
This is the code immediately before I get stuck:
#Start a new post
driver.find_element_by_xpath("//span[normalize-space()='Start a post']").click()
time.sleep(2)
driver.find_element_by_xpath("//li-icon[#type='image-icon']//*[name()='svg']").click()
time.sleep(2)
The above code works well. The next step though has me puzzled.
I would like to upload the latest image file from my downloads folder. When I click on the link above in LinkedIn it navigates to my user folder (Melissa). (see image)
So... I'm looking for the next lines of code to navigate from the default folder to the Downloads folder (C:\Users\Melissa\Downloads), then, select the newest file, then click 'open' so it attaches to the LinkedIn post.
You can upload the image using this method:
import getpass, glob, os
# Replace this with the code to get the upload input tag
upload_button = driver.find_element_by_xpath("//input[#type='file']")
# Get downloads
downloads = glob.glob("C:\\Users\\{}\\Downloads\\*".format(getpass.getuser()))
# Find the latest downloaded file
latest_download = max(downloads, key=os.path.getctime)
# Enter it into the upload input tag
upload_button.send_keys(latest_download)
Hi I am getting all folders like this
entries=dbx.files_list_folder('').entries
print (entries[1].name)
print (entries[2].name)
And unable to locate subfiles in these folders. As I searched on internet but till now no working function I found.
After listing entries using files_list_folder (and files_list_folder_continue), you can check the type, and then download them if desired using files_download, like this:
entries = dbx.files_list_folder('').entries
for entry in entries:
if isinstance(entry, dropbox.files.FileMetadata): # this entry is a file
md, res = dbx.files_download(entry.path_lower)
print(md) # this is the metadata for the downloaded file
print(len(res.content)) # `res.content` contains the file data
Note that this code sample doesn't properly paginate using files_list_folder_continue nor does it contain any error handling.
There is two possible way to do that:
Either you can write the content to the file or you can create a link (either redirected to the browser or just get a downloadable link )
First way:
metadata, response = dbx.files_download(file_path+filename)
with open(metadata.name, "wb") as f:
f.write(response.content)
Second way:
link = dbx.sharing_create_shared_link(file_path+filename)
print(link.url)
if you want link to be downloadable then replace 0 with 1:
path = link.url.replace("0", "1")
I have this script which download all images from a given web url address:
from selenium import webdriver
import urllib
class ChromefoxTest:
def __init__(self,url):
self.url=url
self.uri = []
def chromeTest(self):
# file_name = "C:\Users\Administrator\Downloads\images"
self.driver=webdriver.Chrome()
self.driver.get(self.url)
self.r=self.driver.find_elements_by_tag_name('img')
# output=open(file_name,'w')
for i, v in enumerate(self.r):
src = v.get_attribute("src")
self.uri.append(src)
pos = len(src) - src[::-1].index('/')
print src[pos:]
self.g=urllib.urlretrieve(src, src[pos:])
# output.write(src)
# output.close()
if __name__=='__main__':
FT=ChromefoxTest("http://imgur.com/")
FT.chromeTest()
my question is: how do i make this script to save all the pics to a specific folder location on my windows machine?
You need to specify the path where you want to save the file. This is explained in the documentation for urllib.urlretrieve:
The method is: urllib.urlretrieve(url[, filename[, reporthook[, data]]]).
And the documentation says:
The second argument, if present, specifies the file location to copy to (if absent, the location will be a tempfile with a generated name).
So...
urllib.urlretrieve(src, 'location/on/my/system/foo.png')
Will save the image to the specified folder.
Also, consider reviewing the documentation for os.path. Those functions will help you manipulate file names and paths.
If you use the requests library you can slurp up really big image files (or small ones) efficiently and arrange to store them in a place of your choice in an obvious way.
Use this code and you'll get a nice picture of a beagle dog!
image_url is the link to the remote image.
file_path is where you want to store the image locally. It can include just a file name or a full path, at your option.
chunk_size is the size of the piece of the file to be downloaded with each slurp from the remote site.
length is the actual size of the piece that is written locally. Since I did this interactively I put this in mainly so that I wouldn't have to look at a long vertical stream of 1024s on my screen.
..
>>> import requests
>>> image_url = 'http://maxpixel.freegreatpicture.com/static/photo/1x/Eyes-Dog-Portrait-Animal-Familiar-Domestic-Beagle-2507963.jpg'
>>> file_path = r'c:\scratch\beagle.jpg'
>>> r = requests.get(image_url, stream=True)
>>> with open(file_path, 'wb') as beagle:
... for chunk in r.iter_content(chunk_size=1024):
... length = beagle.write(chunk)
I am looking for a way to automate the process of downloading satellite imagery. The screenshot shows the type and format of files I am interested in downloading (.ntf and the 150MB files).
I encountered the following code from TheBioBucket that looks promising, although the R package XML is obsolete.
require(XML)
dir.create("D:/GIS_DataBase/DEM/")
setwd("D:/GIS_DataBase/DEM/")
doc <- htmlParse("http://www.viewfinderpanoramas.org/dem3.html#alps")
urls <- paste0("http://www.viewfinderpanoramas.org", xpathSApply(doc,'//*/a[contains(#href,"/dem1/N4")]/#href'))
names <- gsub(".*dem1/(\\w+\\.zip)", "\\1", urls)
for (i in 1:length(urls)) download.file(urls[i], names[i])
Is there a good way to automate the process of downloading .ntf files programmatically using R or Python?
Scraping is definitely easy to implement in Python.
# collect.py
import urllib, urllib2, bs4
from urlparse import urljoin
soup = bs4.BeautifulSoup(urllib2.urlopen("http://www.viewfinderpanoramas.org/dem3.html#alps"))
links = soup.find_all('a')
for link in links:
try:
if "/dem1/N4" in link['href']:
url = urljoin("http://www.viewfinderpanoramas.org/", link['href'])
filename = link['href'].split('/')[-1]
urllib.urlretrieve(url, filename)
#break
except:
pass
You might want to change the filename to include path where you want to put the file
In R the XML package can facilitate what you need fairly easily. Here's a place to start
library(XML)
demdir <- "http://www.viewfinderpanoramas.org/dem1/"
# this returns a data.frame with file names
dems <- readHTMLTable(demdir)[[1]]
# you'll want, for example, to download only zip files
demnames <- dems[grepl(".zip",dems$Name),"Name"]
# (but you can add other subsetting/selection operations here)
# download the files to the same name locally
# (change '.' if you want some other directory)
sapply(demnames, function(demfi) download.file(paste0(demdir,demfi), file.path(".",demfi)))
The only complication I can see is if the filename is too long (if it's truncated in your web-browser), then the filename in dems will also be truncated.
Just wondered if anyone could help I'm trying to download a NetCDF file from the internet within my code. The website is wish to download from is:
http://www.esrl.noaa.gov/psd/cgi-bin/db_search/DBListFiles.pl?did=3&tid=38354&vid=20
the file name which I would like to download is air.sig995.2013.nc
and if its downloaded manually the link is:
ftp://ftp.cdc.noaa.gov/Datasets/ncep.reanalysis.dailyavgs/surface/air.sig995.2013.nc
Thanks
I would use urllib to retrieve the file
like this:
urllib.urlretrieve(url, filename)
where url is the url of the download and filename is the what you want to name the file
You can try this :
#!/usr/bin/env python
# Read data from an opendap server
import netCDF4
# specify an url, the JARKUS dataset in this case
url = 'http://dtvirt5.deltares.nl:8080/thredds/dodsC/opendap/rijkswaterstaat/jarkus/profiles/transect.nc'
# for local windows files, note that '\t' defaults to the tab character in python, so use prefix r to indicate that it is a raw string.
url = r'f:\opendap\rijkswaterstaat\jarkus\profiles\transect.nc'
# create a dataset object
dataset = netCDF4.Dataset(url)
# lookup a variable
variable = dataset.variables['id']
# print the first 10 values
print variable[0:10]
from
https://publicwiki.deltares.nl/display/OET/Reading%2Bdata%2Bfrom%2BOpenDAP%2Busing%2Bpython