Can I download files from inside folder (Sub files) dropbox python? - python

Hi I am getting all folders like this
entries=dbx.files_list_folder('').entries
print (entries[1].name)
print (entries[2].name)
And unable to locate subfiles in these folders. As I searched on internet but till now no working function I found.

After listing entries using files_list_folder (and files_list_folder_continue), you can check the type, and then download them if desired using files_download, like this:
entries = dbx.files_list_folder('').entries
for entry in entries:
if isinstance(entry, dropbox.files.FileMetadata): # this entry is a file
md, res = dbx.files_download(entry.path_lower)
print(md) # this is the metadata for the downloaded file
print(len(res.content)) # `res.content` contains the file data
Note that this code sample doesn't properly paginate using files_list_folder_continue nor does it contain any error handling.

There is two possible way to do that:
Either you can write the content to the file or you can create a link (either redirected to the browser or just get a downloadable link )
First way:
metadata, response = dbx.files_download(file_path+filename)
with open(metadata.name, "wb") as f:
f.write(response.content)
Second way:
link = dbx.sharing_create_shared_link(file_path+filename)
print(link.url)
if you want link to be downloadable then replace 0 with 1:
path = link.url.replace("0", "1")

Related

Python3. How to save downloaded webpages to a specified dir?

I am trying to save all the < a > links within the python homepage into a folder named 'Downloaded pages'. However after 2 iterations through the for loop I receive the following error:
www.python.org#content <_io.BufferedWriter name='Downloaded
Pages/www.python.org#content'> www.python.org#python-network
<_io.BufferedWriter name='Downloaded
Pages/www.python.org#python-network'>
Traceback (most recent call last): File "/Users/Lucas/Python/AP book
exercise/Web Scraping/linkVerification.py", line 26, in
downloadedPage = open(os.path.join('Downloaded Pages', os.path.basename(linkUrlToOpen)), 'wb') IsADirectoryError: [Errno 21]
Is a directory: 'Downloaded Pages/'
I am unsure why this happens as it appears the pages are being saved as due to seeing '<_io.BufferedWriter name='Downloaded Pages/www.python.org#content'>', which says to me its the correct path.
This is my code:
import requests, os, bs4
# Create a new folder to download webpages to
os.makedirs('Downloaded Pages', exist_ok=True)
# Download webpage
url = 'https://www.python.org/'
res = requests.get(url)
res.raise_for_status() # Check if the download was successful
soupObj = bs4.BeautifulSoup(res.text, 'html.parser') # Collects all text form the webpage
# Find all 'a' links on the webpage
linkElem = soupObj.select('a')
numOfLinks = len(linkElem)
for i in range(numOfLinks):
linkUrlToOpen = 'https://www.python.org' + linkElem[i].get('href')
print(os.path.basename(linkUrlToOpen))
# save each downloaded page to the 'Downloaded pages' folder
downloadedPage = open(os.path.join('Downloaded Pages', os.path.basename(linkUrlToOpen)), 'wb')
print(downloadedPage)
if linkElem == []:
print('Error, link does not work')
else:
for chunk in res.iter_content(100000):
downloadedPage.write(chunk)
downloadedPage.close()
Appreciate any advice, thanks.
The problem is that when you try to do things like parse the basename of a page with an .html dir it works, but when you try to do it with one that doesn't specify it on the url like "http://python.org/" the basename is actually empty (you can try printing first the url and then the basename bewteen brackets or something to see what i mean). So to work arround that, the easiest solution would be to use absolue paths like #Thyebri said.
And also, remember that the file you write cannot contain characters like '/', '\' or '?'
So, i dont know if the following code it's messy or not, but using the re library i would do the following:
filename = re.sub('[\/*:"?]+', '-', linkUrlToOpen.split("://")[1])
downloadedPage = open(os.path.join('Downloaded_Pages', filename), 'wb')
So, first i remove part i remove the "https://" part, and then with the regular expressions library i replace all the usual symbols that are present in url links with a dash '-' and that is the name that will be given to the file.
Hope it works!

Error when downloading and unpacking zip file from website Error: zipfile.BadZipFile: File is not a zip file

I have a script that downloads zip files to a folder on my desktop and afterwards unpacks them to a another location locally. Most of the files work fine except for some. I have tried to find a solution by searching for the error but can't wrap my head around where or why the error occurs.
I am downloading the data from a country specific website. I have about five country website that are being iterated through to download and unpack the data. As mentioned the zip files for one of them end up being corrupt. If I download and unpack the data manually it works perfectly fine. So I assume it is a Python error?
The part if Country != "BE" is only a temp "solution" as the script otherwise "crashes".
First the data is being downloaded to DataOut location and then should be unpacked UnpackedDataOut:
with open(os.path.join(DataOut,file_name),"wb") as file:
response = get(domain + url)
file.write(response.content)
if Country != "BE":
with ZipFile(DataOut+"\\"+file_name,"r") as zipObj:
zipObj.extractall(UnpackedDataOut)
The data for BE was downloaded and if I try to open it manually after downloading it via Python it returns the message error in packed file.
you can simply use this snipet to download and extract zip file:
import urllib
import zipfile
url = "http://www.gutenberg.lib.md.us/4/8/8/2/48824/48824-8.zip"
extract_dir = "example"
zip_path, _ = urllib.request.urlretrieve(url)
with zipfile.ZipFile(zip_path, "r") as f:
f.extractall(extract_dir)

(Selenium) Download and Rename File Problem

I am using selenium to login a page, and download some tiff files,
now i have a variable downloadurl, it contains an array of url links which i scraped from the website. now i am using the below code to download files:
driver = webdriver.Chrome();
driver.get(downloadurl)
I do get all files downloaded but with no names, eg. img(1), img(2) ...
Now my problem is: I want driver.get(downloadurl) download files one by one according to downloadurl array sequence, and rename the file right after it is downloaded according to title variable which is an array, then download the next file, and rename...
P.S. I avoid to use requests because the login procedure is very complicated and requires authorization cookies.
Many thanks for the help!
To elaborate on my comment:
import os
import time
for downloadlink, uniqueName in my_list_of_links_and_names:
driver = webdriver.Chrome();
driver.get(downloadurl)
time.sleep(5) # give it time to download (not sure if this is necessary)
# the file is now downloaded
os.rename("img(1).png", uniqueName) # the name is now changed
This will work assuming that "img(1).png" will be renamed and then the next download will come in as "img(1).png" yet again.
The hardest part would be making my_list_of_links_and_names but if you have the data in separate lists, just zip() them together. You can also generate your own title every loop based on some criteria...
First we will create a function (Rename_file) that renames the downloaded image from its folder.
def Rename_file(new_name, Dl_path): #Renames Downloaded Files in the path
filename = max([f for f in os.listdir(Dl_path)])
if 'image.png' in filename: #Finds 'image.png' name in said path
time.sleep(2) #you can change the value in here depending on your requirements
os.rename(os.path.join(Dl_path, filename), os.path.join(Dl_path, new_name+'.png')) #can be changed to .jpg etc
Then we Apply this function in array of url links:
for link in downloadurl: #Will get each link in download url array
for new_name in title:
driver.get(link) #download the said image in link
Rename_file(new_name,Dl_path)
Sample code:
downloadurl = ['www.sample2.com','www.sample2.com']
Dl_path = "//location//of//image_downloaded"
title = ['Title 1', 'Title 2']
def Rename_file(new_name, Dl_path):
filename = max([f for f in os.listdir(Dl_path)])
if 'image.png' in filename:
time.sleep(2)
os.rename(os.path.join(Dl_path, filename), os.path.join(Dl_path, new_name+'.png'))
for new_name in title:
for link in downloadurl:
driver.get(link)
time.sleep(2)
Rename_file(new_name,Dl_path)
I'm quite sure on the Rename function I created but I haven't really tested this with an array of url links since I really can't think of where could I test it. Hopefully this works on you. Please let me know :-)

Extract a table from a locally saved HTML file

I have a series of HTML files stored in a local folder ("destination folder"). These HTML files all contain a number of tables. What I'm looking to do is to locate the tables I'm interested in thanks to keywords, grab these tables in their entirety, paste them to a text file and save this file to the same local folder ("destination folder").
This is what I have for now:
from bs4 import BeautifulSoup
filename = open('filename.txt', 'r')
soup = BeautifulSoup(filename,"lxml")
data = []
for keyword in keywords.split(','):
u=1
txtfile = destinationFolder + ticker +'_'+ companyname[:10]+ '_'+item[1]+'_'+item[3]+'_'+keyword+u+'.txt'
mots = soup.find_all(string=re.compile(keyword))
for mot in mots:
for row in mot.find("table").find_all("tr"):
data = cell.get_text(strip=True) for cell in row.find_all("td")
data = data.get_string()
with open(txtfile,'wb') as t:
t.write(data)
txtfile.close()
u=u+1
except:
pass
filename.close()
Not sure what's happening in the background but I don't get my txt file in the end like I'm supposed to. The process doesn't fail. It runs its course till the end but the txt file is nowhere to be found in my local folder when it's done. I'm sure I'm looking in the correct folder. The same path is used elsewhere in my code and works fine.

Automated download of NetCDF file

Just wondered if anyone could help I'm trying to download a NetCDF file from the internet within my code. The website is wish to download from is:
http://www.esrl.noaa.gov/psd/cgi-bin/db_search/DBListFiles.pl?did=3&tid=38354&vid=20
the file name which I would like to download is air.sig995.2013.nc
and if its downloaded manually the link is:
ftp://ftp.cdc.noaa.gov/Datasets/ncep.reanalysis.dailyavgs/surface/air.sig995.2013.nc
Thanks
I would use urllib to retrieve the file
like this:
urllib.urlretrieve(url, filename)
where url is the url of the download and filename is the what you want to name the file
You can try this :
#!/usr/bin/env python
# Read data from an opendap server
import netCDF4
# specify an url, the JARKUS dataset in this case
url = 'http://dtvirt5.deltares.nl:8080/thredds/dodsC/opendap/rijkswaterstaat/jarkus/profiles/transect.nc'
# for local windows files, note that '\t' defaults to the tab character in python, so use prefix r to indicate that it is a raw string.
url = r'f:\opendap\rijkswaterstaat\jarkus\profiles\transect.nc'
# create a dataset object
dataset = netCDF4.Dataset(url)
# lookup a variable
variable = dataset.variables['id']
# print the first 10 values
print variable[0:10]
from
https://publicwiki.deltares.nl/display/OET/Reading%2Bdata%2Bfrom%2BOpenDAP%2Busing%2Bpython

Categories

Resources