using tf.keras.utils.get_file() for google drive

using tf.keras.utils.get_file() for google drive - python

I am trying to use tf.keras.utils.get_file("URL from google drive")
When I use URL which has less than 33MB it works well
However, when I try to download file more than 33MB it's not working well.
How can I solve this problem?
_URL = 'URL FROM GOOGLE DRIVE'
path_to_zip = tf.keras.utils.get_file("file_name.zip", origin=_URL, extract=True)
PATH = os.path.join(os.path.dirname(path_to_zip), 'art_filename')
I am following https://www.tensorflow.org/tutorials/images/classification
this for my practice, and I am trying to use my own data for the practice.
In this example, it uses URL as "storage.googleapi.com..." and has large amount of data.
I want to use this code for downloading large data from google drive.
Is there anyway to solve this problem?
I also tried google mounting but since I want to access the folders and files,
I am not used to do with google mounting.
Thanks

Files that are above a certain size pop-up with a notification from Drive letting you know that it cannot be scanned for viruses which needs to be accepted before the file can download. By appending "&confirm=t" to the end of the download URL, you can bypass that message and download your files.

Related

How to read file from Google Drive after it was updated?

I need to place .csv file somewhere and then update it on daily basis for code to read it. The situation is the person who will be updating file will not be using coding, and it should be as easy as to upload from web for him. I read tens of question here about how to read or download file from Google Drive, Google storage but all of them suppose updating with code or downloading using API. I want little bit simpler solution. For example,
I'm using the code below to read the .csv file from google drive (this file is, for example, actual will be different).
This is the file which will be updated each day, however each time I update file (remove and upload new one into google drive) the link changes.
Is there a way to update file every time but not making change to code?
For example to make code to get file from the particular folder on Google Drive?
The main thing is that I need to do it without using API and Google Oauth.
If it's not possible where could the file be uploaded for this purpose? I need to be able to upload file every day without any code so code will read update data. Is there storage like this?
import pandas as pd
import requests
from io import StringIO
url='https://drive.google.com/file/d/1976F_8WzIxj9wJXjNyN_uD8Lrl_XtpIf/view?usp=sharing'
file_id = url.split('/')[-2]
dwn_url='https://drive.google.com/uc?export=download&id=' + file_id
url2 = requests.get(dwn_url).text
csv_raw = StringIO(url2)
df = pd.read_csv(csv_raw)
print(df.head())

create vs update
The first thing you need to be sure is that the first time you run your code you use the file.create method.
However when you updated the file you should be using file.update this should not be creating a new file each time. There by your id will remain the same.
google api python client
IMO you should consider using the python client library this will make things a little easer for you
updated_file = service.files().update(
fileId=file_id,
body=file,
newRevision=new_revision,
media_body=media_body).execute()
Google sheets api
You are editing a csv file. By using the google drive api you are downloading the file and uploading it over and over.
Have you consider converting it to a google sheet and using the google sheet api to edit the file programmatically. May save you some processing.

Using Custom Libraries in Google Colab without Mounting Drive

I am using Google Colab and I would like to use my custom libraries / scripts, that I have stored on my local machine. My current approach is the following:
# (Question 1)
from google.colab import drive
drive.mount("/content/gdrive")
# Annoying chain of granting access to Google Colab
# and entering the OAuth token.
And then I use:
# (Question 2)
!cp /content/gdrive/My\ Drive/awesome-project/*.py .
Question 1:
Is there a way to avoid the mounting of the drive entriely? Whenever the execution context changes (e.g. when I select "Hardware Acceleration = GPU", or when I wait an hour), I have to re-generate and re-enter the OAuth token.
Question 2:
Is there a way to sync files between my local machine and my Google Colab scripts more elegently?
Partial (not very satisfying answer) regarding Question 1: I saw that one could install and use Dropbox. Then you can hardcode the API Key into the application and mounting is done, regardless of whether or not it is a new execution context. I wonder if a similar approach exists based on Google Drive as well.

Question 1.
Great question and yes there is- I have been using this workaround which is particularly useful if you are a researcher and want other to be able to re run your code- or just 'colab'orate when working with larger datasets. The below method has worked well working as a team and there are challenges to each person having their own version of datasets.
I have used this regularly on 30 + Gb of image files downloaded and unzipped to colab run time.
The file id is in the link provided when you share from google drive
you can also select multiple files and select share all and then get a generate for example a .txt or .json file which you can parse and extract the file id's.
from google_drive_downloader import GoogleDriveDownloader as gdd
#some file id/ list of file ids parsed from file urls.
google_fid_id = '1-4PbytN2awBviPS4Brrb4puhzFb555g2'
destination = 'dir/dir/fid'
#if zip file ad kwarg unzip=true
gdd.download_file_from_google_drive(file_id=google_fid_id,
destination, unzip=True)
A url parsing function to get file ids from a list of urls might look like this:
def parse_urls():
with open('/dir/dir/files_urls.txt', 'r') as fb:
txt = fb.readlines()
return [url.split('/')[-2] for url in txt[0].split(',')]
One health warning is that you can only repeat this a small number of times in a 24 hour window for the same files.
Here's the gdd git repo:
https://github.com/ndrplz/google-drive-downloader
here is an working example (my own) of how it works inside bigger script:
https://github.com/fdsig/image_utils
Question 2.
You can connect to a local run time but this also means using local resources gpu/cpu etc.
Really hope this helps :-).
F~

If your code isn't secret, you can use git to sync your local codes to github. Then, git clone to Colab with no need for any authentication.

Access docs on Gdrive via Python

I am looking for a way to access an .csv document that I have registered on drive to perform data analysis. The idea would be to have something similar as pandas' read_csv but to access a remote file, not one registered locally. Note that I don't want to access a Google spreadsheet document : it's a .csv document that I have shared on Google drive. Ideally, I'd like to be able to save it on Drive as well.
Thank you for the help,
Best,

You will want to use Google's File Stream to do this. What it does is basically mount the drive to your computer so that you can access it from anywhere.
So on my windows computer I can open a terminal and then access anything on my drive. (Or if you have a mac you will find it mounted to /Volumes)
>>>ls /mnt/g/
$RECYCLE.BIN My Drive Team Drives
>>>ls /mnt/g/My\ Drive/
test.csv

Download a complete folder hierarchy in google drive api python

How can I download a complete folder hierarchy using python Google drive api. Is it that I have query each files in the folder and then download that. But doing this way, folder hierarchy will be lost. Any way to make it in proper way. Thanks

You can achieve this using Google GAM.
gam all users show filelist >filelist.csv
gam all users show filetree >filetree.csv
I got all the answers from this site. I found it very useful.
https://github.com/jay0lee/GAM/wiki

Default max results of query is 100. Must use pageToken/nextPageToken to repeat it.
see Python Google Drive API - list the entire drive file tree

Delete file from Google Drive using Google Drive API SDK

How do I delete any file from Drive using Python's Google Drive API SDK?
I want to sync my folder with google drive, such that, whenever I delete any file from my local machine, the same file which is uploaded on the drive with same name, should be deleted.
I went through : https://developers.google.com/drive/v2/reference/files/delete
But then, from where do I get fileid?
Any help would be appreciated.
Thanks in advance...

You need to read and understand https://developers.google.com/drive/v2/reference/files#resource and https://developers.google.com/drive/search-parameters and https://developers.google.com/drive/v2/reference/files/list
At the bottom of the last page is a Try It Now feature which you can use to play with the Drive SDK BEFORE you write a single line of code. Do the same with https://developers.google.com/drive/v2/reference/files/delete
Once you understand them, you will know how to trash or delete files from Drive. Personally I prefer trash as it's easier to undo my mistakes during testing. #martineau Don't worry too much about the disk space; Google isn't about to run out of disk :-)
The only catch to using Trash is you need to remember to qualify any queries with 'trashed=false' and users will need to empty Trash if ever they hit quota.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

using tf.keras.utils.get_file() for google drive - python

Files that are above a certain size pop-up with a notification from Drive letting you know that it cannot be scanned for viruses which needs to be accepted before the file can download. By appending "&confirm=t" to the end of the download URL, you can bypass that message and download your files.

Related

How to read file from Google Drive after it was updated?

Using Custom Libraries in Google Colab without Mounting Drive

Access docs on Gdrive via Python

Download a complete folder hierarchy in google drive api python

Delete file from Google Drive using Google Drive API SDK

Categories

Resources