Using Custom Libraries in Google Colab without Mounting Drive - python

I am using Google Colab and I would like to use my custom libraries / scripts, that I have stored on my local machine. My current approach is the following:
# (Question 1)
from google.colab import drive
drive.mount("/content/gdrive")
# Annoying chain of granting access to Google Colab
# and entering the OAuth token.
And then I use:
# (Question 2)
!cp /content/gdrive/My\ Drive/awesome-project/*.py .
Question 1:
Is there a way to avoid the mounting of the drive entriely? Whenever the execution context changes (e.g. when I select "Hardware Acceleration = GPU", or when I wait an hour), I have to re-generate and re-enter the OAuth token.
Question 2:
Is there a way to sync files between my local machine and my Google Colab scripts more elegently?
Partial (not very satisfying answer) regarding Question 1: I saw that one could install and use Dropbox. Then you can hardcode the API Key into the application and mounting is done, regardless of whether or not it is a new execution context. I wonder if a similar approach exists based on Google Drive as well.

Question 1.
Great question and yes there is- I have been using this workaround which is particularly useful if you are a researcher and want other to be able to re run your code- or just 'colab'orate when working with larger datasets. The below method has worked well working as a team and there are challenges to each person having their own version of datasets.
I have used this regularly on 30 + Gb of image files downloaded and unzipped to colab run time.
The file id is in the link provided when you share from google drive
you can also select multiple files and select share all and then get a generate for example a .txt or .json file which you can parse and extract the file id's.
from google_drive_downloader import GoogleDriveDownloader as gdd
#some file id/ list of file ids parsed from file urls.
google_fid_id = '1-4PbytN2awBviPS4Brrb4puhzFb555g2'
destination = 'dir/dir/fid'
#if zip file ad kwarg unzip=true
gdd.download_file_from_google_drive(file_id=google_fid_id,
destination, unzip=True)
A url parsing function to get file ids from a list of urls might look like this:
def parse_urls():
with open('/dir/dir/files_urls.txt', 'r') as fb:
txt = fb.readlines()
return [url.split('/')[-2] for url in txt[0].split(',')]
One health warning is that you can only repeat this a small number of times in a 24 hour window for the same files.
Here's the gdd git repo:
https://github.com/ndrplz/google-drive-downloader
here is an working example (my own) of how it works inside bigger script:
https://github.com/fdsig/image_utils
Question 2.
You can connect to a local run time but this also means using local resources gpu/cpu etc.
Really hope this helps :-).
F~

If your code isn't secret, you can use git to sync your local codes to github. Then, git clone to Colab with no need for any authentication.

Related

How to read file from Google Drive after it was updated?

I need to place .csv file somewhere and then update it on daily basis for code to read it. The situation is the person who will be updating file will not be using coding, and it should be as easy as to upload from web for him. I read tens of question here about how to read or download file from Google Drive, Google storage but all of them suppose updating with code or downloading using API. I want little bit simpler solution. For example,
I'm using the code below to read the .csv file from google drive (this file is, for example, actual will be different).
This is the file which will be updated each day, however each time I update file (remove and upload new one into google drive) the link changes.
Is there a way to update file every time but not making change to code?
For example to make code to get file from the particular folder on Google Drive?
The main thing is that I need to do it without using API and Google Oauth.
If it's not possible where could the file be uploaded for this purpose? I need to be able to upload file every day without any code so code will read update data. Is there storage like this?
import pandas as pd
import requests
from io import StringIO
url='https://drive.google.com/file/d/1976F_8WzIxj9wJXjNyN_uD8Lrl_XtpIf/view?usp=sharing'
file_id = url.split('/')[-2]
dwn_url='https://drive.google.com/uc?export=download&id=' + file_id
url2 = requests.get(dwn_url).text
csv_raw = StringIO(url2)
df = pd.read_csv(csv_raw)
print(df.head())
create vs update
The first thing you need to be sure is that the first time you run your code you use the file.create method.
However when you updated the file you should be using file.update this should not be creating a new file each time. There by your id will remain the same.
google api python client
IMO you should consider using the python client library this will make things a little easer for you
updated_file = service.files().update(
fileId=file_id,
body=file,
newRevision=new_revision,
media_body=media_body).execute()
Google sheets api
You are editing a csv file. By using the google drive api you are downloading the file and uploading it over and over.
Have you consider converting it to a google sheet and using the google sheet api to edit the file programmatically. May save you some processing.

Accessing a file from google drive within Python

I am working on a machine learning task and have saved a Keras model and want to deploy it to Github (so that I can host a web demo using Streamlit and/or Flask). However, the model file is so large (> 1 GB), that I cannot upload it to Github for free.
My thought process regarding an alternative is to upload it to a cloud service such as google drive (or dropbox, box etc.) then using some sort of Python module to access it from there.
So my question is, can I upload a pickle file containing a pickled Keras model to Google Drive and then access that object from a Python script? If so, how would I go about doing so?
Thank you!
I believe you can, you'll need to pip oauth2client & gspread. To access the data you would need to enable API manager on your google drive and get credentials in the form of a json file. Then you would need to share the file with the email in the credentials giving it permission. You could then port over the information as you needed to, I'm not sure how Keras works but this would be the first step.
Another important factor is that Google api is very touch when it comes to requests that are coming to fast, to overcome this put in sleep commands between each one, but if you do that this method may become way to slow for your idea.
scope = ["https://spreadsheets.google.com/feeds", 'https://www.googleapis.com/auth/spreadsheets',
"https://www.googleapis.com/auth/drive.file", "https://www.googleapis.com/auth/drive"]
creds = ServiceAccountCredentials.from_json_keyfile_name("Your json file here.json", scope)
client = gspread.authorize(creds)
sheet = client.open("your google sheets name or whatever").sheet1 # Open the spreadhseet
data = sheet.get_all_records() # you can call all the information with this.
I understand that you require a way to upload and download large files* from Drive using Python. If I understood your situation correctly, then you can achieve your goals easily by using the Drive API as #TimothyChen commented. First I highly recommend you to follow the Drive API Python Quickstart tutorial to create a working example. Later, you could modify it to use Files.create() and Files.get() to upload/download files as needed. Don't hesitate to ask me more questions if you have doubts.
*Please, keep in mind that there is a 5 TB size limit in Drive.

Deleting a very big folder in Google Cloud Storage

I have a very big folder in Google Cloud Storage and I am currently deleting the folder with the following django - python code while using Google App Engine within a 30 seconds default http timeout.
def deleteStorageFolder(bucketName, folder):
from google.cloud import storage
cloudStorageClient = storage.Client()
bucket = cloudStorageClient.bucket(bucketName)
logging.info("Deleting : " + folder)
try:
bucket.delete_blobs(blobs=bucket.list_blobs(prefix=folder))
except Exception as e:
logging.info(str(e.message))
It is really unbelievable that Google Cloud is expecting the application to request the information for the objects inside the folder one by one and then delete them one by one.
Obviously, this fails due to the timeout. What would be the best strategy here ?
(There should be a way that we delete the parent object in the bucket, it should delete all the associated child objects somewhere in the background and we remove the associated data from our model. Then Google Storage is free to delete the data whenever it wants. Yet, per my understanding, this is not how things are implemented)
2 simple options in my mind until the client library supports deleting in batch - see https://issuetracker.google.com/issues/142641783 :
if the GAE image includes the gsutil cli, you could execute gsutil -m rm ... in a subprocess
my favorite, use gcsfs library instead of the G library. It supports batch-deleting by default - see https://gcsfs.readthedocs.io/en/latest/_modules/gcsfs/core.html#GCSFileSystem.rm
There is a workaround. You can do this in 2 steps
"Move" your file to delete into another bucket with Transfert
Create a transfert from your bucket, with the filters that you want to another bucket (create a temporary one if needed). Check "delete from source after transfer" checkbox
After the successful transfer, delete the temporary bucket. If it's too long, you have another workaround.
Go to bucket page
Click on lifecycle
Set up a lifecycle where you delete file with age > 0 day
In both cases, you rely on Google Cloud batch feature because by yourselves is too, too, too long!

using tf.keras.utils.get_file() for google drive

I am trying to use tf.keras.utils.get_file("URL from google drive")
When I use URL which has less than 33MB it works well
However, when I try to download file more than 33MB it's not working well.
How can I solve this problem?
_URL = 'URL FROM GOOGLE DRIVE'
path_to_zip = tf.keras.utils.get_file("file_name.zip", origin=_URL, extract=True)
PATH = os.path.join(os.path.dirname(path_to_zip), 'art_filename')
I am following https://www.tensorflow.org/tutorials/images/classification
this for my practice, and I am trying to use my own data for the practice.
In this example, it uses URL as "storage.googleapi.com..." and has large amount of data.
I want to use this code for downloading large data from google drive.
Is there anyway to solve this problem?
I also tried google mounting but since I want to access the folders and files,
I am not used to do with google mounting.
Thanks
Files that are above a certain size pop-up with a notification from Drive letting you know that it cannot be scanned for viruses which needs to be accepted before the file can download. By appending "&confirm=t" to the end of the download URL, you can bypass that message and download your files.

Google Adwords API - refresh token - python

I am in the process of setting up with Google Adwords API. They have a fantastic guide (https://developers.google.com/adwords/api/docs/guides/start), with the exception that one of the last steps is rather vague.
I have gotten to this step, pictured here (but from the link above)
I am instructed (for Python) to put the client ID and client secret into my own configuration file. All the other languages have specific files that were added to need to be edited (such as the PHP example below).
I have been working at this for the past 3 hours, and tried googling and youtubing and reading through every piece of documentation I can find. All of them just say "add the ID and secret to your config file." I have no idea what that means, or how to do it. I've gone into my python directory and found a file named "config.py", but have no idea how to add these credentials. There is a number of scripts on github (that Google links to), one of them for generating a refresh token, like I want. I have no idea how to implement this, though.
https://github.com/googleads/googleads-python-lib/tree/master/examples/adwords/authentication
Thank you in advance for any insight into adding credentials to my python config file or otherwise generating a refresh token.
I found the answer.
In short, the config file is in a directory that was not included in the instructions. It is advisable to download the entire "googleads-python-lib" directory versus just the directory "googlead".
https://github.com/googleads/googleads-python-lib
The config file (googleads.yaml) is within this "googleads-python-lib" directory. I unzipped it in my python2.7/site-packages. There are variables in this config file ready to take your authentication credentials.

Categories

Resources