Cannot download large files from google colab using a gce backend - python

Whenever I try to download large files (>2GB) from my Google Colab which uses a GCE Backend I seem to only be able to download partial files (~37 MB). And since Google blocks mounting Drive or using any of the python api when using a gce environment for google colab I am at a total loss.
I have tried both right-click saving a file and the following:
from google.colab import files
files.download('example.txt')
Are there maybe any clever other ways I could download this file using python?

Related

How to upload a specific file to Google Colab?

I have a file on my computer that I want to upload to Google Colab. I know there are numerous ways to do this, including a
from google.colab import files
uploaded = files.upload()
or just uploading manually from the file system. But I want to upload that specific file without needing to choose that file myself.
Something like:
from google.colab import files
file_path = 'path/to/the/file'
files.upload(file_path)
Is there any way to do this?
Providing a file path directly rather than clicking through the GUI for an upload requires access to your local machine's file system. However, when your run cell IPython magic commands such as %pwd in Google collab, you'll notice that the current working directory shown is that of the notebook environment - not that of your machine. The way to eschew the issue are as follows.
1. Local Runtime
Only local runtimes via Jupyter seems to enable such access to the local file system. This necessitates the installation of jupyterlab, a Jupyter server extension for using a WebSocket, and launching a local server. See this tutorial.
2. Google Drive
In case Google Drive is convenient, you can upload files into Google Drive from your local machine without clicking through a GUI.
3. Embracing the GUI
If these options seem overkill, you, unfortunately, have to stick with
from google.colab import files
uploaded = files.upload()
as you alluded to.

Uploading file to s3 using python and fiftyone api

I am trying to create an automated pipeline that gets files from this api fiftyone and load it to s3. From what I saw the fiftyone package can only download it locally.
import fiftyone as fo
import fiftyone.zoo as foz
dataset = foz.load_zoo_dataset(
"open-images-v6",
split="validation",
classes=["Cat","Dog"],
max_samples=100,
label_types=["detections"],
seed=51,
dataset_name="open-images-pets"
Thats the code I use to download the files, thing is they download locally. Anyone that has some experience with this and how could this be done?
Thank you!
You're right that the code snippet that you shared will download the files from Open Images to whatever local machine you are working on. From there, you can use something like boto3 to upload the files to s3. Then, you may want to check out the examples for using s3fs-fuse and FiftyOne to see how you can mount those cloud files and use them in FiftyOne.
Directly using FiftyOne inside of a Sagemaker notebook is in development.
Note that FiftyOne Teams has more support for cloud data, with methods to upload/download to the cloud and use cloud objects directly rather than with s3fs-fuse.

Google colab access Machine's local drives directly using Os.listdir

I am new to google colab and i am figuring out if google colab is able to access files on my computer's cdrive directly.
import os
path = 'C:\\Users\\guest\\Desktop\\'
for file in os.listdir(path):
print(file)
The error message that come out is [Errno 2] No such file or directory: 'C:\Users\zhuan.lim\Desktop\script tools\Python Scripts\'
I searched online and some examples said to upload the files first using:
from google.colab import files
uploaded = files.upload()
However, is there another way for google colab to directly read from my drives?
Thanks in advance.
Solution
You can make Google Colab access the files on your computer essentially in three ways:
Upload files to Google Colab.
from google.colab import files
uploaded = files.upload()
Upload your files to your Google Drive account and then mount Google Drive on Colab. In my experience, this has been the most convenient method. Also, note that this allows you to both read and write to Google Drive (as if that is a local drive).
from google.colab import drive
drive.mount('/content/gdrive')
!ls ./content/gdrive
Once loaded, click on Files on the left pane to access the file-structure, as shown in the following screenshot.
Note: Alternatively, click on Files >> Mount Drive and this will insert the code-snippet to mount Google Drive into your Colab Notebook. Once you run that cell, you will see GDrive getting mounted.
Initiate a local runtime and then access it. In this case colab uses your local resources and the local files are accessible to it as well. Please do read the security concerns/warning before initiating this option. I have not personally tried it and your are on your own there.
I will explain option#3 below.
Connecting Colab to Local Runtime
Colab offers you to connect to a local runtime. If you have installed jupyter_http_over_ws as explained here you should be able to just provide the port you used to start the local runtime and connect to it from colab.
Step-1
Click on Reconnect and then select "Connect to local runtime". (Top right corner in colab).
Step-2
Click on hyperlink: these instructions, in the pop-up as shown below (in step-3), to install jupyter_http_over_ws, if not already installed.
Install and enable the jupyter_http_over_ws jupyter extension (one-time).
pip install jupyter_http_over_ws
jupyter serverextension enable --py jupyter_http_over_ws
Start server and authenticate.
New notebook servers are started normally, though you will need to set a flag to explicitly trust WebSocket connections from the Colaboratory frontend.
jupyter notebook \
--NotebookApp.allow_origin='https://colab.research.google.com' \
--port=8888 \
--NotebookApp.port_retries=0
For more details, I encourage you to see these instructions.
Step-3
Provide the correct port number (e.g. 8888) that was used to start the local runtime (jupyter notebook on your local machine).
No, there's no other way then files.upload(), because that is the way. But I think your looking for a more user friendly way of getting your files in. You could drag-on-drop your files into Google Drive, and then mount it in your Google Colab session by inserting following lines in a cell and executing it:
from google.colab import drive
drive.mount('/content/gdrive')
It will prompt you to go to a URL to authenticate yourself. After you've clicked the URL and allowed Google Colab Access to your Google Drive files, you can access your Google Drive files. More elaborate explanation here : Import data into Google Colaboratory

How to process videos from google drive in google colab

I have several videos on my google drive. I want to process convert these videos to audio (I already have code for this using ffmpeg). However, the videos are very long and I do not want to have to download them locally. Is there a way to process them on google colab without downloading each video locally?
I already a list of file id's I got using pydrive.
Thanks.
You can mount your Drive via:
from google.colab import drive
drive.mount('/gdrive')
%cd /gdrive
Change your path accordingly, where /gdrive is your "home". Afterwards, you can load your data like you are used to on your local pc.

Accessing drive files in Google Colab

I need to get access to a dataset of images (600MB dataset) in Google Colab.
I already uploaded all of my project files in my drive. The problem is that it seems that Google Colab is not recognizing data_config.py which is as file with all the functions that I need to get my datasets.
What should I do to use my data_config.py?
Error displayed

Categories

Resources