Uploading file to s3 using python and fiftyone api - python

I am trying to create an automated pipeline that gets files from this api fiftyone and load it to s3. From what I saw the fiftyone package can only download it locally.
import fiftyone as fo
import fiftyone.zoo as foz
dataset = foz.load_zoo_dataset(
"open-images-v6",
split="validation",
classes=["Cat","Dog"],
max_samples=100,
label_types=["detections"],
seed=51,
dataset_name="open-images-pets"
Thats the code I use to download the files, thing is they download locally. Anyone that has some experience with this and how could this be done?
Thank you!

You're right that the code snippet that you shared will download the files from Open Images to whatever local machine you are working on. From there, you can use something like boto3 to upload the files to s3. Then, you may want to check out the examples for using s3fs-fuse and FiftyOne to see how you can mount those cloud files and use them in FiftyOne.
Directly using FiftyOne inside of a Sagemaker notebook is in development.
Note that FiftyOne Teams has more support for cloud data, with methods to upload/download to the cloud and use cloud objects directly rather than with s3fs-fuse.

Related

Cannot download large files from google colab using a gce backend

Whenever I try to download large files (>2GB) from my Google Colab which uses a GCE Backend I seem to only be able to download partial files (~37 MB). And since Google blocks mounting Drive or using any of the python api when using a gce environment for google colab I am at a total loss.
I have tried both right-click saving a file and the following:
from google.colab import files
files.download('example.txt')
Are there maybe any clever other ways I could download this file using python?

How to copy files from cloud storage to other cloud? For example, Google Drive to OneDrive

I want to copy the files from Google Drive to OneDrive using any APIs in Python. Google provides some APIs but I don't see anything related to copy the files to another cloud.
One way to achieve this is like download files from Google Drive using Google Drive API and again upload to OneDrive.
Please share some inputs if there is a way to achieve this.
If you are running both services on your desktop, you could just use python to move the files from one folder to another. See How to move a file in Python. By moving the file from your google drive folder to your onedrive folder, the services should automatically sync and upload the files.
(As an aside, another solution if you don't care how the problem gets solved a service like https://publist.app might be of use).

How to upload a Data Set to Azure Jupyter Notebook

I am working with Azure Cloud Jupyter Notebook but i dont know how to read my data set so i need to know how to upload my csv dataset
Here's what I found in the FAQ online:
How can I upload my data and access it in a notebook?
A file can be added to the project itself from either the web or computer, or uploaded using the File Menu inside a JupyterNotebook if you chose to save under the project/ folder. Files outside the project/ folder will not be persisted. If you have multiple files that add up to over 100mb you'll need to upload them one by one.
You can also download data using the terminal or shell commands inside of a notebook from publicly accessible web sites include GitHub, Azure blob storage, nasa.gov, etc...

interface between google colaboratory and google cloud

From google colaboratory, if I want to read/write to a folder in a given bucket created in google cloud, how do I achieve this?
I have created a bucket, a folder within the bucket and uploaded bunch of images into it. Now from colaboratory, using jupyter notebook, want to create multiple sub-directories to organise these images into train, validation and test folders.
Subsequently access respective folders for training, validating and testing the model.
With Google drive, we just update the path to direct to specific directory with following commands, after authentication.
import sys
sys.path.append('drive/xyz')
We do some thing similar on desktop version also
import os
os.chdir(local_path)
Does some thing similar exist for Google Cloud Storage?
I colaboratory FAQs, it has procedure for reading and writing a single file, where we need to set the entire path. That will be tedious to re-organise a main directory into sub-directories and access them separately.
In general it's not a good idea to try to mount a GCS bucket on the local machine (which would allow you to use it as you mentioned). From Connecting to Cloud Storage buckets:
Note: Cloud Storage is an object storage system that does not have the
same write constraints as a POSIX file system. If you write data
to a file in Cloud Storage simultaneously from multiple sources, you
might unintentionally overwrite critical data.
Assuming you'd like to continue regardless of the warning, if you use a Linux OS you may be able to mount it using the Cloud Storage FUSE adapter. See related How to mount Google Bucket as local disk on Linux instance with full access rights.
The recommended way to access GCS from python apps is using the Cloud Storage Client Libraries, but accessing files will be different
than in your snippets. You can find some examples at Python Client for Google Cloud Storage:
from google.cloud import storage
client = storage.Client()
# https://console.cloud.google.com/storage/browser/[bucket-id]/
bucket = client.get_bucket('bucket-id-here')
# Then do other things...
blob = bucket.get_blob('remote/path/to/file.txt')
print(blob.download_as_string())
blob.upload_from_string('New contents!')
blob2 = bucket.blob('remote/path/storage.txt')
blob2.upload_from_filename(filename='/local/path.txt')
Update:
The Colaboratory doc recommends another method that I forgot about, based on the Google API Client Library for Python, but note that it also doesn't operate like a regular filesystem, it's using an intermediate file on the local filesystem:
uploading files to GCS
downloading files from GCS:

Copy files from S3 to Google Cloud storage using boto

I'm trying to automate copying of files from S3 to Google Cloud Storage inside a python script.
Everywhere I look people recommend using gsutil as a command line utility.
Does anybody know if this copies files directly? Or does it first download the files to the computer and then uploads them to GS?
Can this be done using the boto library and google's OAuth2 plugin?
This is what i've got from google's documentation and a little bit or trial-error:
src_uri = boto.storage_uri('bucket/source_file')
dst_uri = boto.storage_uri('bucket/destination_file', 'gs')
object_contents = StringIO.StringIO()
src_uri.get_key().get_file(object_contents)
object_contents.seek(0)
dst_uri.set_contents_from_file(object_contents)
object_contents.close()
From what I understand I'm reading from a file into an object in the host where the script is running, and later uploading such content into a file in GS.
Is this right?
Since this question was asked, GCS Transfer Service has become available. If you want to copy files from S3 to GCS without an intermediate, this is a great option.

Categories

Resources