How to upload complete folder to Dropbox using python - python

I am trying to upload a whole folder to Dropbox at once but I can't seem to get it done is it possible? And even when I am trying to upload a single file I have to precise the file extension in the Dropbox path, is there another way to do it?
code I am using
client = dropbox.client.DropboxClient(access_token)
f= open(file_path)
response = client.put_file('/pass',f )
but it's not working

The Dropbox SDK doesn't automatically find all the local files for you, so you'll need to enumerate them yourself and upload each one at a time. os.walk is a convenient way to do that in Python.
Below is working code with some explanation in the comments. Usage is like this: python upload_dir.py abc123xyz /local/folder/to/upload /path/in/Dropbox:
import os
import sys
from dropbox.client import DropboxClient
# get an access token, local (from) directory, and Dropbox (to) directory
# from the command-line
access_token, local_directory, dropbox_destination = sys.argv[1:4]
client = DropboxClient(access_token)
# enumerate local files recursively
for root, dirs, files in os.walk(local_directory):
for filename in files:
# construct the full local path
local_path = os.path.join(root, filename)
# construct the full Dropbox path
relative_path = os.path.relpath(local_path, local_directory)
dropbox_path = os.path.join(dropbox_destination, relative_path)
# upload the file
with open(local_path, 'rb') as f:
client.put_file(dropbox_path, f)
EDIT: Note that this code doesn't create empty directories. It will copy all the files to the right location in Dropbox, but if there are empty directories, those won't be created. If you want the empty directories, consider using client.file_create_folder (using each of the directories in dirs in the loop).

For me there was a better way since dropbox is installing a folder on local machine you can use it and write with python to that folder the same way you would to any folder:
1. install dropbox app (and folder) on your local machine or server
2. write the files and folder you want same way as you would before to the dropbox folder directory
3. let dropbox do the synching automatically (do nothing)
dropbox is generally installing a "share" drive on local. When you upload on remote there is a lot of synching overhead that is going to make all the process slower. I chose to let dropbox do the synching in the background it made more sense for the problem i was facing and my guess is it is the right solution for most problem. Remember that the dropbox is not a remote database it is a local folder that is mirrored everywhere.
i didn't really measure but on local it took me about 10 second the other way took around 22 minutes so all in all it was about X 130 times faster than writing to local folder and let dropbox doing the synch than writing to the dropbox by using the other method people seem to recommend for unknown reason

Related

Need a way to download scraped data and save in GitHub repo to host in Streamlit

So i wrote a code which first checks a folder for any files i.e. os.scan(dir) and delete all files.
Then it hits an URL and gets back zip files which gets extracted to this folder using zip.extractall().
Now that i have uploaded the code to Github, to host it in Streamlit, i am not sure how to delete and save new files from the download link. Although i can read pre existing files from the folder using os.listdir().
As the app link needs to be shared, keeping it public, i dont want my local to get involved at all.
Tried:
For files in os.scan(dir):
os.remove(files.path)
Even os.listdir.. but didn't work.
The above action might be only for local directories. I want to know if any python approach can be taken for git repos
I am fairly new to Github and streamlit. Any help on this or reference for the approach to be taken is appreciated. Thanks!

Download blob from gcp storage bucket with python flask

I have a GCP bucket with some pdf files.
I am able to download the files from bucket with python(flask) on local host using below code -
storage_client = storage.Client.from_service_account_json("ebillaae7f0224519.json")
bucket = storage_client.get_bucket("bill_storage1")
blob = bucket.blob(file_name)
blob.download_to_filename("file1.pdf")
But when deployed on GCP(app engine) i get this error "line 1282, in download_to_filename with open(filename, "wb") as file_obj: OSError: [Errno 30] Read-only file system: 'file1.pdf'"
Then i modified code to -
storage_client = storage.Client.from_service_account_json("ebillaae7f0224519.json")
bucket = storage_client.get_bucket("bill_storage1")
blob = bucket.blob(file_name)
blob.download_to_filename("/tmp/file1.pdf")
This works after deployment but i cant download the files from bucket to local folder through my app, how to do that?
You cannot write or make modifications within Google App Engine's file system post deployment. You can only read files within the app. That is why certain storage solutions such as Cloud Storage or any database is a recommended way to store files or data.
Google App Engine is serverless. It can scale up quickly because it is simply spinning up instances based from the container image built from your source code and pre-built runtime. Once the container image is built during deployment, you can no longer write/edit its contents. You will have to redeploy to make changes.
One exception is the directory called /tmp where you can store temporary files. It's because this directory is mounted in the instance's RAM. Storing temporary data consumes RAM. Therefore, when the instance is deleted, files in the /tmp directory are deleted as well.
You may check this documentation for additional information.
You can use tempfile.gettempdir() function to get the tmp directory according to your runtime environment
For example
import tempfile
storage_client = storage.Client.from_service_account_json("ebillaae7f0224519.json")
bucket = storage_client.get_bucket("bill_storage1")
blob = bucket.blob(file_name)
blob.download_to_filename("{}/file1.pdf".format(tempfile.gettempdir()))
Locally, the file will be download in your OS default temporary directory.

Can't mount Google drive folder in colab notebook

I'm trying to mount a directory from https://drive.google.com/drive/folders/my_folder_name for use in a google colab notebook.
The instructions for mounting a folder show an example for a directory starting with /content/drive:
from google.colab import drive
drive.mount('/content/drive')
but my directory doesn't start with /content/drive, and the following things I've tried have all resulted in ValueError: Mountpoint must be in a directory that exists:
drive.mount("/content/drive/folders/my_folder_name")
drive.mount("content/drive/folders/my_folder_name")
drive.mount("drive/folders/my_folder_name")
drive.mount("https://drive.google.com/drive/folders/my_folder_name")
How can I mount a google drive location which doesn't start with /content/drive?
The path in drive.mount('/content/drive') is the path (mount point) where to mount the GDrive inside the virtual box where you notebook is running (refer to 'mount point' in Unix/Linux). It does not point to the path you are trying to access of your Google Drive.
Leave "/content/drive" intact and work like this instead:
from google.colab import drive
drive.mount("/content/drive") # Don't change this.
my_path = "/path/in/google_drive/from/root" # Your path
gdrive_path = "/content/drive" + "/My Drive" + my_path # Change according to your locale, if neeeded.
# "/content/drive/My Drive/path/in/google_drive/from/root"
And modify my_path to your desired folder located in GDrive (i don't know if "/My Drive/" changes according to your locale). Now, Colab Notebooks saves notebooks by default in "/Colab Notebooks" so, in MY case, the root of my GDrive is actually gdrive_path = "/content/drive/My Drive" (and I'm guessing yours is too).
This leaves us with:
import pandas as pd
from google.colab import drive
drive.mount("/content/drive") # Don't change this.
my_path = "/folders/my_folder_name" # THIS is your GDrive path
gdrive_path = "/content/drive" + "/My Drive" + my_path
# /content/drive/My Drive/folders/my_folder_name
sample_input_file = gdrive_path + "input.csv" # The specific file you are trying to access
rawdata = pd.read_csv(sample_input_file)
# /content/drive/My Drive/folders/my_folder_name/input.csv
After a successul mount, you will be asked to paste a validation code after you have granted permissions to the drive.mount API.
Update: GColab does not require copy/paste of the code anymore but instead to simply confirm you are who you say you are via a usual Google login page.
You can try this way
drive.mount('/gdrive)
Now access your file from this path
/gdrive/'My Drive'/folders/my_folder_name
In my case, this is what worked. I think this is what Katardin suggested, except that I had to first add these subfolders (that I was given access to via a link) to My Drive:
right click on subfolders in the google drive link I was given, and select "Add to My Drive."
Log into my google drive. Add the subfolders to a new folder in my google drive my_folder_name.
Then I could access the contents of those subfolders from colab with the following standard code:
drive.mount('/content/drive')
data_dir = 'drive/My Drive/my_folder_name'
os.listdir(data_dir) # shows the subfolders I had shared with me
I have found the reason why one cant mount ones own google drive for these things is because of a race condition with google . First it was suggested that changing the mount location from /content/gdrive to /content/something else but this didnt fix it. What I ended up doing was copying manually the files that are copied to google drive, then installing the google drive desktop application I would then in windows 10 go to the folder which is now located on google drive and disable file permissions inheritance and then manually putting full control rights on the folder to the users group and to authenticated users group. This seems to have fixed this for me. Other times I have noticed with these colabs (not this one in particular but some of the components used like the trained models are missing from the repository (as if they had been removed) Only solution for this is to look around for other sources of these files. This includes scurrying through google search engine and also looking at the git checkout level to find branches besides master and also looking for projects that cloned the project on github to see if they still include the files.
Open the google drive and share the link to everybody or your own accounts.
colab part
from google.colab import drive
drive.mount('/content/drive')
You may want to try the following, though it depends if you're doing this in pro or personal. There is a My Drive that Google Drive keeps in place in the file structure after the /content/drive/.
drive.mount('/content/drive/My Drive/folders/my_folder_name')
Copy your Colab document link and open on Chrome incognito window. And run the command again ;) It should work with no error

How to mount S3 bucket as local FileSystem?

I have a python app running on a Jupiter-notebook on AWS. I loaded a C-library into my python code which expects a path to a file.
I would like to access this file from the S3 bucket.
I tried to use s3fs:
s3 = s3fs.S3FileSystem(anon=False)
using s3.ls('..') lists all my bucket files... this is ok so far. But, the library I am using should actually use the s3 variable inside where I have no access. I can only pass the path to the c library.
Is there a way to mount the s3 bucket in a way, where I don't have to call
s3.open(), and can just call open(/path/to/s3) were somewhere hidden the s3 bucket is really mounted as a local filesystem?
I think it should work like this without using s3. Because I can't change the library I am using internally to use the s3 variable...
with s3.open("path/to/s3/file",'w') as f:
df.to_csv(f)
with open("path/to/s3/file",'w') as f:
df.to_csv(f)
Or am I doing it completely wrong?
The c library iam using is loaded as DLL in python and i call a function :
lib.OpenFile(path/to/s3/file)
I have to pass the path to s3 into the library OpenFile function.
If you're looking to mount the S3 bucket as part of the file system, then use s3fs-fuse
https://github.com/s3fs-fuse/s3fs-fuse
That will make it part of the file system, and the regular file system functions will work as you would expect.
If you are targeting windows, it is possible to use rclone along with winfsp to mount a S3 bucket as local FileSystem
The simplified steps are :
rclone config to create a remote
rclone mount remote:bucket * to mount
https://github.com/rclone/rclone
https://rclone.org/
https://github.com/billziss-gh/winfsp
http://www.secfs.net/winfsp/
Might not the completely relevant to this question, but I am certain it will be to a lot of users coming here.

Python/AWS Lambda Function: How to view /tmp storage?

Lambda functions have access to disk space in their own /tmp directories. My question is, where can I visually view the /tmp directory?
I’m attempting to download the files into the /tmp directory to read them, and write a new file to it as well. I actually want see the files I’m working with are getting stored properly in /tmp during execution.
Thank you
You can't 'view' the /tmp directory after the lambda execution has ended.
Lambda works in distributed architecture and after the execution all resources used (including all files stored in /tmp) are disposed.
So if you want to check your files, you might want consider using EC2 or S3.
If you just want to check if the s3 download was successful, during the execution, you can try:
import os
os.path.isfile('/tmp/' + filename)
As previous answers suggested, you might want to create a /tmp directory in S3 bucket and download/upload your temp processing file to this /tmp directory before final clean up.
You can do the following (I'm not showing detailed process here):
import boto
s3 = boto3.client("s3")
s3.put_object(Bucket=Your_bucket_name,Key=tmp/Your_file_name)
How you download your file from your /tmp is through:
s3.download_file(Your_bucket_name, Your_key_name, Your_file_name)
after you download files and process, you want to upload it again to /tmp through:
s3.upload_file(Your_file_name, Your_bucket_name, Your_key_name)
You can add your /tmp/ in Your_key_name
Then you should be able to list the bucket easily from this sample:
for key in bucket.list():
print "{name}\t{size}\t{modified}".format(
name = key.name,
size = key.size,
modified = key.last_modified,
)
Make sure you keep your download and upload asynchronously by this boto async package.
Try to use a S3 bucket to store the file and read it from the AWS Lambda function, you should ensure the AWS Lambda role has access to the S3 bucket.

Categories

Resources