I have a GCP bucket with some pdf files.
I am able to download the files from bucket with python(flask) on local host using below code -
storage_client = storage.Client.from_service_account_json("ebillaae7f0224519.json")
bucket = storage_client.get_bucket("bill_storage1")
blob = bucket.blob(file_name)
blob.download_to_filename("file1.pdf")
But when deployed on GCP(app engine) i get this error "line 1282, in download_to_filename with open(filename, "wb") as file_obj: OSError: [Errno 30] Read-only file system: 'file1.pdf'"
Then i modified code to -
storage_client = storage.Client.from_service_account_json("ebillaae7f0224519.json")
bucket = storage_client.get_bucket("bill_storage1")
blob = bucket.blob(file_name)
blob.download_to_filename("/tmp/file1.pdf")
This works after deployment but i cant download the files from bucket to local folder through my app, how to do that?
You cannot write or make modifications within Google App Engine's file system post deployment. You can only read files within the app. That is why certain storage solutions such as Cloud Storage or any database is a recommended way to store files or data.
Google App Engine is serverless. It can scale up quickly because it is simply spinning up instances based from the container image built from your source code and pre-built runtime. Once the container image is built during deployment, you can no longer write/edit its contents. You will have to redeploy to make changes.
One exception is the directory called /tmp where you can store temporary files. It's because this directory is mounted in the instance's RAM. Storing temporary data consumes RAM. Therefore, when the instance is deleted, files in the /tmp directory are deleted as well.
You may check this documentation for additional information.
You can use tempfile.gettempdir() function to get the tmp directory according to your runtime environment
For example
import tempfile
storage_client = storage.Client.from_service_account_json("ebillaae7f0224519.json")
bucket = storage_client.get_bucket("bill_storage1")
blob = bucket.blob(file_name)
blob.download_to_filename("{}/file1.pdf".format(tempfile.gettempdir()))
Locally, the file will be download in your OS default temporary directory.
Related
I have created the folder code but how can i access the folder to write csv file into that folder?
# Creating folder on S3 for unmatched data
client = boto3.client('s3')
# Variables
target_bucket = obj['source_and_destination_details']['s3_bucket_name']
subfolder = obj['source_and_destination_details']['s3_bucket_uri-new_folder_path'] + obj['source_and_destination_details']['folder_name_for_unmatched_column_data']
# Create subfolder (objects)
client.put_object(Bucket = target_bucket, Key = subfolder)
Folder is getting created succesfully by above code but how to write csv file into it?
Below is the code which i have tried to write but its not working
# Writing csv on AWS S3
df.reindex(idx).to_csv(obj['source_and_destination_details']['s3_bucket_uri-write'] + obj['source_and_destination_details']['folder_name_for_unmatched_column_data'] + obj['source_and_destination_details']['file_name_for_unmatched_column_data'], index=False)
An S3 bucket is not a file system.
I assume that the to_csv() method is supposed to do write to some sort of file system, but this is not the way it works with S3. While there are solutions to mount S3 buckets as file systems, this is not the preferred way.
Usually, you would interact with S3 via the AWS REST APIs, the AWS CLI or a client library such as Boto, which you’re already using.
So in order to store your content on S3, you first create the file locally, e.g. in the system’s /tmp folder. Then use Boto’s put_object() method to upload the file. Remove from your local storage afterwards.
I'm very new to AWS, and relatively new to python. Please go easy on me.
I want to upload files from a Sharepoint location to an S3 bucket. From there, I'll be able to perform analysis on those files.
The below code uploads a file in a local directory to an example S3 bucket. I'd like to modify this to only upload new files from a Sharepoint location (and not upload new files).
import boto3
BUCKET_NAME = "test_bucket"
s3 = boto3.client("s3")
with open("./burger.jpg", "rb") as f:
s3.upload_fileobj(f, BUCKET_NAME, "burger_new_upload.jpg", ExtraArgs={"ACL": "public-read"})
Would I find use of AWS Lambda via Python code? Thank you for sharing your knowledge.
I have moved files and folders over to google cloud storage (GCS). I am finding it difficult to understand the prefixes and delimiters in the GCS documentation.
What I want to do is essentially replace the path/location of locally stored files with GCS location. E.g.
It is currently coded for local path:
Variable = "C:\\Users\\admin\\Documents\\Folder1\\doc3.csv"
I need it written to search for the file in GCS like below:
Variable = "https://storage.cloud.google.com/MYBUCKETNAME/Folder1/doc3.csv?supportedpurview=project"
This obviously doesn't work but I have the below code which connects to the bucket but I am struggling to direct it to the specific file.
from google.cloud import storage
import os
client = storage.Client()
bucket = client.bucket('mybucketname')
blobs = bucket.list_blobs(prefix='Folder1')
for blob in blobs:
print(blob.name)
So the output of this gives the following files in that specific folder:
doc1.csv
doc2.csv
doc3.csv
For my variable, what do I write as my filepath to doc3.csv. This is what I am struggling with.
You can't refer to a blob in Cloud Storage by it's name like it's a local file. You'll need to transfer the file from Cloud Storage to your local filesystem first:
destination_file_name = ...
blob.download_to_filename(destination_file_name)
Then you can read the file from where you've stored it locally:
with open(destination_file_name) as f:
contents = f.read()
...or otherwise use it like any other local file on your filesystem.
See https://cloud.google.com/storage/docs/downloading-objects for more details.
I have a python app running on a Jupiter-notebook on AWS. I loaded a C-library into my python code which expects a path to a file.
I would like to access this file from the S3 bucket.
I tried to use s3fs:
s3 = s3fs.S3FileSystem(anon=False)
using s3.ls('..') lists all my bucket files... this is ok so far. But, the library I am using should actually use the s3 variable inside where I have no access. I can only pass the path to the c library.
Is there a way to mount the s3 bucket in a way, where I don't have to call
s3.open(), and can just call open(/path/to/s3) were somewhere hidden the s3 bucket is really mounted as a local filesystem?
I think it should work like this without using s3. Because I can't change the library I am using internally to use the s3 variable...
with s3.open("path/to/s3/file",'w') as f:
df.to_csv(f)
with open("path/to/s3/file",'w') as f:
df.to_csv(f)
Or am I doing it completely wrong?
The c library iam using is loaded as DLL in python and i call a function :
lib.OpenFile(path/to/s3/file)
I have to pass the path to s3 into the library OpenFile function.
If you're looking to mount the S3 bucket as part of the file system, then use s3fs-fuse
https://github.com/s3fs-fuse/s3fs-fuse
That will make it part of the file system, and the regular file system functions will work as you would expect.
If you are targeting windows, it is possible to use rclone along with winfsp to mount a S3 bucket as local FileSystem
The simplified steps are :
rclone config to create a remote
rclone mount remote:bucket * to mount
https://github.com/rclone/rclone
https://rclone.org/
https://github.com/billziss-gh/winfsp
http://www.secfs.net/winfsp/
Might not the completely relevant to this question, but I am certain it will be to a lot of users coming here.
Lambda functions have access to disk space in their own /tmp directories. My question is, where can I visually view the /tmp directory?
I’m attempting to download the files into the /tmp directory to read them, and write a new file to it as well. I actually want see the files I’m working with are getting stored properly in /tmp during execution.
Thank you
You can't 'view' the /tmp directory after the lambda execution has ended.
Lambda works in distributed architecture and after the execution all resources used (including all files stored in /tmp) are disposed.
So if you want to check your files, you might want consider using EC2 or S3.
If you just want to check if the s3 download was successful, during the execution, you can try:
import os
os.path.isfile('/tmp/' + filename)
As previous answers suggested, you might want to create a /tmp directory in S3 bucket and download/upload your temp processing file to this /tmp directory before final clean up.
You can do the following (I'm not showing detailed process here):
import boto
s3 = boto3.client("s3")
s3.put_object(Bucket=Your_bucket_name,Key=tmp/Your_file_name)
How you download your file from your /tmp is through:
s3.download_file(Your_bucket_name, Your_key_name, Your_file_name)
after you download files and process, you want to upload it again to /tmp through:
s3.upload_file(Your_file_name, Your_bucket_name, Your_key_name)
You can add your /tmp/ in Your_key_name
Then you should be able to list the bucket easily from this sample:
for key in bucket.list():
print "{name}\t{size}\t{modified}".format(
name = key.name,
size = key.size,
modified = key.last_modified,
)
Make sure you keep your download and upload asynchronously by this boto async package.
Try to use a S3 bucket to store the file and read it from the AWS Lambda function, you should ensure the AWS Lambda role has access to the S3 bucket.