How to successfully using GCS Filesystem to read a JPG File [duplicate] - python

As the topic indicates...
I have try two ways and none of them work:
First:
I want to programmatically talk to GCS in Python. such as reading gs://{bucketname}/{blobname} as a path or a file. The only thing I can find is a gsutil module, however it seems used in a commend line instead of a python application.
i find a code here Accessing data in google cloud bucket, but still confused on how to retrieve it to a type i need. there is a jpg file in the bucket, and want to download it for a text detection, this will be deploy on google funtion.
Second:
download_as_bytes()method, Link to the blob document I import the googe.cloud.storage module and provide the GCP key, however the error rise saying the Blob has no attribute of download_as_bytes().
is there anything else i haven't try? Thank you!
for the reference:
def text_detected(user_id):
bucket=storage_client.bucket(
'img_platecapture')
blob=bucket.blob({user_id})
content= blob.download_as_bytes()
image = vision.Image(content=content) #insert a content
response = vision_client.text_detection(image=image)
if response.error.message:
raise Exception(
'{}\nFor more info on error messages, check: '
'https://cloud.google.com/apis/design/errors'.format(
response.error.message))
img = Image.open(input_file) #insert a path
draw = ImageDraw.Draw(img)
font = ImageFont.truetype("simsun.ttc", 18)
for text in response.text_annotations[1::]:
ocr = text.description
draw.text((bound.vertices[0].x-25, bound.vertices[0].y-25),ocr,fill=(255,0,0),font=font)
draw.polygon(
[
bound.vertices[0].x,
bound.vertices[0].y,
bound.vertices[1].x,
bound.vertices[1].y,
bound.vertices[2].x,
bound.vertices[2].y,
bound.vertices[3].x,
bound.vertices[3].y,
],
None,
'yellow',
)
texts=response.text_annotations
a=str(texts[0].description.split())
b=re.sub(u"([^\u4e00-\u9fa5\u0030-u0039])","",a)
b1="".join(b)
print("偵測到的地址為:",b1)
return b1
#handler.add(MessageEvent, message=ImageMessage)
def handle_content_message(event):
message_content = line_bot_api.get_message_content(event.message.id)
user = line_bot_api.get_profile(event.source.user_id)
data=b''
for chunk in message_content.iter_content():
data+= chunk
global bucket_name
bucket_name = 'img_platecapture'
bucket = storage_client.bucket(bucket_name)
blob = bucket.blob(f'{user.user_id}.jpg')
blob.upload_from_string(data)
text_detected1=text_detected(user.user_id) ####Here's the problem
line_bot_api.reply_message(
event.reply_token,
messages=TextSendMessage(
text=text_detected1
))
reference code(gcsfs/fsspec):
gcs = gcsfs.GCSFileSystem()
bucket=storage_client.bucket('img_platecapture')
blob=bucket.blob({user_id})
f =fsspec.open("gs://img_platecapture/{user_id}")
with f.open({user_id}, "rb") as fp:
content = fp.read()
image = vision.Image(content=content)
response = vision_client.text_detection(image=image)

You can do that with the Cloud Storage Python client :
def download_blob(bucket_name, source_blob_name, destination_file_name):
"""Downloads a blob from the bucket."""
# The ID of your GCS bucket
# bucket_name = "your-bucket-name"
# The ID of your GCS object
# source_blob_name = "storage-object-name"
# The path to which the file should be downloaded
# destination_file_name = "local/path/to/file"
storage_client = storage.Client()
bucket = storage_client.bucket(bucket_name)
# Construct a client side representation of a blob.
# Note `Bucket.blob` differs from `Bucket.get_blob` as it doesn't retrieve
# any content from Google Cloud Storage. As we don't need additional data,
# using `Bucket.blob` is preferred here.
blob = bucket.blob(source_blob_name)
# blob.download_to_filename(destination_file_name)
# blob.download_as_string()
blob.download_as_bytes()
print(
"Downloaded storage object {} from bucket {} to local file {}.".format(
source_blob_name, bucket_name, destination_file_name
)
)
You can use the following methods :
blob.download_to_filename(destination_file_name)
blob.download_as_string()
blob.download_as_bytes()
To be able to correctly use this library, you have to install the expected pip package in your virtual env.
Example of project structure :
my-project
requirements.txt
your_python_script.py
The requirements.txt file :
google-cloud-storage==2.6.0
Run the following command :
pip install -r requirements.txt
In your case maybe the package was not installed correctly in your virtual env, that's why you could not access to the download_as_bytes method.

I'd be using fsspec's GCS filesystem implementation instead.
https://github.com/fsspec/gcsfs/
>>> import gcsfs
>>> fs = gcsfs.GCSFileSystem(project='my-google-project')
>>> fs.ls('my-bucket')
['my-file.txt']
>>> with fs.open('my-bucket/my-file.txt', 'rb') as f:
... print(f.read())
b'Hello, world'
https://gcsfs.readthedocs.io/en/latest/#examples

Related

How to deal with GSutil URI not working all the time

I am facing a little issue here that I can't explain.
On some occasions, I am able to open files from my cloud storage buckets using a GSutil URI. For instance this one works fine
df = pd.read_csv('gs://poker030120203/ouptut_test.csv')
But on some other occasions, this method does not work & returns an error FileNotFoundError: [Errno 2] No such file or directory
This happens for instance with the following codes
rank_table_filename = 'gs://poker030120203/rank_table.bin'
rank_table_file = open(rank_table_filename, "r")
preflop_table_filename = 'gs://poker030120203/preflop_table.npy'
self.preflop_table = np.load(preflop_table_filename)
I am not sure if this is related to the "open" or "load" methode, or maybe the file type, but I can't figure out why this return an error. I do not know if this has an impact on that matter, but I'm running everything from Vertex (ie. the AI module that automatically sets up a storage bucket / a VM and a jupyter notebook).
Thanks a lot for the help
In order to read and write the file from the google cloud storage, you can use google recommended methods. It's easier to use google client libraries to read / write anything from / in Google Cloud Storage.
From the doc Example:
from google.cloud import storage
def write_read(bucket_name, blob_name):
"""Write and read a blob from GCS using file-like IO"""
# The ID of your GCS bucket
# bucket_name = "your-bucket-name"
# The ID of your new GCS object
# blob_name = "storage-object-name"
storage_client = storage.Client()
bucket = storage_client.bucket(bucket_name)
blob = bucket.blob(blob_name)
# Mode can be specified as wb/rb for bytes mode.
# See: https://docs.python.org/3/library/io.html
with blob.open("w") as f:
f.write("Hello world")
with blob.open("r") as f:
print(f.read())

Python - download entire directory from Google Cloud Storage with progress bar

I am downloading a entire directory from Google Cloud Storage using below python code
from google.cloud import storage
from pathlib import Path
def download_blob():
"""Downloads a blob from the bucket."""
# The ID of your GCS bucket
bucket_name = "Bucket name"
# The ID of your GCS object
blob_name = input("Enter the folder name in "+bucket_name+" : ")
storage_client = storage.Client()
bucket = storage_client.bucket(bucket_name)
blobs = bucket.list_blobs(prefix=blob_name) # Get list of files
print('Downloading file')
for blob in blobs:
if blob.name.endswith("/"):
continue
file_split = blob.name.split("/")
directory = "/".join(file_split[0:-1])
Path(directory).mkdir(parents=True, exist_ok=True)
blob.download_to_filename(blob.name)
print('Download completed')
download_blob()
How to show progress bar after printing the line "Downloading File"
I will assume that you have a Python library that is capable of showing a progress bar on the console/terminal where you plan to run this program.
What you can do at a coarse level is the following:
You have a list of blobs that are present in the specific Google Cloud Bucket + prefix.
For each of the blob, you have a property named size. This can tell you the number of bytes that are there for each of the blobs.
You can sum up first the total number of bytes that make up all the blobs and then start the download_to_filename loop, where you go through downloading each blob and then everytime that download is complete, you update the % complete in the progress bar.
Alternatively, if you really want fine grained percentage, then you probably need to use the start and end parameters of the download_to_filename method, where you can get specific number of bytes only. Refer to the documentation.

How to download all blobfiles with a sas url?

Showing all blobs in a (foreign) container is possible with the code below, so I know the provide SAS-url is valid
from azure.storage.blob import ContainerClient, BlobServiceClient
sas_url = r'[the sas_token]'
container = ContainerClient.from_container_url(sas_url)
blob_list = container.list_blobs()
for blob in blob_list:
print(blob.name)
How do I download the contents of the container to a local folder?
With our own containers I would connect with a BlobServiceClient using the provided connection-string, which I don't have for this container.
You are almost there. All you need to do is create BlobClient from ContainerClient and blob name using get_blob_client method. Once you have that, you will be able to download the blob using download_blob method.
Your code would be something like:
sas_url = r'[the sas_token]'
container = ContainerClient.from_container_url(sas_url)
blob_list = container.list_blobs()
for blob in blob_list:
print(blob.name)
blob = container.get_blob_client(blob.name)
blob.download_blob();
Please ensure that your SAS URL has Read permission otherwise download operation will fail.
If someone else tries to save csv's from a blob here is the code is used with Gaurav's help
sas_url = r'[SAS_URL]'
sas_token = r'[SAS_token]'
container = ContainerClient.from_container_url(sas_url)
blob_service_client = BlobServiceClient(account_url="[ACCOUNT NAME]", credential=sas_token)
blob_list = container.list_blobs()
for blob in blob_list:
name = blob.name
length = len(name)
nr = length - name.rfind('/') - 1
filename = name[-nr:]
if name[-4:] == '.csv':
try:
blob_client = blob_service_client.get_blob_client(account_url='[CONTAINER]', blob=name)
blob_data = blob_client.download_blob()
file = blob_data.readall()
file = pd.read_csv(BytesIO(file))
file.to_csv(filename)
except:
Exception

How do you write a .feather file into GCS?

Previously worked on .csv files which was straightforward to upload to GCS
For csv I would do the following, which works:
blob = bucket.blob(path)
blob.upload_from_string(dataframe.to_csv(), 'text/csv')
I am trying to do the same i.e. write the dataframe as a .feather file in bucket
blob = bucket.blob(path)
blob.upload_from_string(dataframe.reset_index().to_feather(), 'text/feather')
However, this fails saying to_feather() requires a fname. Any suggestions/guidance on where I went wrong would be helpful.
upload_from_string works for the to_csv() method because the ‘path’ parameter is optional. When no path is provided, the result is returned as a string. On the other hand, the to_feather() method requires a path specified. So you should store the feather file and then upload the feather file into GCS.
Refer the code below:
dataFrame.reset_index().to_feather(FILE PATH)
bucket_name = "BUCKET-NAME"
source_file_name = "FILE PATH"
destination_blob_name = "GCS Object Name"
storage_client = storage.Client()
bucket = storage_client.bucket(bucket_name)
blob = bucket.blob(destination_blob_name)
blob.upload_from_filename(source_file_name)

Saving Pillow Images from PDF to Google Cloud Server

I am working on a Django web app that takes in PDF files and performs some image processing to each page of the PDFs. I am given a PDF and I need to save each page into my Google Cloud Storage. I am using pdf2image’s convert_from_path() to generate a list of Pillow images for each page in the PDF. Now, I want to save these images to Google Cloud Storages but I can’t figure it out.
I have successfully saved these Pillow images locally but I do not know how to do this in the cloud.
fullURL = file.pdf.url
client = storage.Client()
bucket = client.get_bucket('name-of-my-bucket')
blob = bucket.blob(file.pdf.name[:-4] + '/')
blob.upload_from_string('', content_type='application/x-www-form-urlencoded;charset=UTF-8')
pages = convert_from_path(fullURL, 400)
for i,page in enumerate(pages):
blob = bucket.blob(file.pdf.name[:-4] + '/' + str(i) + '.jpg')
blob.upload_from_string('', content_type='image/jpeg')
outfile = file.pdf.name[:-4] + '/' + str(i) + '.jpg'
page.save(outfile)
of = open(outfile, 'rb')
blob.upload_from_file(of)
So start off by not using blobstore. They are trying to get
rid of it and get people to use cloud storage. First set up cloud storage
https://cloud.google.com/appengine/docs/standard/python/googlecloudstorageclient/setting-up-cloud-storage
I use webapp2 and not Django but I’m sure you can figure it out. Also I don’t use Pillow images so you’ll have to open the image that you’re going to upload. Then do something like this (this assumes you’re trying to post the data):
import cloudstorage as gcs
import io
import StringIO
from google.appengine.api import app_identity
before get and post in its own section
def create_file(self, filename, Dacontents):
write_retry_params = gcs.RetryParams(backoff_factor=1.1)
gcs_file = gcs.open(filename,
'w',
content_type='image/jpeg',
options={'x-goog-meta-foo': 'foo',
'x-goog-meta-bar': 'bar'},
retry_params=write_retry_params)
gcs_file.write(Dacontents)
gcs_file.close()
in get in your HTML
<form action="/(whatever yoururl is)" method="post"enctype="multipart/form-data">
<input type="file" name="orders"/>
<input type="submit"/>
</form>
In Post
orders=self.request.POST.get(‘orders)#this is for webapp2
bucket_name = os.environ.get('BUCKET_NAME',app_identity.get_default_gcs_bucket_name())
bucket = '/' + bucket_name
OpenOrders=orders.file.read()
if OpenOrders:
filename = bucket + '/whateverYouWantToCallIt'
self.create_file(filename,OpenOrders)
Since you have saved the files locally, then they are available in your local directory where the web app is running.
What you can do simply is to iterate through the files of that directory and upload them to the Google Cloud Storage one by one.
Here is a sample code:
You will need this library:
google-cloud-storage
Python code:
#Libraries
import os
from google.cloud import storage
#Public variable declarations:
bucket_name = "[BUCKET_NAME]"
local_directory = "local/directory/of/the/files/for/uploading/"
bucket_directory = "uploaded/files/" #Where the files will be uploaded in the bucket
#Upload file from source to destination
def upload_blob(source_file_name, destination_blob_name):
storage_client = storage.Client()
bucket = storage_client.get_bucket(bucket_name)
blob = bucket.blob(destination_blob_name)
blob.upload_from_filename(source_file_name)
#Iterate through all files in that directory and upload one by one using the same filename
def upload_files():
for filename in os.listdir(local_directory):
upload_blob(local_directory + filename, bucket_directory + filename)
return "File uploaded!"
#Call this function in your code:
upload_files()
NOTE: I have tested the code in Google App Engine web app and it worked for me. Take the idea of how it is working and modify it according to your needs. I hope that was helpful.

Categories

Resources