I am working on a Django web app that takes in PDF files and performs some image processing to each page of the PDFs. I am given a PDF and I need to save each page into my Google Cloud Storage. I am using pdf2image’s convert_from_path() to generate a list of Pillow images for each page in the PDF. Now, I want to save these images to Google Cloud Storages but I can’t figure it out.
I have successfully saved these Pillow images locally but I do not know how to do this in the cloud.
fullURL = file.pdf.url
client = storage.Client()
bucket = client.get_bucket('name-of-my-bucket')
blob = bucket.blob(file.pdf.name[:-4] + '/')
blob.upload_from_string('', content_type='application/x-www-form-urlencoded;charset=UTF-8')
pages = convert_from_path(fullURL, 400)
for i,page in enumerate(pages):
blob = bucket.blob(file.pdf.name[:-4] + '/' + str(i) + '.jpg')
blob.upload_from_string('', content_type='image/jpeg')
outfile = file.pdf.name[:-4] + '/' + str(i) + '.jpg'
page.save(outfile)
of = open(outfile, 'rb')
blob.upload_from_file(of)
So start off by not using blobstore. They are trying to get
rid of it and get people to use cloud storage. First set up cloud storage
https://cloud.google.com/appengine/docs/standard/python/googlecloudstorageclient/setting-up-cloud-storage
I use webapp2 and not Django but I’m sure you can figure it out. Also I don’t use Pillow images so you’ll have to open the image that you’re going to upload. Then do something like this (this assumes you’re trying to post the data):
import cloudstorage as gcs
import io
import StringIO
from google.appengine.api import app_identity
before get and post in its own section
def create_file(self, filename, Dacontents):
write_retry_params = gcs.RetryParams(backoff_factor=1.1)
gcs_file = gcs.open(filename,
'w',
content_type='image/jpeg',
options={'x-goog-meta-foo': 'foo',
'x-goog-meta-bar': 'bar'},
retry_params=write_retry_params)
gcs_file.write(Dacontents)
gcs_file.close()
in get in your HTML
<form action="/(whatever yoururl is)" method="post"enctype="multipart/form-data">
<input type="file" name="orders"/>
<input type="submit"/>
</form>
In Post
orders=self.request.POST.get(‘orders)#this is for webapp2
bucket_name = os.environ.get('BUCKET_NAME',app_identity.get_default_gcs_bucket_name())
bucket = '/' + bucket_name
OpenOrders=orders.file.read()
if OpenOrders:
filename = bucket + '/whateverYouWantToCallIt'
self.create_file(filename,OpenOrders)
Since you have saved the files locally, then they are available in your local directory where the web app is running.
What you can do simply is to iterate through the files of that directory and upload them to the Google Cloud Storage one by one.
Here is a sample code:
You will need this library:
google-cloud-storage
Python code:
#Libraries
import os
from google.cloud import storage
#Public variable declarations:
bucket_name = "[BUCKET_NAME]"
local_directory = "local/directory/of/the/files/for/uploading/"
bucket_directory = "uploaded/files/" #Where the files will be uploaded in the bucket
#Upload file from source to destination
def upload_blob(source_file_name, destination_blob_name):
storage_client = storage.Client()
bucket = storage_client.get_bucket(bucket_name)
blob = bucket.blob(destination_blob_name)
blob.upload_from_filename(source_file_name)
#Iterate through all files in that directory and upload one by one using the same filename
def upload_files():
for filename in os.listdir(local_directory):
upload_blob(local_directory + filename, bucket_directory + filename)
return "File uploaded!"
#Call this function in your code:
upload_files()
NOTE: I have tested the code in Google App Engine web app and it worked for me. Take the idea of how it is working and modify it according to your needs. I hope that was helpful.
Related
I am facing a little issue here that I can't explain.
On some occasions, I am able to open files from my cloud storage buckets using a GSutil URI. For instance this one works fine
df = pd.read_csv('gs://poker030120203/ouptut_test.csv')
But on some other occasions, this method does not work & returns an error FileNotFoundError: [Errno 2] No such file or directory
This happens for instance with the following codes
rank_table_filename = 'gs://poker030120203/rank_table.bin'
rank_table_file = open(rank_table_filename, "r")
preflop_table_filename = 'gs://poker030120203/preflop_table.npy'
self.preflop_table = np.load(preflop_table_filename)
I am not sure if this is related to the "open" or "load" methode, or maybe the file type, but I can't figure out why this return an error. I do not know if this has an impact on that matter, but I'm running everything from Vertex (ie. the AI module that automatically sets up a storage bucket / a VM and a jupyter notebook).
Thanks a lot for the help
In order to read and write the file from the google cloud storage, you can use google recommended methods. It's easier to use google client libraries to read / write anything from / in Google Cloud Storage.
From the doc Example:
from google.cloud import storage
def write_read(bucket_name, blob_name):
"""Write and read a blob from GCS using file-like IO"""
# The ID of your GCS bucket
# bucket_name = "your-bucket-name"
# The ID of your new GCS object
# blob_name = "storage-object-name"
storage_client = storage.Client()
bucket = storage_client.bucket(bucket_name)
blob = bucket.blob(blob_name)
# Mode can be specified as wb/rb for bytes mode.
# See: https://docs.python.org/3/library/io.html
with blob.open("w") as f:
f.write("Hello world")
with blob.open("r") as f:
print(f.read())
As the topic indicates...
I have try two ways and none of them work:
First:
I want to programmatically talk to GCS in Python. such as reading gs://{bucketname}/{blobname} as a path or a file. The only thing I can find is a gsutil module, however it seems used in a commend line instead of a python application.
i find a code here Accessing data in google cloud bucket, but still confused on how to retrieve it to a type i need. there is a jpg file in the bucket, and want to download it for a text detection, this will be deploy on google funtion.
Second:
download_as_bytes()method, Link to the blob document I import the googe.cloud.storage module and provide the GCP key, however the error rise saying the Blob has no attribute of download_as_bytes().
is there anything else i haven't try? Thank you!
for the reference:
def text_detected(user_id):
bucket=storage_client.bucket(
'img_platecapture')
blob=bucket.blob({user_id})
content= blob.download_as_bytes()
image = vision.Image(content=content) #insert a content
response = vision_client.text_detection(image=image)
if response.error.message:
raise Exception(
'{}\nFor more info on error messages, check: '
'https://cloud.google.com/apis/design/errors'.format(
response.error.message))
img = Image.open(input_file) #insert a path
draw = ImageDraw.Draw(img)
font = ImageFont.truetype("simsun.ttc", 18)
for text in response.text_annotations[1::]:
ocr = text.description
draw.text((bound.vertices[0].x-25, bound.vertices[0].y-25),ocr,fill=(255,0,0),font=font)
draw.polygon(
[
bound.vertices[0].x,
bound.vertices[0].y,
bound.vertices[1].x,
bound.vertices[1].y,
bound.vertices[2].x,
bound.vertices[2].y,
bound.vertices[3].x,
bound.vertices[3].y,
],
None,
'yellow',
)
texts=response.text_annotations
a=str(texts[0].description.split())
b=re.sub(u"([^\u4e00-\u9fa5\u0030-u0039])","",a)
b1="".join(b)
print("偵測到的地址為:",b1)
return b1
#handler.add(MessageEvent, message=ImageMessage)
def handle_content_message(event):
message_content = line_bot_api.get_message_content(event.message.id)
user = line_bot_api.get_profile(event.source.user_id)
data=b''
for chunk in message_content.iter_content():
data+= chunk
global bucket_name
bucket_name = 'img_platecapture'
bucket = storage_client.bucket(bucket_name)
blob = bucket.blob(f'{user.user_id}.jpg')
blob.upload_from_string(data)
text_detected1=text_detected(user.user_id) ####Here's the problem
line_bot_api.reply_message(
event.reply_token,
messages=TextSendMessage(
text=text_detected1
))
reference code(gcsfs/fsspec):
gcs = gcsfs.GCSFileSystem()
bucket=storage_client.bucket('img_platecapture')
blob=bucket.blob({user_id})
f =fsspec.open("gs://img_platecapture/{user_id}")
with f.open({user_id}, "rb") as fp:
content = fp.read()
image = vision.Image(content=content)
response = vision_client.text_detection(image=image)
You can do that with the Cloud Storage Python client :
def download_blob(bucket_name, source_blob_name, destination_file_name):
"""Downloads a blob from the bucket."""
# The ID of your GCS bucket
# bucket_name = "your-bucket-name"
# The ID of your GCS object
# source_blob_name = "storage-object-name"
# The path to which the file should be downloaded
# destination_file_name = "local/path/to/file"
storage_client = storage.Client()
bucket = storage_client.bucket(bucket_name)
# Construct a client side representation of a blob.
# Note `Bucket.blob` differs from `Bucket.get_blob` as it doesn't retrieve
# any content from Google Cloud Storage. As we don't need additional data,
# using `Bucket.blob` is preferred here.
blob = bucket.blob(source_blob_name)
# blob.download_to_filename(destination_file_name)
# blob.download_as_string()
blob.download_as_bytes()
print(
"Downloaded storage object {} from bucket {} to local file {}.".format(
source_blob_name, bucket_name, destination_file_name
)
)
You can use the following methods :
blob.download_to_filename(destination_file_name)
blob.download_as_string()
blob.download_as_bytes()
To be able to correctly use this library, you have to install the expected pip package in your virtual env.
Example of project structure :
my-project
requirements.txt
your_python_script.py
The requirements.txt file :
google-cloud-storage==2.6.0
Run the following command :
pip install -r requirements.txt
In your case maybe the package was not installed correctly in your virtual env, that's why you could not access to the download_as_bytes method.
I'd be using fsspec's GCS filesystem implementation instead.
https://github.com/fsspec/gcsfs/
>>> import gcsfs
>>> fs = gcsfs.GCSFileSystem(project='my-google-project')
>>> fs.ls('my-bucket')
['my-file.txt']
>>> with fs.open('my-bucket/my-file.txt', 'rb') as f:
... print(f.read())
b'Hello, world'
https://gcsfs.readthedocs.io/en/latest/#examples
I have a project in hand to backup a website for some reasons. I use Requests in Python to crawl the contents and images (urls). The problem is, how can I save the image in the cloud, by using the url of that image, in Cloud services (Google Drive, Dropbox, etc.).
I know there is a way to first save the image locally and then upload the local image to the cloud. But I'm wondering if there are APIs that support uploading images by urls, not the local file.
It seems like Dropbox has a feature called /save_url that
lets app developers upload files to Dropbox by just providing a URL, without having to download the file first.
https://www.dropbox.com/developers-v1/core/docs#save-url
If you don't mind paying for the storage, you can save it to your own cloud storage. I occasionally have to do a similar action, and handle it as such:
def on_upload_image(self):
url = self.request.get('url')
result = urlfetch.fetch(url)
binary = result.content
blob_key = functions.get_blob_key_by_data(binary)
self.url = images.get_serving_url(blob_key, secure_url=True)
self.json()
from google.appengine.api import app_identity
def get_blob_key_by_data(data):
bucket = app_identity.get_default_gcs_bucket_name()
filename = hashlib.sha256(data).hexdigest()
mime_type = get_mime_type(data)
if not mime_type:
return None
gcs_filename = '/%s/image_%s' % (bucket, filename)
with gcs.open(gcs_filename, 'w', content_type=mime_type) as f:
f.write(data)
blob_key = blobstore.create_gs_key("/gs" + gcs_filename)
return blob_key
I want to store files and images that I get from an api in the blobstore (or rather so that they are accessible from the blobstore api). Since the file-api is deprecated, how do I do this?
One way is to store images in Cloud Storage (gcs) and access them via the blogstore api. Basically you call gcs.open() and write the file. Then when you need to use the blobstore api you call blobkey = blobstore.create_gs_key(). With that you can do things such as use the images api with calls like images.get_serving_url(blobkey, secure_url=False).
How you do that depends on what you're particular goals are. I am using to serve images in a gallery that I upload. To do that I have an file upload on an html form on the front end, which sends the file. On the backend I am doing this (these are just the broad strokes):
# inside the webapp2.RequestHandler get method:
import mimetypes
file_data = self.request.get("photoUpload", default_value = None)
filename = self.request.POST["photoUpload"].filename
folder = "someFolderName"
content_type = mimetypes.guess_type(self.filename)[0]
Then save the file data to GCS:
from google.appengine.api import app_identity
import cloudstorage as gcs
# gcs_filename must be unique so I'm using bucket/folder/file
# it would be smart to check uniqueness before proceeding
gcs_filename = '/%s%s/%s' % (bucket or app_identity.get_default_gcs_bucket_name(), folder, filename)
gcs.open(gcs_filename, 'w', content_type=content_type or b'binary/octet-stream', options={b'x-goog-acl': b'public-read'}) as f:
f.write(file_data)
Now I can access using the GCS api with calls like:
gcs.delete(gcs_filename)
Or use the Blobstore API by getting the previously mentioned blocky:
blobkey = blobstore.create_gs_key()
I'm looking for a solution on how to upload a picture from an external url like http://example.com/image.jpg to google cloud storage using appengine python,
I am now using
blobstore.create_upload_url('/uploadSuccess', gs_bucket_name=bucketPath)
for users that want to upload a picture from their computer, calling
images.get_serving_url(gsk,size=180,crop=True)
on uploadSuccess and storing that as their profile image. I'm trying to allow users to use their facebook or google profile picture after they login with oauth2. I have access to their profile picture link, and I would just like to copy it for consistency. Pease help :)
To upload an external image you have to get it and save it.
To get the image you van use this code:
from google.appengine.api import urlfetch
file_name = 'image.jpg'
url = 'http://example.com/%s' % file_name
result = urlfetch.fetch(url)
if result.status_code == 200:
doSomethingWithResult(result.content)
To save the image you can use the app engine GCS client code shown here
import cloudstorage as gcs
import mimetypes
doSomethingWithResult(content):
gcs_file_name = '/%s/%s' % ('bucket_name', file_name)
content_type = mimetypes.guess_type(file_name)[0]
with gcs.open(gcs_file_name, 'w', content_type=content_type,
options={b'x-goog-acl': b'public-read'}) as f:
f.write(content)
return images.get_serving_url(blobstore.create_gs_key('/gs' + gcs_file_name))
Here is my new solution (2019) using the google-cloud-storage library and upload_from_string() function only (see here):
from google.cloud import storage
import urllib.request
BUCKET_NAME = "[project_name].appspot.com" # change project_name placeholder to your preferences
BUCKET_FILE_PATH = "path/to/your/images" # change this path
def upload_image_from_url_to_google_storage(img_url, img_name):
"""
Uploads an image from a URL source to google storage.
- img_url: string URL of the image, e.g. https://picsum.photos/200/200
- img_name: string name of the image file to be stored
"""
storage_client = storage.Client()
bucket = storage_client.get_bucket(BUCKET_NAME)
blob = bucket.blob(BUCKET_FILE_PATH + "/" + img_name + ".jpg")
# try to read the image URL
try:
with urllib.request.urlopen(img_url) as response:
# check if URL contains an image
info = response.info()
if(info.get_content_type().startswith("image")):
blob.upload_from_string(response.read(), content_type=info.get_content_type())
print("Uploaded image from: " + img_url)
else:
print("Could not upload image. No image data type in URL")
except Exception:
print('Could not upload image. Generic exception: ' + traceback.format_exc())
If you're looking for an updated way of doing this relying on storages package, I wrote those 2 functions:
import requests
from storages.backends.gcloud import GoogleCloudStorage
def download_file(file_url, file_name):
response = requests.get(file_url)
if response.status_code == 200:
upload_to_gc(response.content, file_name)
def upload_to_gc(content, file_name):
gc_file_name = "{}/{}".format("some_container_name_here", file_name)
with GoogleCloudStorage().open(name=gc_file_name, mode='w') as f:
f.write(content)
Then normally call download_file() and pass url and prefered_file_name from anywhere within your system.
The class GoogleCloudStorage came from django-storages package.
pip install django-storages
Django Storages