I'mt trying to use goole vision api but i can't run my python script without getting the following error:
google.auth.exceptions.DefaultCredentialsError: ('File /root/GoogleCloudStaff/apikey.json is not a valid json file.', ValueError('Invalid control character at: line 5 column 37 (char 172)',))
My python script:
import io
from google.cloud import vision
vision_client = vision.Client()
#file_name = "/var/www/FlaskApp/FlaskApp/static/"#'375px-Guido_van_Rossum_OSCON_2006_cropped.png'
file_name = '1200px-Guido_van_Rossum_OSCON_2006.jpg'
#file_name = "/var/www/FlaskApp/FlaskApp/static/cyou_pic_folders/cyou_folder_2017_11_16_10_26_18/pi_pic_lc_2017_11_16_10_26_1800049.png"
with io.open(file_name, 'rb') as image_file:
content = image_file.read()
image = vision_client.image(
content=content, )
labels = image.detect_labels()
for label in labels:
print(label.description)
Thanks very much!
DefaultCredentialsError indicates that you failed acquiring default credentials.Have you done initial set up in a proper manner?
Take a look at vision
The error you are facing might be attributed to an issue with the service account key itself. \n is a control character present in the key which signifies a new line, which might be causing the issue. To solve the error, you can either validate the content of the JSON file or you can download the key from Google Cloud again. The key can be downloaded by following these instructions.
After acquiring the service account key, the environment variable GOOGLE_APPLICATION_CREDENTIALS has to be set depending on the Operating System used.
As the final step, run the following Python code which performs a labeling task using the Cloud Vision API. The service account key will be automatically used to authenticate the labeling request.
import io
import os
# Imports the Google Cloud client library
from google.cloud import vision
# Instantiates a client
client = vision.ImageAnnotatorClient()
# The name of the image file to annotate
file_name = os.path.abspath('path/to/file/sample.jpg')
# Loads the image into memory
with io.open(file_name, 'rb') as image_file:
content = image_file.read()
image = vision.Image(content=content)
# Performs label detection on the image file
response = client.label_detection(image=image)
labels = response.label_annotations
print('Labels:')
for label in labels:
print(label.description)
The code prints out the labels which the API returns. The Vision API can detect and extract information about entities in an image, across a broad group of categories.
You seem to be missing the authentication config. From Using the client library:
Using the client library
To run the client library, you must first set up authentication.
Related
I am getting memory error while creating simple dataframe read from CSV file on Azure Machine Learning using notebook VM as compute instance. The VM has config of DS 13 56gb RAM, 8vcpu, 112gb storage on Ubuntu (Linux (ubuntu 16.04). CSV file is 5gb file.
blob_service = BlockBlobService(account_name,account_key)
blobstring = blob_service.get_blob_to_text(container,filepath).content
dffinaldata = pd.read_csv(StringIO(blobstring), sep=',')
What I am doing wrong here ?
you need to provide the right encoding when calling get_blob_to_text, please refer to the sample.
The code below is what normally use for reading data file in blob storages. Basically, you can use blob’s url along with sas token and use a request method. However, You might want to edit the ‘for loop’ depending what types of data you have (e.g. csv, jpg, and etc).
-- Python code below --
import requests
from azure.storage.blob import BlockBlobService, BlobPermissions
from azure.storage.blob.baseblobservice import BaseBlobService
from datetime import datetime, timedelta
account_name = '<account_name>'
account_key = '<account_key>'
container_name = '<container_name>'
blob_service=BlockBlobService(account_name,account_key)
generator = blob_service.list_blobs(container_name)
for blob in generator:
url = f"https://{account_name}.blob.core.windows.net/{container_name}"
service = BaseBlobService(account_name=account_name, account_key=account_key)
token = service.generate_blob_shared_access_signature(container_name, img_name, permission=BlobPermissions.READ, expiry=datetime.utcnow() + timedelta(hours=1),)
url_with_sas = f"{url}?{token}"
response = requests.get(url_with_sas)
Please follow the below link to read data on Azure Blob Storage.
https://learn.microsoft.com/en-us/azure/machine-learning/how-to-access-data
The google cloud vision api works fine on one pdf pdf1 but returns absolutely nothing on the other pdf pdf2. I'm unable to make sense of this behavior as both the pdfs are very similar and have almost the same font.Please help.
I'm using the code given in their examples section by uploading these files in a google cloud bucket.
def async_detect_document(gcs_source_uri, gcs_destination_uri):
"""OCR with PDF/TIFF as source files on GCS"""
from google.cloud import vision
from google.cloud import storage
from google.protobuf import json_format
# Supported mime_types are: 'application/pdf' and 'image/tiff'
mime_type = 'application/pdf'
# How many pages should be grouped into each json output file.
batch_size = 2
client = vision.ImageAnnotatorClient()
feature = vision.types.Feature(
type=vision.enums.Feature.Type.DOCUMENT_TEXT_DETECTION)
gcs_source = vision.types.GcsSource(uri=gcs_source_uri)
input_config = vision.types.InputConfig(
gcs_source=gcs_source, mime_type=mime_type)
gcs_destination = vision.types.GcsDestination(uri=gcs_destination_uri)
output_config = vision.types.OutputConfig(
gcs_destination=gcs_destination, batch_size=batch_size)
async_request = vision.types.AsyncAnnotateFileRequest(
features=[feature], input_config=input_config,
output_config=output_config)
operation = client.async_batch_annotate_files(
requests=[async_request])
print('Waiting for the operation to finish.')
operation.result(timeout=180)
# Once the request has completed and the output has been
# written to GCS, we can list all the output files.
storage_client = storage.Client()
match = re.match(r'gs://([^/]+)/(.+)', gcs_destination_uri)
bucket_name = match.group(1)
prefix = match.group(2)
bucket = storage_client.get_bucket(bucket_name=bucket_name)
# List objects with the given prefix.
blob_list = list(bucket.list_blobs(prefix=prefix))
print('Output files:')
for blob in blob_list:
print(blob.name)
# Process the first output file from GCS.
# Since we specified batch_size=2, the first response contains
# the first two pages of the input file.
output = blob_list[0]
json_string = output.download_as_string()
response = json_format.Parse(
json_string, vision.types.AnnotateFileResponse())
# The actual response for the first page of the input file.
first_page_response = response.responses[0]
annotation = first_page_response.full_text_annotation
# Here we print the full text from the first page.
# The response contains more information:
# annotation/pages/blocks/paragraphs/words/symbols
# including confidence scores and bounding boxes
print(u'Full text:\n{}'.format(
annotation.text))
It probably has nothing to do with the GCloud API, I tried uploading your pdf to the vision drag and drop website and it returns expected results. Maybe at some point in your pipeline, the pdf is corrupted in any way? what does it look like in gcloud storage?
we also faced this issue and after doing few experiments it seems to me this is due to some font google vision not able to support.
To solve this generate pdf to image and then send image to process will provide result.
I'm playing around with some scripts in python and trying to find if these image returns results. However when running python doesn't print anything. I don't get an error but can't seem to figure it out.
import io
import os
# Imports the Google Cloud client library
from google.cloud import vision
from google.cloud.vision import types
def run(annotations):
# Instantiates a client
client = vision.ImageAnnotatorClient()
# The name of the image file to annotate
file_name = os.path.join(
os.path.dirname(__file__),
'static/123456.jpg')
# Loads the image into memory
with io.open(file_name, 'rb') as image_file:
content = image_file.read()
image = types.Image(content=content)
if annotations.pages_with_matching_images:
print('\n{} Pages with matching images retrieved'.format(
len(annotations.pages_with_matching_images)))
matching = annotations.pages_with_matching_images
print matching
I'm basing the work on these examples
https://cloud.google.com/vision/docs/quickstart-client-libraries#client-libraries-install-python
https://cloud.google.com/vision/docs/internet-detection
You are missing some key parts:
import io
import os
# Imports the Google Cloud client library
from google.cloud import vision
from google.cloud.vision import types
def run(): # remove the argument since you aren't using it
# Instantiates a client
client = vision.ImageAnnotatorClient()
# The name of the image file to annotate
file_name = os.path.join(
os.path.dirname(__file__),
'static/123456.jpg')
# Loads the image into memory
with io.open(file_name, 'rb') as image_file:
content = image_file.read()
image = types.Image(content=content) # dedent this
web_detection = client.web_detection(image=image).web_detection
""" annotations doesn't exist in your script as is...
if annotations.pages_with_matching_images:
print('\n{} Pages with matching images retrieved'.format(
len(annotations.pages_with_matching_images)))
"""
# replace above with this
if web_detection.pages_with_matching_images:
print('\n{} Pages with matching images retrieved'.format(
len(web_detection.pages_with_matching_images)))
if __name__ == '__main__':
run()
Some of the key issues you need to watch out for when editing tutorial scripts is following your objects along. You can't use an object that is called in the tutorial in your own script if you don't first create that object.
I am trying to run the quick start demo by Google Vision APIs on MacOS Sierra.
def run_quickstart():
# [START vision_quickstart]
import io
import os
# Imports the Google Cloud client library
from google.cloud import vision
# Instantiates a client
vision_client = vision.Client()
# The name of the image file to annotate
file_name = os.path.join(
os.path.dirname(__file__),
'resources/wakeupcat.jpg')
# Loads the image into memory
with io.open(file_name, 'rb') as image_file:
content = image_file.read()
image = vision_client.image(
content=content)
# Performs label detection on the image file
labels = image.detect_labels()
print('Labels:')
for label in labels:
print(label.description)
# [END vision_quickstart]
if __name__ == '__main__':
run_quickstart()
Script looks as above. I am using Service Account key file to authenticate. As document suggested I have installed google-vision dependencies via pip and set up an environment variable with,
export GOOGLE_APPLICATION_CREDENTIALS=/my_credentials.json
Environment variable is correctly set. Still script raises,
oauth2client.client.HttpAccessTokenRefreshError: invalid_grant: Invalid JWT Signature.
There are similar questions asked when using API keys, when using Service account file was not mentioned.
I want to store files and images that I get from an api in the blobstore (or rather so that they are accessible from the blobstore api). Since the file-api is deprecated, how do I do this?
One way is to store images in Cloud Storage (gcs) and access them via the blogstore api. Basically you call gcs.open() and write the file. Then when you need to use the blobstore api you call blobkey = blobstore.create_gs_key(). With that you can do things such as use the images api with calls like images.get_serving_url(blobkey, secure_url=False).
How you do that depends on what you're particular goals are. I am using to serve images in a gallery that I upload. To do that I have an file upload on an html form on the front end, which sends the file. On the backend I am doing this (these are just the broad strokes):
# inside the webapp2.RequestHandler get method:
import mimetypes
file_data = self.request.get("photoUpload", default_value = None)
filename = self.request.POST["photoUpload"].filename
folder = "someFolderName"
content_type = mimetypes.guess_type(self.filename)[0]
Then save the file data to GCS:
from google.appengine.api import app_identity
import cloudstorage as gcs
# gcs_filename must be unique so I'm using bucket/folder/file
# it would be smart to check uniqueness before proceeding
gcs_filename = '/%s%s/%s' % (bucket or app_identity.get_default_gcs_bucket_name(), folder, filename)
gcs.open(gcs_filename, 'w', content_type=content_type or b'binary/octet-stream', options={b'x-goog-acl': b'public-read'}) as f:
f.write(file_data)
Now I can access using the GCS api with calls like:
gcs.delete(gcs_filename)
Or use the Blobstore API by getting the previously mentioned blocky:
blobkey = blobstore.create_gs_key()