Getting vertices where google vision API found words - python

I'm working with the Google Vision API.
I would like to get the vertices ((x,y) locations) of the rectangles where google vision found a block of words. So far I'm getting the text from the google client.
credentials = service_account.Credentials.from_service_account_file("/api-key.json")
client = vision.ImageAnnotatorClient(credentials=credentials)
#open file
with io.open(path, 'rb') as image_file:
content = image_file.read()
#call api
image = types.Image(content=content)
response = client.document_text_detection(image=image)
document = response.full_text_annotation
What I would like is to get the vertices for each block of words in document.text.

It seems like Google has updated the documentation, although it is not so easy to find.
See tutorial on the Google Vision API here
The vertices can be found in response.text_annotations

From google documentation, you can find how the API response is structured (BLOCK, PARAGRAPH, ...) and how to retrieve corresponding vertices.
Particularly this function:
def get_document_bounds(image_file, feature):
"""Returns document bounds given an image."""
client = vision.ImageAnnotatorClient()
bounds = []
with io.open(image_file, 'rb') as image_file:
content = image_file.read()
image = types.Image(content=content)
response = client.document_text_detection(image=image)
document = response.full_text_annotation
# Collect specified feature bounds by enumerating all document features
for page in document.pages:
for block in page.blocks:
for paragraph in block.paragraphs:
for word in paragraph.words:
for symbol in word.symbols:
if (feature == FeatureType.SYMBOL):
bounds.append(symbol.bounding_box)
if (feature == FeatureType.WORD):
bounds.append(word.bounding_box)
if (feature == FeatureType.PARA):
bounds.append(paragraph.bounding_box)
if (feature == FeatureType.BLOCK):
bounds.append(block.bounding_box)
if (feature == FeatureType.PAGE):
bounds.append(block.bounding_box)
# The list `bounds` contains the coordinates of the bounding boxes.
return bounds

Related

GOOGLE VISION API (Python): how to cut a specific object recognized by (OBJECT_LOCALIZATION)

I'm having difficulties working with Normalized bounding polygon vertices
from google.cloud import vision
client = vision.ImageAnnotatorClient()
with open("resource/carro4.jpg", 'rb') as image_file:
content = image_file.read()
image = vision.Image(content=content)
objects = client.object_localization(
image=image).localized_object_annotations
print('Number of objects found: {}'.format(len(objects)))
for object_ in objects:
if object_.name == 'License plate':
print('\n{} (confidence: {})'.format(object_.name, object_.score))
print('Normalized bounding polygon vertices: ')
for vertex in object_.bounding_poly.normalized_vertices:
print(' - ({}, {})'.format(vertex.x, vertex.y))

Have an image link from iMDB api, how can I change it's size pyton

I am trying to work with IMDb API. My code thus far is
import http.client
import json
import requests
conn = http.client.HTTPSConnection("imdb-api.com", 443)
payload = ''
headers = {'User-agent': 'Chrome/95.0'}
conn.request("GET", "https://imdb-api.com/en/API/MostPopularMovies/<API_Key>",headers=headers)
res = conn.getresponse()
data = res.read()
convertedDict = json.loads(data.decode("utf-8"))
imagepath = r'venv/files/image.jpeg'
req = requests.get(convertedDict['items'][0]['image'], headers=headers)
with open(imagepath, 'wb') as file:
file.write(req.content)
This allows me to download the image of the first popular movie, however, the image size is really small. This is the link that I am downloading. I know that if I get rid of everything after # the image will become a lot larger. Is there a way to edit the link such that I can drop everything after # and even edit the numbers after UX with code?
Everything I try to do with string or URL operations give's me an error
https://m.media-amazon.com/images/M/MV5BZWMyYzFjYTYtNTRjYi00OGExLWE2YzgtOGRmYjAxZTU3NzBiXkEyXkFqcGdeQXVyMzQ0MzA0NTM#._V1_UX128_CR0,3,128,176_AL_.jpg
Thank you in advance
Explanation
(code example below)
Here's how to get a bigger image of the size you want. Given this URL,
https://m.media-amazon.com/images/M/MV5BZWMyYzFjYTYtNTRjYi00OGExLWE2YzgtOGRmYjAxZTU3NzBiXkEyXkFqcGdeQXVyMzQ0MzA0NTM#._V1_UX128_CR0,3,128,176_AL_.jpg
There's a substring of it:
UX128_CR0,3,128,176
This has three important parts:
The first 128 resizes the image by width, keeping ratio
The second 128 controls the container width that the image appears in
176 controls the container height that the image appears in.
So, we can view the structure like this:
UX<image_width>_CR0,3,<container_width>,<container_height>
As an example, to double the image size:
UX256_CR0,3,256,352_AL_.jpg
(Click here to see: https://m.media-amazon.com/images/M/MV5BZWMyYzFjYTYtNTRjYi00OGExLWE2YzgtOGRmYjAxZTU3NzBiXkEyXkFqcGdeQXVyMzQ0MzA0NTM#.V1_UX256_CR0,3,256,352_AL.jpg
Update: Example of how you might do it in Python.
import re
resize_factor = 2 # Image size multiple
url = "https://m.media-amazon.com/images/M/MV5BZWMyYzFjYTYtNTRjYi00OGExLWE2YzgtOGRmYjAxZTU3NzBiXkEyXkFqcGdeQXVyMzQ0MzA0NTM#._V1_UX128_CR0,3,128,176_AL_.jpg"
#
# resize_factor : Image size multiplier (e.g., resize_factor = 2 doubles the image size, positive integer only)
# url : full URL of the image
# return : string of the new URL
#
def getURL(resize_factor, url):
# Regex for pattern matching relevant parts of the URL
p = re.compile(".*UX([0-9]*)_CR0,([0-9]*),([0-9]*),([0-9]*).*")
match = p.search(url)
if match:
# Get the image dimensions from the URL
img_width = str(int(match.group(1)) * resize_factor)
container_width = str(int(match.group(3)) * resize_factor)
container_height = str(int (match.group(4)) * resize_factor)
# Change the image dimensions
result = re.sub(r"(.*UX)([0-9]*)(.*)", r"\g<1>"+ img_width +"\g<3>", url)
result = re.sub(r"(.*UX[0-9]*_CR0,[0-9]*,)([0-9]*)(.*)", r"\g<1>"+ img_width +"\g<3>", result)
result = re.sub(r"(.*UX[0-9]*_CR0,[0-9]*,[0-9]*,)([0-9]*)(.*)", r"\g<1>"+ container_height +"\g<3>", result)
return result
#
# Test
#
print (getURL(resize_factor,url))
Edit: Typo

How to detect emotions with Azure API?

I'd like to create a simple Python app recognizing face emotions from given URL via Azure Face/Emotions API.
I'm following this documentation:
https://learn.microsoft.com/en-us/azure/cognitive-services/face/quickstarts/python-sdk#authenticate-the-client
https://learn.microsoft.com/en-us/python/api/azure-cognitiveservices-vision-face/azure.cognitiveservices.vision.face.models.emotion?view=azure-python
So, far, I did the face recognition part, but I'm kinda stuck how to call the Emotion model and display results.
import urllib.request
from azure.cognitiveservices.vision.face import FaceClient
from azure.cognitiveservices.vision.face.models import Emotion
from msrest.authentication import CognitiveServicesCredentials
# Image
URL = "https://upload.wikimedia.org/wikipedia/commons/5/55/Dalailama1_20121014_4639.jpg"
# API
KEY = "xxx"
ENDPOINT = "https://happyai.cognitiveservices.azure.com/"
# Now there is a trained endpoint that can be used to make a prediction
predictor = FaceClient(ENDPOINT, CognitiveServicesCredentials(KEY))
detected_faces = predictor.face.detect_with_url(url=URL)
if not detected_faces:
raise Exception('No face detected from image {}'.format(URL))
# Display the detected face ID in the first single-face image.
# Face IDs are used for comparison to faces (their IDs) detected in other images.
print('Detected face ID from', URL, ':')
for face in detected_faces: print (face.face_id)
print()
# Save this ID for use in Find Similar
first_image_face_ID = detected_faces[0].face_id
# Call Emotion model
# Display the results.
Any help would be greatly appreciated.
Thanks!
You can use the following code to do the emotion detection,
def det_emotion(self, frame, count):
image_path = self.path_folder + "/img/frame%d.jpg" % count
image_data = open(image_path, "rb")
params = {
'returnFaceId': 'true',
'returnFaceLandmarks': 'false',
'returnRecognitionModel':'false',
}
response = requests.post(self.face_api_url, params=params,data=image_data)
response.raise_for_status()
faces = response.json()
frame = self.add_square(frame, faces)
return frame
in order to get the emotion in return, you need to specifically define the 'emotion' attribute you want to return in the "return_face_attributes" of the 'detect_with_url' function. Please refer to the following code:
face_attributes = ['emotion']
detected_faces = predictor.face.detect_with_url(url=URL, return_face_attributes=face_attributes)
Then when you loop through all the detected faces, you can reach the Emotion object of each face by calling:
for face in detected_faces:
emotionObject = face.face_attributes.emotion
The emotionObject contains 8 different emotions: 'anger', 'contempt', 'disgust', 'fear', 'happiness', 'neutral', 'sadness', 'surprise'.
Since the emotionObject is not an iterable object, and I don't know how to get just the emotion with the highest confident, so I wrote a sub function to convert it into a dictionary type and add it into the face iteration loop:
def get_emotion(emoObject):
emoDict = dict()
emoDict['anger'] = emoObject.anger
emoDict['contempt'] = emoObject.contempt
emoDict['disgust'] = emoObject.disgust
emoDict['fear'] = emoObject.fear
emoDict['happiness'] = emoObject.happiness
emoDict['neutral'] = emoObject.neutral
emoDict['sadness'] = emoObject.sadness
emoDict['surprise'] = emoObject.surprise
emo_name = max(emoDict, key=emoDict.get)
emo_level = emoDict[emo_name]
return emo_name, emo_level
for face in detected_emotional_faces:
emotion, confidence = get_emotion(face.face_attributes.emotion)
print("{} emotion with confidence level {}".format(emotion, confidence))

How to get a batch response from google vision text detection API?

I'm currently using google vision's text_detection API for single images but I want to get batch responses. I attempted using BatchAnnotateImagesRequest, but I haven't got it working as of yet.
What i'm doing for getting a response for one image.
client = vision.ImageAnnotatorClient()
with io.open(path, 'rb') as image_file:
content = image_file.read()
image = vision.types.Image(content=content)
response = client.document_text_detection(image=image)
texts = response.text_annotations
There's information regarding batch requests to Google's text detection API in the public documentation.
In the documentation you can find some samples written in python you could use to do batch requests, with a limit of 2000 files per batch:
from google.cloud import vision_v1
from google.cloud.vision_v1 import enums
import six
def sample_async_batch_annotate_images(input_image_uri, output_uri):
"""Perform async batch image annotation"""
client = vision_v1.ImageAnnotatorClient()
# input_image_uri = 'gs://cloud-samples-data/vision/label/wakeupcat.jpg'
# output_uri = 'gs://your-bucket/prefix/'
if isinstance(input_image_uri, six.binary_type):
input_image_uri = input_image_uri.decode('utf-8')
if isinstance(output_uri, six.binary_type):
output_uri = output_uri.decode('utf-8')
source = {'image_uri': input_image_uri}
image = {'source': source}
type_ = enums.Feature.Type.LABEL_DETECTION
features_element = {'type': type_}
type_2 = enums.Feature.Type.IMAGE_PROPERTIES
features_element_2 = {'type': type_2}
features = [features_element, features_element_2]
requests_element = {'image': image, 'features': features}
requests = [requests_element]
gcs_destination = {'uri': output_uri}
# The max number of responses to output in each JSON file
batch_size = 2
output_config = {'gcs_destination': gcs_destination, 'batch_size': batch_size}
operation = client.async_batch_annotate_images(requests, output_config)
print('Waiting for operation to complete...')
response = operation.result()
# The output is written to GCS with the provided output_uri as prefix
gcs_output_uri = response.output_config.gcs_destination.uri
print('Output written to GCS with prefix: {}'.format(gcs_output_uri))
Along the sample code you can also find sample of the output you can expect when executing the batched request. More information regarding batch requests can be found here.

Is there a cleaner way to rotate smartphone images uploaded via flask before pushing to S3?

I'm building a webapp that takes uploaded images, stores them on Amazon S3 and then stores the URL in a SQLite database. Unfortunately, EXIF tags cause images that were taken via a smartphone to appear rotated (since they are landscape images w/ EXIF orientation tags).
Currently, my environment grabs the file from the POST data, saves it to my static files folder, rotates image (if needed) with PIL, pushes to S3 and finally deletes the local copy. Here is a little of the code involved:
from PIL import Image
import boto
from boto.s3.connection import S3Connection
from boto.s3.key import Key
def fix_orientation(filename):
img = Image.open(filename)
if hasattr(img, '_getexif'):
exifdata = img._getexif()
try:
orientation = exifdata.get(274)
except:
# There was no EXIF Orientation Data
orientation = 1
else:
orientation = 1
if orientation is 1: # Horizontal (normal)
pass
elif orientation is 2: # Mirrored horizontal
img = img.transpose(Image.FLIP_LEFT_RIGHT)
elif orientation is 3: # Rotated 180
img = img.rotate(180)
elif orientation is 4: # Mirrored vertical
img = img.rotate(180).transpose(Image.FLIP_LEFT_RIGHT)
elif orientation is 5: # Mirrored horizontal then rotated 90 CCW
img = img.rotate(-90).transpose(Image.FLIP_LEFT_RIGHT)
elif orientation is 6: # Rotated 90 CCW
img = img.rotate(-90)
elif orientation is 7: # Mirrored horizontal then rotated 90 CW
img = img.rotate(90).transpose(Image.FLIP_LEFT_RIGHT)
elif orientation is 8: # Rotated 90 CW
img = img.rotate(90)
#save the result and overwrite the originally uploaded image
img.save(filename)
def push_to_s3(**kwargs):
try:
conn = S3Connection(app.config["S3_KEY"], app.config["S3_SECRET"])
buckets = [bucket.name for bucket in conn.get_all_buckets()]
bucket = conn.get_bucket(app.config["S3_BUCKET"])
k = Key(bucket)
k.key = app.config["S3_UPLOAD_DIR"] + kwargs.get("filename")
k.set_contents_from_filename(kwargs.get("photo"))
k.make_public()
return k
except Exception, e:
abort(500)
Here is handling the POST data
# Retrieving Form POST Data
fi = request.files.get("file")
#print "Storing and Rotating File (if needed)"
f = photos.save(fi)
path = photos.path(f)
fix_orientation(path)
#print "Uploading to S3"
img = push_to_s3(photo=path, filename=filename)
#print "Deleting Local Version"
os.remove(path)
The above solution works on Heroku's servers, but it just seems very duct tape'd together of a solution. Is there are cleaner way to do what I'm doing. That is, take a uploaded file, rotate it from memory and then push to S3?
I'm also using Flask-Uploads to handle storage of the upload images.
For what it is worth, Pillow supports a number of other inputs than a file name - including bytearray, buffer, and file-like object. The third is most probably what you are looking for, as anything loaded out of request.files is just a FileStorage file-like object. That simplifies the load-and-transform code to:
def fix_orientation(file_like_object):
img = Image.open(filename)
# ... snip ...
data = BytesIO()
img.save(data)
return data
Since we are going to be passing around data without using the filesystem very much, we can also switch to using boto.s3.key.Key's set_contents_from_file method instead of set_contents_from_filename:
def push_to_s3(photo, filename):
# ... snip ...
k.set_contents_from_file(photo, rewind=True)
# ... etc. ...
That simplifies the resulting implementation to:
# Retrieving Form POST Data
fi = request.files.get("file")
# print "Rotating File (if needed)"
fi = fix_orientation(fi)
# print "Uploading to S3"
push_to_s3(photo=fi, filename=filename)

Categories

Resources