I'm currently using google vision's text_detection API for single images but I want to get batch responses. I attempted using BatchAnnotateImagesRequest, but I haven't got it working as of yet.
What i'm doing for getting a response for one image.
client = vision.ImageAnnotatorClient()
with io.open(path, 'rb') as image_file:
content = image_file.read()
image = vision.types.Image(content=content)
response = client.document_text_detection(image=image)
texts = response.text_annotations
There's information regarding batch requests to Google's text detection API in the public documentation.
In the documentation you can find some samples written in python you could use to do batch requests, with a limit of 2000 files per batch:
from google.cloud import vision_v1
from google.cloud.vision_v1 import enums
import six
def sample_async_batch_annotate_images(input_image_uri, output_uri):
"""Perform async batch image annotation"""
client = vision_v1.ImageAnnotatorClient()
# input_image_uri = 'gs://cloud-samples-data/vision/label/wakeupcat.jpg'
# output_uri = 'gs://your-bucket/prefix/'
if isinstance(input_image_uri, six.binary_type):
input_image_uri = input_image_uri.decode('utf-8')
if isinstance(output_uri, six.binary_type):
output_uri = output_uri.decode('utf-8')
source = {'image_uri': input_image_uri}
image = {'source': source}
type_ = enums.Feature.Type.LABEL_DETECTION
features_element = {'type': type_}
type_2 = enums.Feature.Type.IMAGE_PROPERTIES
features_element_2 = {'type': type_2}
features = [features_element, features_element_2]
requests_element = {'image': image, 'features': features}
requests = [requests_element]
gcs_destination = {'uri': output_uri}
# The max number of responses to output in each JSON file
batch_size = 2
output_config = {'gcs_destination': gcs_destination, 'batch_size': batch_size}
operation = client.async_batch_annotate_images(requests, output_config)
print('Waiting for operation to complete...')
response = operation.result()
# The output is written to GCS with the provided output_uri as prefix
gcs_output_uri = response.output_config.gcs_destination.uri
print('Output written to GCS with prefix: {}'.format(gcs_output_uri))
Along the sample code you can also find sample of the output you can expect when executing the batched request. More information regarding batch requests can be found here.
Related
I am trying to work with IMDb API. My code thus far is
import http.client
import json
import requests
conn = http.client.HTTPSConnection("imdb-api.com", 443)
payload = ''
headers = {'User-agent': 'Chrome/95.0'}
conn.request("GET", "https://imdb-api.com/en/API/MostPopularMovies/<API_Key>",headers=headers)
res = conn.getresponse()
data = res.read()
convertedDict = json.loads(data.decode("utf-8"))
imagepath = r'venv/files/image.jpeg'
req = requests.get(convertedDict['items'][0]['image'], headers=headers)
with open(imagepath, 'wb') as file:
file.write(req.content)
This allows me to download the image of the first popular movie, however, the image size is really small. This is the link that I am downloading. I know that if I get rid of everything after # the image will become a lot larger. Is there a way to edit the link such that I can drop everything after # and even edit the numbers after UX with code?
Everything I try to do with string or URL operations give's me an error
https://m.media-amazon.com/images/M/MV5BZWMyYzFjYTYtNTRjYi00OGExLWE2YzgtOGRmYjAxZTU3NzBiXkEyXkFqcGdeQXVyMzQ0MzA0NTM#._V1_UX128_CR0,3,128,176_AL_.jpg
Thank you in advance
Explanation
(code example below)
Here's how to get a bigger image of the size you want. Given this URL,
https://m.media-amazon.com/images/M/MV5BZWMyYzFjYTYtNTRjYi00OGExLWE2YzgtOGRmYjAxZTU3NzBiXkEyXkFqcGdeQXVyMzQ0MzA0NTM#._V1_UX128_CR0,3,128,176_AL_.jpg
There's a substring of it:
UX128_CR0,3,128,176
This has three important parts:
The first 128 resizes the image by width, keeping ratio
The second 128 controls the container width that the image appears in
176 controls the container height that the image appears in.
So, we can view the structure like this:
UX<image_width>_CR0,3,<container_width>,<container_height>
As an example, to double the image size:
UX256_CR0,3,256,352_AL_.jpg
(Click here to see: https://m.media-amazon.com/images/M/MV5BZWMyYzFjYTYtNTRjYi00OGExLWE2YzgtOGRmYjAxZTU3NzBiXkEyXkFqcGdeQXVyMzQ0MzA0NTM#.V1_UX256_CR0,3,256,352_AL.jpg
Update: Example of how you might do it in Python.
import re
resize_factor = 2 # Image size multiple
url = "https://m.media-amazon.com/images/M/MV5BZWMyYzFjYTYtNTRjYi00OGExLWE2YzgtOGRmYjAxZTU3NzBiXkEyXkFqcGdeQXVyMzQ0MzA0NTM#._V1_UX128_CR0,3,128,176_AL_.jpg"
#
# resize_factor : Image size multiplier (e.g., resize_factor = 2 doubles the image size, positive integer only)
# url : full URL of the image
# return : string of the new URL
#
def getURL(resize_factor, url):
# Regex for pattern matching relevant parts of the URL
p = re.compile(".*UX([0-9]*)_CR0,([0-9]*),([0-9]*),([0-9]*).*")
match = p.search(url)
if match:
# Get the image dimensions from the URL
img_width = str(int(match.group(1)) * resize_factor)
container_width = str(int(match.group(3)) * resize_factor)
container_height = str(int (match.group(4)) * resize_factor)
# Change the image dimensions
result = re.sub(r"(.*UX)([0-9]*)(.*)", r"\g<1>"+ img_width +"\g<3>", url)
result = re.sub(r"(.*UX[0-9]*_CR0,[0-9]*,)([0-9]*)(.*)", r"\g<1>"+ img_width +"\g<3>", result)
result = re.sub(r"(.*UX[0-9]*_CR0,[0-9]*,[0-9]*,)([0-9]*)(.*)", r"\g<1>"+ container_height +"\g<3>", result)
return result
#
# Test
#
print (getURL(resize_factor,url))
Edit: Typo
I wanted to use Cloud Vision API to detect labels from ca. 40K photographs and download the results as CSV files. I uploaded photos into the cloud storage and used the following code, but the error occured. I asked a person who uses python in his job but he cannot deal with this error. Can you help mi with fixing it?
TypeError: Invalid constructor input for BatchAnnotateImagesRequest: [{'image': source {
image_uri: "gs://bucket/image-path.jpg"
}
, 'features': [{'type': <Type.LABEL_DETECTION: 4>}]}]
The code I used:
from google.cloud import
from google.cloud import storage
from google.cloud.vision_v1 import ImageAnnotatorClient
from google.cloud.vision_v1 import types
import os
import json
import numpy as np
os.environ["GOOGLE_APPLICATION_CREDENTIALS"]='C://file-path.json'
#(created in step 1)
# Get GCS bucket
storage_client = storage.Client()
bucket = storage_client.bucket('bucket_name')
image_paths = []
for blob in list(bucket.list_blobs()):
image_paths.append("gs://bucket_name/"+blob.name)
# We can send a maximum of 16 images per request.
start = 0
end = 16
label_output = []
for i in range(int(np.floor(len(image_paths)/16))+1):
requests = []
client = vision.ImageAnnotatorClient()
for image_path in image_paths[start:end]:
image = types.Image()
image.source.image_uri = image_path
requests.append({'image': image,'features': [{'type': vision.Feature.Type.LABEL_DETECTION}]})
response = client.batch_annotate_images(requests)
for image_path, i in zip(image_paths[start:end], response.responses):
labels = [{label.description: label.score} for label in i.label_annotations]
labels = {k: v for d in labels for k, v in d.items()}
filename = os.path.basename(image_path)
l = {'filename': filename, 'labels': labels}
label_output.append(l)
start = start+16
end = end+16
#export results to CSV file
for l in label_output:
print('"' + label_output[l]['filename'] + '";', end = '')
for label in label_output[l]["labels"]:
print('"' + label + '";"' + label_output[l][label] + '";', end = '')
print("")
batch_annotate_images() is not getting the contents of requests properly. To fix this, just assign your variable requests explicitly to the parameter requests of batch_annotate_images().
response = client.batch_annotate_images(requests=requests)
See batch_annotate_images() for reference. Also if you are planning to update your Vision API to 2.3.1, you might encounter errors on features: see this reference for the updated usage of its parameters.
I'm successfully trained my own dataset using Keras yolov3 Github project link
and I've got good predictions:
I would like to deploy this model on the web using flask to make it work with a stream or with IP cameras.
I saw many tutorials explains how to do that but, in reality, I did not find what I am looking for.
How can I get started?
You can use flask-restful to design a simple rest API.
You can use opencv VideoCapture to grab the video stream and get frames.
import numpy as np
import cv2
# Open a sample video available in sample-videos
vcap = cv2.VideoCapture('URL')
The client will take an image/ frame, encode it using base64, add other details like height, width, and make a request.
import numpy as np
import base64
import zlib
import requests
import time
t1 = time.time()
for _ in range(1000): # 1000 continuous request
frame = np.random.randint(0,256, (416,416,3), dtype=np.uint8) # dummy rgb image
# replace frame with your image
# compress
data = frame # zlib.compress(frame)
data = base64.b64encode(data)
data_send = data
#data2 = base64.b64decode(data)
#data2 = zlib.decompress(data2)
#fdata = np.frombuffer(data2, dtype=np.uint8)
r = requests.post("http://127.0.0.1:5000/predict", json={'imgb64' : data_send.decode(), 'w': 416, 'h': 416})
# make a post request
# print the response here
t2 = time.time()
print(t2-t1)
Your server will load the darknet model, and when it receives a post request it will simply return the model output.
from flask import Flask, request
from flask_restful import Resource, Api, reqparse
import json
import numpy as np
import base64
# compression
import zlib
# load keras model
# load_model('model.h5')
app = Flask(__name__)
api = Api(app)
parser = reqparse.RequestParser()
parser.add_argument('imgb64', location='json', help = 'type error')
parser.add_argument('w', type = int, location='json', help = 'type error')
parser.add_argument('h', type = int, location='json', help = 'type error')
class Predict(Resource):
def post(self):
request.get_json(force=True)
data = parser.parse_args()
if data['imgb64'] == "":
return {
'data':'',
'message':'No file found',
'status':'error'
}
img = data['imgb64']
w = data['w']
h = data['h']
data2 = img.encode()
data2 = base64.b64decode(data2)
#data2 = zlib.decompress(data2)
fdata = np.frombuffer(data2, dtype=np.uint8).reshape(w, h, -1)
# do model inference here
if img:
return json.dumps({
'mean': np.mean(fdata),
'channel': fdata.shape[-1],
'message':'darknet processed',
'status':'success'
})
return {
'data':'',
'message':'Something when wrong',
'status':'error'
}
api.add_resource(Predict,'/predict')
if __name__ == '__main__':
app.run(debug=True, host = '0.0.0.0', port = 5000, threaded=True)
In the # do model inference here part, just use your detect/predict function.
If you want to use native darknet, https://github.com/zabir-nabil/tf-model-server4-yolov3
If you want to use gRPC instead of REST, https://github.com/zabir-nabil/simple-gRPC
I recently started using Google's vision API. I am trying to annotate a batch of images and therefore issued the 'batch image annotation offline' guide from their documentation.
However, it is not clear to me how I can annotate MULTIPLE images from one API call. So let's say I have stored 10 images in my google cloud bucket. How can I annotate all these images at once and store them in one JSON file? Right now, I wrote a program that calls their example function and it works, but to put it simple, why can't I say: 'Look in this folder and annotate all images in it.'?
Thanks in advance.
from batch_image_labeling import sample_async_batch_annotate_images
counter = 0
for file in os.listdir('my_directory'):
filename = file
sample_async_batch_annotate_images('gs://my_bucket/{}'.format(filename), 'gs://my_bucket/{}'.format(counter))
counter += 1
from google.cloud import vision_v1
from google.cloud.vision_v1 import enums
import six
def sample_async_batch_annotate_images(input_image_uri, output_uri):
"""Perform async batch image annotation"""
client = vision_v1.ImageAnnotatorClient()
if isinstance(input_image_uri, six.binary_type):
input_image_uri = input_image_uri.decode('utf-8')
if isinstance(output_uri, six.binary_type):
output_uri = output_uri.decode('utf-8')
source = {'image_uri': input_image_uri}
image = {'source': source}
type_ = enums.Feature.Type.LABEL_DETECTION
features_element = {'type': type_}
type_2 = enums.Feature.Type.IMAGE_PROPERTIES
features_element_2 = {'type': type_2}
features = [features_element, features_element_2]
requests_element = {'image': image, 'features': features}
requests = [requests_element]
gcs_destination = {'uri': output_uri}
# The max number of responses to output in each JSON file
batch_size = 2
output_config = {'gcs_destination': gcs_destination, 'batch_size': batch_size}
operation = client.async_batch_annotate_images(requests, output_config)
print('Waiting for operation to complete...')
response = operation.result()
# The output is written to GCS with the provided output_uri as prefix
gcs_output_uri = response.output_config.gcs_destination.uri
print('Output written to GCS with prefix: {}'.format(gcs_output_uri))
It's somewhat unclear from that example, but your call to async_batch_annotate_images takes a requests parameter which is a list of multiple requests. So you can do something like this:
rom google.cloud import vision_v1
from google.cloud.vision_v1 import enums
import six
def generate_request(input_image_uri):
if isinstance(input_image_uri, six.binary_type):
input_image_uri = input_image_uri.decode('utf-8')
if isinstance(output_uri, six.binary_type):
output_uri = output_uri.decode('utf-8')
source = {'image_uri': input_image_uri}
image = {'source': source}
type_ = enums.Feature.Type.LABEL_DETECTION
features_element = {'type': type_}
type_2 = enums.Feature.Type.IMAGE_PROPERTIES
features_element_2 = {'type': type_2}
features = [features_element, features_element_2]
requests_element = {'image': image, 'features': features}
return requests_element
def sample_async_batch_annotate_images(input_uri, output_uri):
"""Perform async batch image annotation"""
client = vision_v1.ImageAnnotatorClient()
requests = [
generate_request(input_uri.format(filename))
for filename in os.listdir('my_directory')
]
gcs_destination = {'uri': output_uri}
# The max number of responses to output in each JSON file
batch_size = 1
output_config = {'gcs_destination': gcs_destination, 'batch_size': batch_size}
operation = client.async_batch_annotate_images(requests, output_config)
print('Waiting for operation to complete...')
response = operation.result()
# The output is written to GCS with the provided output_uri as prefix
gcs_output_uri = response.output_config.gcs_destination.uri
print('Output written to GCS with prefix: {}'.format(gcs_output_uri))
sample_async_batch_annotate_images('gs://my_bucket/{}', 'gs://my_bucket/results')
This can annotate up to 2,000 images in a single request. The only downside is that you can only specify a single output_uri as a destination, so you won't be able to use counter to put each result in a separate file, but you can set batch_size = 1 to ensure each response is written separately if this is what you want.
I'm working with the Google Vision API.
I would like to get the vertices ((x,y) locations) of the rectangles where google vision found a block of words. So far I'm getting the text from the google client.
credentials = service_account.Credentials.from_service_account_file("/api-key.json")
client = vision.ImageAnnotatorClient(credentials=credentials)
#open file
with io.open(path, 'rb') as image_file:
content = image_file.read()
#call api
image = types.Image(content=content)
response = client.document_text_detection(image=image)
document = response.full_text_annotation
What I would like is to get the vertices for each block of words in document.text.
It seems like Google has updated the documentation, although it is not so easy to find.
See tutorial on the Google Vision API here
The vertices can be found in response.text_annotations
From google documentation, you can find how the API response is structured (BLOCK, PARAGRAPH, ...) and how to retrieve corresponding vertices.
Particularly this function:
def get_document_bounds(image_file, feature):
"""Returns document bounds given an image."""
client = vision.ImageAnnotatorClient()
bounds = []
with io.open(image_file, 'rb') as image_file:
content = image_file.read()
image = types.Image(content=content)
response = client.document_text_detection(image=image)
document = response.full_text_annotation
# Collect specified feature bounds by enumerating all document features
for page in document.pages:
for block in page.blocks:
for paragraph in block.paragraphs:
for word in paragraph.words:
for symbol in word.symbols:
if (feature == FeatureType.SYMBOL):
bounds.append(symbol.bounding_box)
if (feature == FeatureType.WORD):
bounds.append(word.bounding_box)
if (feature == FeatureType.PARA):
bounds.append(paragraph.bounding_box)
if (feature == FeatureType.BLOCK):
bounds.append(block.bounding_box)
if (feature == FeatureType.PAGE):
bounds.append(block.bounding_box)
# The list `bounds` contains the coordinates of the bounding boxes.
return bounds