Analysing URL's using Google Cloud Vision - Python

Analysing URL's using Google Cloud Vision - Python - python

Is there anyway I can analyse URL's using Google Cloud Vision. I know how to analyse images that I store locally, but I can't seem to analyse jpg's that exist on the internet:
import argparse
import base64
import httplib2
from googleapiclient.discovery import build
import collections
import time
import datetime
import pyodbc
time_start = datetime.datetime.now()
def main(photo_file):
'''Run a label request on a single image'''
API_DISCOVERY_FILE = 'https://vision.googleapis.com/$discovery/rest?version=v1'
http = httplib2.Http()
service = build('vision', 'v1', http, discoveryServiceUrl=API_DISCOVERY_FILE, developerKey=INSERT API KEY HERE)
with open(photo_file, 'rb') as image:
image_content = base64.b64encode(image.read())
service_request = service.images().annotate(
body={
'requests': [{
'image': {
'content': image_content
},
'features': [{
'type': 'LOGO_DETECTION',
'maxResults': 10,
}]
}]
})
response = service_request.execute()
try:
logo_description = response['responses'][0]['logoAnnotations'][0]['description']
logo_description_score = response['responses'][0]['logoAnnotations'][0]['score']
print logo_description
print logo_description_score
except KeyError:
print "logo nonexistent"
pass
print time_start
if __name__ == '__main__':
main("C:\Users\KVadher\Desktop\image_file1.jpg")
Is there anyway I can analyse a URL and get an answer as to whether there are any logo's in them?

I figured out how to do it. Re-wrote my code and added used urllib to open the image and then I passed it through base64 and the google cloud vision logo recognition api:
import argparse
import base64
import httplib2
from googleapiclient.discovery import build
import collections
import time
import datetime
import pyodbc
import urllib
import urllib2
time_start = datetime.datetime.now()
#API AND DEVELOPER KEY DETAILS
API_DISCOVERY_FILE = 'https://vision.googleapis.com/$discovery/rest?version=v1'
http = httplib2.Http()
service = build('vision', 'v1', http, discoveryServiceUrl=API_DISCOVERY_FILE, developerKey=INSERT DEVELOPER KEY HERE)
url = "http://www.lcbo.com/content/dam/lcbo/products/218040.jpg/jcr:content/renditions/cq5dam.web.1280.1280.jpeg"
opener = urllib.urlopen(url)
#with open(photo_file) as image:
image_content = base64.b64encode(opener.read())
service_request = service.images().annotate(
body={
'requests': [{
'image': {
'content': image_content
},
'features': [{
'type': 'LOGO_DETECTION',
'maxResults': 10,
}]
}]
})
response = service_request.execute()
try:
logo_description = response['responses'][0]['logoAnnotations'][0]['description']
logo_description_score = response['responses'][0]['logoAnnotations'][0]['score']
print logo_description
print logo_description_score
except KeyError:
print "logo nonexistent"
pass
print time_start

The Google Cloud Vision API allows you to either specify the image content in base64 or a link to a file on Google Cloud storage. See:
https://cloud.google.com/vision/docs/requests-and-responses#json_request_format
This means that you will have to download each image url in your code (using Python's urllib2 library maybe) and encode it in base64, then add it to service_request.

Related

Textract Unsupported Document Exception

I'm trying to use boto3 to run a textract detect_document_text request.
I'm using the following code:
client = boto3.client('textract')
response = client.detect_document_text(
Document={
'Bytes': image_b64['document_b64']
}
)
Where image_b64['document_b64'] is a base64 image code that I converted using, for exemplo, https://base64.guru/converter/encode/image website.
But I'm getting the following error:
UnsupportedDocumentException
What I'm doing wrong?

Per doc:
If you're using an AWS SDK to call Amazon Textract, you might not need to base64-encode image bytes passed using the Bytes field.
Base64-encoding is only required when directly invoking the REST API. When using Python or NodeJS SDK, use native bytes (binary bytes).

For future reference, I solved that problem using:
client = boto3.client('textract')
image_64_decode = base64.b64decode(image_b64['document_b64'])
bytes = bytearray(image_64_decode)
response = client.detect_document_text(
Document={
'Bytes': bytes
}
)

With Boto3 if you are using Jupyternotebook for image (.jpg or .png), you can use:
import boto3
import cv2
with open(images_path, "rb") as img_file:
img_str = bytearray(img_file.read())
textract = boto3.client('textract')
response = textract.detect_document_text(Document={'Bytes': img_str})

This worked for me. It assumes you have configured the ~/.aws with your aws credentials
import boto3
import os
def main():
client = boto3.client('textract', region_name="ca-central-1")
for imageFile in os.listdir('./img'):
image_file = f"./imgs/{imageFile}"
with open(image_file, "rb") as f:
response = client.analyze_expense(
Document={
'Bytes': f.read(),
'S3Object': {
'Bucket': 'REDACTED',
'Name': imageFile,
'Version': '1'
}
})
print(response)
if __name__ == "__main__":
main()

How to upload big file to google drive using requests in python?

I have code that uploads my archive to Google Drive using my access token and requests, but if the file is larger than 512MB, it will fail with exit code MemoryError, so I'm searching for a way to fix this error and upload a file larger than 512MB. I already tried to find a solution but I didn't find anything where I could use an access token.
import os
import json
import requests
import ntpath
import oauth2
import httplib2
import oauth2client
from contextlib import closing
from googleapiclient.discovery import build
from oauth2client.client import GoogleCredentials
_CLIENT_ID = 'YOUR_CLIENT_ID'
_CLIENT_SECRET = 'YOUR_CLIENT_SECRET'
_REFRESH_TOKEN = 'YOUR_REFRESH_TOKEN'
_PARENT_FOLDER_ID = 'YOUR_PARENT_FOLDER_ID'
_ARCHIVE_FILE = os.environ['USERPROFILE'] +'\\Desktop\\WobbyChip.zip'
# ====================================================================================
def GetAccessToken(client_id, client_secret, refresh_token):
cred = oauth2client.client.GoogleCredentials(None,client_id,client_secret,refresh_token,None,'https://accounts.google.com/o/oauth2/token',None)
http = cred.authorize(httplib2.Http())
cred.refresh(http)
obj = json.loads(cred.to_json())
_ACCESS_TOKEN = obj['access_token']
return _ACCESS_TOKEN
def UploadFile(local_file, parent_folder_id, access_token,):
headers = {'Authorization': 'Bearer ' +access_token}
para = {
'name': (ntpath.basename(local_file)),
'parents': [parent_folder_id]}
files = {
'data': ('metadata', json.dumps(para), 'application/json; charset=UTF-8'),
'file': open(local_file, 'rb')}
requests.post(
'https://www.googleapis.com/upload/drive/v3/files?uploadType=multipart',
headers=headers,
files=files)
# ====================================================================================
if __name__ == '__main__':
UploadFile(_ARCHIVE_FILE, _PARENT_FOLDER_ID, GetAccessToken(_CLIENT_ID, _CLIENT_SECRET, _REFRESH_TOKEN))

http post request to API with azure blob storage

I'm trying to make an http post request with Microsoft's face api, in order to connect it with photos in my azure blob storage account. When I run the following code, I get multiple errors like handshake error, or ssl routines type errors. I appreciate any help! The problem code is :
api_response = requests.post(url, headers=headers, data=blob)
obviously for context here is what I ran before that. This first chunk sets up the storage account:
%matplotlib inline
import matplotlib.pyplot as plt
import io
from io import StringIO
import numpy as np
import cv2
from PIL import Image
from PIL import Image
import os
from array import array
azure_storage_account_name = 'musicsurveyphotostorage'
azure_storage_account_key = None #dont need key... we will access public blob...
if azure_storage_account_name is None:
raise Exception("You must provide a name for an Azure Storage account")
from azure.storage.blob import BlockBlobService
blob_service = BlockBlobService(azure_storage_account_name, azure_storage_account_key)
# select container (folder) name where the files resides
container_name = 'musicsurveyphotostorage'
# list files in the selected folder
generator = blob_service.list_blobs(container_name)
blob_prefix = 'https://{0}.blob.core.windows.net/{1}/{2}'
# load image file to process
blob_name = 'shiba.jpg' #name of image I have stored
blob = blob_service.get_blob_to_bytes(container_name, blob_name)
image_file_in_mem = io.BytesIO(blob.content)
img_bytes = Image.open(image_file_in_mem)
This second chunk calls out the API and the problematic post request:
#CALL OUT THE API
import requests
import urllib
url_face_api = 'https://eastus.api.cognitive.microsoft.com/face/v1.0'
api_key ='____'
#WHICH PARAMETERS ATTRIBUTES DO YOU WANT RETURNED
headers = {'Content-Type': 'application/octet-stream', 'Ocp-Apim-
Subscription-Key':api_key}
params = urllib.parse.urlencode({
'returnFaceId': 'true',
'returnFaceLandmarks': 'true',
'returnFaceAttributes': 'age,gender,smile,facialHair,headPose,glasses',
})
query_string = '?{0}'.format(params)
url = url_face_api + query_string
#THIS IS THE PROBLEM CODE
api_response = requests.post(url, headers=headers, data=blob)
#print out output in json
import json
res_json = json.loads(api_response.content.decode('utf-8'))
print(json.dumps(res_json, indent=2, sort_keys=True))

If I open the Fiddler, I also could reproduce the issue that you mentioned. If it is that case, you could pause to capture the request with fiddler during send request.
Based on my test, in your code there are 2 code lines need to be changed. From more information you could refer to the screenshot.
We also could get the some demo code from azure offical document.
url_face_api = 'https://westcentralus.api.cognitive.microsoft.com/face/v1.0/detect' # in your case miss detect
api_response = requests.post(url, headers=headers,data=blob.content) # data should be blob.content

Google Vision API text detection Python example uses project: "google.com:cloudsdktool" and not my own project

I am working on the python example for Cloud Vision API from github repo.
I have already setup the project and activated the service account with its key. I have also called the gcloud auth and entered my credentials.
Here is my code (as derived from the python example of Vision API text detection):
import base64
import os
import re
import sys
from googleapiclient import discovery
from googleapiclient import errors
import nltk
from nltk.stem.snowball import EnglishStemmer
from oauth2client.client import GoogleCredentials
import redis
DISCOVERY_URL = 'https://{api}.googleapis.com/$discovery/rest?version={apiVersion}' # noqa
BATCH_SIZE = 10
class VisionApi:
"""Construct and use the Google Vision API service."""
def __init__(self, api_discovery_file='/home/saadq/Dev/Projects/TM-visual-search/credentials-key.json'):
self.credentials = GoogleCredentials.get_application_default()
print self.credentials.to_json()
self.service = discovery.build(
'vision', 'v1', credentials=self.credentials,
discoveryServiceUrl=DISCOVERY_URL)
print DISCOVERY_URL
def detect_text(self, input_filenames, num_retries=3, max_results=6):
"""Uses the Vision API to detect text in the given file.
"""
images = {}
for filename in input_filenames:
with open(filename, 'rb') as image_file:
images[filename] = image_file.read()
batch_request = []
for filename in images:
batch_request.append({
'image': {
'content': base64.b64encode(
images[filename]).decode('UTF-8')
},
'features': [{
'type': 'TEXT_DETECTION',
'maxResults': max_results,
}]
})
request = self.service.images().annotate(
body={'requests': batch_request})
try:
responses = request.execute(num_retries=num_retries)
if 'responses' not in responses:
return {}
text_response = {}
for filename, response in zip(images, responses['responses']):
if 'error' in response:
print("API Error for %s: %s" % (
filename,
response['error']['message']
if 'message' in response['error']
else ''))
continue
if 'textAnnotations' in response:
text_response[filename] = response['textAnnotations']
else:
text_response[filename] = []
return text_response
except errors.HttpError as e:
print("Http Error for %s: %s" % (filename, e))
except KeyError as e2:
print("Key error: %s" % e2)
vision = VisionApi()
print vision.detect_text(['test_article.png'])
This is the error message I am getting:
Http Error for test_article.png: <HttpError 403 when requesting https://vision.googleapis.com/v1/images:annotate?alt=json returned "Google Cloud Vision API has not been used in project google.com:cloudsdktool before or it is disabled. Enable it by visiting https://console.developers.google.com/apis/api/vision.googleapis.com/overview?project=google.com:cloudsdktool then retry. If you enabled this API recently, wait a few minutes for the action to propagate to our systems and retry.">
I want to be able to use my own project for the example and not the default (google.com:cloudsdktool).

Download the credentials you created and update the GOOGLE_APPLICATION_CREDENTIALS environment variable to point to that file:
export GOOGLE_APPLICATION_CREDENTIALS=/path/to/your/credentials-key.json
Reference: https://github.com/GoogleCloudPlatform/cloud-vision/tree/master/python/text#set-up-to-authenticate-with-your-projects-credentials

The export didn't work for me, even setting it in the code:
import os
...
os.environ['GOOGLE APPLICATION_CREDENTIALS'] = 'path to servvice account json'
os.environ['GCLOUD_PROJECT'] = 'project id'
...
But this fix worked:
Google cloud speech api throwing 403 when trying to use it

Large inconsistency between Yelp v2 API queries and equivalent searches on Yelp.com

Search results using the Yelp v2 API look very different from what you'd find on their website
For example, on the website:
http://www.yelp.com/search?find_desc=restaurants&find_loc=Manhattan%2C+NY&ns=1#start=0&sortby=rating
I issued the same search using their API after reading their documentation, with the following Python code
import json
import argparse
import json
import pprint
import sys
import urllib
import urllib2
import oauth2
HOST = 'api.yelp.com'
PATH = '/v2/search/'
# I put my account's values here, leaving blank for the question
CONSUMER_KEY = ''
CONSUMER_SECRET = ''
TOKEN = ''
TOKEN_SECRET = ''
def main():
url_params = {
'term': 'restaurants',
'location': 'Manhattan,NY',
'sort': 2, # sort by "Highest Rated"
}
url = 'http://{0}{1}?'.format(HOST, PATH)
consumer = oauth2.Consumer(
CONSUMER_KEY,
CONSUMER_SECRET
)
oauth_request = oauth2.Request(
method="GET",
url=url,
parameters=url_params
)
oauth_request.update(
{
'oauth_nonce': oauth2.generate_nonce(),
'oauth_timestamp': oauth2.generate_timestamp(),
'oauth_token': TOKEN,
'oauth_consumer_key': CONSUMER_KEY
}
)
token = oauth2.Token(TOKEN, TOKEN_SECRET)
oauth_request.sign_request(
oauth2.SignatureMethod_HMAC_SHA1(),
consumer,
token
)
signed_url = oauth_request.to_url()
print 'Querying {0} ...'.format(url)
conn = urllib2.urlopen(signed_url, None)
try:
print conn.read()
finally:
conn.close()
I tried many different combinations of query parameters (changing the location phrase to include spaces, no comma, etc. also tried introducing a limit and offset), but with no success. What am I doing wrong?
Results of the API query are here https://code.stypi.com/cmqnfxuo

I think issue may be on path.Here your code
PATH = '/v2/search/'
So your reguest url like this.
http://api.yelp.com/v2/search/?
Error in your url path. The Search path should be:
PATH = '/v2/search'
The request url will be changed like this:
http://api.yelp.com/v2/search?term=food&location=San+Francisco
More information

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Analysing URL's using Google Cloud Vision - Python - python

Related

Textract Unsupported Document Exception

How to upload big file to google drive using requests in python?

http post request to API with azure blob storage

Google Vision API text detection Python example uses project: "google.com:cloudsdktool" and not my own project

Large inconsistency between Yelp v2 API queries and equivalent searches on Yelp.com

Categories

Resources